利用主鍵快速聚合單表數千萬MySQL的非索引欄位
場景
很多時候,為了寫入效率,在生產環境里業務大表(單表千萬行以上)是不允許隨意加索引的。而且就算加索引,因為鎖表問題,對業務也是有影響的。
我們一般會用離線的從庫進行一些數據統計,而生產環境的索引並不能很好的滿足統計的需求。沒有相應索引,我們又如何高效的進行欄位聚合呢?
利用主鍵索引進行range,然後再進行聚合。
舉例
表名:tbl_pay_orders
行數:29,311,362
引擎:InnoDB
欄位:
+------------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| amt | bigint(20) | NO | | NULL | |
| created_time | int(11) | NO | | NULL | |
需求:按天統計tbl_pay_orders金額
不考慮索引的情況下,SQL像這樣寫:
select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;
由於created_time沒有索引,MySQL 索引提示如下:
mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;
+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | SIMPLE | tbl_pay_orders | ALL | NULL | NULL | NULL | NULL | 29311362 | Using temporary; Using filesort |
+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+
1 row in set (0.00 sec)
全表掃描,共29311362行。
那我們如果利用主鍵呢?SQL可能像這樣:
select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;
索引提示是這樣的:
mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;
+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | tbl_pay_orders | range | PRIMARY | PRIMARY | 8 | NULL | 1879550 | Using where; Using temporary; Using filesort |
+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+
1 row in set (0.01 sec)
我們發現,仍然沒有走任何索引(當然了,因為我們並沒有改變索引),但是掃描的行數一下子降到了1879550了。在這個量級我們就可以用MySQL方便的做聚合了
問題來了,我怎麼知道每天的id範圍呢?
答案是離線先按天建索引,生成一個day到start_id的映射關係。
for i in file("idx.txt"):
last_rid, last_day = i.strip().split(",")wf = open(idx.txt, a)sql = "select id,left(from_unixtime(created_time), 10) as day from tbl_pay_orders where id > %s and id < %s" % (last_rid, int(last_rid) + 1000000)items = dao.select_sql(sql)for item in items: item["day"] = item["day"].replace("-", "").replace(" ", "") if item["day"] != last_day: wf.write("%s,%s
" % (item["id"], item["day"]))last_day = item["day"]
推薦閱讀:
※2017年MySQL資料庫技術嘉年華 —— 有態度的技術大會
※MySQL |Self Join
※自建MySQL實例遷移到雲平台實踐
※怎樣在 MySQL 表中存儲樹形結構數據?
※解決阿里雲VPS伺服器mysql自動關閉的問題
TAG:MySQL |