標籤:

利用主鍵快速聚合單表數千萬MySQL的非索引欄位

場景

很多時候,為了寫入效率,在生產環境里業務大表(單表千萬行以上)是不允許隨意加索引的。而且就算加索引,因為鎖表問題,對業務也是有影響的。

我們一般會用離線的從庫進行一些數據統計,而生產環境的索引並不能很好的滿足統計的需求。沒有相應索引,我們又如何高效的進行欄位聚合呢?

利用主鍵索引進行range,然後再進行聚合。

舉例

表名:tbl_pay_orders

行數:29,311,362

引擎:InnoDB

欄位:

+------------------------+---------------------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+------------------------+---------------------+------+-----+---------+----------------+

| id | bigint(20) | NO | PRI | NULL | auto_increment |

| amt | bigint(20) | NO | | NULL | |

| created_time | int(11) | NO | | NULL | |

需求:按天統計tbl_pay_orders金額

不考慮索引的情況下,SQL像這樣寫:

select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;

由於created_time沒有索引,MySQL 索引提示如下:

mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders group by day;

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

| 1 | SIMPLE | tbl_pay_orders | ALL | NULL | NULL | NULL | NULL | 29311362 | Using temporary; Using filesort |

+----+-------------+------------------------+------+---------------+------+---------+------+------+---------------------------------+

1 row in set (0.00 sec)

全表掃描,共29311362行。

那我們如果利用主鍵呢?SQL可能像這樣:

select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;

索引提示是這樣的:

mysql> desc select left(from_unixtime(created_time),10) as day, sum(amt) from tbl_pay_orders where id >= 18000000 and id < 19000000 group by day;

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

| 1 | SIMPLE | tbl_pay_orders | range | PRIMARY | PRIMARY | 8 | NULL | 1879550 | Using where; Using temporary; Using filesort |

+----+-------------+------------------------+-------+---------------+---------+---------+------+---------+----------------------------------------------+

1 row in set (0.01 sec)

我們發現,仍然沒有走任何索引(當然了,因為我們並沒有改變索引),但是掃描的行數一下子降到了1879550了。在這個量級我們就可以用MySQL方便的做聚合了

問題來了,我怎麼知道每天的id範圍呢?

答案是離線先按天建索引,生成一個day到start_id的映射關係。

for i in file("idx.txt"):

last_rid, last_day = i.strip().split(",")

wf = open(idx.txt, a)

sql = "select id,left(from_unixtime(created_time), 10) as day from tbl_pay_orders where id > %s and id < %s" % (last_rid, int(last_rid) + 1000000)

items = dao.select_sql(sql)

for item in items:

item["day"] = item["day"].replace("-", "").replace(" ", "")

if item["day"] != last_day:

wf.write("%s,%s
" % (item["id"], item["day"]))

last_day = item["day"]

推薦閱讀:

2017年MySQL資料庫技術嘉年華 —— 有態度的技術大會
MySQL |Self Join
自建MySQL實例遷移到雲平台實踐
怎樣在 MySQL 表中存儲樹形結構數據?
解決阿里雲VPS伺服器mysql自動關閉的問題

TAG:MySQL |