MySQL數據丟失問題

02-01

這兩天遇到mysql宕機的問題，導致數據出現不一致的現象，結果發現有個特別重要的參數innodb_flush_log_at_trx_commit特別需要關注，默認情況下這個參數是1，即嚴格保證資料庫的一致性，事務在提交之後立即將log buffer的數據寫入到log file,同時調用文件系統的flush操作。

仔細分析一下mysql和文件系統的關係，基本上分為write和flush兩個主要的操作，mysql管理自己的log buffer，文件系統管理log file，mysql事務提交後會調用write方法將數據從log buffer寫入到log file，即持久化；學過操作系統的同學都知道文件系統為了提高IO的效率，本身會對每個文件做緩存，所以要想保證數據真正落地到磁碟上，有時會多一步flush操作，這個flush操作可能是文件系統自己flush，也可能是mysql通過系統調用強制讓文件系統去flush。

但是出現故障的時候innodb_flush_log_at_trx_commit參數的設置為2，根據下面文檔的解釋，為2時mysql會在事務提交之後立即向文件系統寫入log file，但並不會立即調用文件系統的flush操作，而是由定時任務調度每隔1秒flush一次，這就會引發數據丟失問題：當操作系統掛掉時，恰好文件系統自己也沒有調用flush操作，那麼那部分寫入到文件緩存的數據就會丟失。如果mysql自己掛掉，由於log buffer中的數據已經寫入文件系統，只要文件系統不掛掉，數據還是在的。

The default value of 1 is required for full ACID compliance. With this value, the contents of the InnoDBlog buffer are written out to thelog file at each transaction commit and the log file is flushed to disk.
With a value of 0, the contents of the InnoDBlog buffer are written to the log file approximately once per second and the log file is flushed to disk. No writes from the log buffer to the log file are performed at transaction commit. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions with any mysqld process crash.
With a value of 2, the contents of the InnoDBlog buffer are written to the log file after each transaction commit and the log file is flushed to disk approximately once per second. Once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues. Because the flush to disk operation only occurs approximately once per second, you can lose up to a second of transactions in an operating system crash or a power outage.

翻開mysql的源代碼，很好地解釋了innodb_flush_log_at_trx_commit為0、1、2時mysql是如何處理數據落地的問題的。innodb_flush_log_at_trx_commit為0時，什麼都不做，等著定時任務調度去每隔1秒write和flush，這種效率最高，但是丟失數據的風險最大，mysql進程掛掉數據就有可能丟到，更不用說操作系統掛掉；為1時事務提交後立即write和flush，能保證強一致，但IO效率很差；為2時事務提交後立即write，但是每秒flush一次，IO效率有所提升，但是仍有丟失數據的風險，大部分互聯網應用都是這個值。

/**********************************************************************//**nIf required, flushes the log to disk based on the value ofninnodb_flush_log_at_trx_commit. */nstaticnvoidntrx_flush_log_if_needed_low(n/*========================*/n lsn_t lsn) /*!< in: lsn up to which logs are to ben flushed. */n{n switch (srv_flush_log_at_trx_commit) {n case 0:n /* Do nothing */n break;n case 1:n /* Write the log and optionally flush it to disk */n log_write_up_to(lsn, LOG_WAIT_ONE_GROUP,n srv_unix_file_flush_method != SRV_UNIX_NOSYNC);n break;n case 2:n /* Write the log but do not flush it to disk */n log_write_up_to(lsn, LOG_WAIT_ONE_GROUP, FALSE);nn break;n default:n ut_error;n }n}n

建議對數據一致性要求比較高的，還是把innodb_flush_log_at_trx_commit設置為1，對一致性要求相對較弱的可以設置為2。