關(guān)于刪除MySQL Logs的問(wèn)題記錄
五一前,一個(gè)DBA同事反饋,在日常環(huán)境中刪除一個(gè)大的slow log文件(假設(shè)文件大小10G以上吧),然后在MySQL中執(zhí)行flush slow logs,會(huì)發(fā)現(xiàn)mysqld hang住。
今天嘗試著重現(xiàn)了此問(wèn)題,這里簡(jiǎn)要分析下原因。
重現(xiàn)步驟:
1. 構(gòu)造slow log (將long_query_time設(shè)成了0);
2. 觀察刪rm slow log瞬間,tps/qps變化;
3. 觀察執(zhí)行flush slow logs瞬間,tps/qps變化;
4. 記錄flush slow logs執(zhí)行時(shí), pstack打出的調(diào)用棧情況;
***步,沒(méi)啥好說(shuō)的。
第二步,tps/qps沒(méi)啥變化。
第三步,會(huì)發(fā)現(xiàn)tps/qps瞬間跌0,如下所示:
- [ 639s] threads: 32, tps: 1121.00, reads/s: 15843.98, writes/s: 4490.99
- [ 640s] threads: 32, tps: 792.99, reads/s: 10803.89, writes/s: 3150.97
- [ 641s] threads: 32, tps: 0.00, reads/s: 0.00, writes/s: 0.00
- [ 642s] threads: 32, tps: 0.00, reads/s: 0.00, writes/s: 0.00
- [ 643s] threads: 32, tps: 471.01, reads/s: 6860.08, writes/s: 1908.02
mysql命令行會(huì)發(fā)現(xiàn),flush slow logs執(zhí)行時(shí)間剛好為3s左右。
第四步,我們看下pstack的輸出結(jié)果,只記錄相關(guān)的:
- 610 Thread 5 (Thread 0x2afdc4101700 (LWP 30762)):
- 611 #0 0x0000003c6e40a7d6 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
- 612 #1 0x0000000000825135 in inline_mysql_rwlock_rdlock ()
- 613 #2 0x0000000000838004 in LOGGER::lock_shared() ()
- 614 #3 0x00000000008283bf in LOGGER::slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 615 #4 0x0000000000832b30 in slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 616 #5 0x0000000000609f23 in log_slow_statement(THD*) ()
- 617 #6 0x00000000006099d1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
- 618 #7 0x0000000000606e02 in do_command(THD*) ()
- 619 #8 0x00000000006f070f in do_handle_one_connection(THD*) ()
- 620 #9 0x00000000006f020d in handle_one_connection ()
- 621 #10 0x0000003c6e4077f1 in start_thread () from /lib64/libpthread.so.0
- 622 #11 0x0000003c6e0e570d in clone () from /lib64/libc.so.6
- 623 Thread 4 (Thread 0x2afdd0080700 (LWP 30763)):
- 624 #0 0x0000003c6e40a7d6 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
- 625 #1 0x0000000000825135 in inline_mysql_rwlock_rdlock ()
- 626 #2 0x0000000000838004 in LOGGER::lock_shared() ()
- 627 #3 0x00000000008283bf in LOGGER::slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 628 #4 0x0000000000832b30 in slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 629 #5 0x0000000000609f23 in log_slow_statement(THD*) ()
- 630 #6 0x00000000006099d1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
- 631 #7 0x0000000000606e02 in do_command(THD*) ()
- 632 #8 0x00000000006f070f in do_handle_one_connection(THD*) ()
- 633 #9 0x00000000006f020d in handle_one_connection ()
- 634 #10 0x0000003c6e4077f1 in start_thread () from /lib64/libpthread.so.0
- 635 #11 0x0000003c6e0e570d in clone () from /lib64/libc.so.6
- 636 Thread 3 (Thread 0x2afdd0101700 (LWP 30764)):
- 637 #0 0x0000003c6e40a7d6 in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
- 638 #1 0x0000000000825135 in inline_mysql_rwlock_rdlock ()
- 639 #2 0x0000000000838004 in LOGGER::lock_shared() ()
- 640 #3 0x00000000008283bf in LOGGER::slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 641 #4 0x0000000000832b30 in slow_log_print(THD*, char const*, unsigned int, unsigned long long) ()
- 642 #5 0x0000000000609f23 in log_slow_statement(THD*) ()
- 643 #6 0x00000000006099d1 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
- 644 #7 0x0000000000606e02 in do_command(THD*) ()
- 645 #8 0x00000000006f070f in do_handle_one_connection(THD*) ()
- 646 #9 0x00000000006f020d in handle_one_connection ()
- 647 #10 0x0000003c6e4077f1 in start_thread () from /lib64/libpthread.so.0
- 648 #11 0x0000003c6e0e570d in clone () from /lib64/libc.so.6
- 649 Thread 2 (Thread 0x2afe18080700 (LWP 30855)):
- 650 #0 0x0000003c6e40e54d in close () from /lib64/libpthread.so.0
- 651 #1 0x00000000008f56ed in my_close ()
- 652 #2 0x0000000000825c16 in inline_mysql_file_close ()
- 653 #3 0x000000000082b305 in MYSQL_LOG::close(unsigned int) ()
- 654 #4 0x000000000082b634 in MYSQL_QUERY_LOG::reopen_file() ()
- 655 #5 0x0000000000828283 in LOGGER::flush_slow_log() ()
- 656 #6 0x000000000071d8fc in reload_acl_and_cache(THD*, unsigned long, TABLE_LIST*, int*) ()
- 657 #7 0x0000000000610200 in mysql_execute_command(THD*) ()
- 658 #8 0x000000000061534d in mysql_parse(THD*, char*, unsigned int, Parser_state*) ()
- 659 #9 0x00000000006086a0 in dispatch_command(enum_server_command, THD*, char*, unsigned int) ()
- 660 #10 0x0000000000606e02 in do_command(THD*) ()
- 661 #11 0x00000000006f070f in do_handle_one_connection(THD*) ()
- 662 #12 0x00000000006f020d in handle_one_connection ()
- 663 #13 0x0000003c6e4077f1 in start_thread () from /lib64/libpthread.so.0
- 664 #14 0x0000003c6e0e570d in clone () from /lib64/libc.so.6
會(huì)發(fā)現(xiàn)Thread 2在執(zhí)行flush slow logs操作,其他的線程都在等待鎖LOCK_log上邊。
背后的原因其實(shí)很簡(jiǎn)單,在shell中執(zhí)行rm slow log操作時(shí),由于mysqld仍有文件句柄打開(kāi)此文件,所以實(shí)際上此時(shí)文件并未刪除。執(zhí)行flush logs操作,其實(shí)際執(zhí)行的是1)close;2)open 操作(logger.flush_slow_log -> mysql_slow_log.reopen_file),在close操作執(zhí)行時(shí),文件系統(tǒng)真正刪除文件,此時(shí)該線程占用著LOCK_log鎖。
刪除時(shí)會(huì)執(zhí)行刷臟(當(dāng)然我構(gòu)造這個(gè)場(chǎng)景很極端,基本所有slow log文件的內(nèi)容都在文件系統(tǒng)緩存中),這個(gè)會(huì)很耗時(shí)間,比如我執(zhí)行這個(gè)語(yǔ)句耗了3s。此時(shí)間段內(nèi),如果連接發(fā)來(lái)的語(yǔ)句需要記log(server層的log:slow log/binlog/general log共有LOCK_log這把鎖)就會(huì)處于等待狀態(tài),那么系統(tǒng)對(duì)外的反應(yīng)就是hang住了。
flush slow logs中調(diào)用執(zhí)行的close所需時(shí)間和文件大小、以及文件系統(tǒng)緩存中該文件臟頁(yè)比例都有關(guān)系,比如我在執(zhí)行flush slow logs前使用sysctl vm.drop_caches=3清空
了文件系統(tǒng)緩存的話,同樣大小的flush slow logs操作執(zhí)行時(shí)間是0.42s,相應(yīng)的阻塞時(shí)間也會(huì)減少不少。
可以考慮在slow logs的文件句柄上執(zhí)行posix_fadvise調(diào)用,促使不會(huì)緩存很大的log文件內(nèi)容(slow log也沒(méi)啥需要緩存的),這有篇霸爺?shù)奈恼?,可以參考?posix_fadvise清除緩存的誤解和改進(jìn)措施 。
另外,peter在07年就討論過(guò)這個(gè)問(wèn)題, Be careful rotating MySQL logs 其給出的建議是先mv file,然后flush logs,再執(zhí)行刪除文件的操作,讓真正的刪除行為由自己而不是mysqld完成。比較遺憾的是,七年過(guò)去了,LOCK_log這把鎖的問(wèn)題還沒(méi)有完整的解決掉。
PS:
文章結(jié)尾記一點(diǎn)備忘,通過(guò)close/rm操作刪除一個(gè)10G大小的文件,在執(zhí)行sysctl vm.drop_caches=3清空緩存后,此操作的耗時(shí)仍在百毫秒量級(jí)(我的機(jī)器上是200ms+),其背后做了什么事情還需要找內(nèi)核組的同事了解下。