MySQL plante à environ 70 000 fichiers ouverts

Question

Serveur exécutant la communauté 8.0.14, nous avons atteint le nombre de fichiers ouverts il y a quelques jours, j'ai donc augmenté la limite, mais après un certain temps, le serveur s'est écrasé avec cela dans le journal des erreurs:

InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 03:07:21 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail. key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=221 max_threads=1000 thread_count=4 connection_count=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 403301 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f9200000b20 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fd571ceaca0 thread_stack 0x46000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char*, unsigned long)+0x3d) [0x561bdb69757d] /usr/sbin/mysqld(handle_fatal_signal+0x423) [0x561bda8277c3] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7fe5095f6890] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7) [0x7fe507981e97] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141) [0x7fe507983801] /usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x2d5) [0x561bdb94d2c5] /usr/sbin/mysqld(+0x1fe243c) [0x561bdb94f43c] /usr/sbin/mysqld(mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long)+0x9ba) [0x561bdb95638a] /usr/sbin/mysqld(mem_heap_add_block(mem_block_info_t*, unsigned long)+0x72) [0x561bdb956562] /usr/sbin/mysqld(trx_undo_update_rec_get_update(unsigned char const*, dict_index_t*, unsigned long, unsigned long, unsigned long, unsigned long, trx_t*, mem_block_info_t*, upd_t**, lob::undo_vers_t*, type_cmpl_t&)+0x1926) [0x561bdb924896] /usr/sbin/mysqld(row_purge_step(que_thr_t*)+0x845) [0x561bdb8b7ad5] /usr/sbin/mysqld(que_run_threads(que_thr_t*)+0x997) [0x561bdb859597] /usr/sbin/mysqld(srv_worker_thread()+0x29d) [0x561bdb8e85bd] /usr/sbin/mysqld(std::thread::_State_impl<std::thread::_Invoker<std::Tuple<Runnable, void (*)()> > >::_M_run()+0x63) [0x561bdb759883] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd57f) [0x7fe5083a757f] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fe5095eb6db] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fe507a6488f] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 0 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 2019-01-27T03:07:37.554520Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.14) starting as process 56489 InnoDB: Progress in percents: 1InnoDB MEMCACHED: Memcached uses atomic increment 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1002019-01-27T03:12:07.973920Z 0 [System] [MY-010229] [Server] Starting crash recovery... 2019-01-27T03:12:07.982255Z 0 [System] [MY-010232] [Server] Crash recovery finished. 2019-01-27T03:12:10.581437Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2

La machine est une 2x Xeon E5-2680v4 - 28c/56t - 2,4 GHz/3,3 GHz avec 512 Go de RAM, fichier de configuration:

server-id = 1 log-bin = /var/lib/mysql-binlogs/web07 relay-log = web07-relay-bin binlog_expire_logs_seconds = 259200 open_files_limit = 10000 max_connections = 1000 innodb_flush_method=O_DIRECT innodb_flush_log_at_trx_commit=0 innodb_log_file_size=38G innodb_buffer_pool_size=300G innodb_buffer_pool_instances=64 join_buffer_size=5M tmp_table_size=1G max_heap_table_size=512M skip_name_resolve=1 table_open_cache=20000 sql-mode="ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION,NO_AUTO_VALUE_ON_ZERO" relay-log-recovery=1 default-authentication-plugin=mysql_native_password

Après ce plantage, le serveur se bloque à nouveau si je ne le redémarre pas lorsque les fichiers ouverts deviennent trop nombreux (toutes les 24 heures environ), voici maintenant par exemple:

$ mysql -e "show global status like 'open_%';" +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | Open_files | 62507 | | Open_streams | 0 | | Open_table_definitions | 1843 | | Open_tables | 18779 | | Opened_files | 71902 | | Opened_table_definitions | 3168 | | Opened_tables | 18803 | +--------------------------+-------+

Pendant que j'écrivais cette question, le serveur s'est écrasé:

2019-01-29T22:27:51.031406Z 54298197 [ERROR] [MY-013132] [Server] The table '/tmp/#sql9ae8_33c8655_13' is full!

2019-01-29T22:28:01.844610Z 54321796 [ERROR] [MY-013132] [Server] The table '/tmp/#sql9ae8_33ce284_1' is full!

Comment puis-je déboguer pourquoi il y a tant de fichiers ouverts? Le trafic est élevé mais il n'y a généralement pas plus de 10 connexions ouvertes, alors pourquoi les tables temporaires ne sont pas fermées?

lsof: (~ 27k lignes au total, 10 heures après redémarrage)

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mysqld 23964 mysql cwd DIR 9,3 4096 49549201 /var/lib/mysql mysqld 23964 mysql rtd DIR 9,3 4096 2 / mysqld 23964 mysql txt REG 9,3 56952368 6292097 /usr/sbin/mysqld mysqld 23964 mysql DEL REG 9,3 12858561 /tmp/mysql_temptable.As3HWG mysqld 23964 mysql DEL REG 9,3 12858557 /tmp/mysql_temptable.I4gP0F mysqld 23964 mysql DEL REG 9,3 12858554 /tmp/mysql_temptable.W139aF mysqld 23964 mysql DEL REG 9,3 12858553 /tmp/mysql_temptable.MhBwlE mysqld 23964 mysql DEL REG 9,3 12858551 /tmp/mysql_temptable.mcw0vD mysqld 23964 mysql DEL REG 9,3 12858550 /tmp/mysql_temptable.OFuZ94 mysqld 23964 mysql DEL REG 9,3 12858544 /tmp/mysql_temptable.2ffRvY mysqld 23964 mysql DEL REG 9,3 12858549 /tmp/mysql_temptable.VvXQPw mysqld 23964 mysql DEL REG 9,3 12858541 /tmp/mysql_temptable.KTTZRR mysqld 23964 mysql DEL REG 9,3 12858542 /tmp/mysql_temptable.eMVSbq mysqld 23964 mysql DEL REG 9,3 12858538 /tmp/mysql_temptable.6Kaay

Après les premières lignes, tous les fichiers sont (supprimés). Je pense que c'est le problème, pourquoi ces pointeurs ne sont pas publiés?

mysqld 27864 mysql 1658u REG 9,3 1048584 12846342 /tmp/mysql_temptable.wazlKx (deleted) mysqld 27864 mysql 1660u REG 9,3 1048584 12846352 /tmp/mysql_temptable.S5Oo1m (deleted) mysqld 27864 mysql 1662u REG 9,3 1048584 12846343 /tmp/mysql_temptable.l1CZjK (deleted) mysqld 27864 mysql 1666u REG 9,3 1048584 12846366 /tmp/mysql_temptable.VDaTxb (deleted)

2019-02-19: J'ai augmenté certaines limites comme suggéré, et j'ai fait un mysqlcheck --all-databases, sans changement.

Comme écrit dans les commentaires, le nombre de fichiers ouverts ne diminue pas même si j'arrête le trafic.

L'application s'est révélée stable et aucun changement significatif n'a été apporté au code ou à la quantité de trafic dans les jours précédant l'apparition du problème. (Je suis sûr à 100% car ce jour était un jour férié et il n'y a pas de commit sur la branche master)

the_nuts · Accepted Answer

Le problème est un bogue Mysql ( https://bugs.mysql.com/bug.php?id=94185 )

Quoi qu'il en soit, certains réglages ont atténué le problème et arrêté le nombre de fichiers ouverts à augmenter.

Plus précisément, nous définissons:

innodb_lru_scan_depth=100 innodb_flushing_avg_loops=5 thread_cache_size=216

Corrigé à partir de la prochaine version 8.0.16, et voici l'entrée du journal des modifications:

Les variables locales de thread statique définies à la mauvaise portée n'ont pas été libérées à la sortie du thread.

Wilson Hauck · Answer

Pour commencer, considérez max_connections = 500 plutôt que 1000 pour réduire considérablement votre empreinte RAM compte tenu du détail de max_connections_used de 221 lorsque votre événement d'échec d'assertion a été documenté.

La plupart des configurations que je vois ont tmp_table_size = 1G max_heap_table_size = 512M de tailles identiques.