Diagnosing and Fixing “MySQL Server Has Gone Away” Messages

MySQL Server Has Gone Away

MySQL Server Has Gone AwayWe all like when error messages are descriptive and give a clear idea about what is happening; however, there are some cases when a few possible reasons lay behind one error message. “MySQL server has gone away” is one of them. Most of the cases when the error occurs are described in MySQL documentation, but it can get tricky. And here, I’d like to talk about “tricky”.

There are only a few major cases when this happens:

1. MySQL Thread Was Killed by an Administrator or a Utility Such as pt-kill

The manual intervention is likely to be intermittent and, as it is a one-time thing in certain situations (e.g., a bad long-running query), probably would be known to a DBA. Pt-kill might be less noticeable, as it is often left running as a workaround to prevent those bad long queries from taxing system resources. Checking the system processlist should bring the commands to the surface:

$ ps xauf | grep pt-kill
taras 6069 0.1 0.1 111416 29452 pts/29 S+ 10:57 0:00 | | \_ perl /usr/bin/pt-kill --interval 1s --busy-time 5s --match-info (SELECT) h=127.0.0.1 --print --kill
taras 6913 0.0 0.0 21532 1112 pts/30 S+ 11:00 0:00 | \_ grep --color=auto pt-kill

A very convenient way is to use the audit plugin available for Percona Server for MySQL to determine where the kill command came from:

<AUDIT_RECORD>
  <NAME>Query</NAME>
  <RECORD>624484743_2020-06-30T17:38:14</RECORD>
  <TIMESTAMP>2020-06-30T17:57:35 UTC</TIMESTAMP>
  <COMMAND_CLASS>kill</COMMAND_CLASS>
  <CONNECTION_ID>17</CONNECTION_ID>
  <STATUS>0</STATUS>
  <SQLTEXT>KILL QUERY 16</SQLTEXT>
  <USER>taras[taras] @ localhost []</USER>
  <HOST>localhost</HOST>
  <OS_USER></OS_USER>
  <IP></IP>
  <DB></DB>
</AUDIT_RECORD>

It shows the hostname, user, and time when the connection got killed.

2. Big Data Chunk Transfer

For example, when using BLOB fields to store binary data in a table or there is an INSERT statement containing a lot of rows. It may happen when using the MySQL CLI client (one of the cases being loading an SQL dump), or it can happen within an application when it tries to store the BLOB data (for example, from a file upload).

There is a limit MySQL imposes on the amount of data that can be transmitted per query, and the max_allowed_packet variable defines it.

So, in both cases, we need to determine which table the data is being written to, for instance, grepping the SQL file for INSERT INTO statements and implementing logging on the application end. This way, the statement will be stored along with the error that prevented it from completing. A partial statement can be captured (as BLOBs could be a burden to log), but as long as there is a table name, it is possible to check the table structure and see if it does contain binary data.

Example of an INSERT statement with binary data (truncated):

INSERT INTO t1 VALUES (1, x'89504....82’)

To allow for a larger query execution, the variable needs to be adjusted:

SET GLOBAL max_allowed_packet = 128M ;

The variable can be set per session or globally, depending on the use case.

3. The Connection Was Closed by Timeout

It is trivial, but applications can be reusing already-established connections. During the time of inactivity or lower traffic, it is possible some connections will not be used for a while and closed on the MySQL end. It is traced best with the application logging; if there is an event that happened in the evening followed by a period of inactivity and then the error in the morning, it is very likely that MySQL closed the connection.

mysql> SET SESSION wait_timeout = 5 ;
Query OK, 0 rows affected (0.00 sec)

Wait for 5 seconds:

mysql> select 1 ;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 16
Current database: *** NONE ***

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

Usually, the connection is re-established, and the application continues normal operations; yet it is always possible to extend the timeout in MySQL configuration:

SET GLOBAL wait_timeout = 57600 ;

The default value for the variable is 28800 seconds (8 hours), which is enough in most cases.

Also, closing connections cleanly from an application end, after a period of inactivity, eliminates this problem.

4. MySQL Server Has Actually Gone Away

This one is probably the worst possible scenario when MySQL crashes on a query or due to some other reason, e.g., the OOM killer killed the process. However, it can be caused by a clean restart, too.

In this case, one should check MySQL uptime and the logs, MySQL error log, and syslog. Those should indicate whether the server restart occurred and if there was an error leading to the restart.

In case the server did crash, it is time to find the actual cause. Check the bug tracker, as the issue might have been reported and possibly fixed already; upgrade MySQL if needed. In case it was a clean restart, check if auto-updates are enabled or if someone else restarted the service interactively (yes, lack of communication is a thing too).


by Taras Onishchuk via Percona Database Performance Blog

Comments

Popular posts from this blog