A Simple Approach to Troubleshooting High CPU in MySQL

High CPU in MySQL

High CPU in MySQLOne of our customers recently asked whether it is possible to identify, from the MySQL side, the query that is causing high CPU usage on his system. The usage of simple OS tools to find the culprit has been a widely used technique for a long time by PostgreSQL and Oracle DBAs, but it didn’t work for MySQL as historically we’ve lacked the instrumentation to match an OS thread with an internal processlist thread – until recently.

Percona added support to map processlist ids to OS thread ids through column TID of the information_schema.processlist table starting on Percona Server for MySQL 5.6.27. With the release of 5.7, MySQL followed with its own implementation by extending the PERFORMANCE_SCHEMA.THREADS table and adding a new column named THREAD_OS_ID, which Percona Server for MySQL adopted in place of its own, as it usually does remain as close to upstream as possible.

The following approach is useful for cases where there is a query overloading one particular CPU while other cores are performing normally. For cases where it is a general CPU usage issue, different methods can be used, such as the one in this other blog post Reducing High CPU on MySQL: A Case Study.

How can we use this new column to find out which session is using the most CPU resources in my database?

Let’s use an example:

To troubleshoot CPU issues, we can use several tools, such as top or pidstat (requires the sysstat package). In the following example, we will use pidstat. The tool has an option (-t) that changes its view from process (the default) to threads, where it shows the associated threads within a given process. We can use it to find out which thread is consuming the most CPU in our server. Adding the -p parameter along with the mysql process id so the tool only shows MySQL threads, making it easier for us to troubleshoot. The last parameter (1) is to display one sample per second:

The command is pidstat -t -p <mysqld_pid> 1:

shell> pidstat -t -p 31258 1
03:31:06 PM   UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
[...]
03:31:07 PM 10014         -     32039    5.00    1.00    0.00    6.00    22  |__mysqld
03:31:07 PM 10014         -     32040    5.00    1.00    0.00    6.00    23  |__mysqld
03:31:07 PM 10014         -     32042    6.00    1.00    0.00    7.00     8  |__mysqld
03:31:07 PM 10014         -     32047    5.00    1.00    0.00    6.00     6  |__mysqld
03:31:07 PM 10014         -     32048    5.00    1.00    0.00    6.00    15  |__mysqld
03:31:07 PM 10014         -     32049    5.00    1.00    0.00    6.00    14  |__mysqld
03:31:07 PM 10014         -     32052    5.00    1.00    0.00    6.00    14  |__mysqld
03:31:07 PM 10014         -     32053   94.00    0.00    0.00   94.00     9  |__mysqld
03:31:07 PM 10014         -     32055    4.00    1.00    0.00    5.00    10  |__mysqld
03:31:07 PM 10014         -      4275    5.00    1.00    0.00    6.00    10  |__mysqld
03:31:07 PM 10014         -      4276    5.00    1.00    0.00    6.00     7  |__mysqld
03:31:07 PM 10014         -      4277    6.00    1.00    0.00    7.00    15  |__mysqld
03:31:07 PM 10014         -      4278    5.00    1.00    0.00    6.00    18  |__mysqld
03:31:07 PM 10014         -      4279    5.00    1.00    0.00    6.00    10  |__mysqld
03:31:07 PM 10014         -      4280    5.00    1.00    0.00    6.00    12  |__mysqld
03:31:07 PM 10014         -      4281    5.00    1.00    0.00    6.00    11  |__mysqld
03:31:07 PM 10014         -      4282    4.00    1.00    0.00    5.00     2  |__mysqld
03:31:07 PM 10014         -     35261    0.00    0.00    0.00    0.00     4  |__mysqld
03:31:07 PM 10014         -     36153    0.00    0.00    0.00    0.00     5  |__mysqld

 

We can see that the thread 32053 is consuming the most CPU by quite a large margin, and we made sure to verify that the consumption is constant across multiple samples of pidstat. Using that information, we can log into the database and use the following query to find out which MySQL Thread is the culprit:

mysql > select * from performance_schema.threads where THREAD_OS_ID = 32053 \G
*************************** 1. row ***************************
          THREAD_ID: 686
               NAME: thread/sql/one_connection
               TYPE: FOREGROUND
     PROCESSLIST_ID: 590
   PROCESSLIST_USER: msandbox
   PROCESSLIST_HOST: localhost
     PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: Query
   PROCESSLIST_TIME: 0
  PROCESSLIST_STATE: Sending data
   PROCESSLIST_INFO: select * from test.joinit where b = 'a a eveniet ut.'
   PARENT_THREAD_ID: NULL
               ROLE: NULL
       INSTRUMENTED: YES
            HISTORY: YES
    CONNECTION_TYPE: SSL/TLS
       THREAD_OS_ID: 32053
1 row in set (0.00 sec)

 

There we go! Now we know that the high CPU consumption comes from a query in the joinit table, being executed by the user msandbox from localhost in the database test. Using this information we can troubleshoot the query and check the execution plan with the EXPLAIN command to see if there is any room for improvement.

mysql > explain select * from test.joinit where b = 'a a eveniet ut.' \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: joinit
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 7170836
     filtered: 10.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

 

In this case, it was a simple index that was missing!

mysql > alter table test.joinit add index (b) ;
Query OK, 0 rows affected (15.18 sec)
Records: 0  Duplicates: 0  Warnings: 0

 

After we create the index, we are no longer seeing CPU spikes:

shell> pidstat -t -p 31258 1
03:37:53 PM   UID      TGID       TID    %usr %system  %guest    %CPU   CPU  Command
[...]
03:37:54 PM 10014         -     32039   25.00    6.00    0.00   31.00     0  |__mysqld
03:37:54 PM 10014         -     32040   25.00    5.00    0.00   30.00    21  |__mysqld
03:37:54 PM 10014         -     32042   25.00    6.00    0.00   31.00    20  |__mysqld
03:37:54 PM 10014         -     32047   25.00    4.00    0.00   29.00    23  |__mysqld
03:37:54 PM 10014         -     32048   25.00    7.00    0.00   32.00    22  |__mysqld
03:37:54 PM 10014         -     32049   23.00    6.00    0.00   29.00     4  |__mysqld
03:37:54 PM 10014         -     32052   23.00    7.00    0.00   30.00    14  |__mysqld
03:37:54 PM 10014         -     32053   10.00    2.00    0.00   12.00    11  |__mysqld
03:37:54 PM 10014         -     32055   24.00    6.00    0.00   30.00     1  |__mysqld
03:37:54 PM 10014         -      4275   25.00    6.00    0.00   31.00     7  |__mysqld
03:37:54 PM 10014         -      4276   25.00    6.00    0.00   31.00     1  |__mysqld
03:37:54 PM 10014         -      4277   24.00    5.00    0.00   29.00    14  |__mysqld
03:37:54 PM 10014         -      4278   24.00    6.00    0.00   30.00     9  |__mysqld
03:37:54 PM 10014         -      4279   25.00    5.00    0.00   30.00     6  |__mysqld
03:37:54 PM 10014         -      4280   26.00    5.00    0.00   31.00    14  |__mysqld
03:37:54 PM 10014         -      4281   24.00    6.00    0.00   30.00    10  |__mysqld
03:37:54 PM 10014         -      4282   25.00    6.00    0.00   31.00    10  |__mysqld
03:37:54 PM 10014         -     35261    0.00    0.00    0.00    0.00     4  |__mysqld
03:37:54 PM 10014         -     36153    0.00    0.00    0.00    0.00     5  |__mysqld

Why not use this approach to troubleshoot IO and Memory issues?

The issue with measuring thread IO from the OS side is that most of the MySQL IO operations are done by background threads, such as read, write, and page cleaner threads. To measure thread IO you can use tools like pidstat with the -d (IO instead of CPU) option or iostat with -H (per thread). If you have a very IO-consuming thread you might be able to see it, but be warned that the results can be misleading because of the background thread operations.

Memory consumption is an even trickier resource to measure from the OS side, as all memory is allocated under the MySQL process, and since it is MySQL that manages its memory access, it is transparent for the OS which thread is consuming the most memory. For this, we can use the perfomance_schema memory instrumentation available starting in 5.7.

Conclusion

There are many approaches to troubleshooting high CPU usage, but here we present a simple, widely-used approach on Oracle and PostgreSQL databases that can be adapted to MySQL, starting from version 5.7. By tracking down from the OS thread consumption to the database side, we can quickly detect CPU intensive queries that are affecting the system performance.


Correctly understanding the true cause of database performance problems allows for a quick and efficient resolution – yet enterprises often lack this crucial information. Without it, your solution could require more time and resources than necessary, or inefficiently address the issue. And contrary to popular belief, the problem is not always the database itself!

Download: Top 5 Causes of Poor Database Performance


by Juan Arruti via Percona Database Performance Blog

Comments