A Simple Approach to Troubleshooting High CPU in MySQL
One of our customers recently asked whether it is possible to identify, from the MySQL side, the query that is causing high CPU usage on his system. The usage of simple OS tools to find the culprit has been a widely used technique for a long time by PostgreSQL and Oracle DBAs, but it didn’t work for MySQL as historically we’ve lacked the instrumentation to match an OS thread with an internal processlist thread – until recently.
Percona added support to map processlist ids to OS thread ids through column TID of the information_schema.processlist table starting on Percona Server for MySQL 5.6.27. With the release of 5.7, MySQL followed with its own implementation by extending the PERFORMANCE_SCHEMA.THREADS table and adding a new column named THREAD_OS_ID, which Percona Server for MySQL adopted in place of its own, as it usually does remain as close to upstream as possible.
The following approach is useful for cases where there is a query overloading one particular CPU while other cores are performing normally. For cases where it is a general CPU usage issue, different methods can be used, such as the one in this other blog post Reducing High CPU on MySQL: A Case Study.
How can we use this new column to find out which session is using the most CPU resources in my database?
Let’s use an example:
To troubleshoot CPU issues, we can use several tools, such as top or pidstat (requires the sysstat package). In the following example, we will use pidstat. The tool has an option (-t) that changes its view from process (the default) to threads, where it shows the associated threads within a given process. We can use it to find out which thread is consuming the most CPU in our server. Adding the -p parameter along with the mysql process id so the tool only shows MySQL threads, making it easier for us to troubleshoot. The last parameter (1) is to display one sample per second:
The command is pidstat -t -p <mysqld_pid> 1:
shell> pidstat -t -p 31258 1 03:31:06 PM UID TGID TID %usr %system %guest %CPU CPU Command [...] 03:31:07 PM 10014 - 32039 5.00 1.00 0.00 6.00 22 |__mysqld 03:31:07 PM 10014 - 32040 5.00 1.00 0.00 6.00 23 |__mysqld 03:31:07 PM 10014 - 32042 6.00 1.00 0.00 7.00 8 |__mysqld 03:31:07 PM 10014 - 32047 5.00 1.00 0.00 6.00 6 |__mysqld 03:31:07 PM 10014 - 32048 5.00 1.00 0.00 6.00 15 |__mysqld 03:31:07 PM 10014 - 32049 5.00 1.00 0.00 6.00 14 |__mysqld 03:31:07 PM 10014 - 32052 5.00 1.00 0.00 6.00 14 |__mysqld 03:31:07 PM 10014 - 32053 94.00 0.00 0.00 94.00 9 |__mysqld 03:31:07 PM 10014 - 32055 4.00 1.00 0.00 5.00 10 |__mysqld 03:31:07 PM 10014 - 4275 5.00 1.00 0.00 6.00 10 |__mysqld 03:31:07 PM 10014 - 4276 5.00 1.00 0.00 6.00 7 |__mysqld 03:31:07 PM 10014 - 4277 6.00 1.00 0.00 7.00 15 |__mysqld 03:31:07 PM 10014 - 4278 5.00 1.00 0.00 6.00 18 |__mysqld 03:31:07 PM 10014 - 4279 5.00 1.00 0.00 6.00 10 |__mysqld 03:31:07 PM 10014 - 4280 5.00 1.00 0.00 6.00 12 |__mysqld 03:31:07 PM 10014 - 4281 5.00 1.00 0.00 6.00 11 |__mysqld 03:31:07 PM 10014 - 4282 4.00 1.00 0.00 5.00 2 |__mysqld 03:31:07 PM 10014 - 35261 0.00 0.00 0.00 0.00 4 |__mysqld 03:31:07 PM 10014 - 36153 0.00 0.00 0.00 0.00 5 |__mysqld
We can see that the thread 32053 is consuming the most CPU by quite a large margin, and we made sure to verify that the consumption is constant across multiple samples of pidstat. Using that information, we can log into the database and use the following query to find out which MySQL Thread is the culprit:
mysql > select * from performance_schema.threads where THREAD_OS_ID = 32053 \G *************************** 1. row *************************** THREAD_ID: 686 NAME: thread/sql/one_connection TYPE: FOREGROUND PROCESSLIST_ID: 590 PROCESSLIST_USER: msandbox PROCESSLIST_HOST: localhost PROCESSLIST_DB: NULL PROCESSLIST_COMMAND: Query PROCESSLIST_TIME: 0 PROCESSLIST_STATE: Sending data PROCESSLIST_INFO: select * from test.joinit where b = 'a a eveniet ut.' PARENT_THREAD_ID: NULL ROLE: NULL INSTRUMENTED: YES HISTORY: YES CONNECTION_TYPE: SSL/TLS THREAD_OS_ID: 32053 1 row in set (0.00 sec)
There we go! Now we know that the high CPU consumption comes from a query in the joinit table, being executed by the user msandbox from localhost in the database test. Using this information we can troubleshoot the query and check the execution plan with the EXPLAIN command to see if there is any room for improvement.
mysql > explain select * from test.joinit where b = 'a a eveniet ut.' \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: joinit partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 7170836 filtered: 10.00 Extra: Using where 1 row in set, 1 warning (0.00 sec)
In this case, it was a simple index that was missing!
mysql > alter table test.joinit add index (b) ; Query OK, 0 rows affected (15.18 sec) Records: 0 Duplicates: 0 Warnings: 0
After we create the index, we are no longer seeing CPU spikes:
shell> pidstat -t -p 31258 1 03:37:53 PM UID TGID TID %usr %system %guest %CPU CPU Command [...] 03:37:54 PM 10014 - 32039 25.00 6.00 0.00 31.00 0 |__mysqld 03:37:54 PM 10014 - 32040 25.00 5.00 0.00 30.00 21 |__mysqld 03:37:54 PM 10014 - 32042 25.00 6.00 0.00 31.00 20 |__mysqld 03:37:54 PM 10014 - 32047 25.00 4.00 0.00 29.00 23 |__mysqld 03:37:54 PM 10014 - 32048 25.00 7.00 0.00 32.00 22 |__mysqld 03:37:54 PM 10014 - 32049 23.00 6.00 0.00 29.00 4 |__mysqld 03:37:54 PM 10014 - 32052 23.00 7.00 0.00 30.00 14 |__mysqld 03:37:54 PM 10014 - 32053 10.00 2.00 0.00 12.00 11 |__mysqld 03:37:54 PM 10014 - 32055 24.00 6.00 0.00 30.00 1 |__mysqld 03:37:54 PM 10014 - 4275 25.00 6.00 0.00 31.00 7 |__mysqld 03:37:54 PM 10014 - 4276 25.00 6.00 0.00 31.00 1 |__mysqld 03:37:54 PM 10014 - 4277 24.00 5.00 0.00 29.00 14 |__mysqld 03:37:54 PM 10014 - 4278 24.00 6.00 0.00 30.00 9 |__mysqld 03:37:54 PM 10014 - 4279 25.00 5.00 0.00 30.00 6 |__mysqld 03:37:54 PM 10014 - 4280 26.00 5.00 0.00 31.00 14 |__mysqld 03:37:54 PM 10014 - 4281 24.00 6.00 0.00 30.00 10 |__mysqld 03:37:54 PM 10014 - 4282 25.00 6.00 0.00 31.00 10 |__mysqld 03:37:54 PM 10014 - 35261 0.00 0.00 0.00 0.00 4 |__mysqld 03:37:54 PM 10014 - 36153 0.00 0.00 0.00 0.00 5 |__mysqld
Why not use this approach to troubleshoot IO and Memory issues?
The issue with measuring thread IO from the OS side is that most of the MySQL IO operations are done by background threads, such as read, write, and page cleaner threads. To measure thread IO you can use tools like pidstat with the -d (IO instead of CPU) option or iostat with -H (per thread). If you have a very IO-consuming thread you might be able to see it, but be warned that the results can be misleading because of the background thread operations.
Memory consumption is an even trickier resource to measure from the OS side, as all memory is allocated under the MySQL process, and since it is MySQL that manages its memory access, it is transparent for the OS which thread is consuming the most memory. For this, we can use the perfomance_schema memory instrumentation available starting in 5.7.
Conclusion
There are many approaches to troubleshooting high CPU usage, but here we present a simple, widely-used approach on Oracle and PostgreSQL databases that can be adapted to MySQL, starting from version 5.7. By tracking down from the OS thread consumption to the database side, we can quickly detect CPU intensive queries that are affecting the system performance.
Correctly understanding the true cause of database performance problems allows for a quick and efficient resolution – yet enterprises often lack this crucial information. Without it, your solution could require more time and resources than necessary, or inefficiently address the issue. And contrary to popular belief, the problem is not always the database itself!
Download: Top 5 Causes of Poor Database Performance
by Juan Arruti via Percona Database Performance Blog
Comments
Post a Comment