Dangerous Edge Case Warning for Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-change

Percona Toolkit and pt-online-schema-changeRecently I was dealing with an unexpected issue raised by our Support customer, in which data became inconsistent after a schema change was applied.

After some investigation, it turned out that affected tables had a special word in the comments of some columns, which triggered an already known (and fixed) issue with the TableParser.pm library of Percona Toolkit.  The problem is that the customer was using an outdated Toolkit version, where pt-online-schema-change was using that buggy parser.

This bug applies only to Percona Toolkit versions up to 3.0.10, so if you have already 3.0.11 or newer installed, you can skip the rest of this post as these are no longer affected.

I am writing this post to warn every user of pt-online-schema-change who has not upgraded the toolkit, as the problem is potentially very dangerous and can lead to the silent loss of data. 

The problem can manifest in two ways. The first one, although confusing, is not really dangerous as the operation is canceled. It happens when the columns with culprit comments do not allow NULL values. For example:

CREATE TABLE `test_not_null` (
`id` int NOT NULL,
`add_id` int NOT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

Schema change operation on this one will look like this:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_not_null --print --alter "engine=InnoDB" --execute
(...)
Altering `test`.`test_not_null`...
Creating new table...
CREATE TABLE `test`.`_test_not_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) NOT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Created new table test._test_not_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_not_null_new` engine=InnoDB
Altered `test`.`_test_not_null_new` OK.
2020-09-30T21:25:22 Creating triggers...
2020-09-30T21:25:22 Created triggers OK.
2020-09-30T21:25:22 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/
2020-09-30T21:25:22 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_not_null_ins`
2020-09-30T21:25:22 Dropped triggers OK.
2020-09-30T21:25:22 Dropping new table...
DROP TABLE IF EXISTS `test`.`_test_not_null_new`;
2020-09-30T21:25:22 Dropped new table OK.
`test`.`test_not_null` was not altered.
2020-09-30T21:25:22 Error copying rows from `test`.`test_not_null` to `test`.`_test_not_null_new`: 2020-09-30T21:25:22 Copying rows caused a MySQL error 1364:
Level: Warning
Code: 1364
Message: Field 'add_id' doesn't have a default value
Query: INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_not_null_new` (`id`) SELECT `id` FROM `test`.`test_not_null` LOCK IN SHARE MODE /*pt-online-schema-change 1438 copy table*/

So the reason for the failed operation may be unclear, but at least no data gets damaged. A much worse result happens when the column with comment allows nulls:

CREATE TABLE `test_null` (
`id` int NOT NULL,
`add_id` int DEFAULT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB;

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
+----+--------+
| id | add_id |
+----+--------+
| 1  |      1 |
| 2  |      2 |
| 3  |      3 |
+----+--------+
3 rows in set (0.01 sec)

For this one, the schema change command runs without any errors:

$ ./pt-online-schema-change-3.0.10 u=msandbox,p=msandbox,h=localhost,S=/tmp/mysql_sandbox5735.sock,D=test,t=test_null --print --alter "engine=InnoDB" --execute
(...)
Altering `test`.`test_null`...
Creating new table...
CREATE TABLE `test`.`_test_null_new` (
`id` int(11) NOT NULL,
`add_id` int(11) DEFAULT NULL COMMENT 'my generated test case',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Created new table test._test_null_new OK.
Altering new table...
ALTER TABLE `test`.`_test_null_new` engine=InnoDB
Altered `test`.`_test_null_new` OK.
2020-09-30T21:28:11 Creating triggers...
2020-09-30T21:28:11 Created triggers OK.
2020-09-30T21:28:11 Copying approximately 3 rows...
INSERT LOW_PRIORITY IGNORE INTO `test`.`_test_null_new` (`id`) SELECT `id` FROM `test`.`test_null` LOCK IN SHARE MODE /*pt-online-schema-change 3568 copy table*/
2020-09-30T21:28:11 Copied rows OK.
2020-09-30T21:28:11 Analyzing new table...
2020-09-30T21:28:11 Swapping tables...
RENAME TABLE `test`.`test_null` TO `test`.`_test_null_old`, `test`.`_test_null_new` TO `test`.`test_null`
2020-09-30T21:28:11 Swapped original and new tables OK.
2020-09-30T21:28:11 Dropping old table...
DROP TABLE IF EXISTS `test`.`_test_null_old`
2020-09-30T21:28:11 Dropped old table `test`.`_test_null_old` OK.
2020-09-30T21:28:11 Dropping triggers...
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_del`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_upd`
DROP TRIGGER IF EXISTS `test`.`pt_osc_test_test_null_ins`
2020-09-30T21:28:11 Dropped triggers OK.
Successfully altered `test`.`test_null`.

But… the table data is not the same after:

mysql [localhost:5735] {msandbox} (test) > select * from test_null;
+----+--------+
| id | add_id |
+----+--------+
|  1 |   NULL |
|  2 |   NULL |
|  3 |   NULL |
+----+--------+
3 rows in set (0.00 sec)

Summarizing, it is essential to make sure you are using the up to date Percona Toolkit, especially the pt-online-schema-change tool, to avoid potential disaster. The current latest stable release, as of when I am writing the post, is version 3.2.1, and the fixed version for this particular bug, 3.0.11, was released in July 2018.

References:

https://www.percona.com/doc/percona-toolkit/3.0/release_notes.html
https://jira.percona.com/browse/PT-1570

 


by Przemysław Malkowski via Percona Database Performance Blog

Comments