Replication Between Two Percona XtraDB Clusters, GTIDs and Schema Changes

Replication Between Two Percona XtraDB Clusters

Replication Between Two Percona XtraDB ClustersI got this question on the “How to Avoid Pitfalls in Schema Upgrade with Percona XtraDB Cluster (PXC)” webinar and wanted to answer it in a separate post.

Will RSU have an effect on GTID consistency if replication PXC cluster to another cluster?

Answer for this: yes and no.

Galera assigns its own GTID for the operations, replicated to all nodes of the cluster. Such operations include DML (

INSERT/UPDATE/DELETE
 ) on InnoDB tables and DDL commands, executed with default TOI method. You can find more details on how GTIDs work in the Percona XtraDB Cluster in this blog post.

However, DDL commands, executed with RSU method, are applied locally and have their own, individual, GTID.

Let’s set up a replication between two PXC clusters and see how it works.

First, let’s use the default

wsrep_osu_method
  TOI and create three tables on each node of the source cluster:
node1> create table toi1(id int) engine=innodb;
Query OK, 0 rows affected (0,08 sec)

node1> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1
1 row in set (0,01 sec)

node2> create table toi2(id int) engine=innodb;
Query OK, 0 rows affected (0,07 sec)

node2> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-2
1 row in set (0,01 sec)

node3> create table toi3(id int) engine=innodb;
Query OK, 0 rows affected (0,07 sec)

node3> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3
1 row in set (0,01 sec)

You see that all GTIDs have the same UUID:

24f602ff-cb98-11ea-beb2-ba09d9a11266
  and the number of the transaction increase no matter which node is the source of the change.

All changes successfully replicate. As a result, the replica has received and applied GTIDs as can be seen in the

SHOW SLAVE STATUS
  output:
mysql> show slave status\G
*************************** 1. row ***************************
...
           Retrieved_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3
            Executed_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3
...

With RSU method we are executing DDL on each node while it is out of sync with the rest of the cluster. After the operation finishes nothing replicated to other nodes by itself, rather relies on the DBA to perform changes manually. Therefore GTID for such an operation uses local UUID:

node1> set wsrep_osu_method='rsu';
Query OK, 0 rows affected (0,00 sec)

node1> create table rsu(id int) engine=innodb;
Query OK, 0 rows affected (0,04 sec)

node1> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
25394777-cb98-11ea-a23a-98af65266957:1
1 row in set (0,00 sec)

node2> set wsrep_osu_method='rsu';
Query OK, 0 rows affected (0,00 sec)

node2> create table rsu(id int) engine=innodb;
Query OK, 0 rows affected (0,04 sec)

node2> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
322ff3eb-cb98-11ea-8a94-98af65266957:1
1 row in set (0,00 sec)

node3> set wsrep_osu_method='rsu';
Query OK, 0 rows affected (0,00 sec)

node3> create table rsu(id int) engine=innodb;
Query OK, 0 rows affected (0,04 sec)

node3> show global variables like 'gtid_executed'\G
*************************** 1. row ***************************
Variable_name: gtid_executed
        Value: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
3ab8cf00-cb98-11ea-b433-98af65266957:1
1 row in set (0,01 sec)

As you see that this same operation created GTIDs with three different UUIDs on the three nodes:

25394777-cb98-11ea-a23a-98af65266957
,
322ff3eb-cb98-11ea-8a94-98af65266957
  and
3ab8cf00-cb98-11ea-b433-98af65266957
 .

Replica cluster received GTID from the node which set up as a replication source:

mysql> show slave status\G
*************************** 1. row ***************************
...
           Retrieved_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
25394777-cb98-11ea-a23a-98af65266957:1
            Executed_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
25394777-cb98-11ea-a23a-98af65266957:1
...

So by default RSU does not generate any issue with GTID.

However, if later you need to perform a failover and setup any other node as a replication source node, the replica will try to apply local GTIDs and fail with an error:

mysql> stop slave;
Query OK, 0 rows affected (0,01 sec)

mysql> CHANGE MASTER TO master_host='127.0.0.1', master_port=13004, master_user='root', MASTER_AUTO_POSITION = 1;
Query OK, 0 rows affected, 1 warning (0,02 sec)

mysql> start slave;
Query OK, 0 rows affected (0,01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
...
        Relay_Master_Log_File: binlog.000004
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
...
                   Last_Errno: 1050
                   Last_Error: Error 'Table 'rsu' already exists' on query. Default database: 'test'. Query: 'create table rsu(id int) engine=
...
               Last_SQL_Errno: 1050
               Last_SQL_Error: Error 'Table 'rsu' already exists' on query. Default database: 'test'. Query: 'create table rsu(id int) engine=innodb'
...
           Retrieved_Gtid_Set: 322ff3eb-cb98-11ea-8a94-98af65266957:1
            Executed_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
25394777-cb98-11ea-a23a-98af65266957:1
...

The only solution here is to inject empty transaction instead of one, created by the RSU operation:

mysql> set gtid_next='322ff3eb-cb98-11ea-8a94-98af65266957:1';
Query OK, 0 rows affected (0,00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0,00 sec)

mysql> commit;
Query OK, 0 rows affected (0,01 sec)

mysql> set gtid_next='automatic';
Query OK, 0 rows affected (0,00 sec)

mysql> start slave;
Query OK, 0 rows affected (0,01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
...
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
...
           Retrieved_Gtid_Set: 322ff3eb-cb98-11ea-8a94-98af65266957:1
            Executed_Gtid_Set: 24f602ff-cb98-11ea-beb2-ba09d9a11266:1-3,
25394777-cb98-11ea-a23a-98af65266957:1,
322ff3eb-cb98-11ea-8a94-98af65266957:1
...

Conclusion

Operations in RSU mode create local GTIDs with UUID, different from the one which is used cluster-wide. They do not cause any error until you need to perform a failover and replace the current replication source with another node.


by Sveta Smirnova via Percona Database Performance Blog

Comments