Percona Monthly Bug Report: October 2020

Percona Monthly Bug Report October 2020

Percona Monthly Bug Report October 2020At Percona, we operate on the premise that full-transparency makes a product better. We strive to build the best open-source database products, but also to help you manage any issues that arise in any of the databases that we support. And, in true open-source form, report back on any issues or bugs you might encounter along the way.

We constantly update our bug reports and monitor other boards to ensure we have the latest information, but we wanted to make it a little easier for you to keep track of the most critical ones. This monthly post is a central place to get information on the most noteworthy open and recently resolved bugs. 

In this October 2020 edition of our monthly bug report, we have the following list of bugs:

Percona Server/MySQL Bugs

PS-7300 (MySQL#98869):  Session temporary tablespace truncation on connection disconnect causes high CPU usage. Only affected the MySQL server with a large buffer pool ( tested with ~150GB and above BP size).

Affects Version/s: 8.0  [Tested/Reported version 8.0.19, 8.0.20, 8.0.21]

Fixed Version/s: PS-8.0.22

 

PS-5641 (MySQL#95863): Deadlock on MTS replica while running administrative commands. This is a design flaw in the way MTS locks interact with non-replication locks like MDL or ACL locks, and the lack of deadlock detection infrastructure in replication.  

There are three ways of hitting this issue:

  • SET GLOBAL read_only=[ON/OFF]
  • SET GLOBAL super_read_only=[ON/OFF]
  • FLUSH TABLE WITH READ LOCK

 Affects Version/s: 5.7,8.0  [Tested/Reported version 5.7.26, 8.0.15,8.0.20]

 

MySQL#91977: Dropping Large Table Causes Semaphore Waits, due which no Other Work Possible.

Affects Version/s: 5.7,8.0  [Tested/Reported version 5.7.23, 8.0.12]

Running DROP/ALTER on the large table could lead to this bug. DROP/ALTER query will be stuck in ‘checking permissions’ state  and later it may crash mysqld due to long semaphore

Wait. It’s critical since it can result in unplanned downtime. The issue is also evident with the pt-online-schema-change tool while performing ALTER operations on large tables.

This is the design flow, due to such InnoDB protection for semaphore all naturally long DDL operations are at the risk. To address this issue there is a Feature Request(MySQL#92044) to have a configurable timeout for long semaphore wait until the server dies.

 

Percona XtraDB Cluster

PXC-3366: Upgrading SST traffic enabled  PXC 5.7 to PXC 8.0 will not show any status/variable, it will give empty output always.

Also taking backup of this upgraded PXC 8.0 node using xtrabackup will fail with a crash due to this bug.

Affects Version/s: 8.0   [Tested/Reported 8.0.19]

 

PXC-3456:   If wsrep_node_address value specified as hostname, Joiner node failed to join using SST. The workaround for the problem is to use an IP Address for wsrep_node_address.

Affects Version/s: 8.0   [Tested/Reported 8.0.20]

Fixed Version: 8.0.20-11.3

 

PXC-3396: SST failing because of the wrong binlog name.

Setting the log-bin name explicitly with the data-dir path will result in SST failure. 

Workaround for this issue is, Use different directories for binlog other than data dir and SST will work fine. Also, I do not see this issue with default binlog settings which will create binlogs in data-dir with the name “binlog.*” 

Affects Version/s: 8.0   [Tested/Reported 8.0.20-11.2]

 

PXC-3418: Cluster node locked up with alter table(even when using pt-online-schema-changes) and concurrent RW load. It is possible to hit a writer node deadlock when ALTER TABLE is run a few times on another node, with a concurrent read/write load on the writer node.

Affects Version/s: 5.7   [Tested/Reported 5.7.31]

 

PXC-3373: [ERROR] WSREP: Certification exception: Unsupported key prefix: ^B: 71 (Protocol error) at galera/src/key_set.cpp:throw_bad_prefix():152

Affects Version/s: 5.7   [Tested/Reported 5.7.30]

IST to the lower version node will fail with this error when the write load is running on the Donor node. Rolling upgrade is the most commonly used method resulting in a different version of pxc nodes for while and in such cases, the user can experience this issue easily.

Possible workaround,

  • Upgrade pxc node to have the same version across the cluster.
  • Stop write load on the donor node while IST is running.

 

Percona XtraBackup

PXB-2237: PXB crashes during a backup when an encrypted table is updated

Affects Version/s:  2.4  [Tested/Reported version 2.4.20]

Databases with encrypted tables are affected by this bug. As a workaround, Taking backup in non-peak hours could avoid this crash Or block write access to encrypted tables

 

PXB-2178: Restoring datadir from partial backup results in an inconsistent data dictionary

 Affects Version/s:  8.0  [Tested/Reported version 8.0.11

As a result of this bug after restored, you will see additional database/s which were not part of a partial backup. Issue evident only in Xtrabackup 8.0 due to new data dictionary implementation in MySQL 8.0 version, this issue is not reproducible with xtrabackup 2.4. 

The workaround for this issue is to use “DROP DATABASE IF EXISTS” for dropping unwanted extra database/s.

 

Percona Toolkit

PT-1570: Percona Toolkit tools fail to detect columns with the word GENERATED as part of the comment.

The issue is actually much more severe for other percona toolkit tools using the same TableParser.pm package- like pt-online-schema-change and pt-archiver. Basically, with the old package, the tool will skip the column that has ‘generated’ word in a comment in the copying process. This will result in data loss.

Therefore, it is very important to use the toolkit version 3.0.11 or higher to avoid that serious problem.

Affects Version/s:  3.0.10

Fixed Version/s: 3.0.11

 

PT-1747: pt-online-schema-change was bringing the database into a broken state when applying the “rebuild_constraints” foreign keys modification method if any of the child tables were blocked by the metadata lock.

Affects Version/s:  3.x   [Tested/Reported version 3.0.13]

Critical bug since it can cause data inconsistency in the user environment. It potentially affects who rebuilds tables with foreign keys.

 

PT-169:  pt-online-schema-change remove the old and new table.

It is a critical bug, It destroys the original table when the pt-online-schema-change tool fails. For example, in a case, an attempt to alter data type on parent table not only failed when a child table refers to it (FK) but resulted in the loss (drop) of both tables.

Affects Version/s:  3.2.1   [Tested/Reported version 3.2.1]

Fixed version: 3.3.0

 

PT-1853: pt-online-schema-change doesn’t handle self-referencing foreign keys properly

When using pt-osc to change a table that has a self FK pt-osc creates the FK pointing to the old table instead of pointing to the _new table. Because of it, pt-osc needs to rebuild the FK after swapping the tables (DROP the FK and recreating it again pointing to the _new table). This can cause issues because the INPLACE algorithm is supported when foreign_key_checks is disabled. Otherwise, only the COPY algorithm is supported.

Affects Version/s:  3.x  [Tested/Reported version 3.2.0]

Fixed Version/s: 3.2.1

Affects who rebuild tables with a self-referencing foreign key.

 

PMM  [Percona Monitoring and Management]

PMM-4547: MongoDB dashboard replication lag count incorrect.

With a hidden replica (with/without any delay specified). Another replica will show a delay of the max unsigned integer value (about 136 years).

The problem here is that sometimes, MongoDB reports (in getDiagnosticData) that the primary behind the secondary.

Since timestamps are unsigned ints, subtracting a bigger number produces an overflow and that’s why we see 100+ years lag.

Affects Version/s:  2.9  [Tested/Reported version 2.0,2.9.0]

Fixed Version: PMM-2.12.0

 

PMM-6733: QAN explain, example, table not working properly when using –query-source=perfschema.

This a documentation bug, In the reported case query Examples, Explain and Tables tab in PMM QAN showing error since Performance_Schema events_statements_history consumer was not enabled in performance_schema( In MySQL 5.6 it’s by default disabled).  Enabling events_statements_history will fix the issue for new queries.

Affects Version/s:  Documentation Bug

Fixed version/s: 2.10.0

 

PMM-5823: pmm-server log download and api to get the version failed with timeout

Occurring at irregular intervals and only affected to pmm docker installation with no external internet access from pmm-server docker container. The issue is only visible for a while(around 5-10 mins) after starting pmm-server later you will not see this issue.

Affects Version/s:  2.x  [Tested/Reported version 2.2]

Fixed version: 2.12.0

 

Summary

We welcome community input and feedback on all our products. If you find a bug or would like to suggest an improvement or a feature, learn how in our post, How to Report Bugs, Improvements, New Feature Requests for Percona Products.

For the most up-to-date information, be sure to follow us on Twitter, LinkedIn, and Facebook.

Quick References:

Percona JIRA  

MySQL Bug Report

Report a Bug in a Percona Product

___

About Percona:

As the only provider of distributions for all three of the most popular open source databases—PostgreSQL, MySQL, and MongoDB—Percona provides expertise, software, support, and services no matter the technology.

Whether its enabling developers or DBAs to realize value faster with tools, advice, and guidance, or making sure applications can scale and handle peak loads, Percona is here to help.

Percona is committed to being open source and preventing vendor lock-in. Percona contributes all changes to the upstream community for possible inclusion in future product releases.


by Lalit Choudhary via Percona Database Performance Blog

Comments