Rate Limit (Throttle) for MySQL with ProxySQL

Rate Limit (Throttle) for MySQL with ProxySQL

Rate Limit (Throttle) for MySQL with ProxySQLMaybe one of the more “obscure” operations when dealing with replica lag, or, in general, when one needs to control writes to the database, is the Rate limit. It’s also lately one of the most popular conversations around the community.

But what is it? In plain words: holding up queries for a while, giving air to the replicas to breath and catch up. Something similar to the Galera’s Flow Control mechanism, although flow control, when it kicks in, stops all the writes while the nodes catch up. With a throttle no write is stopped, just delayed.

There are several ways to do this. A popular tool is Freno but this is also something that can be achieved with ProxySQL. Let’s see how.

Delay

ProxySQL has a variable called “mysql-default_query_delay”  that is pretty self-explanatory. It will add a delay, in milliseconds, to all the queries passing through ProxySQL.

Now, the trick is to monitor the replica lag and apply the delay. Something that in this case is solved with a small bash script:

#/bin/bash

set -o pipefail

proxysqlhost=127.0.0.1
proxysqladmport=6032
proxysqluser=admin
proxysqlpass=admin

replicahost=mysql2
replicauser=pmm
replicapass=pmm

lagThreshold=3


function check_lag() {
         lag=$(mysql -h$replicahost -u$replicauser -p$replicapass -e "SHOW SLAVE STATUS\G" | grep  "Seconds_Behind_Master" | awk -F\: '{print $2}' 2>&1)
}

function modifyProxysqlVariable () {
        out=$(mysql -u$proxysqluser -p$proxysqlpass -h$proxysqlhost -P6032 -e"set mysql-default_query_delay = $delay ; LOAD MYSQL VARIABLES TO RUNTIME;" 2>&1)
        echo $out
}

function destructor () {
        delay=0
        modifyProxysqlVariable
        echo "bye"
}

trap destructor EXIT INT TERM

while true; do

        check_lag
        echo $lag
        if [ $lag -ge $lagThreshold ]; then
                delay=1
                modifyProxysqlVariable
        elif [ $lag -eq 0 ]; then
                delay=0
                modifyProxysqlVariable
        fi
        sleep 0.001
done

Replica lag will be monitored on a loop, and when it’s bigger than the threshold, 1mS of delay will be added overall. And after lag is under control, the delay is removed.

For testing purposes, I have set up a basic Primary/Replica environment, running with sysbench on high concurrency setup to make the replica lag on purpose. I collected some graphs with Percona Monitoring and Management,  and here’s how it looks:

Rate Limit MySQL

The above graph shows the lag constantly growing until the script starts to run and the replica eventually catches up. Fantastic! Until you see the Questions graphs and it looks very….flat

throttle MySQL

So what happened is that until the lag reached the threshold (in this case, 3 seconds) all the traffic was delayed. Once the lag is under control, traffic is back but not at 100% since the throttle script continues to run. So it maintains the replica up to date at a cost of rate-limiting the writes.

And the above graph shows the same (questions) but from the ProxySQL dashboard perspective.

Now, what’s the alternative?

Delay on Rules

ProxySQL query rules can be set with delays per rule. Since the idea is to add less overhead, a good approach is to check which query rule is the least used and see if by adding a delay to it we can control the lag.

Admin> SELECT hits, mysql_query_rules.rule_id, match_digest, match_pattern, replace_pattern, cache_ttl, apply, delay FROM mysql_query_rules NATURAL JOIN stats.stats_mysql_query_rules ORDER BY mysql_query_rules.rule_id;
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
| hits      | rule_id | match_digest | match_pattern         | replace_pattern | cache_ttl | apply | delay |
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
| 0         | 1       | NULL         | ^SELECT .* FOR UPDATE | NULL            | NULL      | 0     | NULL  |
| 609897074 | 2       | NULL         | ^SELECT .*            | NULL            | NULL      | 0     | NULL  |
| 87128148  | 3       | NULL         | ^UPDATE .*            | NULL            | NULL      | 0     | 0     |
| 43561905  | 4       | NULL         | ^INSERT .*            | NULL            | NULL      | 0     | 0     |
| 43562935  | 5       | NULL         | ^DELETE .*            | NULL            | NULL      | 0     | 0     |
| 0         | 6       | NULL         | ^REPLACE .*           | NULL            | NULL      | 0     | NULL  |
+-----------+---------+--------------+-----------------------+-----------------+-----------+-------+-------+
6 rows in set (0.01 sec)

From that query, we can see that the rule with id 5, the one that routes the DELETEs, is the least used from all the rules that hit the Primary.

We just need to make a small adjustment to the script to modify the mysql_query_rules table, by adding a proper function. The full script with the modification looks like this:

#/bin/bash

set -o pipefail

proxysqlhost=127.0.0.1
proxysqladmport=6032
proxysqluser=admin
proxysqlpass=admin

replicahost=mysql2
replicauser=pmm
replicapass=pmm

lagThreshold=3

ruleId=5

function destructor () {
        delay=0
        modifyRuleDelay
        echo "bye"
}

function check_lag() {
         lag=$(mysql -h$replicahost -u$replicauser -p$replicapass -e "SHOW SLAVE STATUS\G" | grep  "Seconds_Behind_Master" | awk -F\: '{print $2}' 2>&1)
}

function modifyRuleDelay () {
        out=$(mysql -u$proxysqluser -p$proxysqlpass -h$proxysqlhost -P6032 -e"update mysql_query_rules set delay = $delay where rule_id = $ruleId ; LOAD MYSQL QUERY RULES TO RUNTIME;" 2>&1)
}

trap destructor EXIT INT TERM

while true; do

        check_lag
        echo $lag
        if [ $lag -ge $lagThreshold ]; then
                delay=1
                modifyRuleDelay
        elif [ $lag -eq 0 ]; then
                delay=0
                modifyRuleDelay
        fi
        sleep 0.001
done

Now, running the script we can see that the same behavior happens. Lag is controlled:

rate limit for Proxsql

What about the Questions? Well, now the drop is not complete, and some traffic can still be routed and when the replica catches up, the behavior is the same as a small reduction, being that the cost of keeping the lag controlled.

ProxySQL reports the same.

Now, we have talked about controlling lag but pretty much any status variable can be monitored. Perhaps your issues are with contention (Threads_running) or with IO-bound issues (InnoDB checkpoint age). Regardless of the variable, rate limit can be achievable with ProxySQL.


by Daniel Guzmán Burgos via Percona Database Performance Blog

Comments