Generating Numeric Sequences in MySQL

Generating Numeric Sequences in MySQL

Generating Numeric Sequences in MySQLWhat is the easiest way to generate a sequence of integers in MySQL? In other words, which “SELECT <something>” statement should I write to get 0, 1, 2, … N – 1?

This is the question I have been struggling with for years and it looks like I have finally got the answer (although I must admit I had to put some development efforts and add a few hundred lines to the server code).  Percona Server for MySQL 8.0.20-11 includes a new feature dedicated to solving exactly this problem.

However, before revealing all the secrets, let us first consider existing solutions. So, we want to get the following:

SELECT ???
+-------+
| value |
+-------+
|     0 |
|     1 |
|   ... |
| N - 1 |
+-------+
N rows in set (0.00 sec)

What are our options for doing so?

The Old School Way

Let us start with the most straightforward solutions.

UNION to the Rescue

It may sound a bit primitive but the simplest solution would be to combine the result of multiple SELECT statements into a single result set with UNION.

SELECT 0 AS value UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

Although this may work for smaller numbers, this solution is not very extensible.

Existing Table With a Unique Column

Let’s say that we already have a table with a unique column of any data type. For instance:

CREATE TABLE t1(id CHAR NOT NULL PRIMARY KEY);
Query OK, 0 rows affected (0.01 sec)

INSERT INTO t1 VALUES ('a'), ('b'), ('c'), ('d');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

SELECT * FROM t1;
+----+
| id |
+----+
| a  |
| b  |
| c  |
| d  |
+----+
4 rows in set (0.00 sec)

Now, we can join this table with a copy of itself and select the number of records for which id in the copy is less than or equal to the id in the original table.

SELECT COUNT(*) - 1 AS value FROM t1, t1 AS t2 WHERE t2.id <= t1.id GROUP BY t1.id;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+

The main drawback of this solution is its quadratic complexity on N that may cause significant resource utilization when N is big.

Session Variable Increment Within a SELECT

Provided that we already have a table t1 as in the previous example (although unique column constraint is not required here), we can join it with a single value SELECT that assigns initial value to a session variable. At the same time, for each record of the existing table, it will increment the value of this session variable.

SELECT (@val := @val + 1) - 1 AS value FROM t1, (SELECT @val := 0) AS tt;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set, 2 warnings (0.00 sec)

This one is not bad: it’s extensible, linear complexity on N does not introduce unnecessary overhead, and the only drawback is a requirement to have an existing table.

Joining Multiple Views

We can always join several tables (or views) that contain more than one record to multiply the total number of records in the result set.

CREATE VIEW binary_v AS SELECT 0 AS v UNION ALL SELECT 1;
Query OK, 0 rows affected (0.00 sec)

SELECT * FROM binary_v ORDER BY v;
+---+
| v |
+---+
| 0 |
| 1 |
+---+
2 rows in set (0.00 sec)

SELECT b0.v + b1.v * 2 + b2.v * 4 AS value FROM binary_v b0, binary_v b1, binary_v b2 ORDER BY value;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
|     4 |
|     5 |
|     6 |
|     7 |
+-------+
8 rows in set (0.00 sec)

Using the same approach, by intersecting K instances of binary_v we can generate a sequence of 2^K values. Similarly, we can create a view for digits and as the result get 10^K values.

CREATE VIEW decimal_v AS SELECT 0 AS v UNION ALL SELECT 1
 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9;
Query OK, 0 rows affected (0.00 sec)

SELECT * FROM decimal_v ORDER by v;
+---+
| v |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
10 rows in set (0.00 sec)

SELECT d0.v + d1.v * 10 + d2.v * 100 AS value 
  FROM decimal_v d0, decimal_v d1, decimal_v d2 ORDER BY value;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|   ... |
|   998 |
|   999 |
+-------+
1000 rows in set (0.01 sec)

Although this seems to be pretty easy to understand, the execution plan of such a query is definitely far from being perfect.

Classicism

Stored Procedures

Basically, before selecting, we can create a temporary table and fill it with the required numbers utilizing a pre-created stored procedure.

CREATE TEMPORARY TABLE t1 (value BIGINT UNSIGNED NOT NULL PRIMARY KEY);
Query OK, 0 rows affected (0.01 sec)

CALL generate_seq(4);
Query OK, 1 row affected (0.01 sec)

SELECT * FROM t1 ORDER BY value;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

The stored procedure itself can be defined as follows:

delimiter |
CREATE PROCEDURE generate_seq(n BIGINT UNSIGNED)
BEGIN
  DECLARE i BIGINT UNSIGNED DEFAULT 0;
  WHILE i < n DO
    INSERT INTO t1 VALUES(i);
    SET i = i + 1;
  END WHILE;
END|
delimiter ;

The execution plan for this approach is almost perfect, the only drawback is the necessity to call generate_seq() before using the sequence.

Prepared Statements

Let us try to automate the UNION-based solution a bit. Instead of manually repeating the UNION clause we can generate this statement dynamically.

SET @generated_stmt = generate_seq_stmt(4);
Query OK, 0 rows affected (0.00 sec)

PREPARE stmt1 FROM @generated_stmt;
Query OK, 0 rows affected (0.00 sec)
Statement prepared

EXECUTE stmt1;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

DEALLOCATE PREPARE stmt1;
Query OK, 0 rows affected (0.00 sec)

Where generate_seq_stmt() can be defined as follows:

delimiter |
CREATE FUNCTION generate_seq_stmt(n BIGINT UNSIGNED) RETURNS TEXT DETERMINISTIC
BEGIN
  DECLARE res TEXT DEFAULT 'SELECT 0 AS value';
  DECLARE i BIGINT UNSIGNED DEFAULT 1;
  WHILE i < n DO
    SET res = CONCAT(res, ' UNION ALL SELECT ', i);
    SET i = i + 1;
  END WHILE;
  RETURN res;
END|
delimiter ;

This seems to work, but the main disadvantage of this solution is that it cannot be embedded directly into more complex queries (unless the latter are also converted into prepared statements).

Sequence Storage Engine (MariaDB)

MariaDB, on the other hand, took a completely different approach. Instead of extending SQL syntax and inventing new constructs, in version 10.0, they implemented Sequence Storage Engine that creates completely virtual, ephemeral tables automatically when users need them. All you have to do is execute the following:

SELECT * FROM seq_0_to_3;
+-----+
| seq |
+-----+
|   0 |
|   1 |
|   2 |
|   3 |
+-----+
4 rows in set (0.00 sec)

This seems nice and clear, however, the idea of reserving an almost infinite number of table names (seq_<int1>_<int2> and seq_<int1>_<int2>_<int3>) for each database is not very appealing. For instance, you cannot do:

CREATE TABLE seq_1_to_100 (col INT) ENGINE = InnoDB;
ERROR 1050 (42S01): Table 'seq_1_to_100' already exists

This approach requires a lot of unnecessary error handling and additional branching in the server code. Their documentation also mentions a couple of tricks (like ALTER TABLE seq_1_to_100 ENGINE = BLACKHOLE) you can apply to get around certain problems, but in general, they just add more complexity and corner cases.
Personally, I like the syntax SELECT * FROM <virtual_sequence_generator>, but implementing this construct as a new storage engine was not the best design decision.

Modern Way

Recursive Common Table Expressions (CTE)

In MySQL Server 8.0.1, Oracle introduced Common Table Expressions (CTE), both non-recursive and recursive.

WITH RECURSIVE seq AS (SELECT 0 AS value UNION ALL SELECT value + 1 FROM seq WHERE value < 3)
  SELECT * FROM seq;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

Although this solution can be used for a pretty wide range of upper bounds N and seems to be exactly what we need, I doubt it is readable/easy to understand.

Starting from MySQL Server 8.0.19, you can simplify this query a bit using LIMIT instead of WHERE.

WITH RECURSIVE seq AS (SELECT 0 AS value UNION ALL SELECT value + 1 FROM seq LIMIT 4)
  SELECT * FROM seq;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

However, both of those solutions have a limitation. By default, upper bound N cannot be very high.

WITH RECURSIVE seq AS (SELECT 0 AS value UNION ALL SELECT value + 1 FROM seq LIMIT 1002)
  SELECT * FROM seq;
ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

Increasing cte_max_recursion_depth can shift this limitation though.

VALUES ROW(…), ROW(…) …

If you are a lucky one who has already upgraded to MySQL Server 8.0.19, you can use a VALUES Statement (a table value constructor that also functions as a standalone SQL statement).

VALUES ROW(0), ROW(1), ROW(2), ROW(3);
+----------+
| column_0 |
+----------+
|        0 |
|        1 |
|        2 |
|        3 |
+----------+
4 rows in set (0.00 sec)

This one is a bit easier than UNION-based but still lacks extensibility.

JSON_TABLE()

In MySQL Server 8.0.4 Oracle introduced a new JSON_TABLE() function that can extract data from a JSON document and return it as a relational table having the specified columns. But you may ask how can this even potentially be related to generating numerical sequences. Let’s consider the following example.

SELECT *
 FROM JSON_TABLE('[{"a":0},{"a":1},{"a":2},{"a":3}]',
                 "$[*]" COLUMNS(value BIGINT UNSIGNED PATH "$.a")
      ) AS tt;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

Here, we pass a simple JSON document to JSON_TABLE() function. That JSON-document is an array that consists of a series of objects with the predefined key “a”. We iterate over all elements of the array that match “$[*]” JSON path expression and extract the value of the “$.a” key into a column of type BIGINT UNSIGNED with the name value.
Although at this point the syntactic overhead is just overwhelming, I am starting to see the light at the end of the tunnel.

We can do better and improve our first JSON_TABLE() example.

SELECT tt.rowid - 1 AS value
  FROM JSON_TABLE('[{},{},{},{}]',
                  "$[*]" COLUMNS(rowid FOR ORDINALITY)
       ) AS tt;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

Here, we have an array of empty objects and use a special construct FOR ORDINALITY that is equivalent to specifying a column as AUTO_INCREMENT in a CREATE TABLE statement. Although I should note that we still have a predefined number of empty JSON objects in our array. That’s not enough – we have to go deeper.

SET @upper_bound = 4;
Query OK, 0 rows affected (0.00 sec)

SELECT tt.rowid - 1 AS value
  FROM JSON_TABLE(CONCAT('[{}', REPEAT(',{}', @upper_bound - 1), ']'),
                  "$[*]" COLUMNS(rowid FOR ORDINALITY)
       ) AS tt;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

We are almost there!

Post-Modern Way

Let us first summarize what we managed to achieve in the previous example.

  • Clear syntactic construct SELECT … FROM JSON_TABLE(…) AS tt (although function arguments are still non-trivial)
  • We made this construct generate a different number of rows depending on the value of the @upper_bound variable.
  • This construct can be used anywhere where any other derived table statement is allowed.
  • Not only may this construct be dependent on the value of a session variable, in case of joined tables, it may generate a different number of rows depending on the value from another table’s column.

SEQUENCE_TABLE()

In an ideal word it would be really great to have a function, say SEQUENCE_TABLE() that would behave identically to our JSON_TABLE(CONCAT(‘[{}’, REPEAT(‘,{}’, @upper_bound – 1), ‘]’), “$[*]” COLUMNS(rowid FOR ORDINALITY).
Unfortunately, MySQL Server 8.0.19 does not have such a function.

I must admit it would be really terrible from my side to finish this blog post at this point by saying “JSON_TABLE()-based solution is the best we’ve got so far. That’s all we can do, thanks for reading“, so I am not going to do so.
Instead, I am going to announce that in Percona Server 8.0.20-11 we implemented a new Percona-specific feature called ‘SEQUENCE_TABLE()’.

Long story short, now you can just write the following.

SELECT * FROM SEQUENCE_TABLE(4) AS tt;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

And “yes”, it is as easy and straightforward as you see it. Now, let us consider a bit more complicated examples.
What if we want to generate a sequence from 4 to 7 inclusive?

SELECT value AS result FROM SEQUENCE_TABLE(8) AS tt WHERE value >= 4;
+--------+
| result |
+--------+
|      4 |
|      5 |
|      6 |
|      7 |
+--------+
4 rows in set (0.00 sec)

Alternatively, you can write:

SELECT value + 4 AS result FROM SEQUENCE_TABLE(4) AS tt;
+--------+
| result |
+--------+
|      4 |
|      5 |
|      6 |
|      7 |
+--------+
4 rows in set (0.00 sec)

Another example, even numbers from 0 to 6 inclusive:

SELECT value AS result FROM SEQUENCE_TABLE(8) AS tt WHERE value % 2 = 0;
+--------+
| result |
+--------+
|      0 |
|      2 |
|      4 |
|      6 |
+--------+
4 rows in set (0.00 sec)

Alternatively:

SELECT value * 2 AS result FROM SEQUENCE_TABLE(4) AS tt;
+--------+
| result |
+--------+
|      0 |
|      2 |
|      4 |
|      6 |
+--------+
4 rows in set (0.00 sec)

Yet another example, numbers from 0 to 3 inclusive in reverse order:

SELECT value AS result FROM SEQUENCE_TABLE(4) AS tt ORDER BY value DESC;
+--------+
| result |
+--------+
|      3 |
|      2 |
|      1 |
|      0 |
+--------+
4 rows in set (0.00 sec)

Alternatively:

SELECT 3 - value AS result FROM SEQUENCE_TABLE(4) AS tt;
+--------+
| result |
+--------+
|      3 |
|      2 |
|      1 |
|      0 |
+--------+
4 rows in set (0.00 sec)

SEQUENCE_TABLE() can also be used to generate a set of random numbers:

SELECT FLOOR(RAND() * 100) AS result FROM SEQUENCE_TABLE(4) AS tt;
+--------+
| result |
+--------+
|      6 |
|     37 |
|     67 |
|     25 |
+--------+
4 rows in set (0.00 sec)

Please notice that usage patterns for SEQUENCE_TABLE() are not limited to numbers only. We can, for instance, generate a list of predefined string literals (convert a row into a column, if you wish).

SELECT ELT(value + 1, 'a', 'b', 'c', 'd') AS result FROM SEQUENCE_TABLE(4) AS tt;
+--------+
| result |
+--------+
| a      |
| b      |
| c      |
| d      |
+--------+
4 rows in set (0.00 sec)

Or the same but with repeating values:

SELECT ELT(value % 4 + 1, 'a', 'b', 'c', 'd') AS result FROM SEQUENCE_TABLE(8) AS tt;
+--------+
| result |
+--------+
| a      |
| b      |
| c      |
| d      |
| a      |
| b      |
| c      |
| d      |
+--------+
8 rows in set (0.00 sec)

Finally, this table function may also help with generating pseudo-random string values:

SELECT MD5(value) AS result FROM SEQUENCE_TABLE(4) AS tt;
+----------------------------------+
| result                           |
+----------------------------------+
| cfcd208495d565ef66e7dff9f98764da |
| c4ca4238a0b923820dcc509a6f75849b |
| c81e728d9d4c2f636f067f89cc14862c |
| eccbc87e4b5ce2fe28308fd9f2a7baf3 |
+----------------------------------+
4 rows in set (0.00 sec)

This construct can be used to fill an existing table:

CREATE TABLE t1 (id BIGINT UNSIGNED);
Query OK, 0 rows affected (0.00 sec)

INSERT INTO t1 SELECT * FROM SEQUENCE_TABLE(4) AS tt;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

SELECT * FROM t1;
+------+
| id   |
+------+
|    0 |
|    1 |
|    2 |
|    3 |
+------+
4 rows in set (0.00 sec)

Or even to create a new one with pre-filled values:

CREATE TABLE t1 AS SELECT * FROM SEQUENCE_TABLE(4) AS tt;
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

SELECT * FROM t1;
+-------+
| value |
+-------+
|     0 |
|     1 |
|     2 |
|     3 |
+-------+
4 rows in set (0.00 sec)

I am pretty sure there are a lot of other use cases (say, generating Fibonacci numbers or printing all prime numbers in a given range) and you will definitely be able to find a lot of your own.

Conclusion

In this blog post, I tried to show that SQL is a pretty powerful language and allows us to do a lot of exotic things. However, it’s very frustrating that in some very simple cases the only way to do what you want is using heavy artillery. The example with SEQUENCE_TABLE() that I demonstrated shows that if adding new functionality by extending server code is the only option remaining, don’t be scared – it is OK to do this, especially when you know what to do.


Our solution brief “Get Up and Running with Percona Server for MySQL” outlines setting up a MySQL® database on-premises using Percona Server for MySQL. It includes failover and basic business continuity components.

Download PDF


by Yura Sorokin via Percona Database Performance Blog

Comments

Popular posts from this blog