Introduction to MySQL 8.0 Recursive Common Table Expression (Part 2)

MySQL 8.0 Recursive Common Table Expression

MySQL 8.0 Recursive Common Table ExpressionsThis is the second part of a two-articles series. In the first part, we introduced the Common Table Expression (CTE), a new feature available on MySQL 8.0 as well as Percona Server for MySQL 8.0. In this article, we’ll present the Recursive Common Table Expression. SQL is generally poor at recursive structures, but it is now possible on MySQL to write recursive queries. Before MySQL 8.0, recursion was possible only by creating stored routines.

What is a Recursive Common Table Expression?

A recursive CTE is one having a subquery that refers to its own name. It is particularly useful in the following cases:

  • To generate series
  • Hierarchical or tree-structured data traversal

Let’s see the main components of a recursive CTE. The following is the syntax to create it:

WITH RECURSIVE cte AS (
   initial_query    -- "seed" member
   UNION ALL
   recursive_query    -- recusive member that references to the same CTE name
)
SELECT * FROM cte;    -- main query

First of all, the clause RECURSIVE is mandatory, and then there are two mandatory components. The seed member is the initial query, the one that will be executed at the first iteration. The recursive member is the query containing the reference to the same CTE name. This second component will generate all the remaining items of the main query.

The process stops when an iteration does not generate any rows. Be aware of that in order to avoid generating a lot of iterations that can exhaust the memory.

It is important for recursive CTEs that the recursive member includes a condition to terminate the recursion. As a development technique you can force termination by placing a limit on execution time:

  • The cte_max_recursion_depth system variable enforces a limit on the number of recursion levels for CTEs. The server terminates the execution of any CTE that recurses more levels than the value of this variable. The default value is 1000.
  • The max_execution_time system variable enforces an execution timeout for SELECT statements executed within the current session.
  • The MAX_EXECUTION_TIME optimizer hint enforces a per-query execution timeout for the SELECT statement in which it appears.

 

Generate Series

Let’s see now some simple usage of Recursive CTE to generate series.

One-Level Sequence

First, create a simple series of integer numbers from 1 to 10. This a one-level sequence because the N+1 value is a function of the previous one N only.

WITH RECURSIVE natural_sequence AS
  ( SELECT 1 AS n       -- seed member: our sequence starts from 1
    UNION ALL
    SELECT n + 1 FROM natural_sequence    -- recursive member: reference to itself
    WHERE n < 10                          -- stop condition
  )
SELECT * FROM natural_sequence;           -- main query
+------+
| n    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
|   10 |
+------+

# let's see what happen if we miss the stop condition
mysql> WITH RECURSIVE natural_sequence AS ( SELECT 1 AS n   UNION ALL SELECT n + 1 FROM natural_sequence   ) SELECT * FROM natural_sequence;
ERROR 3636 (HY000): Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

Another typical example is calculating the factorial.

mysql> WITH RECURSIVE factorial(n, fact) AS ( 
          SELECT 0, 1 
          UNION ALL  
          SELECT n + 1, fact * (n+1)  
          FROM factorial 
          WHERE n < 20 ) 
       SELECT * from factorial;
+------+---------------------+
| n    | fact                |
+------+---------------------+
|    0 |                   1 |
|    1 |                   1 |
|    2 |                   2 |
|    3 |                   6 |
|    4 |                  24 |
|    5 |                 120 |
|    6 |                 720 |
|    7 |                5040 |
|    8 |               40320 |
|    9 |              362880 |
|   10 |             3628800 |
|   11 |            39916800 |
|   12 |           479001600 |
|   13 |          6227020800 |
|   14 |         87178291200 |
|   15 |       1307674368000 |
|   16 |      20922789888000 |
|   17 |     355687428096000 |
|   18 |    6402373705728000 |
|   19 |  121645100408832000 |
|   20 | 2432902008176640000 |
+------+---------------------+

 

Two-Level Sequence

In this case, we would like to create a two-level sequence where the N+2 value is a function of the two previous values N+1 and N.

The typical example here is the Fibonacci Series; each number is the sum of the two preceding ones, starting from 0 and 1.  Let’s calculate the first 20 items of the Fibonacci series.

mysql> WITH RECURSIVE fibonacci (n, fib_n, next_fib_n) AS (   
          SELECT 1, 0, 1   
          UNION ALL   
          SELECT n + 1, next_fib_n, fib_n + next_fib_n     
          FROM fibonacci 
          WHERE n < 20 ) 
       SELECT * FROM fibonacci;
+------+-------+------------+
| n    | fib_n | next_fib_n |
+------+-------+------------+
|    1 |     0 |          1 |
|    2 |     1 |          1 |
|    3 |     1 |          2 |
|    4 |     2 |          3 |
|    5 |     3 |          5 |
|    6 |     5 |          8 |
|    7 |     8 |         13 |
|    8 |    13 |         21 |
|    9 |    21 |         34 |
|   10 |    34 |         55 |
|   11 |    55 |         89 |
|   12 |    89 |        144 |
|   13 |   144 |        233 |
|   14 |   233 |        377 |
|   15 |   377 |        610 |
|   16 |   610 |        987 |
|   17 |   987 |       1597 |
|   18 |  1597 |       2584 |
|   19 |  2584 |       4181 |
|   20 |  4181 |       6765 |
+------+-------+------------+

 

Date Sequence

Let’s consider having a simple table containing our shop’s sales such as the following:

CREATE TABLE sales (
id INT AUTO_INCREMENT PRIMARY KEY,
order_date DATE,
product VARCHAR(20),
price DECIMAL(10,2));

# populate the table
INSERT INTO sales(order_date, product, price) 
VALUES('2020-02-01','DVD PLAYER',100.50),('2020-02-01','TV',399.99),('2020-02-02','LAPTOP',1249.00),
('2020-02-04','DISHWASHER',500.00),('2020-02-04','TV',699.00),('2020-02-06','LAPTOP',990.50),('2020-02-06','HAIRDRYER',29.90),
('2020-02-06','GAME CONSOLE',299.00),('2020-02-08','BOOK',9.00),('2020-02-08','REFRIGERATOR',600.00);

# let's run a query to generate the sales report by day
SELECT order_date, SUM(price) AS sales
FROM sales 
GROUP BY order_date;
+------------+---------+
| order_date | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

Notice, however, that our sales report has missing dates: Feb 3rd and Feb 5th. We would like to generate a report including even the dates with no sales.

A recursive CTE can help.

WITH RECURSIVE dates(date) AS (
   SELECT '2020-02-01' 
   UNION ALL
   SELECT date + INTERVAL 1 DAY 
   FROM dates
   WHERE date < '2020-02-07' )
SELECT dates.date, COALESCE(SUM(price), 0) sales
FROM dates LEFT JOIN sales ON dates.date = sales.order_date
GROUP BY dates.date;
+------------+---------+
| date       | sales   |
+------------+---------+
| 2020-02-01 |  500.49 |
| 2020-02-02 | 1249.00 |
| 2020-02-03 |    0.00 |
| 2020-02-04 | 1199.00 |
| 2020-02-05 |    0.00 |
| 2020-02-06 | 1319.40 |
| 2020-02-07 |  609.00 |
+------------+---------+

 

Hierarchical Data Traversal

Let’s take a look now at some other use cases for recursive CTE: a simple tree for an Org Chart, a more complex tree for family genealogy and a graph for train paths, of the following picture.

A Simple Tree: Org Chart

 

# create the table
CREATE TABLE orgchart(
id INT PRIMARY KEY,
name VARCHAR(20),
role VARCHAR(20),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES orgchart(id));

# insert the rows
INSERT INTO orgchart VALUES(1,'Matthew','CEO',NULL), 
(2,'Caroline','CFO',1),(3,'Tom','CTO',1),
(4,'Sam','Treasurer',2),(5,'Ann','Controller',2),
(6,'Anthony','Dev Director',3),(7,'Lousie','Sys Admin',3),
(8,'Travis','Senior DBA',3),(9,'John','Developer',6),
(10,'Jennifer','Developer',6),(11,'Maria','Junior DBA',8);

# let's see the table, The CEO has no manager, so the manager_id is set to NULL
SELECT * FROM orgchat;
+----+----------+--------------+------------+
| id | name     | role         | manager_id |
+----+----------+--------------+------------+
|  1 | Matthew  | CEO          |       NULL |
|  2 | Caroline | CFO          |          1 |
|  3 | Tom      | CTO          |          1 |
|  4 | Sam      | Treasurer    |          2 |
|  5 | Ann      | Controller   |          2 |
|  6 | Anthony  | Dev Director |          3 |
|  7 | Lousie   | Sys Admin    |          3 |
|  8 | Travis   | Senior DBA   |          3 |
|  9 | John     | Developer    |          6 |
| 10 | Jennifer | Developer    |          6 |
| 11 | Maria    | Junior DBA   |          8 |
+----+----------+--------------+------------+

 

Let’s run some queries using recursive CTE to traverse this kind of hierarchy.

# find the reporting chain for all the employees
mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, CAST(name AS CHAR(100))  
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
+------+----------+---------------------------------------+
| id   | name     | path                                  |
+------+----------+---------------------------------------+
|    1 | Matthew  | Matthew                               |
|    2 | Caroline | Matthew -> Caroline                   |
|    3 | Tom      | Matthew -> Tom                        |
|    4 | Sam      | Matthew -> Caroline -> Sam            |
|    5 | Ann      | Matthew -> Caroline -> Ann            |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |
|    8 | Travis   | Matthew -> Tom -> Travis              |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |
+------+----------+---------------------------------------+

Please note the usage of the CAST function on the “seed” member of the CTE. This was done on purpose. Let’s look what happens in case you don’t use the CAST function:

mysql> WITH RECURSIVE reporting_chain(id, name, path) AS ( 
          SELECT id, name, name 
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name) 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain;
ERROR 1406 (22001): Data too long for column 'path' at row 1

Why an error? The query is, in theory, correct, but the problem is that the type of column path is determined from the non-recursive SELECT only, and so it is CHAR(7) (Matthew length). On the recursive part of the CTE it would cause a character truncation, so: error!

Let’s look at a query to traverse the tree and calculate the level of the employees in the Org Chart.

mysql> WITH RECURSIVE reporting_chain(id, name, path, level) AS ( 
          SELECT id, name, CAST(name AS CHAR(100)), 1  
          FROM org_chart 
          WHERE manager_id IS NULL 
          UNION ALL 
          SELECT oc.id, oc.name, CONCAT(rc.path,' -> ',oc.name), rc.level+1 
          FROM reporting_chain rc JOIN org_chart oc ON rc.id=oc.manager_id) 
       SELECT * FROM reporting_chain ORDER BY level;
+------+----------+---------------------------------------+-------+
| id   | name     | path                                  | level |
+------+----------+---------------------------------------+-------+
|    1 | Matthew  | Matthew                               |     1 |
|    2 | Caroline | Matthew -> Caroline                   |     2 |
|    3 | Tom      | Matthew -> Tom                        |     2 |
|    4 | Sam      | Matthew -> Caroline -> Sam            |     3 |
|    5 | Ann      | Matthew -> Caroline -> Ann            |     3 |
|    6 | Anthony  | Matthew -> Tom -> Anthony             |     3 |
|    7 | Lousie   | Matthew -> Tom -> Lousie              |     3 |
|    8 | Travis   | Matthew -> Tom -> Travis              |     3 |
|    9 | John     | Matthew -> Tom -> Anthony -> John     |     4 |
|   10 | Jennifer | Matthew -> Tom -> Anthony -> Jennifer |     4 |
|   11 | Maria    | Matthew -> Tom -> Travis -> Maria     |     4 |
+------+----------+---------------------------------------+-------+

 

A More Complex Tree: Genealogy

Creating a table to represent the following genealogy with grandparents, parents, and sons.

CREATE TABLE genealogy(
id INT PRIMARY KEY,
name VARCHAR(20),
father_id INT,
mother_id INT,
FOREIGN KEY(father_id) REFERENCES genealogy(id),
FOREIGN KEY(mother_id) REFERENCES genealogy(id));

# populate the table
INSERT INTO genealogy VALUES(1,'Maria',NULL,NULL),
(2,'Tom',NULL,NULL),(3,'Robert',NULL,NULL),
(4,'Claire',NULL,NULL),(5,'John',2,1),
(6,'Jennifer',2,1),(7,'Sam',3,4),
(8,'James',7,6);

SELECT * FROM genealogy;
+----+----------+-----------+-----------+
| id | name     | father_id | mother_id |
+----+----------+-----------+-----------+
|  1 | Maria    |      NULL |      NULL |
|  2 | Tom      |      NULL |      NULL |
|  3 | Robert   |      NULL |      NULL |
|  4 | Claire   |      NULL |      NULL |
|  5 | John     |         2 |         1 |
|  6 | Jennifer |         2 |         1 |
|  7 | Sam      |         3 |         4 |
|  8 | James    |         7 |         6 |
+----+----------+-----------+-----------+

Let’s find all of James’s ancestors and the relationship:

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('son' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy  
          WHERE name='James' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                       END,
                       level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    8 | James    |         7 |         6 | son          |     0 |
|    6 | Jennifer |         2 |         1 | mother       |     1 |
|    7 | Sam      |         3 |         4 | father       |     1 |
|    1 | Maria    |      NULL |      NULL | grandmother  |     2 |
|    2 | Tom      |      NULL |      NULL | grandfather  |     2 |
|    3 | Robert   |      NULL |      NULL | grandfather  |     2 |
|    4 | Claire   |      NULL |      NULL | grandmother  |     2 |
+------+----------+-----------+-----------+--------------+-------+

Using the same query but changing the initial condition we can find out the ancestors of anyone in the hierarchy, for example, Jennifer:

mysql> WITH RECURSIVE ancestors AS ( 
          SELECT *, CAST('daughter' AS CHAR(20)) AS relationship, 0 level 
          FROM genealogy 
          WHERE name='Jennifer' 
          UNION ALL 
          SELECT g.*, CASE WHEN g.id=a.father_id AND level=0 THEN 'father' 
                           WHEN g.id=a.mother_id AND level=0 THEN 'mother' 
                           WHEN g.id=a.father_id AND level=1 THEN 'grandfather' 
                           WHEN g.id=a.mother_id AND level=1 THEN 'grandmother' 
                      END, 
                      level+1 
           FROM genealogy g, ancestors a 
           WHERE g.id=a.father_id OR g.id=a.mother_id) 
        SELECT * FROM ancestors;
+------+----------+-----------+-----------+--------------+-------+
| id   | name     | father_id | mother_id | relationship | level |
+------+----------+-----------+-----------+--------------+-------+
|    6 | Jennifer |         2 |         1 | daughter     |     0 |
|    1 | Maria    |      NULL |      NULL | mother       |     1 |
|    2 | Tom      |      NULL |      NULL | father       |     1 |
+------+----------+-----------+-----------+--------------+-------+

 

A Graph: Train Routes

Let’s create a graph representing train routes in Italy for the more important cities, from the image below:

 

Be aware of uni-directional and bi-directional connections. Each connection also has a distance in km.

CREATE TABLE train_route(
id INT PRIMARY KEY,
origin VARCHAR(20),
destination VARCHAR(20),
distance INT);

# populate the table
INSERT INTO train_route VALUES(1,'MILAN','TURIN',150),
(2,'TURIN','MILAN',150),(3,'MILAN','VENICE',250),
(4,'VENICE','MILAN',250),(5,'MILAN','GENOA',200),
(6,'MILAN','ROME',600),(7,'ROME','MILAN',600),
(8,'MILAN','FLORENCE',380),(9,'TURIN','GENOA',160),
(10,'GENOA','TURIN',160),(11,'FLORENCE','VENICE',550),
(12,'FLORENCE','ROME',220),(13,'ROME','FLORENCE',220),
(14,'GENOA','ROME',500),(15,'ROME','NAPLES',210),
(16,'NAPLES','VENICE',800);

SELECT * FROM train_route;
+----+----------+-------------+----------+
| id | origin   | destination | distance |
+----+----------+-------------+----------+
|  1 | MILAN    | TURIN       |      150 |
|  2 | TURIN    | MILAN       |      150 |
|  3 | MILAN    | VENICE      |      250 |
|  4 | VENICE   | MILAN       |      250 |
|  5 | MILAN    | GENOA       |      200 |
|  6 | MILAN    | ROME        |      600 |
|  7 | ROME     | MILAN       |      600 |
|  8 | MILAN    | FLORENCE    |      380 |
|  9 | TURIN    | GENOA       |      160 |
| 10 | GENOA    | TURIN       |      160 |
| 11 | FLORENCE | VENICE      |      550 |
| 12 | FLORENCE | ROME        |      220 |
| 13 | ROME     | FLORENCE    |      220 |
| 14 | GENOA    | ROME        |      500 |
| 15 | ROME     | NAPLES      |      210 |
| 16 | NAPLES   | VENICE      |      800 |
+----+----------+-------------+----------+

Returning all the train destinations with Milan as the origin:

mysql> WITH RECURSIVE train_destination AS ( 
          SELECT origin AS dest 
          FROM train_route 
          WHERE origin='MILAN'  
          UNION  
          SELECT tr.destination 
          FROM train_route tr 
          JOIN train_destination td ON td.dest=tr.origin) 
       SELECT * from train_destination;
+----------+
| dest     |
+----------+
| MILAN    |
| TURIN    |
| VENICE   |
| GENOA    |
| ROME     |
| FLORENCE |
| NAPLES   |
+----------+

Basically starting from any city, you can go wherever you want in Italy, but there are different paths. So let’s run a query to find out all the possible paths, and the total length of each, starting from Milan and Naples.

mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
           AND  NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-------------------------------------------------------+----------+--------------+
| cur_path                                              | cur_dest | tot_distance |
+-------------------------------------------------------+----------+--------------+
| MILAN                                                 | MILAN    |            0 |
| MILAN -> TURIN                                        | TURIN    |          150 |
| MILAN -> VENICE                                       | VENICE   |          250 |
| MILAN -> GENOA                                        | GENOA    |          200 |
| MILAN -> ROME                                         | ROME     |          600 |
| MILAN -> FLORENCE                                     | FLORENCE |          380 |
| MILAN -> TURIN -> GENOA                               | GENOA    |          310 |
| MILAN -> GENOA -> TURIN                               | TURIN    |          360 |
| MILAN -> GENOA -> ROME                                | ROME     |          700 |
| MILAN -> ROME -> FLORENCE                             | FLORENCE |          820 |
| MILAN -> ROME -> NAPLES                               | NAPLES   |          810 |
| MILAN -> FLORENCE -> VENICE                           | VENICE   |          930 |
| MILAN -> FLORENCE -> ROME                             | ROME     |          600 |
| MILAN -> TURIN -> GENOA -> ROME                       | ROME     |          810 |
| MILAN -> GENOA -> ROME -> FLORENCE                    | FLORENCE |          920 |
| MILAN -> GENOA -> ROME -> NAPLES                      | NAPLES   |          910 |
| MILAN -> ROME -> FLORENCE -> VENICE                   | VENICE   |         1370 |
| MILAN -> ROME -> NAPLES -> VENICE                     | VENICE   |         1610 |
| MILAN -> FLORENCE -> ROME -> NAPLES                   | NAPLES   |          810 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE           | FLORENCE |         1030 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES             | NAPLES   |         1020 |
| MILAN -> GENOA -> ROME -> FLORENCE -> VENICE          | VENICE   |         1470 |
| MILAN -> GENOA -> ROME -> NAPLES -> VENICE            | VENICE   |         1710 |
| MILAN -> FLORENCE -> ROME -> NAPLES -> VENICE         | VENICE   |         1610 |
| MILAN -> TURIN -> GENOA -> ROME -> FLORENCE -> VENICE | VENICE   |         1580 |
| MILAN -> TURIN -> GENOA -> ROME -> NAPLES -> VENICE   | VENICE   |         1820 |
+-------------------------------------------------------+----------+--------------+


mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 
          FROM train_route 
          WHERE origin='NAPLES'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin 
            AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths;
+-----------------------------------------------------------------+----------+--------------+
| cur_path                                                        | cur_dest | tot_distance |
+-----------------------------------------------------------------+----------+--------------+
| NAPLES                                                          | NAPLES   |            0 |
| NAPLES -> VENICE                                                | VENICE   |          800 |
| NAPLES -> VENICE -> MILAN                                       | MILAN    |         1050 |
| NAPLES -> VENICE -> MILAN -> TURIN                              | TURIN    |         1200 |
| NAPLES -> VENICE -> MILAN -> GENOA                              | GENOA    |         1250 |
| NAPLES -> VENICE -> MILAN -> ROME                               | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> FLORENCE                           | FLORENCE |         1430 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA                     | GENOA    |         1360 |
| NAPLES -> VENICE -> MILAN -> GENOA -> TURIN                     | TURIN    |         1410 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME                      | ROME     |         1750 |
| NAPLES -> VENICE -> MILAN -> ROME -> FLORENCE                   | FLORENCE |         1870 |
| NAPLES -> VENICE -> MILAN -> FLORENCE -> ROME                   | ROME     |         1650 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME             | ROME     |         1860 |
| NAPLES -> VENICE -> MILAN -> GENOA -> ROME -> FLORENCE          | FLORENCE |         1970 |
| NAPLES -> VENICE -> MILAN -> TURIN -> GENOA -> ROME -> FLORENCE | FLORENCE |         2080 |
+-----------------------------------------------------------------+----------+--------------+

 

It’s quite easy now to find out which is the shortest path from one origin to any final destination. You just need to filter and order the main query. Here are some examples:

# shortest path from MILAN to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='MILAN'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1
+-------------------------+----------+--------------+
| cur_path                | cur_dest | tot_distance |
+-------------------------+----------+--------------+
| MILAN -> ROME -> NAPLES | NAPLES   |          810 |
+-------------------------+----------+--------------+

# shortest path from VENICE to GENOA
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='GENOA' 
       ORDER BY tot_distance ASC LIMIT 1;
+--------------------------+----------+--------------+
| cur_path                 | cur_dest | tot_distance |
+--------------------------+----------+--------------+
| VENICE -> MILAN -> GENOA | GENOA    |          450 |
+--------------------------+----------+--------------+

# shortest path from VENICE to NAPLES
mysql> WITH RECURSIVE paths (cur_path, cur_dest, tot_distance) AS (     
          SELECT CAST(origin AS CHAR(100)), CAST(origin AS CHAR(100)), 0 FROM train_route WHERE origin='VENICE'   
          UNION     
          SELECT CONCAT(paths.cur_path, ' -> ', train_route.destination), train_route.destination, paths.tot_distance+train_route.distance        
          FROM paths, train_route        
          WHERE paths.cur_dest = train_route.origin AND NOT FIND_IN_SET(train_route.destination, REPLACE(paths.cur_path,' -> ',',') ) ) 
       SELECT * FROM paths 
       WHERE cur_dest='NAPLES' 
       ORDER BY tot_distance ASC LIMIT 1;
+-----------------------------------+----------+--------------+
| cur_path                          | cur_dest | tot_distance |
+-----------------------------------+----------+--------------+
| VENICE -> MILAN -> ROME -> NAPLES | NAPLES   |         1060 |
+-----------------------------------+----------+--------------+

 

Limitations

Apart from the limitations we have already seen for limiting the execution time and the number of iterations, there are other built-in limitations you should be aware of.

The recursive SELECT must not contain the following constructs:

  • An aggregate function such as SUM()
  • GROUP BY
  • ORDER BY
  • DISTINCT
  • Window functions

These limitations are not valid for non-recursive CTE. Also, the recursive SELECT part must reference the CTE only once and only in its FROM clause, not in any subquery.

Conclusion

Recursive common table expression is a new interesting feature to implement queries for your applications using MySQL 8.0. Recursion was already possible in the past by creating stored routines but now it’s simpler. Furthermore, you don’t need special and additional grants to create a recursive query.

Generally, recursive CTE is quite simple, but compared to non-recursive CTE, it is a little more complicated. Recursive CTE is more tricky because of recursion, obviously. It’s not a matter of syntax, of course, it’s only a matter of “thinking recursively”.


by Corrado Pandiani via Percona Database Performance Blog

Comments