Index Improvements in PostgreSQL 13

index improvements postgresql 13

index improvements postgresql 13Indexes are one of the core features of all the database management systems (DBMS). Indexes have a very long history in PostgreSQL, which has quite a rich set of index features. PostgreSQL has B-Tree, Hash,  GIN, GIST, and BRIN indexes. And because the development around indexes is still going on, PostgreSQL 13 provides some enhancements.

We can divide the overall improvements into two categories:

  1. Changes which are transparent to the user. There won’t be any visible changes, but they will get the benefit automatically after the upgrade, probably after a rebuild of the index.  No application change required.
  2. The second set of improvements require the user to explicitly use the new feature. 

Both types of improvements have been introduced in PostgreSQL Version 13. Sometimes it is difficult to extract the information from the release notes and comprehend for an end-user or even convince an end-user, so this blog contains examples of improvements done in PostgreSQL 13.

1. Deduplication of B-Tree Index [1]

Non-unique/primary B-Tree indexes generally contain a lot of duplicate values. The storage of duplicates plays a vital role, especially in B-Tree, which allows aggregate operations like COUNT or GROUP BY to use these indexes. However, a smart way of packing can eliminate storing the actual duplicate value by just maintaining the reference.

The process of deduplication eliminates the redundant/duplicate values from the tree. This “Deduplication” process reduces the storage requirement of the index, as each key will be stored only once to save space.

postgres=# CREATE TABLE foo(id INTEGER, name TEXT);

postgres=# INSERT INTO foo VALUES(generate_series(1, 1000000), 'value');
INSERT 0 1000000

postgres=# select pg_size_pretty(pg_relation_size('foo'));
 pg_size_pretty 
----------------
 42 MB
(1 row)

postgres=# CREATE INDEX idx ON foo (name);
postgres=# SELECT pg_size_pretty(pg_relation_size('idx'));
 pg_size_pretty 
----------------
 21 MB
(1 row)

 

postgres=# CREATE TABLE foo(id INTEGER, name TEXT);
postgres=# INSERT INTO foo VALUES(generate_series(1, 1000000), 'value');
INSERT 0 1000000

postgres=# select pg_size_pretty(pg_relation_size('idx'));
 pg_size_pretty 
----------------
 21 MB
(1 row)


postgres=# CREATE INDEX idx ON foo (name);
postgres=# select pg_size_pretty(pg_relation_size('idx'));
 pg_size_pretty 
----------------
 6792 kB
(1 row)

[Note: This is just an example to show the reduction in size, not a benchmark. You may get different numbers.]

A new index parameter is introduced called deduplicate_items which can be specified while creating the index. This parameter is used to enable/disable the deduplication. It is “ON” by default, which means that the benefit of the deduplication will be available without explicitly making any change*.

postgres=# CREATE INDEX idx1 ON foo (name) WITH (deduplicate_items = off);
CREATE INDEX

postgres=# SELECT pg_size_pretty(pg_relation_size('idx'));
 pg_size_pretty 
----------------
 6792 kB
(1 row)

postgres=# select pg_size_pretty(pg_relation_size('idx1'));
 pg_size_pretty 
----------------
 21 MB
(1 row)

Note: If you are upgrading PostgreSQL from older versions using the pg_upgrade, all indexes need to be REINDEX to avail the benefit of deduplication, regardless of which version you are upgrading from. 

2. Allow GiST [2.1] and SP-GiST [2.2] Indexes for Box/Point Distance Lookups

The GiST index is a template for developing further indexes over any kind of data, supporting any lookup over that data. By default, it supports a wide range of operators. In PostgreSQL 13, this new patch adds support for the missing “<-> (box, point)” operator to GiST box_ops as the ordering operator.

Let’s consider the example of a table with a BOX field, a POINT, and a CIRCLE field.

postgres=# CREATE TABLE foo (b BOX, p POINT, c CIRCLE);

Insert Data to the table:

postgres=# INSERT INTO foo
SELECT box(point(0.05*i, 0.05*i), POINT(0.05*i, 0.05*i)),
  POINT(0.05*i, 0.05*i),
  CIRCLE(point(0.05*i, 0.05*i), 1.0)
FROM generate_series(0,10000) as i;

Analyze the table:

postgres=# VACUUM ANALYZE foo;

Create a GiST index on the Box Field:

postgres=# CREATE INDEX idx on foo USING gist (b);
CREATE INDEX

Now let’s have a query to check all the boxes contained in another box area but sorted by distance to a point.

postgres=# EXPLAIN (costs off)                                                                                                                                                                                                         SELECT b FROM foo WHERE b <@ BOX(point(5,5), POINT(6,6))                                                                                                                                                                          ORDER BY b <-> POINT(5.2, 5.91);
                      QUERY PLAN                     
------------------------------------------------------
Index Only Scan using idx on foo
  Index Cond: (b <@ '(6,6),(5,5)'::box)
  Order By: (b <-> '(5.2,5.91)'::point)
(3 rows)

Please note the ordering operator usage in the above query like “b <-> POINT(5.2, 5.91);”. This was not possible in the previous PostgreSQL versions.

3. Allow GIN Indexes to More Efficiently Handle NOT Restrictions [3]

This is a performance enhancement for queries that use GIN indexes. Now the GIN index will be more efficient in handling the Negation restrictions. This improvement avoids the full scanning of GIN indexes.

postgres=# CREATE TABLE t_gin_test_tbl(i int4[], j int4[]);
postgres=# CREATE INDEX ON t_gin_test_tbl USING GIN (i, j); 
postgres=# NSERT INTO t_gin_test_tbl
VALUES
  (null,    null),
  ('{}',    null),
  ('{1}',   null),
  ('{1,2}', null),
  (null,    '{}'),
  (null,    '{10}'),
  ('{1,2}', '{10}'),
  ('{2}',   '{10}'),
  ('{1,3}', '{}'),
  ('{1,1}', '{10}');

postgres=# SET ENABLE_SEQSCAN = off;

postgres=# EXPLAIN (costs off)
SELECT * FROM t_gin_test_tbl WHERE array[0] <@ i;
                    QUERY PLAN    
---------------------------------------------------
 Bitmap Heap Scan on t_gin_test_tbl
   Recheck Cond: ('{0}'::integer[] <@ i)
   ->  Bitmap Index Scan on t_gin_test_tbl_i_j_idx
         Index Cond: (i @> '{0}'::integer[])
(4 rows)

4. Index Operator Class Parameters [4]

PostgreSQL has a variety of index access methods such as (1) GiST, (2) GIN, (3) SP-GiST, and (4) BRIN. While creating an index there is already an option to specify the operator class. The operator class contains the comparison function to be used for the index. Normally when we create an index without specifying the operator class, the default operator class is used and most of the time it is sufficient. But in some cases there is a need to have more than one meaningful behavior, therefore we need to specify the operator class.

These opclasses define the representation of keys and operations on them. Along with that it also defines the supported search strategies. To add some user-side decisions to opclass opclass_parameter is introduced. New syntax in INDEX creation is added to specify the operator class options.

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
postgres=# CREATE TABLE int_arr(id int, val int[]);
CREATE TABLE

postgres=# CREATE INDEX ON int_arr USING gist (                                                                                                                                                                                           val gist__intbig_ops (siglen = 32));
CREATE INDEX

postgres=# \d int_arr
               Table "public.int_arr"
 Column |   Type    | Collation | Nullable | Default 
--------+-----------+-----------+----------+---------
 id     | integer   |           |          | 
 val    | integer[] |           |          | 
Indexes:
    "int_arr_val_idx" gist (val gist__intbig_ops (siglen='32'))

5. GiST Signature Length

Allow CREATE INDEX to specify the GiST signature length and the maximum number of integer ranges. Now a user can specify the GIST Index parameter. 

CREATE INDEX name ON table USING GIST (column [ { DEFAULT | tsvector_ops } (siglen = number) ] );

6. Prevent Indexes That Use Non-Default Collations From Being Added as a Table’s Unique or Primary Key Constraint [5]

The index and column collations must now match so the index’s uniqueness matches the column’s uniqueness.

postgres=# create table foo(col varchar(255));
CREATE TABLE
postgres=# CREATE UNIQUE INDEX idx ON foo(col desc);
CREATE INDEX

postgres=# ALTER TABLE foo ADD CONSTRAINT unique_idx UNIQUE USING INDEX idx;
2020-09-07 01:33:55.971 PKT [11083] ERROR:  index "idx" column number 1 does not have default sorting behavior at character 21
2020-09-07 01:33:55.971 PKT [11083] DETAIL:  Cannot create a primary key or unique constraint using such an index.
2020-09-07 01:33:55.971 PKT [11083] STATEMENT:  ALTER TABLE foo ADD CONSTRAINT unique_idx UNIQUE USING INDEX idx;
ERROR:  index "idx" column number 1 does not have default sorting behavior
LINE 1: ALTER TABLE foo ADD CONSTRAINT unique_idx UNIQUE USING INDEX...
                        ^
DETAIL:  Cannot create a primary key or unique constraint using such an index.

*[Currently this is on by default, but a final decision will be on GA]

Note: All the information is based on PostgreSQL 13 beta.


by Ibrar Ahmed via Percona Database Performance Blog

Comments