Partitioned tables in PostgreSQL

This article intends to show some basics of how table partitioning works in PostgreSQL (PG), as well as enumerating some pros and cons I found when using it. It's a feature I recently started using from PG, so I'm just scratching the surface; If you want to see the demo code, just check it out here.

It would be helpful if you're familiar with PL/pgSQL, but the examples are easy to understand.

Why partition tables?

Simply put, partitioning tables in PG can dramatically reduce the amount of data that the server has to search in order to find the rows you need. This leads to performance improvements that might be drastic depending on the situation.

Imagine, for example, that you have time series data in PG for some reason (instead of using an OLAP database), and that you aren't working with retention periods - instead, you're just going to keep that data forever because you'll need it. And imagine you just keep ingesting more and more data, in large amounts. One of the problems with putting it in the same table is that, if you have too much data, you might start having performance issues, as your table will just keep growing forever. Your indexes might start getting too large for scans and hurting your application, for example. In this case, partitioning the table by periods will avoid it growing forever in a single place.

How does it work?

When you create a partitioned table, you still get a single logical table that can be used for querying, however that table is now composed by all of its partitions in terms of where the data resides (the root table doesn't hold the data itself, only its partitions do). PG then knows exactly what partitions to use from the queries you provide. In other words, if you insert data, and you have a partition configured for it already, PG will know that it's there that your row should be put at. Or, if you select data using constraints that match the partitioning criteria, PG will only scan inside the relevant partitions - bypassing the ones that will never contain data under those constraints.

And this is, in my opinion, the biggest benefit of using partitioned tables in PG: only searching in places where the data might be. This has the potential of dramatically improve querying performance, compared to traditional tables. You might have false positives, but never false negatives. As if it were table-level bloom filters.

Let me use a filesystem as an analogy: Imagine you have a bunch of files in subdirectories organized by a directory tree where you have year, month and day as subdirectories:

foo/2020/01/01/abc1.txt
foo/2020/01/02/abc2.txt
foo/2020/03/02/abc3.txt
foo/2021/03/02/abc4.txt

If you search for file "abc1.txt" in "foo/**", then you'll make your operating system search through 4 different directories; If you search in "foo/2020/", then it reduces the search area to 3 directories; If "foo/2020/01/01", then only 1 directory. The more you specify the subdirectory, the less work the OS will have to do to find that file. A similar thing happens to PG: by using partitioned tables, you reduce the search area to scan to find your data.

And remember all those indexes you wanted to create to speed up your select queries? They will all be created inside the partitions as well, meaning they won't just keep growing forever either, meaning even faster queries!

Example

Let's jump to the code. As previously mentioned, the example code can be found here, but I'll put each part down here to explain how they work in my example.

Root table

Here's the SQL script to create the root table:

CREATE TABLE IF NOT EXISTS data (
    uuid UUID NOT NULL DEFAULT gen_random_uuid(),
    country_code VARCHAR(3) NOT NULL,
    date DATE NOT NULL,
    description TEXT,
    PRIMARY KEY (uuid, country_code, date)
) PARTITION BY LIST (country_code);

It looks almost like a traditional table, with one key difference: the PARTITION BY part. Here, I'm using LIST as the partition type, as I want to partition by a list of values determined by the country_code field. So what this part means is that we want the data table to be partitioned by country code: we're creating a table that will support those partitions.

Notice, too, that the table has three primary keys: uuid, country_code and date. This is because you need to include, in the primary key, all the columns that will be used to partition the table into sub-tables.

However, PG doesn't know how to create partitions for you; You'll have to create them yourself - I believe PG has this approach because it's expensive to create partitions, so it leaves up to the user to define when and how to create the partitions. Let's see then how to create these partitions.

Partition creation

I had several options to create the partitions, and my choice was to create a function that handles the creation of the partitions:

CREATE OR REPLACE FUNCTION data_add_partition(
    date DATE, country_code VARCHAR(3)
) RETURNS VOID LANGUAGE plpgsql AS $$
    DECLARE
        year INTEGER := EXTRACT(YEAR FROM date);
        from_date DATE := FORMAT('%s-01-01', year);
        to_date DATE := FORMAT('%s-01-01', year + 1);
    BEGIN
        -- Create the country code partition
        EXECUTE FORMAT(
            'CREATE TABLE IF NOT EXISTS data_%s '
            'PARTITION OF data FOR VALUES IN (''%s'') '
            'PARTITION BY RANGE (date);',
            country_code, country_code
        );
        -- Create the year partition
        EXECUTE FORMAT(
            'CREATE TABLE IF NOT EXISTS data_%s_%s '
            'PARTITION OF data_%s FOR VALUES FROM (''%s'') TO (''%s'')',
            country_code, year, country_code, from_date, to_date
        );
    END;
$$;

This function basically creates two partitions: one intermediary one for the country code partition, and one lower-level partition for both the country code and the year. With this, I end up with a tree where the root table has partitions by country code as children, and each country code partition has sub-partitions by year as children. Notice, especially, the PARTITION OF part: this defines that the partition belongs to a certain parent table (which can be the real table or a parent partition).

Notice, too, that when using RANGE partitions the end part of the range is not inclusive - so if you want a partition for the whole year of 2020, you have to define it FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'), for example.

Inserting the data

Remember that the partitions aren't created automatically, we have to take care of that ourselves. So I also created a function that takes care of both triggering the creation of the partition and inserting the row:

CREATE OR REPLACE FUNCTION data_insert(
    date DATE, country_code VARCHAR(3), description TEXT
) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
    PERFORM data_add_partition(date, country_code);
    INSERT INTO data (country_code, date, description)
        VALUES (country_code, date, description);
END;
$$;

Ideally the partition creation would be triggered separately and not upon every row we want to insert, since it's an expensive operation, and since most of the time it won't do anything as the partitions will already exist from previous runs. But I chose to keep it here for the sake of the example.

Now let's insert some data by calling that function:

SELECT data_insert('2020-01-01', 'US', 'Something 1');
SELECT data_insert('2020-01-02', 'US', 'Something 2');
SELECT data_insert('2020-01-03', 'BR', 'Something 3');
SELECT data_insert('2021-01-04', 'BR', 'Something 4');
SELECT data_insert('2022-01-05', 'FR', 'Something 5');

Don't be confused by SELECT here and PERFORM in the previous query; PERFORM has to be used in PL/pgSQL when there's no return value from the function being called, whereas in plain SQL we just use SELECT.

If you run all the scripts above in PG, they should run fine. So let's take a look at the data with the psql REPL.

Show me the data

Structure

First, let's take a look at how the tables are structured:

partitioning-demo=# \d+
                                             List of relations
 Schema |     Name     |       Type        |  Owner   | Persistence | Access method |  Size   | Description 
--------+--------------+-------------------+----------+-------------+---------------+---------+-------------
 public | data         | partitioned table | postgres | permanent   |               | 0 bytes | 
 public | data_br      | partitioned table | postgres | permanent   |               | 0 bytes | 
 public | data_br_2020 | table             | postgres | permanent   | heap          | 16 kB   | 
 public | data_br_2021 | table             | postgres | permanent   | heap          | 16 kB   | 
 public | data_fr      | partitioned table | postgres | permanent   |               | 0 bytes | 
 public | data_fr_2022 | table             | postgres | permanent   | heap          | 16 kB   | 
 public | data_us      | partitioned table | postgres | permanent   |               | 0 bytes | 
 public | data_us_2020 | table             | postgres | permanent   | heap          | 16 kB   |

Notice that some tables are marked as "partitioned table" - this is because they have further sub-partitions as children. And notice that they have 0 bytes - because only the leaf partitions actually hold the data for real.

Scans

Now let's see how PG plans some queries. First, a "select all" query:

partitioning-demo=# explain select * from data;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Append  (cost=0.00..91.00 rows=3400 width=68)
   ->  Seq Scan on data_br_2020 data_1  (cost=0.00..18.50 rows=850 width=68)
   ->  Seq Scan on data_br_2021 data_2  (cost=0.00..18.50 rows=850 width=68)
   ->  Seq Scan on data_fr_2022 data_3  (cost=0.00..18.50 rows=850 width=68)
   ->  Seq Scan on data_us_2020 data_4  (cost=0.00..18.50 rows=850 width=68)

Notice that it scans through all partitions. And that it does a Seq Scan, obviously, in each partition, because we didn't specify any constraint that could motivate PG to plan an Index Scan instead.

Alright, let's cut down the partition scans a bit then:

partitioning-demo=# explain select * from data where country_code = 'BR';
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Append  (cost=0.00..41.29 rows=8 width=68)
   ->  Seq Scan on data_br_2020 data_1  (cost=0.00..20.62 rows=4 width=68)
         Filter: ((country_code)::text = 'BR'::text)
   ->  Seq Scan on data_br_2021 data_2  (cost=0.00..20.62 rows=4 width=68)
         Filter: ((country_code)::text = 'BR'::text)

Now that we specified country_code = 'BR', then PG at least knows that it should only search within the partitions that match that criterion - it planned to query only 2 of them.

Let's dive deeper:

partitioning-demo=# explain select * from data where country_code = 'BR' and date >= '2020-01-01' and date < '2021-01-01';
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Index Scan using data_br_2020_pkey on data_br_2020 data  (cost=0.15..22.79 rows=1 width=68)
   Index Cond: (((country_code)::text = 'BR'::text) AND (date >= '2020-01-01'::date) AND (date < '2021-01-01'::date))

And now PG decided to do an Index Scan, because we used two fields that are part of the primary key, so PG determined it's worth using the primary key index. Also, it chose to query only 1 partition, as it knew that the data couldn't possibly be in any other partition.

Conclusion

PG doesn't automatically manage partitions, but it does know how to use them very efficiently. And, depending on the system a partitioned table is part of, this approach might be a game changer.

There are many other things I haven't tested, though, like whether having too many partitions might cause any problems, but if used judiciously I think they are unlikely to cause harm.

And when it comes to creating partitions, don't just do what I did, which is to create the partitions upon every row insert; Instead, just define a better place and time to create them, for example just before you ingest a large portion of data for the same partition.

I hope this has been helpful for you, especially to get you as excited as I am about PostgreSQL partitioned tables!

#postgresql #partitioning