Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow infinite_time_partitions flag to work even if zero data in partition set #585

Closed
mbrancato opened this issue Oct 30, 2023 Discussed in #581 · 14 comments
Closed

Allow infinite_time_partitions flag to work even if zero data in partition set #585

mbrancato opened this issue Oct 30, 2023 Discussed in #581 · 14 comments

Comments

@mbrancato
Copy link

With infinite_time_partitions = true, partman will not create new partitions without data in some partition. When there is new data in some partition, it then makes partitions for all partitions up till now() + premake.

The goal was that simply running run_maintenance() would trigger new partition creation based on now(), and handle retention of old partitions, regardless if there was new data inside the partitions.

Discussed in #581

Originally posted by mbrancato October 23, 2023
I'm planning to run pg_partman without the background worker. To prepare for that, I've be just trying to manually create a test table, and then run run_maintenance() to ensure that forward-looking partitions are being created. My initial use-case was a bit more complex, so I reduced complexity for testing purposes. However, I can still not get any future partitions to be created.

For setup, I'm running this in a container, and this is my Dockerfile:

FROM postgres:15

RUN apt-get update \
      && apt-cache showpkg postgresql-$PG_MAJOR-partman \
      && apt-get install -y --no-install-recommends \
           postgresql-$PG_MAJOR-partman \
      && rm -rf /var/lib/apt/lists/*

extension data:

partitions_test=# \dx
                               List of installed extensions
    Name    | Version |   Schema   |                     Description                      
------------+---------+------------+------------------------------------------------------
 pg_partman | 5.0.0   | partman    | Extension to manage partitioned tables by time or ID
 plpgsql    | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

I'm just using the postgres user for now to connect to a test database:

CREATE SCHEMA IF NOT EXISTS partman;
CREATE EXTENSION IF NOT EXISTS pg_partman WITH SCHEMA partman;

CREATE TABLE public.my_table (
    id uuid NOT NULL,
    col1 integer NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() NOT NULL
) PARTITION BY RANGE (created_at);

SELECT partman.create_parent(
    p_parent_table := 'public.my_table'
    , p_control := 'created_at'
    , p_interval := '1 hour'
    , p_start_partition := date_trunc('hour', NOW())::text
    , p_default_table := false
);
result
partitions_test=# SELECT partman.create_parent(
partitions_test(#     p_parent_table := 'public.my_table'
partitions_test(#     , p_control := 'created_at'
partitions_test(#     , p_interval := '1 hour'
partitions_test(#     , p_start_partition := date_trunc('hour', NOW())::text
partitions_test(#     , p_default_table := false
partitions_test(# );
 create_parent 
---------------
 t
(1 row)

partitions_test=# \d+ my_table                                                                                                                                                                                                                              Partitioned table "public.my_table"
   Column   |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
------------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 id         | uuid                     |           | not null |         | plain   |             |              | 
 col1       | integer                  |           | not null |         | plain   |             |              | 
 created_at | timestamp with time zone |           | not null | now()   | plain   |             |              | 
Partition key: RANGE (created_at)
Partitions: my_table_p20231023_120000 FOR VALUES FROM ('2023-10-23 12:00:00+00') TO ('2023-10-23 13:00:00+00'),
            my_table_p20231023_130000 FOR VALUES FROM ('2023-10-23 13:00:00+00') TO ('2023-10-23 14:00:00+00'),
            my_table_p20231023_140000 FOR VALUES FROM ('2023-10-23 14:00:00+00') TO ('2023-10-23 15:00:00+00'),
            my_table_p20231023_150000 FOR VALUES FROM ('2023-10-23 15:00:00+00') TO ('2023-10-23 16:00:00+00'),
            my_table_p20231023_160000 FOR VALUES FROM ('2023-10-23 16:00:00+00') TO ('2023-10-23 17:00:00+00')

I've tried both 1 hour and 1 minute intervals for testing. In practice, I'll likely use 1 day intervals. Waiting a while and running run_maintenance() seems to do nothing. The same is true if I delete the last partition and then run run_maintenance(). I'm running it this way:

SELECT partman.run_maintenance();

The output with DEBUG1 looks like this:

output logs
2023-10-23 12:18:28.494 UTC [2712] DEBUG:  Parent table possibly removed from part_config by retenion
2023-10-23 12:18:28.494 UTC [2712] CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 115 at RAISE
2023-10-23 12:18:28.495 UTC [2712] DEBUG:  show_partitions: v_parent_schema: public, v_parent_tablename: my_table, v_datetime_string: YYYYMMDD_HH24MISS
2023-10-23 12:18:28.495 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 53 at RAISE
     SQL statement "SELECT partition_tablename                              FROM partman.show_partitions(v_row.parent_table, p_include_default := true) LIMIT 1"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 147 at SQL statement
2023-10-23 12:18:28.495 UTC [2712] DEBUG:  show_partitions: v_default_sql: SELECT n.nspname::text AS partition_schemaname
             , c.relname::text AS partition_name
             FROM pg_catalog.pg_inherits h
             JOIN pg_catalog.pg_class c ON c.oid = h.inhrelid
             JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
             WHERE h.inhparent = 'public.my_table'::regclass
             AND pg_get_expr(relpartbound, c.oid) = 'DEFAULT'
2023-10-23 12:18:28.495 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 69 at RAISE
     SQL statement "SELECT partition_tablename                              FROM partman.show_partitions(v_row.parent_table, p_include_default := true) LIMIT 1"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 147 at SQL statement
2023-10-23 12:18:28.498 UTC [2712] DEBUG:  show_partitions: v_sql: SELECT n.nspname::text AS partition_schemaname
             , c.relname::text AS partition_name
             FROM pg_catalog.pg_inherits h
             JOIN pg_catalog.pg_class c ON c.oid = h.inhrelid
             JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
             WHERE h.inhparent = 'public.my_table'::regclass
         AND pg_get_expr(relpartbound, c.oid) != 'DEFAULT' 
             ORDER BY (regexp_match(pg_get_expr(c.relpartbound, c.oid, true), $REGEX$\(([^)]+)\) TO \(([^)]+)\)$REGEX$))[1]::text::timestamptz ASC 
2023-10-23 12:18:28.498 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 112 at RAISE
     SQL statement "SELECT partition_tablename                              FROM partman.show_partitions(v_row.parent_table, p_include_default := true) LIMIT 1"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 147 at SQL statement
2023-10-23 12:18:28.502 UTC [2712] DEBUG:  run_maint: v_partition_expression: created_at
2023-10-23 12:18:28.502 UTC [2712] CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 169 at RAISE
2023-10-23 12:18:28.503 UTC [2712] DEBUG:  show_partitions: v_parent_schema: public, v_parent_tablename: my_table, v_datetime_string: YYYYMMDD_HH24MISS
2023-10-23 12:18:28.503 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 53 at RAISE
     SQL statement "SELECT partition_tablename                       FROM partman.show_partitions(v_row.parent_table, 'DESC') LIMIT 1"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 171 at SQL statement
2023-10-23 12:18:28.503 UTC [2712] DEBUG:  show_partitions: v_sql: SELECT n.nspname::text AS partition_schemaname
             , c.relname::text AS partition_name
             FROM pg_catalog.pg_inherits h
             JOIN pg_catalog.pg_class c ON c.oid = h.inhrelid
             JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
             WHERE h.inhparent = 'public.my_table'::regclass
         AND pg_get_expr(relpartbound, c.oid) != 'DEFAULT' 
             ORDER BY (regexp_match(pg_get_expr(c.relpartbound, c.oid, true), $REGEX$\(([^)]+)\) TO \(([^)]+)\)$REGEX$))[1]::text::timestamptz DESC 
2023-10-23 12:18:28.503 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 112 at RAISE
     SQL statement "SELECT partition_tablename                       FROM partman.show_partitions(v_row.parent_table, 'DESC') LIMIT 1"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 171 at SQL statement
2023-10-23 12:18:28.504 UTC [2712] DEBUG:  run_maint: parent_table: public.my_table, v_last_partition: my_table_p20231023_121900
2023-10-23 12:18:28.504 UTC [2712] CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 172 at RAISE
2023-10-23 12:18:28.506 UTC [2712] DEBUG:  show_partition_info: v_child_schema: public, v_child_tablename: my_table_p20231023_121900
2023-10-23 12:18:28.506 UTC [2712] CONTEXT:  PL/pgSQL function show_partition_info(text,text,text) line 61 at RAISE
     SQL statement "SELECT child_start_time                                             FROM partman.show_partition_info(v_parent_schema||'.'||v_last_partition, v_row.partition_interval, v_row.parent_table)"
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 183 at SQL statement
2023-10-23 12:18:28.506 UTC [2712] DEBUG:  show_partitions: v_parent_schema: public, v_parent_tablename: my_table, v_datetime_string: YYYYMMDD_HH24MISS
2023-10-23 12:18:28.506 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 53 at RAISE
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 188 at FOR over SELECT rows
2023-10-23 12:18:28.506 UTC [2712] DEBUG:  show_partitions: v_sql: SELECT n.nspname::text AS partition_schemaname
             , c.relname::text AS partition_name
             FROM pg_catalog.pg_inherits h
             JOIN pg_catalog.pg_class c ON c.oid = h.inhrelid
             JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
             WHERE h.inhparent = 'public.my_table'::regclass
         AND pg_get_expr(relpartbound, c.oid) != 'DEFAULT' 
             ORDER BY (regexp_match(pg_get_expr(c.relpartbound, c.oid, true), $REGEX$\(([^)]+)\) TO \(([^)]+)\)$REGEX$))[1]::text::timestamptz DESC 
2023-10-23 12:18:28.506 UTC [2712] CONTEXT:  PL/pgSQL function show_partitions(text,text,boolean) line 112 at RAISE
     PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 188 at FOR over SELECT rows
2023-10-23 12:18:28.511 UTC [2712] DEBUG:  run_maint: v_current_partition_timestamp: <NULL>, v_max_time_default: <NULL>
2023-10-23 12:18:28.511 UTC [2712] CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 217 at RAISE

I've looked at this previous discussion, but while I am running the maintenance manually, its unclear if its saying I literally need to INSERT data to trigger new partitions to be created. Is there something I'm missing to trigger the creation of new partitions?

@dtseiler
Copy link
Contributor

dtseiler commented Nov 1, 2023

I'm seeing similar problems with similar debug logging. I use partition interval of 1 day but if I try to increase the premake value and manually call run_maintenance(), nothing happens and I get the debug logging similar to the OP.

@dtseiler
Copy link
Contributor

dtseiler commented Nov 1, 2023

I can't seem to repeat it now with a clean test. I do know that it was failing when I manually tried to change the datetime_string to YYYY_MM_DD and test run_maintenance. That did nothing so I changed it back to YYYYMMDD and it was still broken.

@mbrancato
Copy link
Author

mbrancato commented Nov 5, 2023

I wanted to add some information here. I've been testing this further and I've identified that for run_maintenance() has this issue on a per-parent table basis. This means that run_maintenance() will only create new partitions for parent tables that have new data. It will not work if any parent table has new data.

Regardless of new data, run_maintenance() will enforce retention and delete old tables.

I'm probably going to create a cron job to insert some dummy data every few hours just to make sure that some outage does not cause a bigger problem. I noticed that if you get to where retention is bumping up against the last partition, it will not be deleted. Any inserts when its in this state and data going into the default partition will also not trigger new partitions to be created so partman cannot get it out of this state and all data will go into the default partition forever.

@keithf4
Copy link
Collaborator

keithf4 commented Nov 6, 2023

This makes sense as each table's maintenance is simply run in a loop and each table is individually checked if it has an data

@keithf4 keithf4 changed the title Unable to get run_maintenance() to create future partitions Allow infinite_time_partitions flag to work even if zero data in partition set Nov 9, 2023
@keithf4 keithf4 self-assigned this Nov 9, 2023
@keithf4 keithf4 added this to the Future milestone Nov 9, 2023
@mikegh
Copy link

mikegh commented Jan 23, 2024

I faced the same problem. I think this is caused by AND in run_maintanance() in condition below.

IF v_row.infinite_time_partitions AND (v_current_partition_timestamp < CURRENT_TIMESTAMP) THEN

Replacing it with "OR" solves the problem.

@keithf4
Copy link
Collaborator

keithf4 commented Jan 23, 2024

I faced the same problem. I think this is caused by AND in run_maintanance() in condition below.

IF v_row.infinite_time_partitions AND (v_current_partition_timestamp < CURRENT_TIMESTAMP) THEN

Replacing it with "OR" solves the problem.

I don't think an OR condition would handle this as intended. In that case it would keep making partitions if infinite was set to false but the current/max value in the partition set was less than "now", which means the setting wouldn't make any difference in the end. But this did point me in the right direction of where to fix this issue, so thank you!

@keithf4 keithf4 modified the milestones: Future, 5.1 Jan 23, 2024
@keithf4
Copy link
Collaborator

keithf4 commented Jan 24, 2024

I have a beta PR up that should allow the infinite flag to work with no data. If you're able to test, any feedback would be appreciated

#615

@mikegh
Copy link

mikegh commented Feb 1, 2024

I just tested the beta version and it works well. But I found that in run_maintenance_proc of patch the remove lock is different from the one aquired. And it would be grate if you add infinite_time_partitions as parameter to create_parent function.

@keithf4
Copy link
Collaborator

keithf4 commented Feb 1, 2024

I just tested the beta version and it works well. But I found that in run_maintenance_proc of patch the remove lock is different from the one aquired. And it would be grate if you add infinite_time_partitions as parameter to create_parent function.

Thank you for catching the mismatched lock calls!

I don't think I'll be adding that option to create_parent() at this time. Trying to keep the parameter list under control a bit and would prefer to have settings there that are more difficult to change or have a larger impact after the partition set is initially created.

@keithf4
Copy link
Collaborator

keithf4 commented Mar 6, 2024

I've uploaded some fixes to the beta PR. If you have a chance to retest again that would be appreciated. I'll hopefully be releasing in the near future

@mikegh
Copy link

mikegh commented Mar 15, 2024

I tested the new version in the part of infinite partitions, and it works fine.

@keithf4
Copy link
Collaborator

keithf4 commented Mar 16, 2024

Thank you!

@TITANHACKY
Copy link

In amazon rds still facing the issue. The extention is not updated there. If I try to upgrade using update command, it is not working

@keithf4
Copy link
Collaborator

keithf4 commented Apr 5, 2024

Apologies, I have no control over the version of pg_partman in RDS. Please put in a support request with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants