Postgres Source, can't replicate root partitioned table

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: GCP VM on debian
  • Memory / Disk: 4 VCPU and 16GB RAM
  • Deployment: Docker
  • Airbyte Version: What version are you using now? 0.38.1-alpha
  • Source name/version: 0.4.16 and incremental | append mode with pgoutput

I have one big Postgres Table which is partitioned by YYYY/MM/DD. It seems that I can replicate the partition individually like messsages_20220518 and messsages_20220519 but when I replicate the root table itself messages it doesn’t work (0 bytes; no records).

In the Postgres SQL I’ve found a special parameter WITH (publish_via_partition_root); available with Postgres 13+ that seems to fit my needs but doesn’t work (still 0 records) when I create the publication with this parameter like CREATE PUBLICATION airbyte_publication FOR ALL TABLES WITH (publish_via_partition_root);

Thanks

Are you trying to sync using CDC and Partioned Table? Could you share one example of command you used to partioned your table?
Did you try to create a view and sync data from the view? Maybe not because of the CDC right?

Yes exactly CDC with partitioned table in Postgre like this Logical Replication Partitioning With PostgreSQL 13 | Severalnines

I didn’t try to create a a view maybe it could be a solution but it seems with postgre 13+ it’s possible to use the CDC on the “root” table

here is the definition of the tables

create table messages
(
    uuid                   uuid                                   not null,
    published_at           timestamp(3)                           not null,
    primary key (uuid, published_at)
)
    partition by RANGE (published_at);

and the for each table partition table by date

create table messages_20220519
    partition of messages
        (
            primary key (uuid, published_at)
            )
        FOR VALUES FROM ('2022-05-19 00:00:00') TO ('2022-05-20 00:00:00');

With airbyte I can sync in incremental | append each partition messages_YYYYMMDD individually but can’t sync the messages “root” table

However it’s possible to make full refresh | overwrite on the messages “root” table but I need incremental sync

Do you have any solutions / workarounds / thoughts about this issue?
This is really a blocking point for us and I think this is a pretty usual use case when working with Postgres CDC as a source!

Sorry Lucien not updating you before. I was OOO. Tomorrow morning I’ll test the table scripts you send and see if I can reproduce the issue. If you don’t use CDC (as just a test) the sync works?

Hi thanks for your reply :slight_smile:

It’s really good news that you will work on that issue. Don’t hesitate to ping me (slack?) if you need help or share screening to reproduce this issue I would be glad to help :slight_smile:

Yes, if I sync the “root” table in Full Refresh without the CDC, it works well !

Any update on this issue @marcosmarxm ? :slight_smile:
Thanks :pray:

Had same issue! I’ll open a Github issue to database team investigate and find a solution but it will enter the current backlog. Lucien do you have a workaround for now?

Great that you were able to reproduce this one.

Not really … The only thing I have in mind is a database cron that reads all the partitions and put it into a dedicated table where we make a full refresh every day

If it were possible to add a new table without resetting all the data I could have ran a script every day to add the new partition table with octavia

Maybe do you have a workaround in mind ?

Unfortunately I don’t think there is a workaround for it.