- Is this your first time deploying Airbyte?: Yes
- OS Version / Instance: GCP VM on debian
- Memory / Disk: 4 VCPU and 16GB RAM
- Deployment: Docker
- Airbyte Version: What version are you using now? 0.38.1-alpha
- Source name/version: 0.4.16 and incremental | append mode with pgoutput
I have one big Postgres Table which is partitioned by YYYY/MM/DD. It seems that I can replicate the partition individually like messsages_20220518
and messsages_20220519
but when I replicate the root table itself messages
it doesn’t work (0 bytes; no records).
In the Postgres SQL I’ve found a special parameter WITH (publish_via_partition_root);
available with Postgres 13+ that seems to fit my needs but doesn’t work (still 0 records) when I create the publication with this parameter like CREATE PUBLICATION airbyte_publication FOR ALL TABLES WITH (publish_via_partition_root);
Thanks
Are you trying to sync using CDC and Partioned Table? Could you share one example of command you used to partioned your table?
Did you try to create a view and sync data from the view? Maybe not because of the CDC right?
Yes exactly CDC with partitioned table in Postgre like this Logical Replication Partitioning With PostgreSQL 13 | Severalnines
I didn’t try to create a a view maybe it could be a solution but it seems with postgre 13+ it’s possible to use the CDC on the “root” table
here is the definition of the tables
create table messages
(
uuid uuid not null,
published_at timestamp(3) not null,
primary key (uuid, published_at)
)
partition by RANGE (published_at);
and the for each table partition table by date
create table messages_20220519
partition of messages
(
primary key (uuid, published_at)
)
FOR VALUES FROM ('2022-05-19 00:00:00') TO ('2022-05-20 00:00:00');
With airbyte I can sync in incremental | append each partition messages_YYYYMMDD individually but can’t sync the messages “root” table
However it’s possible to make full refresh | overwrite on the messages “root” table but I need incremental sync
Do you have any solutions / workarounds / thoughts about this issue?
This is really a blocking point for us and I think this is a pretty usual use case when working with Postgres CDC as a source!
Sorry Lucien not updating you before. I was OOO. Tomorrow morning I’ll test the table scripts you send and see if I can reproduce the issue. If you don’t use CDC (as just a test) the sync works?
Hi thanks for your reply
It’s really good news that you will work on that issue. Don’t hesitate to ping me (slack?) if you need help or share screening to reproduce this issue I would be glad to help
Yes, if I sync the “root” table in Full Refresh without the CDC, it works well !
Any update on this issue @marcosmarxm ?
Thanks
Had same issue! I’ll open a Github issue to database team investigate and find a solution but it will enter the current backlog. Lucien do you have a workaround for now?
Great that you were able to reproduce this one.
Not really … The only thing I have in mind is a database cron that reads all the partitions and put it into a dedicated table where we make a full refresh every day
If it were possible to add a new table without resetting all the data I could have ran a script every day to add the new partition table with octavia
Maybe do you have a workaround in mind ?
Unfortunately I don’t think there is a workaround for it.
Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.