Issue with Incremental Data Sync from PostgreSQL to Snowflake

Summary

User is facing an issue with incremental data not being processed in a connection from PostgreSQL to Snowflake, despite successful initial full load and updates in the confirmed_flush_lsn for the Airbyte replication slot. The setup involves TimescaleDB with partitioned tables.


Question

Hi everyone,
<#C021JANJ6TY|>
I’m setting up a Connection for Syncing a partitioned table from PostgreSQL to Snowflake, but I’m encountering an issue where the connection is not processing the incremental data (CDC). The rows synced are showing as zero, despite seeing updates in the confirmed_flush_lsn for the AirBytes replication slot.
Details:
Initial Full Load: I’m able to see the initial full load of data without any issues.
Publication: I’ve created the publication tspartitionedpub for the demo_xxx.entity_time_series table with the publish_via_partition_root option
> CREATE PUBLICATION “tspartitionedpub” FOR TABLE demo_xxx.entity_timeseries WITH (publish_via_partition_root = true);
> Table Structure:
> postgres=# \d+ demo_xxx.entity_time_series
> Column | Type | Nullable | Default | Storage | Compression | Description
> ------------±--------±---------±--------±---------±------------±-------------
> entity_id | uuid | not null | | plain | |
> timestamp | bigint | not null | | plain | |
> org_id | uuid | not null | | plain | |
> attributes | jsonb | | | extended | |
> action | jsonb | | | extended | |
> Indexes:
> “entity_time_series_pkey” PRIMARY KEY, btree (entity_id, “timestamp”)
> “entity_time_series_timestamp_idx” btree (“timestamp” DESC)
> Publications:
> “snowflakesinkpub”
> “tspartitionedpub”
> Triggers:
> ts_insert_blocker BEFORE INSERT ON demo_ubq.entity_time_series FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
> Child tables: _timescaledb_internal._hyper_1_1_chunk,
> _timescaledb_internal._hyper_1_2_chunk,
> _timescaledb_internal._hyper_1_45_chunk
Replication Slot: The confirmed_flush_lsn is getting updated for the “airbyte_slot_part_test” replication slot, as shown below:
> Before Insert:
>
> SELECT slot_name, confirmed_flush_lsn
> FROM pg_replication_slots
> WHERE slot_name = ‘airbyte_slot_part_test’;
> slot_name | confirmed_flush_lsn
> --------------------------±--------------------
> airbyte_slot_part_test | 70/B6A75D10
>
> After Inserts in Child Tables:
> slot_name | confirmed_flush_lsn
> --------------------------±--------------------
> airbyte_slot_part_test | 70/D272C458
Question:
• The initial full load works, but the connection is not processing the incremental data (CDC) after that.
• The confirmed_flush_lsn updates are visible, but the rows synced show as zero.
• We’re using TimescaleDB - hypertables for time-series partitioning (<https://docs.timescale.com/getting-started/latest/tables-hypertables/|as documented here>) — could this be affecting the incremental data flow?
Has anyone experienced this issue or have suggestions on what might be causing the lack of incremental data syncing?
Thanks in advance!



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

['incremental-data', 'cdc', 'postgresql', 'snowflake', 'timescaledb', 'airbyte', 'replication-slot']

The issue may be related to TimescaleDB’s hypertable implementation. Try:

  1. Verify publication includes all child partitions
  2. Check if WAL level is set to logical
  3. Ensure replication identities are set correctly on parent and child tables
  4. Consider setting replica identity full on the parent table if updates/deletes are needed.

Hi <@U01MMSDJGC9>
Yes, Publication includes all child tables, WAL & replica identities are set properly.

Working fine - When adding each child tables to connection.

Checking on the Airbyte’s replication behaviour for Parent table ( partitioned with hyper table ).

Pain point with adding each child table is - need to manage/add new hyper chunk table to the connection with API update and removing TTLed out hyper chunk table from connection dynamically.
<@U01MMSDJGC9>
Please suggest if there is any better solution for managing this.