Issue with Incremental Data Syncing from PostgreSQL to Snowflake

Summary

The user is facing an issue where the connection for syncing a partitioned table from PostgreSQL to Snowflake is not processing incremental data (CDC) despite seeing updates in the confirmed_flush_lsn for the Airbyte replication slot. The initial full load works, but the incremental data is not being synced. The user is using TimescaleDB hypertables for time-series partitioning and suspects it might be affecting the incremental data flow.


Question

Hi everyone,
<#C021JANJ6TY|>
I’m setting up a Connection for Syncing a partitioned table from PostgreSQL to Snowflake, but I’m encountering an issue where the connection is not processing the incremental data (CDC). The rows synced are showing as zero, despite seeing updates in the confirmed_flush_lsn for the AirBytes replication slot.
Details:
Initial Full Load: I’m able to see the initial full load of data without any issues.
Publication: I’ve created the publication tspartitionedpub for the demo_xxx.entity_time_series table with the publish_via_partition_root option
> CREATE PUBLICATION “tspartitionedpub” FOR TABLE demo_xxx.entity_timeseries WITH (publish_via_partition_root = true);
> Table Structure:
> postgres=# \d+ demo_xxx.entity_time_series
> Column | Type | Nullable | Default | Storage | Compression | Description
> ------------±--------±---------±--------±---------±------------±-------------
> entity_id | uuid | not null | | plain | |
> timestamp | bigint | not null | | plain | |
> org_id | uuid | not null | | plain | |
> attributes | jsonb | | | extended | |
> action | jsonb | | | extended | |
> Indexes:
> “entity_time_series_pkey” PRIMARY KEY, btree (entity_id, “timestamp”)
> “entity_time_series_timestamp_idx” btree (“timestamp” DESC)
> Publications:
> “snowflakesinkpub”
> “tspartitionedpub”
> Triggers:
> ts_insert_blocker BEFORE INSERT ON demo_ubq.entity_time_series FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
> Child tables: _timescaledb_internal._hyper_1_1_chunk,
> _timescaledb_internal._hyper_1_2_chunk,
> _timescaledb_internal._hyper_1_45_chunk
Replication Slot: The confirmed_flush_lsn is getting updated for the “airbyte_slot_part_test” replication slot, as shown below:
> Before Insert:
>
> SELECT slot_name, confirmed_flush_lsn
> FROM pg_replication_slots
> WHERE slot_name = ‘airbyte_slot_part_test’;
> slot_name | confirmed_flush_lsn
> --------------------------±--------------------
> airbyte_slot_part_test | 70/B6A75D10
>
> After Inserts in Child Tables:
> slot_name | confirmed_flush_lsn
> --------------------------±--------------------
> airbyte_slot_part_test | 70/D272C458
Question:
• The initial full load works, but the connection is not processing the incremental data (CDC) after that.
• The confirmed_flush_lsn updates are visible, but the rows synced show as zero.
• We’re using TimescaleDB - hypertables for time-series partitioning (<https://docs.timescale.com/getting-started/latest/tables-hypertables/|as documented here>) — could this be affecting the incremental data flow?
Has anyone experienced this issue or have suggestions on what might be causing the lack of incremental data syncing?
Thanks in advance!



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["incremental-data-syncing", "postgresql", "snowflake", "cdc", "timescaledb", "hypertables", "partitioning"]