Postgres to Snowflake replication slot growing in size

Summary

The user is experiencing a situation where the logical replication slot on the source database keeps growing in size despite successful syncs in Airbyte. The replication slot’s ‘active’ field is listed as ‘false’ even during a sync. User is seeking advice on how to resolve this issue.


Question

Hey guys, I’ve set up an (AWS) Postgres → Snowflake connection using WAL/CDC on the source, which is working well. All data is being replicated, nothing missing, on time, however the logical replication slot on the source database just keeps growing and growing. In 3 days it’s now up to 22GB, yet airbyte sync logs have all syncs successful, as does sanity checking the data in Snowflake. My experience previously is only using DMS, which would clear the replication slot automatically after each checkpoint had been copied. The “active” field of the slot is also listed as “false” - even during a sync. Anyone experienced this before? How can I fix this?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["aws-postgres", "snowflake", "wal-cdc", "logical-replication-slot", "airbyte-sync"]

my max WAL is set to default of 2gb as well. so how it’s 22 :shrug:

Howdy - We recently did just this actually. I’d make sure that you have both a publication slot and subscription configured correctly, and confirm the other config options. Here’s what we have set for the relevant fields in the sync:

Under Advanced:
• Update Method
◦ Read Changes using Write-Ahead Log (CDC)
• Replication Slot matches what was created
• Publication matches what was created
• Optional Fields:
◦ LSN commit behavior
:black_small_square:︎ After loading Data in the destination
Notes from our DevOps team (who set up the DB publication and subscription details:
• Make sure that settings are the following (wal_level = on, wal_level = logical)
SELECT name,setting FROM pg_settings WHERE name IN ('wal_level','rds.logical_replication');
• They also granted rds_superuser and rds_replication roles to our DB user
The “active” field will only be true during a sync, but also only during a replication-based sync (initial syncs are not of course). Not sure when you checked those, but might be worth confirming.

Are you seeing any LSN details in the airbyte fields in your tables? Maybe poke around the airbyte_internal tables in the _airbyte_data field to see if you’re getting that info.

Lastly, I’d confirm the type of sync you’re doing for each table (we’re using Incremental | Append + Deduped). Full syncs might bypass the WAL (just a guess… still new to Airbyte tbh).

Our WAL generally stays between 250MB-800MB (we sync every 15 minutes).

Anyway, hope something in there helps!

Thanks Chris! I was doing Incremental | Append (no dedupe) which was the only difference between your setup, however after re-saving the source, it seems to be working as expected now. slot size gets to about 1gb before dropping again after the sync. Not sure what happened maybe a bad config save on initial save but seems to be all good for the last 24 hours - appreciate your detailed response!