Postgres transaction logs not cleaning up in CDC incremental mode sync to Snowflake

slack-user-airbyte · August 1, 2024, 6:11am

Summary

Postgres transaction logs are not cleaning up as expected after each sync in CDC incremental mode from Postgres 16 to Snowflake. Postgres is running in AWS.

Question

We are using Airbyte in CDC incremental mode to sync Postgres 16 to Snowflake. We are observing that our transaction logs in Postgres are not cleaning up after each sync as much as we expect. Postgres is running in AWS. Any suggestions?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["postgres", "cdc-incremental-mode", "snowflake", "transaction-logs", "aws"]}

slack-user-airbyte · August 1, 2024, 6:17am

I’m facing the same issue with Posgres 15 to Redshift. I’ve upgraded to last connectors and airbyte and its still acting the same, I even downgraded back and I’m still seeing the same behavior.

slack-user-airbyte · August 1, 2024, 6:17am

Are your syncs randomly doing full refreshes of the data ?

slack-user-airbyte · August 1, 2024, 6:17am

I think we were seeing random full refreshes. Let me post some of our findings

slack-user-airbyte · August 1, 2024, 6:17am

• Theory 1 - Debezium timeouts
◦ During moments of high load, Airbyte times out parsing through the WAL logs as it only waits 5 minutes for a relevant record
◦ It should have been waiting for 20 minutes (1200 seconds), but from the logs it clearly was not doing so
◦ This could happen if it was looking for data among many other WAL logs, e.g., something half way through a 800GB WAL log backlog
◦ Explains the never completing syncs allowing the WAL logs to grow for the last few weeks, and the immediate draining of the WAL logs once the replication slot moved ahead after increasing the timeout
◦ Q: What in the WAL logs successfully drained each night, and why was there so much left over? Should replay oldest to newest, so shouldn’t oldest just get purged? Were we seeing just a lag based on where it failed b/c of this?

slack-user-airbyte · August 1, 2024, 6:17am

As Joey mentioned, our replication slot just won’t advance. I really thought it was the dbz timeouts, but now we are “caught up” it seems and it doesn’t time out

     , PG_CURRENT_WAL_LSN()
     , REDO_LSN
     , RESTART_LSN
     , PG_SIZE_PRETTY(PG_WAL_LSN_DIFF(PG_CURRENT_WAL_LSN(), RESTART_LSN)) AS LAG
     , ACTIVE
FROM PG_CONTROL_CHECKPOINT(), PG_REPLICATION_SLOTS;```
```+------------+------------------+-------------+-------------+------+------+
|slot_name   |pg_current_wal_lsn|redo_lsn     |restart_lsn  |lag   |active|
+------------+------------------+-------------+-------------+------+------+
|airbyte_slot|1DE0/894564C8     |1DE0/75FDF240|1D8D/55E49CA0|333 GB|false |
+------------+------------------+-------------+-------------+------+------```

slack-user-airbyte · August 1, 2024, 6:17am

<@U01MMSDJGC9> Could you please offer any insights or guide us to how debug this further ? This is a huge flaw i’m guessing with the postgres source connector and it’s impacting us massively

slack-user-airbyte · August 1, 2024, 6:17am

I assume you’re using RDS Postgres too?

slack-user-airbyte · August 1, 2024, 6:17am

This is also impacting us heavily so we should work together to find the resolution. Our RDS instance is constantly autoscaling storage because the transaction logs aren’t being cleaned up.

slack-user-airbyte · August 2, 2024, 6:17am

Me too and we’ve already hit maximum storage, and with RDS you can’t downscale storage :man-facepalming:

slack-user-airbyte · August 2, 2024, 6:17am

Oh wow, that is a lot. We are not near the limit yet

slack-user-airbyte · August 2, 2024, 6:17am

We are at 3613 GiB right now

slack-user-airbyte · August 2, 2024, 6:17am

Yeah, you can’t downscale storage, you have to create a new instance and migrate

slack-user-airbyte · August 2, 2024, 6:17am

ohhh ive set the limit at 1000 GB, more than that and its coming right out of my salary

slack-user-airbyte · August 2, 2024, 6:17am

Can both of you provide what version of connectors source and destination and airbyte platform you are using?

slack-user-airbyte · August 2, 2024, 6:17am

Postgres 3.6.11
Snowflake 3.11.4
Platform 0.63.10

Also occurred in previous versions but we were hoping upgrading would fix it

slack-user-airbyte · August 2, 2024, 6:17am

Source - Postgres: 3.4.26
Destination - Redshift: 2.6.4
Platform - 0.62.4

Disclaimer: I’ve already tried upgrading to:
Source - Postgres: 3.6.11
Destination - Redshift: 3.4.0
Platform - 0.63.11

and it was even worse, I would see full refreshes that would for example load the true 50 GB of data at first attempt, then fail and load 100 GB at second attempt, and might randomly late load 150GB

slack-user-airbyte · August 2, 2024, 6:17am

Joey, could you file a GitHub issue? I’ll ask someone from the database team to take a look into :octavia-thanks:

slack-user-airbyte · August 2, 2024, 6:17am

Yeah, I will work on that shortly and post the link here.

slack-user-airbyte · August 2, 2024, 6:17am

https://github.com/airbytehq/airbyte/issues/42917

Topic		Replies	Views
Stack Syncs and Automated Full Resyncs on CDC Connector Questions & Issues source-postgres , destination-snowflake , connectors	13	1288	October 17, 2022
Missing Deletions on Postgres CDC Connector Questions & Issues source-postgres , data-loading , connectors	2	492	March 30, 2023
Partially Synced Postgres Connection Connector Questions & Issues source-postgres	2	254	November 11, 2022
Postgres CDC connector stuck , never stops reading from source and staging files while never copying them Connector Questions & Issues source-postgres	5	907	May 15, 2023
Syncing with Postgres CDC source fails to pick up deletes and syncs a whole snapshot each time Connector Questions & Issues source-postgres , connectors	0	163	May 2, 2023

Postgres transaction logs not cleaning up in CDC incremental mode sync to Snowflake

Summary

Question

Related topics