Support for Large Initial CDC Syncs with WASS on Airbyte 0.64.3 and Postgres Source 3.6.18

slack-user-airbyte · September 27, 2024, 6:13am

Summary

Inquiry about the support for large initial CDC syncs with WASS on Airbyte 0.64.3 and Postgres source 3.6.18, experiencing issues with a Postgres->Redshift sync

Question

Does anyone know if https://airbyte.com/blog/supporting-very-large-cdc-syncs-with-wass|WASS support for large initial CDC syncs is available on Airbyte 0.64.3 and Postgres source 3.6.18? The blog post on WASS claims it should be a recent enough version, but the initial sync fails for the seemingly exact reason identified in the blog post (i.e., it failed at 38GB synced of a 500+GB Postgres->Redshift sync). Thank you for your help.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["large-initial-cdc-syncs", "wass", "airbyte-0.64.3", "postgres-source-3.6.18", "initial-sync-failure", "postgres-redshift-sync"]}

slack-user-airbyte · September 27, 2024, 6:22am

this happened to us so we downgraded to pre-WASS (3.4.26)

slack-user-airbyte · September 27, 2024, 6:22am

problem went away after

slack-user-airbyte · September 27, 2024, 6:22am

for some reason, WASS was breaking our initial sync Saved offset is before replication slot's confirmed lsn. Please reset the connection, and then increase WAL retention and/or increase sync frequency to prevent this from happening in the future.
even though our WAL retention is indefinite and the WAL shouldn’t be too big by the time this error would throw

slack-user-airbyte · September 27, 2024, 6:22am

<@U05N5APSYBE> Thank you for sharing this! I hadn’t expected to see WASS hurting the sync. I’ll update this thread once the sync using Airbyte 1.0.0 goes, but if it doesn’t go well, we’ll see about downgrading the source. Thank you again!

slack-user-airbyte · September 27, 2024, 6:22am

hi <@U07LR8G4C2H> and <@U05N5APSYBE> , yes, 3.6.18 contains wass, if you feel there is a problem with it, could you file a ticket with your connection link or the sync log? we will take it from there

slack-user-airbyte · September 27, 2024, 6:22am

Hi <@U073KSQ6Z53>! Our (new) sync has been running for 5 hours and we’re observing two issues
• The RDS OldestReplicationSlotLag is growing linearly, suggesting that WASS isn’t kicked in despite being on Airbyte 1.0.0/latest Postgres source. Is there some way to confirm this or force it to read the WAL?
• Only 20 of the >500GB have synced. Is there anything we can do to speed up the sync? Our concern (beyond it taking a long time) is about the ~5 days of uncapped WAL growth.
cc <@U01MMSDJGC9> in case you have any ideas

Thank you for any ideas you might have.

slack-user-airbyte · September 27, 2024, 6:22am

Hi Adam, this is a little strange as our current implementation has 4 hours time out for initial sync, and then CDC will kicks in.

slack-user-airbyte · September 27, 2024, 6:22am

<@U073KSQ6Z53> that’s fascinating! Here’s a screenshot of it running (my local time is 16:44 PM, so 5 hours). Is there a line in the logs we can look for to detect CDC/the WAL read kicking in?

slack-user-airbyte · September 27, 2024, 6:22am

oh sorry, i just checked again, this timeout is by default 8 hours!

slack-user-airbyte · September 27, 2024, 6:22am

It’s the last row “Initial Load Timeout in Hours”

slack-user-airbyte · September 27, 2024, 6:22am

you can set that to a smaller value maybe 4 experiment WASS

slack-user-airbyte · September 27, 2024, 6:22am

<@U073KSQ6Z53> OK great this is so helpful! And when it times out, nothing bad will happen as long as we can handle 8 hours of WAL backup? If so, that works for us.

Curiously, is there any trick to increasing the sync throughput?

Thank you so much for your help.

slack-user-airbyte · September 27, 2024, 6:22am

Yes, once CDC kicks in it will acknowledge the WAL log entries. You shall see the log size gets shrinked.

slack-user-airbyte · September 27, 2024, 6:22am

currently, to increase sync throughput, we can only do so by creating multiple connections, letting each connection own a subset of the current set of streams. In Q4, we do plan to migrate postgres to our newly developed bulk extract CDK, which provides concurrencies and perhaps will give better throughput for each connection.

slack-user-airbyte · September 28, 2024, 6:20am

> One follow-up: Since each timeout is treated like a sync failure and there are 5 retries per sync, will the sync be marked as failing and be unrecoverable after 5 timeouts? (edited)
Yue can provide a better answer, but latest version of Airbyte if a sync made progress and fail it will try again without count as a failure-failure

slack-user-airbyte · September 28, 2024, 6:20am

<@U01MMSDJGC9> that’s comforting, thank you!

slack-user-airbyte · October 3, 2024, 6:20am

No amount of sync/refresh sync worked (they all threw the WAL error). We upgraded to 1.0.0, and are running a sync now to see if any of the throughput/WASS improvements resolve the issue.

slack-user-airbyte · October 6, 2024, 6:19am

Thank you <@U073KSQ6Z53>! I do see the CDC having kicked in after 8 hours (twice). I also see the system is making incremental progress, which is GREAT!

One follow-up: Since each timeout is treated like a sync failure and there are 5 retries per sync, will the sync be marked as failing and be unrecoverable after 5 timeouts?

slack-user-airbyte · October 7, 2024, 6:20am

in WASS, we will retry 20 times.

Topic		Replies	Views
The CDC method does not consuming replication slot (Postgres) Connector Questions & Issues source-postgres , data-loading , connectors	3	582	July 11, 2022
Syncing with Postgres CDC source fails to pick up deletes and syncs a whole snapshot each time Connector Questions & Issues source-postgres , connectors	0	159	May 2, 2023
Postgres Sync Fails After Upgrading to 0.39.8-alpha Connector Questions & Issues source-postgres	5	424	July 14, 2022
CDC incremental dedup history spend too much time Connector Questions & Issues normalization , connectors	5	468	July 14, 2022
Postgres REPLICA IDENTITY FULL tables are not found in source stream Connector Questions & Issues source-postgres , schema	5	903	August 27, 2022

Support for Large Initial CDC Syncs with WASS on Airbyte 0.64.3 and Postgres Source 3.6.18

Summary

Question

Related topics