Slow Incremental Sync Performance compared to Full Refresh in Airbyte from Postgres to Clickhouse

slack-user-airbyte · September 21, 2024, 6:11am

Summary

The user is experiencing slow performance with Incremental Sync compared to Full Refresh when replicating data from Postgres to Clickhouse using Airbyte. They have ~16 million records in a partitioned table and noticed that Incremental Syncs are taking more time to process much fewer records than the initial Full refresh.

Question

Hi everyone!

I’m trying to use Airbyte to replicate data from Postgres (AWS RDS) to Clickhouse (self-hosted on k8s so far). I’m using CDC method. For the first run I used Full refresh | OVerwrite and then changed to Incremental | Append . I realised that almost all Incremental Appends are running terribly slow compared to the initial Full refresh? Why is that? I have ~16mln records in the table (partition table). Initial Full refresh took 14m23s. Then, I scheduled Incremental sync every two hours and you can find it took in most cases even more time to process much less records

Thanks for any help!

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["airbyte", "postgres", "clickhouse", "cdc", "full-refresh", "incremental-sync", "performance", "slow"]}

slack-user-airbyte · September 24, 2024, 6:21am

My WAL-related config in RDS

max_wal_senders: 20
max_wal_size: 10240
min_wal_size: 192
wal_buffers: 131072
wal_compression: 1
wal_receiver_create_temp_slot: 0
wal_receiver_timeout: 30000
wal_sender_timeout: 30000
max_replication_slots: 20```

slack-user-airbyte · September 24, 2024, 6:21am

Same here, incremental update of 114 000 records take almost 2 hours.
But when we sync the 50 millions records (initial sync), it took “only” 12 hours.

I am wondering if it can be due to slow target destination.

slack-user-airbyte · October 6, 2024, 6:19am

Hi! I’m not sure what helped but we applied some stuff that could fix the issue:
• we modified/fixed performance of our ERP jobs that runs on the Postgres database - it helped to reduce the overall load on that DB
• we created separate replication slot for each table we wanted to replicate (and configure them as separate sources)
• modified the min_wal_size parameter increasing it to 512
• increase the resource requests of Clickhouse instance (we are running it in k8s)
After these modifications Incremental Appends are much faster

Topic		Replies	Views
Continuous CDC and Incremental Sync Options in Airbyte for Postgres to Clickhouse Integration Connector Questions cdc , airbyte , connector , incremental-sync , clickhouse	0	81	June 12, 2024
Improving performance of MSSQL to S3 CDC incremental syncs Connector Questions performance , airbyte , connector , incremental-sync , data-discrepancy	0	12	September 30, 2024
Airbyte performing full refresh instead of incremental load after PostgreSQL upgrade Connector Questions cdc , airbyte , connector , full-refresh , bug	0	12	September 27, 2024
Issue with Incremental Replication in Airbyte for Postgres Connector Questions cdc , connector , postgres , bug , airbyte-connection	1	35	December 12, 2024
Best practices for syncing large tables in Airbyte Connector Questions airbyte , connector , incremental-sync , question , best-practices	12	829	July 14, 2024

Slow Incremental Sync Performance compared to Full Refresh in Airbyte from Postgres to Clickhouse

Summary

Question

Related topics