Slow Incremental Sync Performance compared to Full Refresh in Airbyte from Postgres to Clickhouse

Summary

The user is experiencing slow performance with Incremental Sync compared to Full Refresh when replicating data from Postgres to Clickhouse using Airbyte. They have ~16 million records in a partitioned table and noticed that Incremental Syncs are taking more time to process much fewer records than the initial Full refresh.


Question

Hi everyone!

I’m trying to use Airbyte to replicate data from Postgres (AWS RDS) to Clickhouse (self-hosted on k8s so far). I’m using CDC method. For the first run I used Full refresh | OVerwrite and then changed to Incremental | Append . I realised that almost all Incremental Appends are running terribly slow compared to the initial Full refresh? Why is that? I have ~16mln records in the table (partition table). Initial Full refresh took 14m23s. Then, I scheduled Incremental sync every two hours and you can find it took in most cases even more time to process much less records

Thanks for any help!



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte", "postgres", "clickhouse", "cdc", "full-refresh", "incremental-sync", "performance", "slow"]

My WAL-related config in RDS

max_wal_senders: 20
max_wal_size: 10240
min_wal_size: 192
wal_buffers: 131072
wal_compression: 1
wal_receiver_create_temp_slot: 0
wal_receiver_timeout: 30000
wal_sender_timeout: 30000
max_replication_slots: 20```

Same here, incremental update of 114 000 records take almost 2 hours.
But when we sync the 50 millions records (initial sync), it took “only” 12 hours.

I am wondering if it can be due to slow target destination.

Hi! I’m not sure what helped but we applied some stuff that could fix the issue:
• we modified/fixed performance of our ERP jobs that runs on the Postgres database - it helped to reduce the overall load on that DB
• we created separate replication slot for each table we wanted to replicate (and configure them as separate sources)
• modified the min_wal_size parameter increasing it to 512
• increase the resource requests of Clickhouse instance (we are running it in k8s)
After these modifications Incremental Appends are much faster