Continuous CDC and Incremental Sync Options in Airbyte for Postgres to Clickhouse Integration

Summary

The user is facing issues with setting up continuous CDC and Incremental Sync - Append + Deduped options in Airbyte for Postgres to Clickhouse integration.


Question

Hello All,

I am a first-time user of Airbyte and have setup a small setup to test out the feasibility of our requirement which is to do a CDC from Postgres to Clickhouse in a real-time fashion. Most of our data (>95%) is transactional and we get very frequent updates on the recent most data. I’ve setup Airbyte in an EC2 machine and setup source integration for Postgres in CDC mode and Clickhouse as the destination. I created a connection and was trying to get a test table’s data from Postgres to Clickhouse. Incremental overwrites and appends are working perfectly fine, and I am able to view data in Clickhouse. However, I am running into the following issues:

  1. There’s no option in Airbyte to run a continuous stream to capture CDC and reflect them in Clickhouse in real-time. It is only allowing me to set a cron schedule or run manual triggers which essentially makes the pipeline a batch job.
  2. There’s no option in my connection to perform Incremental Sync - Append + Deduped as shown in the screenshot below. I get only two options - either an overwrite or append. Both are not ideal for me because incremental overwrites are not transactional in nature and I would get different results during the time when writes are running and incremental appends will bloat up my destination tables.
    Please let me know if there’s a workaround to these issues. I have also attached a screenshot of library versions.


This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte", "postgres", "clickhouse", "cdc", "real-time", "incremental-sync", "continuous-stream", "batch-job"]