Issues with CDC setup and memory utilization in Airbyte server

Summary

Issues with CDC setup not reflecting current data in destination despite successful sync and recurring memory utilization problems in Airbyte server


Question

I have two issues both are of concern.
Airbyte has been a great tool, I had priveled to recommend and used at my current job. However, I have two issue that I haven’t be able to solve.

  1. CDC setup for Postgres as source and Bigquery as destination, in auditing, it was realised that many of tables/columns does not hold equivalent current data as exist on source, despite connection sync show successful.
    I am open to guidelines or tip on how to resolve, we currently resorted to full loading at an interval, this is not ideal.

  2. Every now and then, the Server running Airbyte run out of RAM, with same number of Connections until the server is restart. Once, the server is restart, memory utilization can be under 20% for the 2 to 5days, memory utilization % can then jump to 50% around 8-10 days and eventually it hit 96% when the Airbyte hang. CPU utilization never exceeded 50%, even when the server finally hang.

Airbyte currently run on a machine with 32GB of RAM, the RAM was increase incrementally to solve this issue, yet the issue remain unsolved. Another thing done to solve that issue is to ensure that all connection jobs run after each other (no more than two job run within same time slot)



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["cdc-setup", "postgres-connector", "bigquery-connector", "memory-utilization", "server-ram", "connection-jobs"]
  1. the problem is related to the data types in bigquery?
  2. this is something the team is working to improve: limits and requirements to run Airbyte and what resources are needed

Thanks for the feedback <@U03UPS0983Z> !