Sync Hangs on Kubernetes with Airbyte

Summary

Sync hangs randomly on Kubernetes with Airbyte, showing as (BLOCKED on Feature.get) in Temporal. Spike in CPU usage on ‘worker’ pod when sync becomes stuck. Using helm charts 0.64.180 with GCS storage and CONTAINER_ORCHESTRATOR_ENABLED: false.


Question

Hi all,

We’re having trouble using airbyte on kubernetes. Randomly, on average at least once per day, a sync hangs. There’re no errors anywhere that we could find.
And the job can be stuck for a long time. For example, we’ve had cases where the sync usually takes a few minutes, but it got stuck for almost half a day until we cancelled it.
There doesn’t appear to be a problem with resources, no limits on CPU and the memory, disk space seems fine. The kubernetes node seems fine as well, nothing major seems to be happening. However, we noticed a correlation with a big spike on CPU usage on a worker pod when a sync becomes stuck.

Temporal shows the sync as (BLOCKED on Feature.get).

We’re using helm charts 0.64.180 with gcs storage and CONTAINER_ORCHESTRATOR_ENABLED: false (we had to disable it as we didn’t manage to deploy airbyte successfully otherwise, the instructions in airbyte website aren’t working for us).

All we did was slowly add more and more syncs with time, but decreasing concurrency (by a lot) didn’t help resolving the issue.

Is there anything else we could look into that would help identify the problem?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["sync-hangs", "kubernetes", "airbyte", "helm-charts", "gcs-storage", "cpu-usage", "worker-pod", "temporal"]