All syncs hanging for days

  • Is this your first time deploying Airbyte?: No
  • Memory / Disk: Up to 32x c6i.xlarge
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.17
  • Source name/version: Postgres 0.4.18
  • Destination name/version: Snowflake 0.4.30
  • Step: Sync
  • Description: This is a continuation of this topic. Over the past weekend all of my connections (about 25) were stuck in the ‘Running’ state, though nothing was actually being synced. Here are the logs from five of our connections:
    logs-49582.txt (48.0 KB)
    logs-49584.txt (41.5 KB)
    logs-49591.txt (46.7 KB)
    logs-49590.txt (41.2 KB)
    logs-49601.txt (4.8 KB)

I tried deleting all of the pods in Kubernetes and letting them restart, but the syncs continued to hang in the ‘Running’ state. The only thing that ‘fixed’ it was going into each connection individually and cancelling the hanging sync.

Here are the env values, for reference:
JOB_MAIN_CONTAINER_MEMORY_REQUEST=1Gi
JOB_MAIN_CONTAINER_MEMORY_LIMIT=1Gi
MAX_SYNC_WORKERS=15
MAX_SPEC_WORKERS=15
MAX_CHECK_WORKERS=15
MAX_DISCOVER_WORKERS=15

Hey @andmo, thanks for your post. The most trivial thing I can recommend here is to upgrade your Airbyte version to the latest and see if that fixes some of the connections that are breaking. If you’ve already tried that, what were the results?

In addition, just to enumerate and gather more info: what does your kubernetes deployment look like?

Hi @sajarin,

I will try the upgrade first and let you know the result. I appreciate the help.

Also, I’m not a Kubernetes expert; what information should I gather to answer your second question?

Kind regards,

Andrew

Hey @andmo, I appreciate the reply. Have you gotten a chance to upgrade and has it had any meaningful impact?

As for the kubernetes deployment, it’d just be helpful to know how you’re deploying your cluster. Are you using GKE or EKS? Is it a local instance? You can learn more about the different ways we support kubernetes here: https://docs.airbyte.com/deploying-airbyte/on-kubernetes/

I am also facing the same issue. But I am not using kubernetes. I am trying sync the 10M rows from Postgres to Snowflake after 6GB data extracted got stuck for a while until i canceled the sync manually

Below is the place where it got stuck,

2022-08-02 03:54:44 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 698000 (6 GB)

2022-08-02 03:54:45 destination > 2022-08-02 03:54:45 INFO i.a.i.d.r.SerializedBufferingStrategy(flushWriter):93 - Flushing buffer of stream MTL_SYSTEM_ITEMS (200 MB)

2022-08-02 03:54:45 destination > 2022-08-02 03:54:45 INFO i.a.i.d.s.StagingConsumerFactory(lambda$flushBufferFunction$3):158 - Flushing buffer for stream MTL_SYSTEM_ITEMS (200 MB) to staging

2022-08-02 03:54:45 destination > 2022-08-02 03:54:45 INFO i.a.i.d.r.BaseSerializedBuffer(flush):131 - Wrapping up compression and write GZIP trailer data.

2022-08-02 03:54:45 destination > 2022-08-02 03:54:45 INFO i.a.i.d.r.BaseSerializedBuffer(flush):138 - Finished writing data to 4802c05e-271e-4b8c-b4d2-953dda97865a16960591700195388958.csv.gz (200 MB)

Hey @tagnev, thanks for your posts. Seems like a common issue with users who are attempting to sync millions of records. Would it impossible to share some more information about how you’ve deployed Airbyte, how many resources you’ve allocated for it, what versions you’re using for the connectors and the logs associated with the failed sync?

@andmo, are you still experiencing this problem?

Yes still issue exists. Below is the where I created the separate thread,

Hey @tagnev, this seems like a good issue for our Github repo. Although we’d still need more information in order to try to reproduce the issue. Thanks for taking the time to resync, let us know when you’ve posted the logs so we can escalate the issue to Github.

sorry what info required in order to escalate this issue

Hi @tagnev, let’s close this thread and focus on the discussion on the other thread: https://discuss.airbyte.io/t/oracle-large-source-schema-refresh-issue/2157.