RuntimeException during sync: "Cannot find pod while trying to retrieve exit code"

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 12Gb / 500Gb
  • Deployment: Kubernetes
  • Airbyte Version: 0.35.10-alpha
  • Source name/version: Custom python connector
  • Destination name/version: destination-snowflake 0.4.8
  • Step: sync
  • Description:

Scenario: we are syncing between a custom connector and the Snowflake connector.
~25GB / 25.000.000 records seem to be importing correctly for a little over 2 hours.
The log states that the streams are finalized, and that the tmp tables and stages have been successfully cleaned in the destination.

But then the following exception shows up in the airbyte-worker log:

2022-08-25 14:41:22 destination > 2022-08-25 14:41:22 INFO i.a.i.b.IntegrationRunner(runInternal):153 - Completed integration: io.airbyte.integrations.destination.snowflake.SnowflakeDestination
2022-08-25 14:41:54 INFO i.a.w.p.KubePodProcess(exitValue):710 - Closed all resources for pod airbyte-source-questmanager-6g-sync-2988-2-syksw
2022-08-25 14:41:54 INFO i.a.w.p.KubePodProcess(exitValue):710 - Closed all resources for pod airbyte-source-questmanager-6g-sync-2988-2-syksw
2022-08-25 14:41:54 INFO i.a.w.p.KubePodProcess(exitValue):710 - Closed all resources for pod airbyte-source-questmanager-6g-sync-2988-2-syksw
2022-08-25 14:41:54 INFO i.a.w.p.KubePodProcess(exitValue):710 - Closed all resources for pod airbyte-source-questmanager-6g-sync-2988-2-syksw
2022-08-25 14:41:54 ERROR i.a.w.DefaultReplicationWorker(run):141 - Sync worker failed.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Cannot find pod while trying to retrieve exit code. This probably means the Pod was not correctly created.

Complete logs: logs-2988.txt (3.4 MB)

After this exception the logs seem to indicate that the syncing continues like normal, and finally the normalization step is executed successfully. The attempt is flagged as ‘failed’ and a new attempt is then started.

A few days ago there were 3 failed attempts in a row because of this, but the corresponding Kubernetes pod has been deleted automatically since then so I am unable to look at those logs.

I then performed a sync with the same connector with a much smaller subset of streams (40MB). This time it succeeded.

After that I performed a sync with the original set of streams again. The first attempt failed in the same way, and I canceled the run during the second attempt. The pod once again seems to have been deleted automatically, so I can not look into those logs either.

I would really appreciate any advice on how to approach further debugging or what might cause this issue and how to fix it. Let me know if you need any additional info. Thanks!

I have found an existing issue regarding this problem:

As indicated in the comments it is a bug that should have been fixed in version 0.35.63-alpha. We are currently using version 0.35.10. We will have to update to a more recent version.

Hi @BravoDeltaBD, thanks for searching for the issue on Github. Let me know if you’re still experiencing the same problem after the upgrade.