Summary
Error message indicates a failure to create a pod for the check step in Airbyte platform. User is seeking direction for troubleshooting.
Question
I keep seeing this and I simply can’t figure out why, short of destroying our installation, I have no clue how to debug this one, I’ve taken multiple steps on rancher to isolate the issue but nothing stands out, there is two tickets related but no clear answer as of what is happening. https://github.com/airbytehq/airbyte/issues/35346 + https://github.com/airbytehq/airbyte/discussions/35301
io.airbyte.workers.exception.WorkerException: Failed to create pod for check step
at io.airbyte.workers.process.KubeProcessFactory.create(KubeProcessFactory.java:197) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
at io.airbyte.workers.process.AirbyteIntegrationLauncher.check(AirbyteIntegrationLauncher.java:149) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]
at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:71) ~[io.airbyte-airbyte-commons-worker-0.53.0.jar:?]```
Any direction will be appreciated.
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1710424150247589) if you want to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["debugging", "error", "airbyte-platform", "pod-creation", "troubleshooting"]
</sub>
Too little information to suggest something
Do you use helm + kubernetes? In case of “yes” - does check pod start? Can you show it’s logs?
the pod doesnt start no, it simply hangs and gets terminated
and yes I use the official helm chart with the only custom part being: is adding an ingress to the webapp on ssl
database:
secretName: ''
secretValue: ''
deploymentMode: oss
edition: community
env_vars: {}
jobs:
kube:
annotations: {}
images:
busybox: ''
curl: ''
socat: ''
labels: {}
main_container_image_pull_secret: ''
nodeSelector: {}
tolerations: []
resources:
limits:
cpu: 2
memory: 8Gi
requests:
cpu: 0.5
memory: 128Mi```
this app has been working for 200 days (this is not a fresh install)
stopped working all the sudden 2 days ago, not upgraded or changed anything server side
Do you see any Warnings in kubernetes events?
> the pod doesnt start no, it simply hangs and gets terminated
Do you get init error similar to this?
*btw, this screenshot is from https://github.com/derailed/k9s|k9s - cool tool for k8s management
ill quickly check, but yes it does end like that
from what I can tell its struggeling to copy a config file in under 60 seconds
Can you check the logs of the failed pod?
Maybe there are something like this (it’s my usual problem with Airbyte)
thats exactly my error yes
cleanup ran so ive started a fresh sync
anything that can be done to slim down the config or make it faster again
I don’t think the problem with the config sizes (but i can be wrong)
From what i saw the configs are quite small
The good solution is to move from kubectl cp
to more reliable tool (just my opinion, i’m not part of the Airbyte team, just a user)
Meanwhile, you can inspect your kubernetes cluster. In case kubectl cp
fails regulary there are maybe some problems with network/cluster controller
we are busy upgrading from 1.25 to 1.27 in hopes it helps
but yes this is completely random