Summary
Syncs are failing with error message ‘Failed to create pod orchestrator-repl-job-1367-attempt-0, pre-existing pod exists which didn’t advance out of the NOT_STARTED state.’ Need to know if there are any settings to enable to resolve this issue.
Question
Hi!
We are using airbyte version 0.42.0
Currently, all our syncs are failing with this error
"message": "Failed to create pod orchestrator-repl-job-1367-attempt-0, pre-existing pod exists which didn't advance out of the NOT_STARTED state.",
Are there any settings that we need to enable to get around this issue?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["syncs-failing", "pod-orchestrator-error", "settings-enable"]
Errors
"message": "Encoded failure",
"source": "JavaSDK",
"stackTrace": "",
"encodedAttributes": {
"message": "Failed to create pod orchestrator-repl-job-1367-attempt-0, pre-existing pod exists which didn't advance out of the NOT_STARTED state.",
"stack_trace": "io.airbyte.workers.sync.LauncherWorker.lambda$run$3(LauncherWorker.java:198)\nio.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:318)\nio.airbyte.workers.sync.LauncherWorker.run(LauncherWorker.java:116)\nio.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$6(TemporalAttemptExecution.java:202)\njava.base/java.lang.Thread.run(Thread.java:1589)\n"
},
"cause": {
"message": "Encoded failure",
"source": "JavaSDK",
"stackTrace": "",
"encodedAttributes": {
"message": "Timed out waiting for [300000] milliseconds for [Pod] with name:[orchestrator-repl-job-1367-attempt-0] in namespace [default].",
"stack_trace":```
Is it Kubernetes deployments ? Would you mind share how you’re deploying it ?
It seems the pods are being launched in default namespace. I’ll assume that the orchestrator doesn’t have access to launch pod in said namespace
Yes, it is kubernetes deployment, deployed using Kustomize.
Can you share your yaml ?
It seems you have some timeout set for the jobs
Hi <@U04RTG92KMF> all our syncs in prod are failing with same error. Can you share any thoughts on this
2024-04-23 17:40:35 - Additional Failure Information: Activity with activityType='RunWithJobOutput' failed: 'Activity task timed out'. scheduledEventId=77, startedEventId=78, activityId=e481d484-2c35-33f0-9e5e-a4b86949f5c3, identity='', retryState=RETRY_STATE_MAXIMUM_ATTEMPTS_REACHED