Summary
After upgrading Airbyte to the latest version, encountering issues with connectors failing to sync data due to Kubernetes pod initialization timeouts. User suspects the problem is related to Kubernetes even though they did not configure it. Seeking insights on how to resolve the issue.
Question
Hello guys I’m new to Airbyte and me and my team at work are having some trouble after we upgraded to the latest version.
We used to have an Airbyte (v0.50.35) deployment running on GCP via docker-compose and it was working fine, but we decided to upgrade to the latest version (v0.63.14) and now none of our connectors work and we can’t sync any data.
In order to upgrade our installation, we followed the official upgrade guide (https://docs.airbyte.com/using-airbyte/getting-started/oss-quickstart#migrating-from-docker-compose-optional) and when we ran this command:
abctl local install --migrate
We got a warning message saying that some Kubernetes pod(s) failed to initialize due to a timeout problem. It automatically retried a few times and eventually succeeded, so we didn’t give it much attention.
The problem is that we are getting a very similar issue when our connections start synchronizing data from our sources and we have no clue on what we should do. In the synchronization logs of one of our connectors (we didn’t change its definition after we upgraded), I see this error message:
message='io.airbyte.workers.exception.WorkloadLauncherException: io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workload.launcher.pods.KubeClientException: Init container of orchestrator pod failed to start within allotted timeout of 900 seconds. (Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-2016-attempt-1] in namespace [airbyte-abctl].) at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) at reactor.core.publisher.Mono.subscribe(Mono.java:4395) at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Init container of orchestrator pod failed to start within allotted timeout of 900 seconds. (Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-2016-attempt-1] in namespace [airbyte-abctl].) at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:118) at io.airbyte.workload.launcher.pods.KubePodClient.launchReplication(KubePodClient.kt:97) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:43) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ... 53 more Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-2016-attempt-1] in namespace [airbyte-abctl]. at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:944) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98) at io.fabric8.kubernetes.client.extension.ResourceAdapter.waitUntilCondition(ResourceAdapter.java:175) at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:108) at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:104) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:253) at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) at dev.failsafe.Functions.lambda$get$0(Functions.java:46) at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:253) at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInitCustomCheck(KubePodLauncher.kt:104) at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInit(KubePodLauncher.kt:66) at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:115) ... 57 more ', type='java.lang.RuntimeException', nonRetryable=false
We didn’t configure anything to do with Kubernetes (we don’t even have kubectl installed), but from the log message, it’s safe to say that this seem to be related with Kubernetes failing to spin up pods to execute our workloads.
I’m running Airbyte on a VM on Debian 11 on GCP.
Can someone offer any insights on how to solve this problem? My first guess would be to increase the timeout period, but this is my first time working with Airbyte and I don’t really know what I’m doing. I was just following the documentation’s instructions when the problem started.
Any ideas?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.