Failed to create pod for write step in abctl migration

Summary

The user is experiencing issues with syncs in abctl migration, specifically getting a ‘Failed to create pod for write step’ error message. They mention that the syncs used to run smoothly with docker-compose and are seeking insights from the community.


Question

Hi Everyone,

I’ve just migrated to abctl and I get lots of fails and hang time in my syncs, finally ending up with this message:
io.airbyte.workers.exception.WorkerException: Failed to create pod for write step

These syncs used to run smoothly with docker-compose. Any insights about this anyone ?

Bests



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["failed-to-create-pod", "abctl-migration", "syncs", "docker-compose"]

and finally this one:

	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source)
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456)
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129)
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61)
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44)
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4552)
	at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126)
	at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84)
	at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4552)
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4395)
	at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50)
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28)
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12)
	at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64)
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43)
	at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39)
	at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78)
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107)
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Init container of orchestrator pod failed to start within allotted timeout of 900 seconds. (Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-12532-attempt-1] in namespace [airbyte-abctl].)
	at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:118)
	at io.airbyte.workload.launcher.pods.KubePodClient.launchReplication(KubePodClient.kt:97)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:43)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24)
	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42)
	... 53 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-12532-attempt-1] in namespace [airbyte-abctl].
	at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:944)
	at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98)
	at io.fabric8.kubernetes.client.extension.ResourceAdapter.waitUntilCondition(ResourceAdapter.java:175)
	at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:108)
	at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:104)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:253)
	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:253)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInitCustomCheck(KubePodLauncher.kt:104)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInit(KubePodLauncher.kt:66)
	at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:115)
	... 57 more
', type='java.lang.RuntimeException', nonRetryable=false```

Hello <@U05UPQR97LN> sorry about the migration problem. It is likely some pod/service didn’t start properly.

HI <@U01MMSDJGC9>, thanks for your answer. Do you think I should stop and re-install ? Or do you have something in mind to fix the issue ? All my pipelines are actually stuck/failing.

Are you running on your local machine or a VM?

EC2 instance t.large

My guess is that abctl is resource constrained and the pods are not able to spin up. This is an issue several people have run into now. I will work with the team to get a plan together to address our resource usage issue. In the mean time, I would suggest moving to a t{2,3}.2xl

Hi <@U07C8CCC68Y>, thanks for the insights. Our instance is yet a t3.xlarge, and seem to have more than half of its compute power that is left unused. Not familiar with kube, but if your guess is correct, maybe there are some system level configs that can be set up to let the service use more resources (I’m thinking about something that would be similar to docker configs).

Yeah we are working on better default settings for smaller machines as we speak. Hoping to have this working on an xlarge by the end of the week if not sooner.

Right now the connector pods are asking for a large amount of resources as that gives the best performance.

Hey !

I’ve increased the size this morning and it’s now running smoothly. Thanks for the support, it’s very appreciated !

Fantastic, great to hear. We will also be pushing a new release today with a --low-resource flag that should help if you want to move to a smaller instance type in the future.