Failed to create pod for write step in abctl migration

slack-user-airbyte · August 15, 2024, 6:12am

Summary

The user is experiencing issues with syncs in abctl migration, specifically getting a ‘Failed to create pod for write step’ error message. They mention that the syncs used to run smoothly with docker-compose and are seeking insights from the community.

Question

Hi Everyone,

I’ve just migrated to abctl and I get lots of fails and hang time in my syncs, finally ending up with this message:
io.airbyte.workers.exception.WorkerException: Failed to create pod for write step

These syncs used to run smoothly with docker-compose. Any insights about this anyone ?

Bests

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["failed-to-create-pod", "abctl-migration", "syncs", "docker-compose"]}

slack-user-airbyte · August 15, 2024, 6:18am

and finally this one:

	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source)
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456)
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129)
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61)
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44)
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138)
	at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4552)
	at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126)
	at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84)
	at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4552)
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4395)
	at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50)
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28)
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12)
	at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64)
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43)
	at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39)
	at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78)
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107)
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243)
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216)
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Init container of orchestrator pod failed to start within allotted timeout of 900 seconds. (Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-12532-attempt-1] in namespace [airbyte-abctl].)
	at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:118)
	at io.airbyte.workload.launcher.pods.KubePodClient.launchReplication(KubePodClient.kt:97)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:43)
	at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24)
	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42)
	... 53 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[orchestrator-repl-job-12532-attempt-1] in namespace [airbyte-abctl].
	at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:944)
	at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98)
	at io.fabric8.kubernetes.client.extension.ResourceAdapter.waitUntilCondition(ResourceAdapter.java:175)
	at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:108)
	at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitCustomCheck$initializedPod$1.invoke(KubePodLauncher.kt:104)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:253)
	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:253)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInitCustomCheck(KubePodLauncher.kt:104)
	at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInit(KubePodLauncher.kt:66)
	at io.airbyte.workload.launcher.pods.KubePodClient.waitOrchestratorPodInit(KubePodClient.kt:115)
	... 57 more
', type='java.lang.RuntimeException', nonRetryable=false```

slack-user-airbyte · August 15, 2024, 6:18am

Hello <@U05UPQR97LN> sorry about the migration problem. It is likely some pod/service didn’t start properly.

slack-user-airbyte · August 15, 2024, 6:18am

HI <@U01MMSDJGC9>, thanks for your answer. Do you think I should stop and re-install ? Or do you have something in mind to fix the issue ? All my pipelines are actually stuck/failing.

slack-user-airbyte · August 15, 2024, 6:18am

Are you running on your local machine or a VM?

slack-user-airbyte · August 15, 2024, 6:18am

EC2 instance t.large

slack-user-airbyte · August 15, 2024, 6:18am

My guess is that abctl is resource constrained and the pods are not able to spin up. This is an issue several people have run into now. I will work with the team to get a plan together to address our resource usage issue. In the mean time, I would suggest moving to a t{2,3}.2xl

slack-user-airbyte · August 15, 2024, 6:18am

Hi <@U07C8CCC68Y>, thanks for the insights. Our instance is yet a t3.xlarge, and seem to have more than half of its compute power that is left unused. Not familiar with kube, but if your guess is correct, maybe there are some system level configs that can be set up to let the service use more resources (I’m thinking about something that would be similar to docker configs).

slack-user-airbyte · August 15, 2024, 6:18am

Yeah we are working on better default settings for smaller machines as we speak. Hoping to have this working on an xlarge by the end of the week if not sooner.

slack-user-airbyte · August 15, 2024, 6:18am

Right now the connector pods are asking for a large amount of resources as that gives the best performance.

slack-user-airbyte · August 17, 2024, 6:17am

Hey !

I’ve increased the size this morning and it’s now running smoothly. Thanks for the support, it’s very appreciated !

slack-user-airbyte · August 17, 2024, 6:17am

Fantastic, great to hear. We will also be pushing a new release today with a --low-resource flag that should help if you want to move to a smaller instance type in the future.

Topic		Replies	Views
Error running connections after migrating to abctl Platform Questions platform , docker-compose , logging , abctl , pod	1	58	September 9, 2024
Issue with abctl migration causing app crash and 500 errors Platform Questions platform , bug , 500-error , ec2-instance , abctl-migration	1	9	August 9, 2024
Syncs failing with 'Failed to create pod orchestrator' error Platform Questions platform , question , syncs-failing , pod-orchestrator-error , settings-enable	7	52	June 24, 2024
Troubleshooting custom connector pod creation failure due to CPU limits in abctl migration Connector Questions connector , custom-connector , helm-charts , cpu-limits , abctl	5	154	August 15, 2024
Sync Process Fails Due to Insufficient Resources Platform Questions platform , bigquery , bug , ec2 , abctl	1	447	October 20, 2024

Failed to create pod for write step in abctl migration

Summary

Question

Related topics