Summary
The sync process from PostgreSQL to BigQuery fails due to insufficient system resources. The logs show a Kubernetes timeout error while launching the replication pod.
Question
I see these kinds of logs when running a sync of a single table from pg to bq
2024-10-15 13:18:11 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CLAIM — (workloadId = c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync) — (dataplaneId = local)
2024-10-15 13:18:10 platform > Executing worker wrapper. Airbyte version: 1.1.0
2024-10-15 13:18:10 platform > Creating workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync
2024-10-15 13:18:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is pending
2024-10-15 13:18:11 INFO i.a.w.l.c.WorkloadApiClient(claim):75 - Claimed: true for c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync via API for local
2024-10-15 13:18:11 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CHECK_STATUS — (workloadId = c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync) — (dataplaneId = local)
2024-10-15 13:18:11 INFO i.a.w.l.p.s.CheckStatusStage(applyStage):59 - No pod found running for workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync
2024-10-15 13:18:11 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: BUILD — (workloadId = c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync) — (dataplaneId = local)
2024-10-15 13:18:11 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: MUTEX — (workloadId = c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync) — (dataplaneId = local)
2024-10-15 13:18:11 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):54 - Mutex key: c0134630-70db-4e74-849f-fa6a1362792b specified for workload: c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync. Attempting to delete existing pods...
2024-10-15 13:18:11 INFO i.a.w.l.p.s.EnforceMutexStage(applyStage):65 - Mutex key: c0134630-70db-4e74-849f-fa6a1362792b specified for workload: c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync found no existing pods. Continuing...
2024-10-15 13:18:11 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: LAUNCH — (workloadId = c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync) — (dataplaneId = local)
2024-10-15 13:18:11 INFO i.a.w.l.p.KubePodClient(launchReplication):84 - Launching replication pod: replication-job-4-attempt-0 with containers:
2024-10-15 13:18:11 INFO i.a.w.l.p.KubePodClient(launchReplication):85 - [source] image: airbyte/source-postgres:3.6.22 resources: ResourceRequirements(claims=[], limits={memory=24Gi, cpu=6000m}, requests={memory=24Gi, cpu=6000m}, additionalProperties={})
2024-10-15 13:18:11 INFO i.a.w.l.p.KubePodClient(launchReplication):86 - [destination] image: airbyte/destination-bigquery:2.9.0 resources: ResourceRequirements(claims=[], limits={memory=24Gi, cpu=6000m}, requests={memory=24Gi, cpu=6000m}, additionalProperties={})
2024-10-15 13:18:11 INFO i.a.w.l.p.KubePodClient(launchReplication):87 - [orchestrator] image: airbyte/container-orchestrator:1.1.0 resources: ResourceRequirements(claims=[], limits={memory=24Gi, cpu=6000m}, requests={memory=24Gi, cpu=6000m}, additionalProperties={})
2024-10-15 13:20:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:22:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:24:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:26:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:28:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:30:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:32:11 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync is claimed
2024-10-15 13:33:11 ERROR i.a.w.l.p.h.FailureHandler(apply):39 - Pipeline Error
io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workers.exception.ResourceConstraintException: Unable to start the REPLICATION pod. This may be due to insufficient system resources. Please check available resources and try again.
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) ~[micronaut-inject-4.6.5.jar:4.6.5]
at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:134) ~[micronaut-aop-4.6.5.jar:4.6.5]
at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) ~[io.airbyte.airbyte-metrics-metrics-lib-1.1.0.jar:?]
at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) ~[io.airbyte.airbyte-metrics-metrics-lib-1.1.0.jar:?]
at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:143) ~[micronaut-aop-4.6.5.jar:4.6.5]
at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Mono.subscribe(Mono.java:4560) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Mono.subscribe(Mono.java:4560) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Mono.subscribeWith(Mono.java:4642) ~[reactor-core-3.6.9.jar:3.6.9]
at reactor.core.publisher.Mono.subscribe(Mono.java:4403) ~[reactor-core-3.6.9.jar:3.6.9]
at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) ~[io.airbyte-airbyte-commons-temporal-core-1.1.0.jar:?]
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) ~[temporal-opentracing-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: io.airbyte.workers.exception.ResourceConstraintException: Unable to start the REPLICATION pod. This may be due to insufficient system resources. Please check available resources and try again.
at io.airbyte.workload.launcher.pods.KubePodClient.waitForPodInitComplete(KubePodClient.kt:313) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchReplication(KubePodClient.kt:105) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:47) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
... 53 more
Caused by: io.fabric8.kubernetes.client.KubernetesClientTimeoutException: Timed out waiting for [900000] milliseconds for [Pod] with name:[replication-job-4-attempt-0] in namespace [airbyte-abctl].
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:944) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.waitUntilCondition(BaseOperation.java:98) ~[kubernetes-client-6.12.1.jar:?]
at io.fabric8.kubernetes.client.extension.ResourceAdapter.waitUntilCondition(ResourceAdapter.java:175) ~[kubernetes-client-api-6.12.1.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitComplete$initializedPod$1.invoke(KubePodLauncher.kt:83) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher$waitForPodInitComplete$initializedPod$1.invoke(KubePodLauncher.kt:79) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$2(KubePodLauncher.kt:335) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.Functions.lambda$get$0(Functions.java:46) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376) ~[failsafe-3.3.2.jar:3.3.2]
at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112) ~[failsafe-3.3.2.jar:3.3.2]
at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand(KubePodLauncher.kt:335) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodLauncher.waitForPodInitComplete(KubePodLauncher.kt:79) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.waitForPodInitComplete(KubePodClient.kt:308) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pods.KubePodClient.launchReplication(KubePodClient.kt:105) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:47) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
... 53 more
2024-10-15 13:33:41 platform > Workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync has returned a terminal status of failure. Fetching output...
2024-10-15 13:33:41 platform > Replication output for workload c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync : null
2024-10-15 13:33:42 platform > Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=20, successiveCompleteFailures=1, totalCompleteFailures=1, successivePartialFailures=0, totalPartialFailures=0)
Backoff before next attempt: 10 seconds
2024-10-15 13:33:12 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToFailed):54 - Attempting to update workload: c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync to FAILED.
2024-10-15 13:33:12 INFO i.a.w.l.p.h.FailureHandler(apply):62 - Pipeline aborted after error for workload: c0134630-70db-4e74-849f-fa6a1362792b_4_0_sync.```
And after 10 mins or so it fails saying there might be insufficient resources. I'm using abctl to setup airbyte on an ec2 which is `t3.2xLarge`. I've also increased the memory and cpu limits in `values.yaml` file.
When I uninstall abctl and re-install, the sync process runs successfuly, but never after that when I click Run Sync
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1728999038592549) if you want
to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["sync-process", "postgresql", "bigquery", "insufficient-resources", "kubernetes-timeout", "abctl", "ec2", "t3.2xlarge", "values.yaml", "replication-pod"]
</sub>