Airbyte upgrade causing connections to be stuck in Starting status

Summary

After upgrading Airbyte from v0.60.1 to v0.64.1 on GKE using Helm, connections are stuck in Starting status and not progressing.


Question

Hi Everyone. I have airbyte running in GKE and I’ve upgraded from v0.60.1 to v0.64.1 by using https://artifacthub.io/packages/helm/airbyte/airbyte and right now the connections are stuck. The sync are in Starting status and hanging there for hours



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte", "upgrade", "GKE", "connections", "stuck", "starting status", "Helm"]

I’ve got this error. It seems after the upgrade some permissions change
io.airbyte.workload.launcher.pipeline.stages.model.StageError: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod airbyte-googlecalendar-source-check-378-3-ftrvw. at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:38) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.$$access$$apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Exec.dispatch(Unknown Source) at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:129) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:138) at io.airbyte.workload.launcher.pipeline.stages.$LaunchPodStage$Definition$Intercepted.apply(Unknown Source) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.apply(LaunchPodStage.kt:24) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) at reactor.core.publisher.Mono.subscribe(Mono.java:4552) at reactor.core.publisher.Mono.subscribeWith(Mono.java:4634) at reactor.core.publisher.Mono.subscribe(Mono.java:4395) at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.base/java.lang.Thread.run(Thread.java:1583) Caused by: io.airbyte.workload.launcher.pods.KubeClientException: Failed to create pod airbyte-googlecalendar-source-check-378-3-ftrvw. at io.airbyte.workload.launcher.pods.KubePodClient.launchConnectorWithSidecar(KubePodClient.kt:352) at io.airbyte.workload.launcher.pods.KubePodClient.launchCheck(KubePodClient.kt:279) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:44) at io.airbyte.workload.launcher.pipeline.stages.LaunchPodStage.applyStage(LaunchPodStage.kt:24) at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ... 53 more Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: <https://10.130.128.1:443/api/v1/namespaces/airbyte/pods/airbyte-googlecalendar-source-check-378-3-ftrvw?fieldManager=fabric8>. Message: pods "airbyte-googlecalendar-source-check-378-3-ftrvw" is forbidden: User "system:serviceaccount:airbyte:airbyte-admin" cannot patch resource "pods" in API group "" in the namespace "airbyte". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=airbyte-googlecalendar-source-check-378-3-ftrvw, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods "airbyte-googlecalendar-source-check-378-3-ftrvw" is forbidden: User "system:serviceaccount:airbyte:airbyte-admin" cannot patch resource "pods" in API group "" in the namespace "airbyte", metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:419) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:764) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:231) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236) at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:1179) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.serverSideApply(BaseOperation.java:98) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:57) at io.airbyte.workload.launcher.pods.KubePodLauncher$create$1.invoke(KubePodLauncher.kt:52) at io.airbyte.workload.launcher.pods.KubePodLauncher.runKubeCommand$lambda$0(KubePodLauncher.kt:307) at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243) at dev.failsafe.Functions

The database wasn’t migrated properly, do you know how this can be solved?
select exists (select "public"."actor_definition"."id", "public"."actor_definition"."name", "public"."actor_definition"."icon", "public"."actor_definition"."actor_type", "public"."actor_definition"."source_type", "public"."actor_definition"."created_at", "public"."actor_definition"."updated_at", "public"."actor_definition"."tombstone", "public"."actor_definition"."resource_requirements", "public"."actor_definition"."public", "public"."actor_definition"."custom", "public"."actor_definition"."max_seconds_between_messages", "public"."actor_definition"."default_version_id", "public"."actor_definition"."icon_url", "public"."actor_definition"."support_refreshes" from "public"."actor_definition" where "public"."actor_definition"."id" = cast(? as uuid))
ERROR: column actor_definition.support_refreshes does not exist

The las error is trying to downgrade the airbyte version

It seems like downgrading to 0.453.2 works well. :sparkler:

This is related to a rolebinding problem that crept up in a recent update. There are some thoughts/workarounds on this GitHub issue:
https://github.com/airbytehq/airbyte/issues/42859

Thank you for the reply

We have rolled out a change that should help with the roll binding issue at the end of last week. We have another change that we will be landing in the next few days to make this work better.