Error in GKE Helm Deployment for Airbyte Connection Setup

Summary

The user is experiencing errors in their GKE Helm deployment when setting up a new connection in Airbyte. The error logs indicate a potential deadlock issue in the workflow processing.


Question

Hello, i’m experiencing errors in my GKE helm deployment helm install airbyte-test airbyte/airbyte when i’m setting up new connection (you can see on screenshot at which step)
can someone help me please?
i can see error logs only in worker pod:

2024-08-08 17:37:51 WARN i.t.i.r.ReplayWorkflowTaskHandler(failureToWFTResult):279 - Workflow task processing failure. startedEventId=3, WorkflowId=254d5fff-6e44-45d6-8de7-c83a68c90bc1, RunId=1d97f5a2-cd6e-445a-9fa5-5f5a91fb668d. If seen continuously the workflow might be stuck.
io.temporal.internal.statemachines.InternalWorkflowTaskException: Failure handling event 3 of type 'EVENT_TYPE_WORKFLOW_TASK_STARTED' during execution. {WorkflowTaskStartedEventId=3, CurrentStartedEventId=3}
        at io.temporal.internal.statemachines.WorkflowStateMachines.createEventProcessingException(WorkflowStateMachines.java:373) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:297) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleEvent(WorkflowStateMachines.java:260) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.applyServerHistory(ReplayWorkflowRunTaskHandler.java:249) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTaskImpl(ReplayWorkflowRunTaskHandler.java:231) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler.handleWorkflowTask(ReplayWorkflowRunTaskHandler.java:165) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTaskWithQuery(ReplayWorkflowTaskHandler.java:133) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowTaskHandler.handleWorkflowTask(ReplayWorkflowTaskHandler.java:98) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handleTask(WorkflowWorker.java:413) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:320) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.worker.WorkflowWorker$TaskHandlerImpl.handle(WorkflowWorker.java:261) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.RuntimeException: WorkflowTask: failure executing SCHEDULED->WORKFLOW_TASK_STARTED, transition history is [CREATED->WORKFLOW_TASK_SCHEDULED]
        at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:163) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:103) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:84) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:419) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:295) ~[temporal-sdk-1.22.3.jar:?]
        ... 13 more
Caused by: io.temporal.internal.sync.PotentialDeadlockException: Potential deadlock detected. Workflow thread "workflow-method-254d5fff-6e44-45d6-8de7-c83a68c90bc1-1d97f5a2-cd6e-445a-9fa5-5f5a91fb668d" didn't yield control for over a second. {detectionTimestamp=1723138671311, threadDumpTimestamp=1723138671317}

workflow-method-254d5fff-6e44-45d6-8de7-c83a68c90bc1-1d97f5a2-cd6e-445a-9fa5-5f5a91fb668d
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1027)
        at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
        at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
        at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:526)
        at io.temporal.api.command.v1.Command.toBuilder(Command.java:1198)
        at io.temporal.api.command.v1.Command.newBuilder(Command.java:1190)
        at io.temporal.internal.statemachines.StateMachineCommandUtils.createRecordMarker(StateMachineCommandUtils.java:32)
        at io.temporal.internal.statemachines.StateMachineCommandUtils.<clinit>(StateMachineCommandUtils.java:29)
        at io.temporal.internal.statemachines.VersionStateMachine$InvocationStateMachine.createMarkerExecuting(VersionStateMachine.java:243)
        at io.temporal.internal.statemachines.DynamicTransitionAction.apply(DynamicTransitionAction.java:39)
        at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:159)
        at io.temporal.internal.statemachines.StateMachine.handleExplicitEvent(StateMachine.java:93)
        at io.temporal.internal.statemachines.EntityStateMachineBase.explicitEvent(EntityStateMachineBase.java:95)
        at io.temporal.internal.statemachines.VersionStateMachine.getVersion(VersionStateMachine.java:383)
        at io.temporal.internal.statemachines.WorkflowStateMachines.getVersion(WorkflowStateMachines.java:950)
        at io.temporal.internal.replay.ReplayWorkflowContextImpl.getVersion(ReplayWorkflowContextImpl.java:304)
        at io.temporal.internal.sync.SyncWorkflowContext.getVersion(SyncWorkflowContext.java:934)
        at io.temporal.internal.sync.WorkflowInternal.getVersion(WorkflowInternal.java:506)
        at io.temporal.workflow.Workflow.getVersion(Workflow.java:947)
        at io.airbyte.workers.temporal.check.connection.CheckConnectionWorkflowImpl.checkUseWorkloadApiFlag(CheckConnectionWorkflowImpl.java:75)
        at io.airbyte.workers.temporal.check.connection.CheckConnectionWorkflowImpl.run(CheckConnectionWorkflowImpl.java:50)
        at CheckConnectionWorkflowImplProxy.run$accessor$RhNsiIHH(Unknown Source)
        at CheckConnectionWorkflowImplProxy$auxiliary$b78tpP6d.call(Unknown Source)
        at io.airbyte.micronaut.temporal.TemporalActivityStubInterceptor.execute(TemporalActivityStubInterceptor.java:79)
        at CheckConnectionWorkflowImplProxy.run(Unknown Source)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:339)
        at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:314)
        at io.temporal.internal.sync.WorkflowExecutionHandler.runWorkflowMethod(WorkflowExecutionHandler.java:70)
        at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:135)
        at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102)
        at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:107)
        at io.temporal.worker.ActiveThreadReportingExecutor.lambda$submit$0(ActiveThreadReportingExecutor.java:53)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

        at io.temporal.internal.sync.WorkflowThreadContext.runUntilBlocked(WorkflowThreadContext.java:261) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.sync.WorkflowThreadImpl.runUntilBlocked(WorkflowThreadImpl.java:302) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.sync.DeterministicRunnerImpl.runUntilAllBlocked(DeterministicRunnerImpl.java:229) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.sync.SyncWorkflow.eventLoop(SyncWorkflow.java:192) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowExecutor.eventLoop(ReplayWorkflowExecutor.java:72) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.replay.ReplayWorkflowRunTaskHandler$StatesMachinesCallbackImpl.eventLoop(ReplayWorkflowRunTaskHandler.java:406) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.eventLoop(WorkflowStateMachines.java:663) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.access$700(WorkflowStateMachines.java:53) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines$WorkflowTaskCommandsListener.workflowTaskStarted(WorkflowStateMachines.java:1171) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowTaskStateMachine.handleCompleted(WorkflowTaskStateMachine.java:139) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowTaskStateMachine.handleStarted(WorkflowTaskStateMachine.java:129) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.FixedTransitionAction.apply(FixedTransitionAction.java:46) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.StateMachine.executeTransition(StateMachine.java:159) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.StateMachine.handleHistoryEvent(StateMachine.java:103) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.EntityStateMachineBase.handleEvent(EntityStateMachineBase.java:84) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleSingleEvent(WorkflowStateMachines.java:419) ~[temporal-sdk-1.22.3.jar:?]
        at io.temporal.internal.statemachines.WorkflowStateMachines.handleEventsBatch(WorkflowStateMachines.java:295) ~[temporal-sdk-1.22.3.jar:?]
        ... 13 more```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1723139376926849) if you want 
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["gke-helm-deployment", "airbyte-connection-setup", "error-logs", "potential-deadlock", "workflow-processing"]
</sub>

This is most likely an issue with your ingress/load balancer timeout while it’s waiting to spin up the check container (which may have had to wait on a node being added).

If you’re using an HTTP/S LB on GCP as an ingress, the default timeout is 30 seconds. you really only see this during connection check and schema discovery, because it happens in the background when real syncs are running.

I’d recommend bumping this to 600 or 1200 seconds. If you’re using an LB like described above, you would do this on the Backend Config of your LB. If you’re using nginx, follow those instructions.

While less common than the ingress/LB case, there can also be timeouts in discovery related to slow endpoints or very large database schemas. The Enterprise docs include a note on handling that in values.yaml using ExtraEnv to increase HTTP_IDLE_TIMEOUT and READ_TIMEOUT, which also applies to OSS:
https://docs.airbyte.com/enterprise-setup/scaling-airbyte#schema-discovery-timeouts

thank you for an answer. the problem was with wrong Object Types declaration in NetSuite plugin