Sync jobs timing out after upgrading Airbyte deployment on GCP

Summary

After upgrading an Airbyte deployment on GCP, sync jobs are timing out without starting. Looking for assistance in resolving the issue.


Question

I recently upgraded an OSS deployment running with Helm/k8s on GCP from ~0.5x to 1.20, ensuring that outdated configuration was removed from the helm values and new requirements were satisfied. All of the services seem to be running correctly, and I can access the webapp, but all sync jobs seem to time out without ever starting. I’ll throw a few potentially-related errors in the thread in the hopes that someone has seen something similar. Any ideas?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["sync-jobs", "upgrade", "GCP", "timeout", "helm", "kubernetes", "configuration"]

This WARN i.a.c.s.c.JobConverter(getWorkspaceId):403 - Unable to retrieve workspace ID for job null
seems to occur on any sync job trigger

airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container java.lang.NullPointerException: null
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:904)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at com.google.common.cache.LocalCache.get(LocalCache.java:4016)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.persistence.job.WorkspaceHelper.lambda$getWorkspaceForJobId$4(WorkspaceHelper.java:162)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.persistence.job.WorkspaceHelper.handleCacheExceptions(WorkspaceHelper.java:231)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.persistence.job.WorkspaceHelper.getWorkspaceForJobId(WorkspaceHelper.java:162)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.commons.server.converters.JobConverter.getWorkspaceId(JobConverter.java:401)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.commons.server.converters.JobConverter.getAttemptLogs(JobConverter.java:320)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.commons.server.converters.JobConverter.getSynchronousJobRead(JobConverter.java:349)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.commons.server.handlers.ConnectorDefinitionSpecificationHandler.getSourceSpecificationRead(ConnectorDefinitionSpecificationHandler.java:145)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.commons.server.handlers.ConnectorDefinitionSpecificationHandler.getSpecificationForSourceId(ConnectorDefinitionSpecificationHandler.java:76)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.server.apis.SourceDefinitionSpecificationApiController.lambda$getSpecificationForSourceId$1(SourceDefinitionSpecificationApiController.java:46)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.kt:29)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.server.apis.SourceDefinitionSpecificationApiController.getSpecificationForSourceId(SourceDefinitionSpecificationApiController.java:46)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.airbyte.server.apis.$SourceDefinitionSpecificationApiController$Definition$Exec.dispatch(Unknown Source)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invokeUnsafe(AbstractExecutableMethodsDefinition.java:461)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.context.DefaultBeanContext$BeanContextUnsafeExecutionHandle.invokeUnsafe(DefaultBeanContext.java:4350)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:272)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.web.router.DefaultUriRouteMatch.execute(DefaultUriRouteMatch.java:38)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.http.server.RouteExecutor.executeRouteAndConvertBody(RouteExecutor.java:498)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.http.server.RouteExecutor.lambda$callRoute$5(RouteExecutor.java:475)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.core.execution.ExecutionFlow.lambda$async$1(ExecutionFlow.java:87)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at io.micronaut.core.propagation.PropagatedContext.lambda$wrap$3(PropagatedContext.java:211)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
airbyte-server-bbd99c6b7-5dmn5:airbyte-server-container 	at java.base/java.lang.Thread.run(Thread.java:1583)```

Do you use taints and tolerations in kubernetes for Airbyte?

nope — it’s a pretty basic setup

Is this on your own cluster or GKE?

I’m also seeing errors like 405 - Failure during reporting of activity result to the server. ActivityId = ee3d7b55-87d5-3c8a-9b1c-48b2bed6bdeb, ActivityType = RunWithWorkload, WorkflowId=check_6291_source, WorkflowType=CheckConnectionWorkflow, RunId=a9dbb574-5c3c-4def-ae20-da392ca73715

and warnings for WARN i.a.c.s.h.h.StatsAggregationHelper(hydrateWithStats):150 - Missing stats for job 6293 attempt 0

Are you seeing any errors along these lines?
https://github.com/airbytehq/airbyte/issues/42859

oh, good thought, but sadly no. The role and rolebinding appear to be in place.

hm. I’d also look if maybe there are any errors being thrown by the bootloader, especially related to SQL (including Temporal)

okay, it looks like the newer versions of the helm chart default to storing secrets in a k8s secret called {deployment-name}-airbyte-secrets but sync job pods are still being created with references to airbyte-config-secrets , which no longer exists

here’s the issue that resulted from this investigation https://github.com/airbytehq/airbyte/issues/48502