Error upgrading Airbyte using Helm chart with airbyte-workers failure

Summary

When upgrading Airbyte to the latest version using the Helm chart, encountering an error with airbyte-workers failing due to missing key in ConfigMap.


Question

Hi, I am trying to upgrade airbyte to the latest version by using the Helm chart (the chart version is 0.53.227). In installing the charts, the airbyte-workers seem to fail with the error:
Error: couldn't find key USE_STREAM_CAPABLE_STATE in ConfigMap dp-airbyte/airbyte-airbyte-env
Has anyone seen this before?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["upgrade", "airbyte", "helm-chart", "airbyte-workers", "error", "configmap"]

Did you upgrade from what version? Did you try to delete the worker pod and restart it?

<@U01MMSDJGC9> I was attempting to upgrade from 0.50.29, and yes I did delete the worker pod to restart it

This variable was deleted last October and has a default value to true. If you’re using latest version the worker doesn’t require this variable anymore. Not sure why this error pops out.

Can you share the output of kubectl describe pod for your workr?

<@U01MMSDJGC9> I have actually tried the very latest helm chart released today (0.53.250) and I see that the missing key from the configmap error is gone. However, the worker now initialises but crashes with the below. I wonder if it’s due to the temporal version, I noticed I was massively out of date with 1.7.0, I am working my way into progressively upgrading it (as can only move one major version at a time).


Path Taken: new ApplicationInitializer() --&gt; ApplicationInitializer.checkConnectionActivities --&gt; List.checkConnectionActivities([CheckConnectionActivity checkConnectionActivity]) --&gt; new CheckConnectionActivityImpl(WorkerConfigsProvider workerConfigsProvider,ProcessFactory processFactory,SecretsRepositoryReader secretsRepositoryReader,Path workspaceRoot,WorkerEnvironment workerEnvironment,LogConfigs logConfigs,AirbyteApiClient airbyteApiClient,String airbyteVersion,AirbyteMessageSerDeProvider serDeProvider,AirbyteProtocolVersionedMigratorFactory migratorFactory,FeatureFlags featureFlags,FeatureFlagClient featureFlagClient,GsonPksExtractor gsonPksExtractor,WorkloadApi workloadApi,WorkloadIdGenerator workloadIdGenerator,[JobOutputDocStore jobOutputDocStore],MetricClient metricClient) --&gt; new JobOutputDocStore([DocumentStoreClient documentStoreClient],MetricClient metricClient)
io.micronaut.context.exceptions.BeanInstantiationException: Error instantiating bean of type  [io.airbyte.workers.workload.JobOutputDocStore]```

the Events from kubectl describe pod:

  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Normal   Nominated         7m32s                  karpenter          Pod should schedule on: nodeclaim/general-apps-ondemand-c4xxz
  Warning  FailedScheduling  6m46s (x7 over 7m33s)  default-scheduler  0/25 nodes are available: 1 Insufficient cpu, 14 node(s) didn't match Pod's node affinity/selector, 5 node(s) had untolerated taint {controlInstance: }, 5 node(s) had untolerated taint {spotInstanceClassification: }. preemption: 0/25 nodes are available: 1 No preemption victims found for incoming pod, 24 Preemption is not helpful for scheduling.
  Warning  FailedScheduling  6m4s (x3 over 6m25s)   default-scheduler  0/26 nodes are available: 1 Insufficient cpu, 1 node(s) had untolerated taint {<http://node.cilium.io/agent-not-ready|node.cilium.io/agent-not-ready>: true}, 14 node(s) didn't match Pod's node affinity/selector, 5 node(s) had untolerated taint {controlInstance: }, 5 node(s) had untolerated taint {spotInstanceClassification: }. preemption: 0/26 nodes are available: 1 No preemption victims found for incoming pod, 25 Preemption is not helpful for scheduling.
  Normal   Scheduled         5m54s                  default-scheduler  Successfully assigned dp-airbyte/airbyte-worker-8cff5f6d-rdxff to 
  Normal   Pulling           5m54s                  kubelet            Pulling image "airbyte/worker:0.50.51"
  Normal   Pulled            5m21s                  kubelet            Successfully pulled image "airbyte/worker:0.50.51" in 32.173502068s
  Normal   Created           3m5s (x5 over 5m21s)   kubelet            Created container airbyte-worker-container
  Normal   Started           3m5s (x5 over 5m13s)   kubelet            Started container airbyte-worker-container
  Normal   Pulled            3m5s (x4 over 5m3s)    kubelet            Container image "airbyte/worker:0.50.51" already present on machine
  Warning  BackOff           48s (x20 over 4m53s)   kubelet            Back-off restarting failed container```

Nice move. Team is working to release more stable version and upgrade process for the Helm Chart

ok so can I do anything to solve the Micronaut server error?

Also, <@U01MMSDJGC9> I spoke too soon :confused: some worker pods were already running before the upgrade and were not replaced. So to test I forced a replacement by deleting them: not they still crash with the Key error:

  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  22s               default-scheduler  0/27 nodes are available: 1 node(s) had untolerated taint {<http://node.cilium.io/agent-not-ready|node.cilium.io/agent-not-ready>: true}, 14 node(s) didn't match Pod's node affinity/selector, 2 Insufficient cpu, 5 node(s) had untolerated taint {controlInstance: }, 5 node(s) had untolerated taint {spotInstanceClassification: }. preemption: 0/27 nodes are available: 2 No preemption victims found for incoming pod, 25 Preemption is not helpful for scheduling.
  Normal   Scheduled         21s               default-scheduler  Successfully assigned dp-airbyte/airbyte-worker-5b4f8854dc-5dx2s
  Normal   Nominated         21s               karpenter          Pod should schedule on: nodeclaim/general-apps-ondemand-vzc76
  Normal   Pulled            9s (x3 over 20s)  kubelet            Container image <http://dkr.ecr.eu-west-1.amazonaws.com/docker.io/airbyte/worker:0.50.29|dkr.ecr.eu-west-1.amazonaws.com/docker.io/airbyte/worker:0.50.29>" already present on machine
  Warning  Failed            9s (x3 over 20s)  kubelet            Error: couldn't find key USE_STREAM_CAPABLE_STATE in ConfigMap dp-airbyte/airbyte-airbyte-env```

<@U054E1QL1HN> can you check if the worker deployment was updated? It doesn’t make sense the worker request this variable in latest version.