Changing Kubernetes Airbyte worker configuration

Summary

The user is trying to change the Kubernetes Airbyte worker configuration using Helm and values.yml file. They are seeking validation on whether the provided approach is correct.


Question

Hi!
I would like to change the Kubernetes Airbyte worker configuration described here: https://docs.airbyte.com/operator-guides/configuring-airbyte#kubernetes-only
I am trying to do that with helm upgrade airbyte-release1 airbyte/airbyte --values values.yml --debug where the values file looks like this:

  version: 0.50.43
global:
  env_vars:
    JOB_MAIN_CONTAINER_CPU_LIMIT: 1```
Is this correct and can I validate somehow that it works correctly?

I also tried another approach, which seems to be working, but I want to follow the documentation you provided.
```airbyte:
  version: 0.50.43
global:
  jobs:
    resources:
      limits:
        cpu: 1
      requests:
        cpu: 1```

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1708439794364159) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["kubernetes", "airbyte-worker-configuration", "helm", "values.yml", "validation"]
</sub>

The values approach (your second example) is the one I use.

those ultimately result in environment variables set on the worker pod; which uses that as configuration when it spins up the job pods

i suppose that is why it is documented as environment variable; in effect you could set environment variables; but you shouldn’t; not for kubernetes/helm anyhow

after setting those global values you should see this in your airbyte-airbyte-env configmap:

 JOB_MAIN_CONTAINER_CPU_REQUEST: "1"```

these configmap keys are mapped to the worker pod:

      valueFrom:
        configMapKeyRef:
          key: JOB_MAIN_CONTAINER_CPU_REQUEST
          name: airbyte-airbyte-env
    - name: JOB_MAIN_CONTAINER_CPU_LIMIT
      valueFrom:
        configMapKeyRef:
          key: JOB_MAIN_CONTAINER_CPU_LIMIT
          name: airbyte-airbyte-env```

at least on my install (i’m on 0.50.50) I can see that the source and dest main container matches what i’ve put into this configuration.

https://artifacthub.io/packages/helm/airbyte/worker?modal=values&amp;path=global.jobs

Those settings are documented in the helm chart

however, it would appear not all settings are exposed this way. And this can trip you up since you MUST use the <http://global.jobs|global.jobs> section to set the ones that are exposed this way; and you MUST NOT use extraEnv - otherwise there will be 2 conflicting env configurations; one that references the configmap; and your explicit env setting: k8s don’t like this

however, for the settings that are not exposed this way; you MAY use the extraEnv with the caveat that; if a new version of the chart comes out that does expose a value for this; it will break your install

Thanks! This is very helpful as I am still sonewhat new to kubernetes.

Setting the resource limits works fine as you explained and I can see that in the configmap!

I still don’t seem to grasp how to set variables like MAX_SYNC_WORKER though.

If I understood your explanation I would have to use extraEnv in this case:

  version: 0.50.43
worker:
  extraEnv:
    MAX_SYNC_WORKERS: 2```

However after a helm upgrade and redeployment, when I start a sync I still see this with kubectl get pods:
source-mysql-read-39-0-frrjc 4/4 Running 0 19s

In the worker deployment I can see that the variable is visible, but I still get 4/4 containers per source pod as shown above.
Excerpt from the worker deployment:

          valueFrom:
            configMapKeyRef:
              key: WORKER_STATE_STORAGE_TYPE
              name: airbyte-release1-airbyte-env
        - name: CONTAINER_ORCHESTRATOR_ENABLED
          value: "false"
        - name: MAX_SYNC_WORKERS
          value: "2"
        image: airbyte/worker:0.50.48
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3```

Oh, my bad. What you see in the READY column in kubectl get pods are not the workers.

source-mysql-read-39-0-frrjc                                 4/4     Running     0               19s```
It seems to be the containers for input, output, heartbeat server and such.

I’ll open a new thread since I assume the configuration <@U04GVA45P43> proposed works and I am just missing a way to validate it.

I think you would have to use extraEnv to set that yes