Troubleshooting worker downscaling issue in local Airbyte deployment

Summary

The user is facing an issue with downscaling workers in a local Airbyte deployment despite configuring the values file. They are seeking advice on whether the config file is correct.


Question

Hi Team,
Im trying to deploy airbyte locally via abctl. My values file look like below

  env_vars:
    JOB_MAIN_CONTAINER_MEMORY_REQUEST: "8Gi"
    JOB_MAIN_CONTAINER_MEMORY_LIMIT: "16Gi"

  jobs:
    resources:
      requests:
        memory: "8Gi"
        cpu: "500m"
      limits:
        memory: "16Gi"
        cpu: "1000m"

worker:
  replicaCount: 1
  extraEnvs:
    - name: MAX_SYNC_WORKERS
      value: "2"
    - name: MAX_SPEC_WORKERS
      value: "2"
    - name: MAX_CHECK_WORKERS
      value: "2"
    - name: MAX_DISCOVER_WORKERS
      value: "2"
    - name: MAX_NOTIFY_WORKERS
      value: "2"

server:
  extraEnvs:
    - name: HTTP_IDLE_TIMEOUT
      value: 20m
    - name: READ_TIMEOUT
      value: 30m```
But even with this one, Im not able to downscale workers.
```airbyte-abctl-worker-64dd58fddb-trt6p                     1/1     Running     0          55m
airbyte-abctl-workload-api-server-55b9cb45-49z5x          1/1     Running     0          55m
airbyte-abctl-workload-launcher-659c8dcbcb-7gftm          1/1     Running     0          55m
airbyte-db-0                                              1/1     Running     0          57m
airbyte-minio-0                                           1/1     Running     0          57m
destination-s3-write-15-0-aujdx                           5/5     Running     0          21m
orchestrator-repl-job-15-attempt-0                        1/1     Running     0          21m
source-bigquery-read-15-0-kvtlc                           4/4     Running     0          21m```
Can someone please advise, if the config file is correct?

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1723707022751499) if you want 
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["airbyte", "local-deployment", "abctl", "worker-downscaling", "config-file"]
</sub>

Can you explain more about do you mean with:
> But even with this one, Im not able to downscale workers.

Hi <@U01MMSDJGC9> I was expecting MAX_SYNC_WORKERS to limit the pods deployed for read and write. But local deployment shows 5 s3 write pods and 4 bigquery read pods.

Im facing memory footprint growing on workers, eventually getting them unresponsive. I was expecting to reduce the workers and try if that works.

2024-08-15 10:21:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 3.48 GB (3563.53125 MB), %% used: 1.0 | Queue `ga4_au_site_events_weekly`, num records: 5084192, num bytes: 3.12 GB, allocated bytes: 3.12 GB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2024-08-15 10:22:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 3.48 GB (3563.53125 MB), %% used: 1.0 | Queue `ga4_au_site_events_weekly`, num records: 5084192, num bytes: 3.12 GB, allocated bytes: 3.12 GB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2024-08-15 10:22:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 2
2024-08-15 10:23:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 3.48 GB (3563.53125 MB), %% used: 1.0 | Queue `ga4_au_site_events_weekly`, num records: 5084192, num bytes: 3.12 GB, allocated bytes: 3.12 GB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2024-08-15 10:23:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 2
2024-08-15 10:24:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 3.48 GB (3563.53125 MB), %% used: 1.0 | Queue `ga4_au_site_events_weekly`, num records: 5084192, num bytes: 3.12 GB, allocated bytes: 3.12 GB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2024-08-15 10:24:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 2
2024-08-15 10:25:32 destination > INFO pool-2-thread-1 i.a.c.i.d.a.b.BufferManager(printQueueInfo):94 [ASYNC QUEUE INFO] Global: max: 3.48 GB, allocated: 3.48 GB (3563.53125 MB), %% used: 1.0 | Queue `ga4_au_site_events_weekly`, num records: 5084192, num bytes: 3.12 GB, allocated bytes: 3.12 GB | State Manager memory usage: Allocated: 10 MB, Used: 0 bytes, percentage Used 0.0
2024-08-15 10:25:32 destination > INFO pool-5-thread-1 i.a.c.i.d.a.FlushWorkers(printWorkerInfo):127 [ASYNC WORKER INFO] Pool queue size: 0, Active threads: 2```

Hey <@U07GKTS64KZ> - if it helps, the numbers you see in the output are actually the number of containers the pod has. not all containers are about replication so this might be a bit misleading interpreation that 5 containers are writing or 4 containers are reading :slightly_smiling_face: