Scaling airbyte on k8s - increased job parallelism

context

Based on the info in the docs, there are a couple of options for adjusting job parallelism.

  • SUBMITTER_NUM_THREADS - hard upper limit on the total number of concurrent jobs that can run across the entire system.
  • MAX_*_WORKERS - upper limit on the number of concurrent jobs that a single airbyte worker can run.
  • TEMPORAL_WORKER_PORTS - my understanding on this is still a little hazy, but it sounds like this should be equal to the sum of the MAX_*_WORKERS variables or else jobs will hang? Is this correct?
  • replicas of airbyte-worker - increases the number of airbyte workers running jobs. each airbyte worker can run up to MAX_*_WORKERS jobs concurrently.

Guidance needed

Could you give a concrete example of how to adjust these values to be able to run 50/100/1000 concurrent syncs?

Is there any more info on the relationship between TEMPORAL_WORKER_PORTS and MAX_*_WORKERS and how these values interact?

In the Kubernetes deployment the one worker is limited to 40 parallel jobs, based on the TEMPORAL_WORKER_PORTS, if you need a bigger number you need to have multiple workers:
To run 100 sync jobs:

  • Number of Workers: 3
  • SUBMITTER_NUM_THREADS = 200
  • MAX_SYNC_WORKERS = 100

Read more here: On Kubernetes (Beta) | Airbyte Documentation

Thank you, that is exactly what I was looking for! I’ll give that a shot