Understanding workers, jobs and concurrency

As per the documentation - all interactions with connectors are run as jobs performed by a Worker and workers can be spec worker, check connection worker, discovery worker or sync worker.

Question 1 - Can a worker perform only a single job at a time or even more than one ?

Also for configuring parallelization, we have different parameters - MAX_*_WORKERS

  • As per this link, these are Maximum number of * workers allowed to run in parallel
  • As per this second link, these are maximum number of * workers each Airbyte Worker container can support

Question 2 - Are these two definitions different ? What is this Airbyte Worker ? Is this worker different from the other four types of workers - spec/check/discover/sync ?

We have another configuration parameter SUBMITTER_NUM_THREADS - the maximum number of concurrent jobs the Scheduler schedules.

Question 3 - How does this last configuration parameter relate to the MAX_*_WORKERS ?

Hey,

Worker → Which is responsible for creating sync/check/discover/spec pods → you can check about these here https://docs.airbyte.com/understanding-airbyte/jobs

Thus the MAX_*_WORKERS control these pod creations

Hey Harshith, thanks for the reply. But I got confused by reading the particular documentation link that you have mentioned. By the first As per the documentation, I meant that particular page (couldn’t add more than two links in the question - so had to remove it, because I already had other links)

Anyway, can you please specifically answer the three questions that I have asked ?

  1. A Worker can perform more jobs and that is dependent on the PORTS open.
  2. Airbyte worker is the worker container and spec/check/discover/sync are the jobs the worker can schedule
  3. MAX_*_WORKER controls the jobs that are created by the worker