Increase Sync Worker Resources

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Kubernetes
  • Memory / Disk: you can use something like 8GB
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.32-alpha
  • Source name/version: Postgres 0.4.30
  • Destination name/version: Google Cloud Storage (GCS) 0.2.9
  • Step: Sync
  • Description:

I try to increase the Worker resources that read my postgres Database like :
source-postgres-read-69-0-qvblo

From what I understand there is some variables to set within the worker deployment like JOB_MAIN_CONTAINER_MEMORY_REQUEST and JOB_MAIN_CONTAINER_MEMORY_LIMIT

I tried to set some values (512 Mi and 1GB) but my worker is still stuck with this for the airbyte/source-postgres:0.4.30 container (for other container in the pod I guess it doesn’t impact on the performance)

  resources:
    requests:
      memory: 50Mi
    limits:
      memory: 25Mi

Am I missing something ? Or doing something wrong ?

Thank for your jelp

Hey, could you verify if these values are reflecting back in the platform (GKE or EKS)? Ideally, those values should set the limit.

Also is there any error you see or can you explain your usecase?

Here is my configuration in the helm chart :

        - name: JOB_MAIN_CONTAINER_CPU_REQUEST
          value: "2"
        - name: JOB_MAIN_CONTAINER_CPU_LIMIT
          value: "2"
        - name: JOB_MAIN_CONTAINER_MEMORY_REQUEST
          value: 4Gi
        - name: JOB_MAIN_CONTAINER_MEMORY_LIMIT
          value: 4Gi

When I check the resources used by the pod in GKE I see

airbyte | source-hubspot-read-8-0-dyiam | main | 51.7 MiB

So maybe my question is more how can I speed up my job to reduce the processing time ?
I have way more memory available for the job (and same story for the CPU)

Hey got it if you think memory or the CPU is the bottleneck i.e you can know them in 2 ways

  1. Could you check in the GKE/EKS dashboard on how much these pods are consuming
  2. Remove LIMITs for both and see if the process is speeding up

Hi @harshith

I will do this on monday.

But the thing is even with my local docker-compose airbyte, memory and CPU used are very low

CONTAINER ID   NAME                             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O     PIDS
51b3b81ad382   source-hubspot-read-67-0-upoyu   6.66%     32.79MiB / 7.667GiB   0.42%     0B / 0B   0B / 3.45MB   2

We can see that there is like 7.6 GB of RAM available but airbyte is using only 30MB …

yeah then we should get into hubspot logs and try to understand what is that causing the slowness.

hi @lucienfregosi , Were you able to figure out how can we choose the appropriate values for these variables:

  - name: JOB_MAIN_CONTAINER_CPU_REQUEST
          value: "2"
        - name: JOB_MAIN_CONTAINER_CPU_LIMIT
          value: "2"
        - name: JOB_MAIN_CONTAINER_MEMORY_REQUEST
          value: 4Gi
        - name: JOB_MAIN_CONTAINER_MEMORY_LIMIT
          value: 4Gi

On what basis we can conclude what resource limits to apply on the sync jobs (e.g. read-pod)? Would highly appreciate it if you share your findings on this. As I walked through the documentation, but could not find anything related to this.
cc: @harshith