Optimizing Resource Usage in Airbyte Kubernetes Deployment

Summary

How to configure Airbyte Kubernetes deployment to utilize minimum specified resources effectively for connectors like MySQL to BigQuery


Question

Hi, I am using airbyte with kubernetes, and we have it set it with 9 CPUs and 12GB of RAM, with a minimum of 3 CPUs and 6GB of RAM. However, when we check the pods, we see that they use way less CPUs and RAM, the connector that used the most used 600MB. How can setup so it used at least the minimum resources specified? I’am using as a source MySQL and Destianation Bigquery



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["optimizing-resource-usage", "kubernetes", "airbyte-platform", "mysql-connector", "bigquery-connector"]

Hi <@U05JENRCF7C>,

we have it already implemented as you mention, however, the pods seems to not used the full resources. Maybe it is a constrain of the mysql connector? what is odd is that we used to have airbyte with docker in a VM with similar resources and the connector were much faster, using all the resources when needed, however, we dont manage to have that with kubernetes

Hmm, in database I found that actor_definition table has resource_requirements field.
I don’t have any value there for MySQL connector, so maybe some defaults values are used there. I’m wondering if there is a possibility to tweak it :thinking_face:
For some connectors I’ve seen values like these


{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "4.0", "cpu_request": "1.0"}}]}```

Cloud, we are using GCP

Check docs about autoscaling and auto-provisioning https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler
and node pools https://cloud.google.com/kubernetes-engine/docs/concepts/node-pools

By combining those 2 things,
• you can separate nodes into 2 node pools: one node pool for Airbyte core components (like server, webapp and so on) and second node pool for jobs that performs synchronizations
• nodes in jobs node pool can be autoscaled and auto-provisioned so resources are used only for synchronizations, and nodes can be terminated after synchronization is done
• you can configure limits/requests for most pods in Airbyte core components node pool, so you will know exactly how much computing power you need to allocate for them
I won’t provide you any specific numbers, because every setup is different and you need to calculate them on your own. It’s more like a general guidance how to reduce costs.

so, in the db that field is empty, so I assume that the restriciton being used is the one from the pods, or thats what it should be, but its not happening haha