Low memory footprint - how to set job main container memory parameters?

kyle-mackenzie-indee · June 10, 2022, 10:36am

I’m using the self deployed docker-compose airbyte on an EC2 instance.
It looks like my connections have a very small batch size:
2022-06-10 07:40:33 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):334 - Records read: 535000 (131 MB)

2022-06-10 07:40:33 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):334 - Records read: 536000 (131 MB)
The memory usage on the ec2 is only around 19% or so also.

From the docs it seems this may need configured with the two below parameters in the .env file.

JOB_MAIN_CONTAINER_MEMORY_REQUEST=
JOB_MAIN_CONTAINER_MEMORY_LIMIT=

But it’s not clear to me how these should be set (playing around has not had any significant effect).

What should these values look like?
E.g.
JOB_MAIN_CONTAINER_MEMORY_REQUEST=“28GB”
or
JOB_MAIN_CONTAINER_MEMORY_REQUEST=28000000
or something else?

Is this your first time deploying Airbyte?: Yes
OS Version / Instance: Ubuntu
Memory / Disk: 32GB or so
Deployment: Docker on EC2
Airbyte Version: 0.39.4
Source name/version: Postgres 0.4.16
Destination name/version: Postgres 0.3.20
Step: Unknown

alafanechere · June 10, 2022, 4:02pm

Hi @kyle-mackenzie-indee,
These values should have the same structure as those usually defined in docker-compose files for memory request.
You can set JOB_MAIN_CONTAINER_MEMORY_REQUEST=28g .
I’m not sure that this will improve the throughput of you postgres sync though. I would suggest you upgrade your source postgres connector to the latest version. We recently improved this connector to have a dynamic fetch size that was previously hardcoded to 1000 records.
You’ll find an interesting related discussion here about source database performance. The discussion is around MySQL but the core logic is exactly the same for Postgres.

kyle-mackenzie-indee · June 21, 2022, 11:02pm

It still seems to have a very low throughput:

2022-06-21 23:00:57 destination > 2022-06-21 23:00:57 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):84 - Flushing mytable: 104 records (24 MB)

2022-06-21 23:01:02 destination > 2022-06-21 23:01:02 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):84 - Flushing mytable: 70 records (24 MB)

2022-06-21 23:01:03 destination > 2022-06-21 23:01:03 INFO i.a.i.d.r.InMemoryRecordBufferingStrategy(lambda$flushAll$1):84 - Flushing mytable: 130 records (24 MB)

But the question of how to set memory parameters is solved.

I’ll keep an eye on the postgres improvements coming this quarter and see if that helps us.
We’re also looking to move to CDC which may have better throughput.

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
Default 2GB memory limit not being observed Connector Questions & Issues source-postgres , data-loading	3	1792	December 15, 2022
How to speed up airbyte Jobs Connector Questions & Issues source-postgres , destination-gcs	6	2717	July 18, 2022
Optimized Postgres source connector performance Connector Questions & Issues source-postgres , data-loading , connectors	2	706	July 22, 2022
Memory Limit Issue with Postgres Containers in Airbyte Replication Connector Questions airbyte , memory-limit , connector , bug , docker-container	0	116	May 14, 2024
Increase Sync Worker Resources Connector Questions & Issues connectors , kubernetes	6	1917	March 30, 2023

Low memory footprint - how to set job main container memory parameters?

Related topics