Summary
After upgrading the Snowflake destination and removing S3 staging, data seems to be stored in memory instead of on disk. RAM usage is high even after setting max per stream to 2GB. Data is being pulled from Postgres source to Snowflake destination.
Question
Hi All, I’m really struggling here. I upgraded the Snowflake destination to the most recent version, and removed S3 staging. But now it looks to be storing all the data to be transferred in memory instead of on disk (or previously S3).
One of those is 14GB, another 4GB. Other piplines today were at 15-20GB doing initial syncs.
I have set the max per stream to 2GB in the airbyte DB
> select * from actor_definition where name like ‘Snowflake’
> {“jobSpecific”: [{“jobType”: “sync”, “resourceRequirements”: {“memory_limit”: “2Gi”, “memory_request”: “2Gi”}}]}
Nothing seems to work. I’m pulling data from Postgres source (3.4.1) to Snowflake destination (3.8.0) and can’t see why its using so much Ram. We have had to bump it up to 64GB machine, and stagger slots to only run two concurrently.
Hoping someone has some wisdom for me
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["snowflake-destination", "s3-staging", "ram-usage", "postgres-source", "airbyte-db"]
this one stream (one of our larger) is now over 36GB and climbing - this cannot be right that it stores all of it in memory when using INTERNAL_STAGING as the method surely?
It looks as though airbyte is using a schema but not a snowflake stage; and hence stores all data in memory. Anyone know how I can get this to store either direct in the snowflake internal named stage, or in the filesystem?
I can’t keep scaling RAM (currently at 64GB just to run one slot)
It looks as though when the additional containers spin up the global limits are not passed through, so they consume whatever they want. I cannot set these irrespective of what I try
> Creating docker container = destination-snowflake-write-3506-0-ruxnd with resources io.airbyte.config.ResourceRequirements@2f1d8b47[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts null
I have managed to resolve this in the database with
update actor_definition set resource_requirements = '{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "2", "cpu_request": "2", "memory_limit": "2861Mi", "memory_request": "2861Mi"}}]}' where id = '424892c4-daac-4491-b35d-c6688ba547ba';
update actor_definition set resource_requirements = '{"jobSpecific": [{"jobType": "sync", "resourceRequirements": {"cpu_limit": "2", "cpu_request": "2", "memory_limit": "1097Mi", "memory_request": "1097Mi"}}]}' where id = 'decd338e-5647-4c0b-adf4-da0e75f5a750';```
```-- set the connection limit for all connections
update connection set resource_requirements = '{"cpu_limit": "2", "cpu_request": "2", "memory_limit": "3814Mi", "memory_request": "2861Mi"}'
where namespace_definition = 'destination'```
*But when the connection resets it loses this setting, reverting to blank. Does anyone know how this can be persistent?*