Quickly run out of storage

Hi Team,

We have a self-hosted Airbyte instance on AWS EC2 instance of t2.2xlarge with an initial 30 GB of storage. Recently we changed all tables from Full Refresh | Overwrite to Incremental | Deduped + History and also change the frequency of syncs to 1 hour. Then quickly after about two days, we run of storage all 30 GB was used. I added again a 10 GB and after a day we run out of storage again.

I installed ncdu and its analysis displayed that /var/lib/docker/volumes/airbyte_workspace/_data is the main source of growing usage of storage. I purged that folder last night, but as you see again since last night the size of each folder is growing over time (ncdu is sorting based on the size so I’m not sure the biggest folders are the recent ones).


If we want to manage this amount of storage we need to pay for about 3 TB of storage for only this EC2 which is too high. What is wrong with our deployment? How can we set this workspace data purge after say 3 days or more?

It’s worth mentioning that we still use Full Refresh | Overwrite for views as the option s not available for them in Airbyte.

Sorry to hear that Arash. Looks there is a problem after removing the airbyte-scheduler service.
Check issue: https://github.com/airbytehq/airbyte/issues/15567

A workaround for now is clean the data folder to not consume so much space. I’ll ask the engineer team to take a look and see what is possible to do too.

@marcosmarxm Do you have any timeframe for fixing this issue? I think it’s very important.
Thanks

@arashlayeghi I’ve had the same issue. I manually delete logs that are older than 7 days every week or so. Pain in the ass, though I suppose I could setup a cronjob. Would be great if this feature worked correctly in the Airbyte deployment.

Hello update here: the solution was merged.
But only next version v0.40.5 will receive the modifications, probably end of this week will have another update.

Please check PR https://github.com/airbytehq/airbyte/pull/16247

Thanks, Marcos,
I updated it to v0.40.5 now. Do I need to do something or does it automatically clean up the storage when needed?

Airbyte version 0.45.5 reimplemented the feature, you can change the default value changing the variable TEMPORAL_HISTORY_RETENTION_IN_DAYS

1 Like

Hello Marcos,

This is our .env file containing TEMPORAL_HISTORY_RETENTION_IN_DAYS=1 at the end.
I ran docker-compose up -d more than 34 hours ago but it seems the storage is getting full without cleaning up. What is wrong with my deployment?

.env (3.6 KB)