Hi, when will a log retention policy be deployed?
Hi @laurencewilliams , could you please share what type of log retention policy you are expecting? On the Airbyte instance, the logs are regularly cleaned up by a specific job. Do you mean a log retention policy on your cloud provider bucket (AWS/GCP ?) where logs get replicated?
Hi, I’ll provide a bit of context on why:
- We spun up Airbyte in an EC2 instance in AWS and quickly ran out of disk space
- We have since increased the disk size but saw that the main culprit for this was the logs taking a lot of space (this is something I saw on the Slack community as well).
- Therefore, as a more sustainable solution we want to see is some sort of log retention policy to implement. Consulting with our Tech-Ops they said they can do a quick solution from their end, but were wondering if a more standard policy was in place that they could implement. Hope that makes sense.
Airbyte schedules periodic jobs to delete logs from syncs.
By default these logs are deleted every month or when their size exceeds 5GB.
You can change the value of the following env variable according to your need and your available disk space:
MINIMUM_WORKSPACE_RETENTION_DAYS- Defines the minimum configuration file age for sweeping. The Scheduler will do it’s best to now sweep files younger than this. Defaults to 1 day.
MAXIMUM_WORKSPACE_RETENTION_DAYS- Defines the oldest un-swept configuration file age. Files older than this will definitely be swept. Defaults to 60 days.
MAXIMUM_WORKSPACE_SIZE_MB- Defines the workspace size sweeping will continue until. Defaults to 5GB.
Hi, thanks for that! This brings me to the following question:
- We actually encountered a disk full issue when it reached 8GB (in this case it exceeded 5GB but the logs were not deleted). Do you know what could have caused this?
- How many workspaces are there? I’m assuming the workspaces for: airbyte scheduler, airbyte worker and airbyte server. Is that correct?
Could you please share exactly which type of logs are taking up this disk space? The automatic deletion of logs is for sync logs.
We also recommend at least 30GB of disk space on your Airbyte host, what’s your current disk size?
How many workspaces are there? I’m assuming the workspaces for: airbyte scheduler, airbyte worker and airbyte server. Is that correct?
You usually have a single workspace on open-source deployments, unless you created new workspaces using our API.
@alafanechere – qq for you on this as my ec2 instance has maxed out 30GB of space with logs. I looked in my .env and docker-compose.yaml and neither have the three environment variables specified that you laid out. I’ll go in to define them myself, but seems like the default values may not be configured correctly given how my instance is chewing up space…
Solved my own issue again. It wasn’t logs taking up space but rather 100+ unused docker images (perhaps related to old connectors?) that were eating up 20GB.
docker prune image -a cleaned things up, in case this thread is of use to anyone else who finds it in the future.