Quickly run out of storage

Hi Team,

We have a self-hosted Airbyte instance on AWS EC2 instance of t2.2xlarge with an initial 30 GB of storage. Recently we changed all tables from Full Refresh | Overwrite to Incremental | Deduped + History and also change the frequency of syncs to 1 hour. Then quickly after about two days, we run of storage all 30 GB was used. I added again a 10 GB and after a day we run out of storage again.

I installed ncdu and its analysis displayed that /var/lib/docker/volumes/airbyte_workspace/_data is the main source of growing usage of storage. I purged that folder last night, but as you see again since last night the size of each folder is growing over time (ncdu is sorting based on the size so I’m not sure the biggest folders are the recent ones).


If we want to manage this amount of storage we need to pay for about 3 TB of storage for only this EC2 which is too high. What is wrong with our deployment? How can we set this workspace data purge after say 3 days or more?

It’s worth mentioning that we still use Full Refresh | Overwrite for views as the option s not available for them in Airbyte.

Sorry to hear that Arash. Looks there is a problem after removing the airbyte-scheduler service.
Check issue: https://github.com/airbytehq/airbyte/issues/15567

A workaround for now is clean the data folder to not consume so much space. I’ll ask the engineer team to take a look and see what is possible to do too.

@marcosmarxm Do you have any timeframe for fixing this issue? I think it’s very important.
Thanks

@arashlayeghi I’ve had the same issue. I manually delete logs that are older than 7 days every week or so. Pain in the ass, though I suppose I could setup a cronjob. Would be great if this feature worked correctly in the Airbyte deployment.

Hello update here: the solution was merged.
But only next version v0.40.5 will receive the modifications, probably end of this week will have another update.

Please check PR https://github.com/airbytehq/airbyte/pull/16247

Thanks, Marcos,
I updated it to v0.40.5 now. Do I need to do something or does it automatically clean up the storage when needed?

Airbyte version 0.45.5 reimplemented the feature, you can change the default value changing the variable TEMPORAL_HISTORY_RETENTION_IN_DAYS

1 Like

Hello Marcos,

This is our .env file containing TEMPORAL_HISTORY_RETENTION_IN_DAYS=1 at the end.
I ran docker-compose up -d more than 34 hours ago but it seems the storage is getting full without cleaning up. What is wrong with my deployment?

.env (3.6 KB)

Hi @marcosmarxm
Follow your introduce above I config variable TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 in .env.
I run docker compose up -d more than 7 days but the storage is still full without clean up.

Environment

  • Airbyte version: 0.40.18
  • OS Version / Instance: AWS EC2
  • Deployment: Docker compose
  • Step where error happened: Deploy with docker compose up
cat .env
# This file only contains Docker relevant variables.
#
# Variables with defaults have been omitted to avoid duplication of defaults.
# The only exception to the non-default rule are env vars related to scaling.
#
# See https://github.com/airbytehq/airbyte/blob/master/airbyte-config/config-models/src/main/java/io/airbyte/config/Configs.java
# for the latest environment variables.
#
# # Contributors - please organise this env file according to the above linked file.


### SHARED ###
VERSION=0.40.18

# When using the airbyte-db via default docker image
CONFIG_ROOT=/data
DATA_DOCKER_MOUNT=airbyte_data
DB_DOCKER_MOUNT=airbyte_db

# Workspace storage for running jobs (logs, etc)
WORKSPACE_ROOT=/tmp/workspace
WORKSPACE_DOCKER_MOUNT=airbyte_workspace

# Local mount to access local files from filesystem
# todo (cgardens) - when we are mount raw directories instead of named volumes, *_DOCKER_MOUNT must
# be the same as *_ROOT.
# Issue: https://github.com/airbytehq/airbyte/issues/578
LOCAL_ROOT=/tmp/airbyte_local
LOCAL_DOCKER_MOUNT=/tmp/airbyte_local
# todo (cgardens) - hack to handle behavior change in docker compose. *_PARENT directories MUST
# already exist on the host filesystem and MUST be parents of *_ROOT.
# Issue: https://github.com/airbytehq/airbyte/issues/577
HACK_LOCAL_ROOT_PARENT=/tmp

# Proxy Configuration
# Set to empty values, e.g. "" to disable basic auth
BASIC_AUTH_USERNAME=airbyte
BASIC_AUTH_PASSWORD=password

### DATABASE ###
# Airbyte Internal Job Database, see https://docs.airbyte.io/operator-guides/configuring-airbyte-db
DATABASE_USER=docker
DATABASE_PASSWORD=docker
DATABASE_HOST=db
DATABASE_PORT=5432
DATABASE_DB=airbyte
# translate manually DATABASE_URL=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_DB} (do not include the username or password here)
DATABASE_URL=jdbc:postgresql://db:5432/airbyte
JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.29.15.001

# Airbyte Internal Config Database, defaults to Job Database if empty. Explicitly left empty to mute docker compose warnings.
CONFIG_DATABASE_USER=
CONFIG_DATABASE_PASSWORD=
CONFIG_DATABASE_URL=
CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.35.15.001

### AIRBYTE SERVICES ###
TEMPORAL_HOST=airbyte-temporal:7233
INTERNAL_API_HOST=airbyte-server:8001
#CONNECTOR_BUILDER_API_HOST=airbyte-connector-builder-server:80 #FIXME: Uncomment this when enabling the connector-builder
WEBAPP_URL=http://localhost:8000/
# Although not present as an env var, required for webapp configuration.
API_URL=/api/v1/

### JOBS ###
# Relevant to scaling.
SYNC_JOB_MAX_ATTEMPTS=3
SYNC_JOB_MAX_TIMEOUT_DAYS=3
JOB_MAIN_CONTAINER_CPU_REQUEST=
JOB_MAIN_CONTAINER_CPU_LIMIT=
JOB_MAIN_CONTAINER_MEMORY_REQUEST=
JOB_MAIN_CONTAINER_MEMORY_LIMIT=

NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_LIMIT=
NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST=
NORMALIZATION_JOB_MAIN_CONTAINER_CPU_LIMIT=
NORMALIZATION_JOB_MAIN_CONTAINER_CPU_REQUEST=

### LOGGING/MONITORING/TRACKING ###
TRACKING_STRATEGY=segment
JOB_ERROR_REPORTING_STRATEGY=logging
# Although not present as an env var, expected by Log4J configuration.
LOG_LEVEL=INFO


### APPLICATIONS ###
# Worker #
WORKERS_MICRONAUT_ENVIRONMENTS=control-plane
# Cron #
CRON_MICRONAUT_ENVIRONMENTS=control-plane
# Relevant to scaling.
MAX_SYNC_WORKERS=5
MAX_SPEC_WORKERS=5
MAX_CHECK_WORKERS=5
MAX_DISCOVER_WORKERS=5
# Temporal Activity configuration
ACTIVITY_MAX_ATTEMPT=
ACTIVITY_INITIAL_DELAY_BETWEEN_ATTEMPTS_SECONDS=
ACTIVITY_MAX_DELAY_BETWEEN_ATTEMPTS_SECONDS=
WORKFLOW_FAILURE_RESTART_DELAY_SECONDS=
TEMPORAL_HISTORY_RETENTION_IN_DAYS=7

### FEATURE FLAGS ###
AUTO_DISABLE_FAILING_CONNECTIONS=false
FORCE_MIGRATE_SECRET_STORE=false

### MONITORING FLAGS ###
# Accepted values are datadog and otel (open telemetry)
METRIC_CLIENT=
# Useful only when metric client is set to be otel. Must start with http:// or https://.
OTEL_COLLECTOR_ENDPOINT="http://host.docker.internal:4317"

USE_STREAM_CAPABLE_STATE=true
ls -lah airbyte_workspace
..............
drwxr-xr-x    3 root root   4096 Nov  4 03:39 8290/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8291/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8292/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8293/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8294/
drwxr-xr-x    3 root root   4096 Nov  4 04:03 8295/
drwxr-xr-x    3 root root   4096 Nov  4 04:09 8296/
drwxr-xr-x    3 root root   4096 Nov  4 04:30 8297/
drwxr-xr-x    3 root root   4096 Nov  4 04:32 8298/
drwxr-xr-x    3 root root   4096 Nov  4 04:32 8299/
..............

Sorry the delay here Arash and Quân, I’ll need take a deeper look during the week.

1 Like

Are you have issues with storage in latest version Lucas?

Hey @marcosmarxm I bumped disk to 60gb, and am going to set that variable TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 to see if that solves it. I was migrating a 700M row size table and that was taking up all 30gb of previous disk.

I resolved the issue by doubling disk size and turning off all other connections while I ran the backfill on the 700M table. It would be great to know a better solution but disk is cheap.

Lucas would be better to open a new Github. issue to track what is happening in your case.

@marcosmarxm Any word on a fix for this yet?

I’ve attempted to set TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 but this has had no effect.

Typo in the version number here, 0.45.5 does not exist yet.

Hello Billy the PR https://github.com/airbytehq/airbyte/pull/20317 renabled the log rotation for Airbyte 0.40.26 version. Can you check you’re using this or later version?

Hey @marcosmarxm , I can confirm i’m using version 0.40.22.
I will upgrade to latest ( v0.40.28 and report back.

Upgrading from 0.40.22 to 0.40.28 has resolved the issue where setting TEMPORAL_HISTORY_RETENTION_IN_DAYS= had no affect.

Thanks @marcosmarxm