No space left on device during sync

Hello,

We are evaluating the Airbyte Open Source version installed on an EC2 (m6a.large instance type) and we have to ingest some internal geo data from a postgresql database (the size given by select pg_size_pretty(pg_database_size('<geo_database>')) is around 1.5 GB).
Unfortunately, during the sync process, we have encountered the following error:

2022-06-23 13:50:38 normalization >     cast(nullif(created, '') as 
2022-06-23 13:50:38 normalization >     timestamp with time zone
2022-06-23 13:50:38 normalization > ) as created,
2022-06-23 13:50:38 normalization >     cast(geometry as text) as geometry,
2022-06-23 13:50:38 normalization >     cast(nullif(modified, '') as 
2022-06-23 13:50:38 normalization >     timestamp with time zone
2022-06-23 13:50:38 normalization > ) as modified,
2022-06-23 13:50:38 normalization >     cast(provider as text) as provider,
2022-06-23 13:50:38 normalization >     cast(admin_level as 
2022-06-23 13:50:38 normalization >     float
2022-06-23 13:50:38 normalization > ) as admin_level,
2022-06-23 13:50:38 normalization >     cast(country_code as text) as country_code,
2022-06-23 13:50:38 normalization >     _airbyte_ab_id,
2022-06-23 13:50:38 normalization >     _airbyte_emitted_at,
2022-06-23 13:50:38 normalization >     now() as _airbyte_normalized_at
2022-06-23 13:50:38 normalization > from __dbt__cte__geolocation_geolocation_ab1
2022-06-23 13:50:38 normalization > -- geolocation_geolocation
2022-06-23 13:50:38 normalization > where 1 = 1
2022-06-23 13:50:38 normalization > ),  __dbt__cte__geolocation_geolocation_ab3 as (
2022-06-23 13:50:38 normalization > 
2022-06-23 13:50:38 normalization > -- SQL model to build a hash column based on the values of this record
2022-06-23 13:50:38 normalization > -- depends_on: __dbt__cte__geolocation_geolocation_ab2
2022-06-23 13:50:38 normalization > select
2022-06-23 13:50:38 normalization >     md5(cast(coalesce(cast("id" as text), '') || '-' || coalesce(cast("name" as text), '') || '-' || coalesce(cast("type" as text), '') || '-' || coalesce(cast("value" as text), '') || '-' || coalesce(cast(created as text), '') || '-' || coalesce(cast(geometry as text), '') || '-' || coalesce(cast(modified as text), '') || '-' || coalesce(cast(provider as text), '') || '-' || coalesce(cast(admin_level as text), '') || '-' || coalesce(cast(country_code as text), '') as text)) as _airbyte_geolocation_geolocation_hashid,
2022-06-23 13:50:38 normalization >     tmp.*
2022-06-23 13:50:38 normalization > from __dbt__cte__geolocation_geolocation_ab2 tmp
2022-06-23 13:50:38 normalization > -- geolocation_geolocation
2022-06-23 13:50:38 normalization > where 1 = 1
2022-06-23 13:50:38 normalization > )-- Final base SQL model
2022-06-23 13:50:38 normalization > -- depends_on: __dbt__cte__geolocation_geolocation_ab3
2022-06-23 13:50:38 normalization > select
2022-06-23 13:50:38 normalization >     "id",
2022-06-23 13:50:38 normalization >     "name",
2022-06-23 13:50:38 normalization >     "type",
2022-06-23 13:50:38 normalization >     "value",
2022-06-23 13:50:38 normalization >     created,
2022-06-23 13:50:38 normalization >     geometry,
2022-06-23 13:50:38 normalization >     modified,
2022-06-23 13:50:38 normalization >     provider,
2022-06-23 13:50:38 normalization >     admin_level,
2022-06-23 13:50:38 normalization >     country_code,
2022-06-23 13:50:38 normalization >     _airbyte_ab_id,
2022-06-23 13:50:38 normalization >     _airbyte_emitted_at,
2022-06-23 13:50:38 normalization >     now() as _airbyte_normalized_at,
2022-06-23 13:50:38 normalization >     _airbyte_geolocation_geolocation_hashid
2022-06-23 13:50:38 normalization > from __dbt__cte__geolocation_geolocation_ab3
2022-06-23 13:50:38 normalization > -- geolocation_geolocation from "data_stack".gis_raw._airbyte_raw_geolocation_geolocation
2022-06-23 13:50:38 normalization > where 1 = 1
2022-06-23 13:50:38 normalization >   );
2022-06-23 13:50:38 normalization > 13:36:31.623428 [debug] [Thread-5  ]: SQL status: SELECT 832988 in 18.89 seconds
2022-06-23 13:50:38 normalization > 13:36:31.632373 [debug] [Thread-5  ]: Using postgres connection "model.airbyte_utils.geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:31.632623 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: /* {"app": "dbt", "dbt_version": "1.0.0", "profile_name": "normalize", "target_name": "prod", "node_id": "model.airbyte_utils.geolocation_tag"} */
2022-06-23 13:50:38 normalization > alter table "data_stack"."gis_raw"."geolocation_tag" rename to "geolocation_tag__dbt_backup"
2022-06-23 13:50:38 normalization > 13:36:31.634469 [debug] [Thread-5  ]: SQL status: ALTER TABLE in 0.0 seconds
2022-06-23 13:50:38 normalization > 13:36:31.638089 [debug] [Thread-5  ]: Using postgres connection "model.airbyte_utils.geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:31.638268 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: /* {"app": "dbt", "dbt_version": "1.0.0", "profile_name": "normalize", "target_name": "prod", "node_id": "model.airbyte_utils.geolocation_tag"} */
2022-06-23 13:50:38 normalization > alter table "data_stack".gis_raw."geolocation_tag__dbt_tmp" rename to "geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:31.639017 [debug] [Thread-5  ]: SQL status: ALTER TABLE in 0.0 seconds
2022-06-23 13:50:38 normalization > 13:36:31.653794 [debug] [Thread-5  ]: Using postgres connection "model.airbyte_utils.geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:31.654015 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: /* {"app": "dbt", "dbt_version": "1.0.0", "profile_name": "normalize", "target_name": "prod", "node_id": "model.airbyte_utils.geolocation_tag"} */
2022-06-23 13:50:38 normalization > 
2022-06-23 13:50:38 normalization >     create  index if not exists
2022-06-23 13:50:38 normalization >   "f8c65d98598b32764827bae5bb8adb06"
2022-06-23 13:50:38 normalization >   on "data_stack".gis_raw."geolocation_tag" using btree
2022-06-23 13:50:38 normalization >   (_airbyte_emitted_at);
2022-06-23 13:50:38 normalization >   
2022-06-23 13:50:38 normalization > 13:36:32.510393 [debug] [Thread-5  ]: SQL status: CREATE INDEX in 0.86 seconds
2022-06-23 13:50:38 normalization > 13:36:32.518994 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: COMMIT
2022-06-23 13:50:38 normalization > 13:36:32.519219 [debug] [Thread-5  ]: Using postgres connection "model.airbyte_utils.geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:32.519333 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: COMMIT
2022-06-23 13:50:38 normalization > 13:36:32.529888 [debug] [Thread-5  ]: SQL status: COMMIT in 0.01 seconds
2022-06-23 13:50:38 normalization > 13:36:32.536762 [debug] [Thread-5  ]: Using postgres connection "model.airbyte_utils.geolocation_tag"
2022-06-23 13:50:38 normalization > 13:36:32.536950 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: /* {"app": "dbt", "dbt_version": "1.0.0", "profile_name": "normalize", "target_name": "prod", "node_id": "model.airbyte_utils.geolocation_tag"} */
2022-06-23 13:50:38 normalization > drop table if exists "data_stack".gis_raw."geolocation_tag__dbt_backup" cascade
2022-06-23 13:50:38 normalization > 13:36:32.562170 [debug] [Thread-5  ]: SQL status: DROP TABLE in 0.03 seconds
2022-06-23 13:50:38 normalization > 13:36:32.563968 [debug] [Thread-5  ]: finished collecting timing info
2022-06-23 13:50:38 normalization > 13:36:32.564163 [debug] [Thread-5  ]: On model.airbyte_utils.geolocation_tag: Close
2022-06-23 13:50:38 normalization > 13:36:32.565065 [info ] [Thread-5  ]: 1 of 2 OK created table model gis_raw.geolocation_tag................................................................... [SELECT 832988 in 19.92s]
2022-06-23 13:50:38 normalization > 13:36:32.565434 [debug] [Thread-5  ]: Finished running node model.airbyte_utils.geolocation_tag
2022-06-23 13:50:38 normalization > 13:50:35.688300 [debug] [Thread-4  ]: Postgres adapter: Postgres error: could not write to file "base/pgsql_tmp/pgsql_tmp1360.7": No space left on device
2022-06-23 13:50:38 normalization > 
2022-06-23 13:50:38 normalization > 13:50:35.688688 [debug] [Thread-4  ]: On model.airbyte_utils.geolocation_geolocation: ROLLBACK
2022-06-23 13:50:38 normalization > 13:50:35.701830 [debug] [Thread-4  ]: finished collecting timing info
2022-06-23 13:50:38 normalization > 13:50:35.702071 [debug] [Thread-4  ]: On model.airbyte_utils.geolocation_geolocation: Close
2022-06-23 13:50:38 normalization > 13:50:35.702936 [debug] [Thread-4  ]: Database Error in model geolocation_geolocation (models/generated/airbyte_tables/gis_raw/geolocation_geolocation.sql)
2022-06-23 13:50:38 normalization >   could not write to file "base/pgsql_tmp/pgsql_tmp1360.7": No space left on device
2022-06-23 13:50:38 normalization >   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/gis_raw/geolocation_geolocation.sql
2022-06-23 13:50:38 normalization > 13:50:35.703371 [error] [Thread-4  ]: 2 of 2 ERROR creating table model gis_raw.geolocation_geolocation....................................................... [ERROR in 863.05s]
2022-06-23 13:50:38 normalization > 13:50:35.703786 [debug] [Thread-4  ]: Finished running node model.airbyte_utils.geolocation_geolocation
2022-06-23 13:50:38 normalization > 13:50:35.707811 [debug] [MainThread]: Acquiring new postgres connection "master"
2022-06-23 13:50:38 normalization > 13:50:35.708018 [debug] [MainThread]: Using postgres connection "master"
2022-06-23 13:50:38 normalization > 13:50:35.708131 [debug] [MainThread]: On master: BEGIN
2022-06-23 13:50:38 normalization > 13:50:35.708236 [debug] [MainThread]: Opening a new connection, currently in state closed
2022-06-23 13:50:38 normalization > 13:50:35.735940 [debug] [MainThread]: SQL status: BEGIN in 0.03 seconds
2022-06-23 13:50:38 normalization > 13:50:35.736249 [debug] [MainThread]: On master: COMMIT
2022-06-23 13:50:38 normalization > 13:50:35.736402 [debug] [MainThread]: Using postgres connection "master"
2022-06-23 13:50:38 normalization > 13:50:35.736538 [debug] [MainThread]: On master: COMMIT
2022-06-23 13:50:38 normalization > 13:50:35.736973 [debug] [MainThread]: SQL status: COMMIT in 0.0 seconds
2022-06-23 13:50:38 normalization > 13:50:35.737203 [debug] [MainThread]: On master: Close
2022-06-23 13:50:38 normalization > 13:50:35.738066 [info ] [MainThread]: 
2022-06-23 13:50:38 normalization > 13:50:35.738399 [info ] [MainThread]: Finished running 2 table models in 863.48s.
2022-06-23 13:50:38 normalization > 13:50:35.738719 [debug] [MainThread]: Connection 'master' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.738967 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_geolocation_ab1' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.739225 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_geolocation_ab3' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.739354 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_geolocation' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.739455 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_geolocation_ab2' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.739549 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_tag_ab3' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.739660 [debug] [MainThread]: Connection 'model.airbyte_utils.geolocation_tag' was properly closed.
2022-06-23 13:50:38 normalization > 13:50:35.752055 [info ] [MainThread]: 
2022-06-23 13:50:38 normalization > 13:50:35.752392 [info ] [MainThread]: Completed with 1 error and 0 warnings:
2022-06-23 13:50:38 normalization > 13:50:35.752879 [info ] [MainThread]: 
2022-06-23 13:50:38 normalization > 13:50:35.753298 [error] [MainThread]: Database Error in model geolocation_geolocation (models/generated/airbyte_tables/gis_raw/geolocation_geolocation.sql)
2022-06-23 13:50:38 normalization > 13:50:35.753720 [error] [MainThread]:   could not write to file "base/pgsql_tmp/pgsql_tmp1360.7": No space left on device
2022-06-23 13:50:38 normalization > 13:50:35.754073 [error] [MainThread]:   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/gis_raw/geolocation_geolocation.sql
2022-06-23 13:50:38 normalization > 13:50:35.754436 [info ] [MainThread]: 
2022-06-23 13:50:38 normalization > 13:50:35.754776 [info ] [MainThread]: Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2
2022-06-23 13:50:38 INFO i.a.w.t.TemporalAttemptExecution(lambda$getWorkerThread$2):161 - Completing future exceptionally...
io.airbyte.workers.exception.WorkerException: Normalization Failed.
	at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:63) ~[io.airbyte-airbyte-

We made some investigation to understand where the cause could come from (space on the EC2, disk space allocated for the containers) but we have found nothing.
Do you have an idea?

Thanks in advance.

Source: postgres 11
Destination: postgres 11

Hey @dboissier,
I’m not sure if the disk space error is from your Airbyte instance or your destination Postgres Database.
Could you please double-check the disk space on your Postgres instance too?

Hi,

The postgres destination instance is a Aurora Serverless with Postgres support. In my current understanding, the disk space is managed by the service.

We ran the following SQL instruction:

SELECT datname, temp_files AS "Temporary files",pg_size_pretty(temp_bytes) AS "Size of temporary files" FROM pg_stat_database
where datname = 'data_stack';

And we got

  • Temporary Files: 225
  • Size of temporary files: 110 GB

Do you think it could be the root cause?

Regards.

I think base/pgsql_tmp/ is the directory where the postgres query executor stores temporary files for some operations. This is why I’m prone to say this problem is on Aurora side.
Could you please check this AWS documentation, they have some interesting suggestions. I’d also suggest boosting your Aurora RDS cluster size if it’s something you can act on.

Thanks for your advice. We will have a look on it and if it is too complicated, we may switch to a “classical” RDS.