I have a BigQuery dataset with a series of 3 GB table that I am trying to sync to Google Cloud Postgres. The sync was successful when I tried running it for just 1 table, so I decided to try the incremental sync mode with deduped + history. I then merged the 2nd table from BigQuery into the first one that was already synced in order to test the incremental sync. The problem is that near the end of the sync, this error happens:
Database Error in model contacts_common_part_10000000_stg (models/generated/airbyte_incremental/for_export/contacts_common_part_10000000_stg.sql)
temporary file size exceeds temp_file_limit (3807390kB)
compiled SQL at …/build/run/airbyte_utils/models/generated/airbyte_incremental/for_export/contacts_common_part_10000000_stg.sql,retryable=,timestamp=1667879986684]]]
2022-11-08 03:59:47 normalization-orchestrator >
2022-11-08 03:59:47 normalization-orchestrator > ----- END DEFAULT NORMALIZATION -----
2022-11-08 03:59:47 normalization-orchestrator >
2022-11-08 03:59:47 normalization-orchestrator > Writing async status SUCCEEDED for KubePodInfo[namespace=jobs, name=orchestrator-norm-job-804129-attempt-1, mainContainerInfo=KubeContainerInfo[image=airbyte/container-orchestrator:dev-4c6f520, pullPolicy=IfNotPresent]]…
2022-11-08 03:59:48 INFO i.a.c.t.TemporalUtils(withBackgroundHeartbeat):316 - Stopping temporal heartbeating…
2022-11-08 03:59:48 INFO i.a.w.t.TemporalAttemptExecution(get):162 - Stopping cancellation check scheduling…
2022-11-08 03:59:48 INFO i.a.c.t.TemporalUtils(withBackgroundHeartbeat):283 - Stopping temporal heartbeating…
2022-11-08 04:00:02 INFO i.a.w.t.s.ReplicationActivityImpl(getContainerLauncherWorkerFactory):301 - received response from from jobsApi.getJobInfoLight: class JobInfoLightRead {
job: class JobRead {
id: 806801
configType: sync
configId: 3b154632-e660-4d4f-9bec-afd006ff9453
createdAt: 1667880000
updatedAt: 1667880000
status: running
resetConfig: null
}
}
I am using the online version of Airbyte (cloud.airbyte.io) and I have a Postgres Machine with 2 vCPUs and 13GB of memory.
The reason I am trying out the incremental sync is because the BigQuery tables should be merged in Postgres and the total size of the merged table should be around 75 GB in total and with more data to be added in the future.