Handling Full Refresh Overwrite Sync Interruptions in Airbyte

Summary

When using an S3 source and Postgres destination with Full Refresh Overwrite sync, a {record_type}.parquet file is targeted and overwritten hourly, causing interruptions in the full refresh due to S3 streaming data. User seeks recommendations to avoid this issue.


Question

Hi Everyone - I am utilizing an S3 source and Postgres destination, and have been using a Full Refresh Overwrite sync. Within this structure, I have one file {record_type}.parquet that is targeted and overwritten by a scripted process every hour. On some intervals, this overwrite interrupts the full_refresh due to the S3 source streaming in the data, and thus results in an incomplete export.

Any recommendations on how to avoid this? I’ve chosen not to utilize incremental sync for this pipeline because of the large volume and lack of necessity for history.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["s3-source", "postgres-destination", "full-refresh-overwrite", "interruptions", "incomplete-export"]