Hello, While running a connection sometimes error occurs while transferring data to bigquery destination. Now because of this error the destination process stops but normalization still continues and after that the sync shows fails and retries and it succeeds in 2nd attempt. But some tables that were processed in the 1st attempt have 2 enteries in bigquery and others that were not able to process because of error have only one entry. The problem is that data for already processed tables do not get deleted on the bigquery side after the sync fails. Is there a mechanism in airbyte to delete the data from destination for already synced tables after connection sync fails?
Are you using incremenetal dedup + history sync mode?
No. I am using full refresh append mode.
Today there is no mechanism to exclude those rows. The full refresh append always will append data to your destination generating duplicates. Why the non-finished ones are problematic? Are you running dbt after the sync?
Let me explain the problem that I am getting with an example. Let say there are 2 tables to sync in a connection. Let say after syncing table 1 some error occurred and because of this airbyte retries the sync. Now the data for 1st table is already there in destination as it was already synced before the error occurred. But when airbyte retries, it will sync both the table 1 and 2. So 1st table will be synced 2 times and 2nd table will be synced only 1 time. This is happening in my case when some error occurs. Is there anything airbyte does to prevent it?
If duplication of data is a concern why don’t use full refresh instead?
I am using full refresh append. Do you mean full refresh overwrite?
Yes. sorry wasn’t clear