Unexpected Record Updates and Data Integrity Issues in Airbyte Incremental Sync to Snowflake

Summary

The user is facing unexpected record updates and data integrity issues while using Airbyte OSS on AWS EC2 to sync data from Google and Meta Ads APIs to Snowflake in incremental + append (deduped) mode. The issues include records being updated even if they haven’t changed and incorrect data over time. The primary key is set to campaign ID, date, and device. The user seeks insights on why this is happening and how to troubleshoot or resolve the problems.


Question

I’m using Airbyte OSS on AWS EC2 to sync data from Google and Meta Ads APIs to Snowflake. Initially, I did a full load of all existing data (already in incremental + append (deduped) mode). The data includes KPIs like spending, clicks, and impressions, recorded daily and segmented by dimensions such as date and device.
I’ve set the sync strategy to “incremental + append (deduped)” with the primary key set to campaign ID, date, and device, as these uniquely identify each row. However, I’m encountering some unexpected issues:

  1. Unexpected Record Updates: The _AIRBYTE_UPDATED_AT column in Snowflake updates for records even if they haven’t changed. For instance, records from “2024-07-28” are updated on “2024-07-30” and “2024-07-31”, despite being synced initially on “2024-07-29”. I expected that records with an older date cursor than the last sync date wouldn’t be considered in subsequent syncs.
  2. Data Integrity Issues: It appears that not only are records being unexpectedly updated, but some data is also becoming incorrect over time. This was not the case immediately after the full load, where the data matched the Ads Dashboards accurately. The issues seem to arise during the incremental sync process.
    Has anyone experienced similar issues with Airbyte or have insights on why this might be happening? Could it be related to how Airbyte handles data deduplication or the incremental sync logic? Any advice or suggestions on how to troubleshoot or resolve these issues would be greatly appreciated.
    Thank you!


This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte-oss", "aws-ec2", "google-ads-api", "meta-ads-api", "snowflake", "incremental-sync", "append-deduped", "data-integrity", "record-updates", "primary-key", "troubleshoot"]