Website_overview stream creates multiple rows for each date in Incremental/Deduped sync mode

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Airbyte deployed via plural.sh
  • Memory / Disk: n/a
  • Deployment: n/a
  • Airbyte Version: 0.44.2
  • Source name/version: Google Analytics 4 - v0.1.3
  • Destination name/version: Big Query - v1.2.19
  • Step: During sync
  • Description: The issue is that the data in BigQuery for the website_overview stream has multiple entries for each date. This causes incorrect data when creating reports, e.g. in Looker Studio. When comparing the data in BigQuery with the data in the Google Analytics 4 dashboard directly, I can see that only the latest row for each date has the correct data.

My questions are:

  1. Why are there multiple rows for each date? I am using the Incremental/Deduped sync mode - with the date field as the Cursor field, hence I would expect the row will be updated, not a new row will be created on each sync.
  2. Any advice on how to fix this issue? One idea is to create a new table that is only storing the latest entry (determined by _airbyte_emitted_at field). However it seems like this is fixing a problem that shouldn’t be there in the first place.