We’ve been trying to set up our universal analytics to big query connector for months now and i keep running into issues. A quick description of our situation:
We have a custom report that uses a custom dimension which is unique for every session. This means for some days we have upwards of 60K rows per day that need to be synched.
Now here is the issue:
Originally i thought to get all the data (since January 2020) i set “replication start date” to 2020-01-01 and the sync mode to dedup+history.
This works fine for the initial refresh. For every subsequent refresh however (which i set to daily and experimented with 3 times a day as well) there are a lot of rows missing in the days since the initial sync. I have read somewhere (can’t find the link) that this is somehow related to the “cursor field” ga_date and ambiguity therein. If i don’t use ga_date however (which could be an option), dedup + history is not possible anymore as i can’t set a different cursor.
I then tried the following:
go back to full refresh overwrite. This is not an option because the query is way too big to repeat daily. It takes over a day each time and is not efficient enough.
Do a simple append. It leads to even more problems than the dedup + history version. Some rows are missing while others are doubled now (which i could of course clean up but the missing rows are still an issue)
manually or automatically reset the “replication start date” to 30 days ago and use full overwrite. This deletes even the older entries from the destination database and is therefore also not useful unless there is a way to somehow ONLY overwrite data newer than 30daysago. and keep the older data in the destination but i can’t find that anywhere
The only solution i see is to somehow manually change the lookback window. The google reporting api has a method for that but unfortunately it doesn’t seem to be a valid input in airbyte.
I’m all out of ideas and would love some help here