Universal analytics connector issue

Hi there,

We’ve been trying to set up our universal analytics to big query connector for months now and i keep running into issues. A quick description of our situation:

We have a custom report that uses a custom dimension which is unique for every session. This means for some days we have upwards of 60K rows per day that need to be synched.

Now here is the issue:
Originally i thought to get all the data (since January 2020) i set “replication start date” to 2020-01-01 and the sync mode to dedup+history.
This works fine for the initial refresh. For every subsequent refresh however (which i set to daily and experimented with 3 times a day as well) there are a lot of rows missing in the days since the initial sync. I have read somewhere (can’t find the link) that this is somehow related to the “cursor field” ga_date and ambiguity therein. If i don’t use ga_date however (which could be an option), dedup + history is not possible anymore as i can’t set a different cursor.

I then tried the following:

  1. go back to full refresh overwrite. This is not an option because the query is way too big to repeat daily. It takes over a day each time and is not efficient enough.

  2. Do a simple append. It leads to even more problems than the dedup + history version. Some rows are missing while others are doubled now (which i could of course clean up but the missing rows are still an issue)

  3. manually or automatically reset the “replication start date” to 30 days ago and use full overwrite. This deletes even the older entries from the destination database and is therefore also not useful unless there is a way to somehow ONLY overwrite data newer than 30daysago. and keep the older data in the destination but i can’t find that anywhere

The only solution i see is to somehow manually change the lookback window. The google reporting api has a method for that but unfortunately it doesn’t seem to be a valid input in airbyte.

I’m all out of ideas and would love some help here

Hi @liomm, you are correct that the problem lies in the cursor field: I’ve seen this issue come up with this connector before and it looks like the ga_date field is not granular enough to work correctly in your case. Could you give me an example of what that field looks like? If it does not include a timestamp, I suggest you open a feature request to add it.

Regarding the lookback window:
We’ve got an issue open for that:
https://github.com/airbytehq/airbyte/issues/17030

I know this isn’t the solution you’re looking for, but I think the best way forward is adding granularity to that field and putting a PR in for the other open issue. If you’ve got time to contribute, please feel free!

What is the best solution you found so far now?
I am doing the same and facing a lot of issues when the table gets to have in destination end.

I am trying to pull till 2007
what is the best sync method you think?

given that UA is deprecated now this particular issue is not relevant anymore as there’s no more new data to be expected.

Thus a full refresh would be the only reasonable approach.

I think there’s a limit to how far back in the past you can still get data with the API. I don’t remember exactly but i can’t imagine it’s possible to get data all the way back to 2007. That might actually be your problem. I would suggest you try 1 or 2 years and see if that works and then increase the the timeframe until you get an error. If there is no timelimit but rather an API load limit you can also just create different datasets for different years and join them later inside your database. This should be pretty clean as there is no new universal analytics data coming in