Source Google Analytics 4- Ga4 api to bigquery: problem with incremental data

Hi, :smile:
I succeeded in getting a custom report data from GA4 API , then send it to bigquery.
Now, I wanna update only incremental data from the source but there is nowhere to setup custom cursor field or set up the end date is yesterday’s date.

Is there any way to do so?

Let’s me explain a little bit:
So, here is some of my setup info:

  • GA4 version: Google Analytics Data API ALPHA 0.0.3

  • Custom reports: [{“name”: “Revenue by ad events”, “dimensions”: [“date”, “country”, “appVersion”, “eventName”], “metrics”: [“totalAdRevenue”, “totalPurchasers”, “totalRevenue”]}]

If I choose Sync Mode = Incremental | Append (or Deduped History), the Cursor Field is default as date, which means when I call api at, ex: 9am 15/09/2022, then all record after 9am of 15/09/2022 won’t be taken anymore.

Even though I can use Sync Mode = Full Refresh | Overwrite, this means is definitely a bad idea for long term usage (in case I get more custom report, and the date range is longer)

Hope someone can help!
Really appreciate!

1 Like

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

Hi @phucdinh,

Thanks for your post and welcome to the community! I may be mistaken but the Incremental | Deduped + History mode should sync all of the data as long it’s a daily sync. Have you tried creating this connection and are there indeed records that are not synced?

Unfortunately, the GA4 connector doesn’t support any other cursors other than the date.

Hi @sajarin,
Thanks for replying.
I did try Incremental | Deduped + History mode first, and the problem occurs like I explained above:

  • When I made a daily API call at 7am. And saying for example today is 19/09/2022

  • Then any record between 7AM of 19/09/2022 and 12AM 20/09/2022 will NOT be updated since Cursors Field of the Source is Date

(I assume Airbyte syncs data of date 19/09 at 7AM, and when it checks the source again, the date 19/09 already exists, so it skips updating record between 7AM and 12AM)

Hey @phucdinh,

Thanks for clarifying. This issue happens because the date field is not granular enough to pick up records that are updated in the course of a day. You can either schedule one sync within a 24 hour period everyday, or use Incremental | Append (if the specific stream supports it) and perform a clean job to remove the duplicate/stale records. Another option would be to update the GA4 connector to use something more granular for the cursor but that would require some time to investigate and merge onto our main branch.

1 Like

Hi @sajarin,
Hope you can help!
I tried to apply Incremental | Append methods since my data full refresh are getting bigger and bigger (take hours to query if I use Full Refresh | Overwrite).
So I have read about cursor field value in this doc. And in Known Limitations Part, it talks about max cursor field value.
Thus, I have a question about where the ‘last_sync_max_cursor_field_value’ stored? Is there anyway I can modify this value without writing custom code?

FYI: I did try to delete today’s records in destination (in my case is Big Query), then sync again using Incremental | Append, but it always returns no new records.

Here some logs:

  • First sync return all records up to today’s date which is 03/10/2022. Then I deleted all 03/10/2022 record in destination

  • Second sync when using Incremental | Append

2022-10-03 03:23:22 source > Starting syncing SourceGoogleAnalyticsDataApi
2022-10-03 03:23:22 source > Syncing stream: revenue by ad events 
2022-10-03 03:23:22 source > Setting state of revenue by ad events stream to {'date': '2022-10-03'}
2022-10-03 03:23:22 source > Read 0 records from revenue by ad events stream
2022-10-03 03:23:22 source > Finished syncing revenue by ad events
2022-10-03 03:23:22 source > SourceGoogleAnalyticsDataApi runtimes:
  • Try to sync again:
2022-10-03 03:32:04 source > Starting syncing SourceGoogleAnalyticsDataApi
2022-10-03 03:32:04 source > Syncing stream: revenue by ad events 
2022-10-03 03:32:04 source > Setting state of revenue by ad events stream to {'date': '2022-10-04'}
2022-10-03 03:32:04 source > Read 0 records from revenue by ad events stream
2022-10-03 03:32:04 source > Finished syncing revenue by ad events
2022-10-03 03:32:04 source > SourceGoogleAnalyticsDataApi runtimes:

The date cursor field automatedly increase to tomorrow’s date!!

If you have any solution, hope to hear it too!

Hey @phucdinh,

Sorry for the delay. I saw that you opened up a new post for this issue. Can you try adding data instead of removing data and see if it appends properly? My guess is that when you delete data it’s not resetting the state of the cursor (it’s still 10/3/22) and so it bumps whatever is the current state is which is reflective of the data.

Let me know if that makes sense.

Hello @phucdinh.

I’m having the same problem, did you find a solution?

The first sync works, but subsequent incrementals return 0 rows.

Hi @meloxbr you can use image that I build custom for GA4
phucdinh/airbyte-source-google-analytics-ga4-custom general | Docker Hub
After pulling the docker image to your local, then you can register new sources in Airbyte
Guide: Custom or New Connector | Airbyte Documentation
And finally you can use Incremental Append up to yesterday data

Hey there
I would also love to try your fix to the GA incremental issue
Any chance there is also some PR in airbyte git to see the code fix ?
I’m wrapping my head around this issue for the last few days, but I’m airbyte noob :slight_smile:

hi @phucdinh I already pull your docker image and add the source in my airbyte, but I got error when I create connection from your source to redshit, this is the error. I hope you can help me