Summary
The user is asking about using a custom cursor of ‘time - 2 days’ to retrieve late-arriving data in the Mixpanel to BigQuery integration. They are unsure if a custom connector is needed for this task.
Question
We need to use a cursor of ‘time - 2 days’ to get late-arriving data. Do we need to use a custom connector to do this? I’m having this issue on Mixpanel -> BigQuery. I cannot change the source-defined cursor of time
to something like mp_processing_time_ms
which would be a better cursor, and not need me to recheck ‘older’ dates. I found this <@U07513P6DT9>'s <Slack, but that requires me running open-source and not Cloud>?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["custom-cursor", "late-arriving-data", "mixpanel", "bigquery", "custom-connector"]
Thanks for responding <@U07513P6DT9>, you mentioned:
> I was able to resolve the issue of another process resetting the stream state. Here is the POST method: https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/state/create_or_update
> I implement a lookback window by running a daily job that adjusts the cursor back by a defined period, ensuring that any delayed data is captured
> we update the cursor to current_time - 4 days, allowing the system to re-read and process all data from the last 4 days, effectively filling any data gaps we found
That is a nice solution. Does this mean you are running your own Airbyte deployment, or can you use this API against Airbyte cloud?
We are using the self-hosted open-source version. I believe there should be an equivalent API method available for the cloud version.
Cool. Lastly, is your sync mode Incremental + Append + Deduped?
Yes, this approach prevents duplication and eliminates the need for downstream handling.