Summary
Explanation of how to use incremental_sync
yaml node in Full Refresh mode and how to implement an incremental stream with an updated_at
cursor while initially running a full refresh of all objects.
Question
Hello everyone. Can anyone explain how the incremental_sync
yaml node should be used in a Full Refresh mode? I’d expect that when the Full Refresh mode is used the incremental_sync
yaml node is just ignored and the stream would just fetch all data using the defined pagination strategy without using incremental parameters. Aparently this seems to not to be the case and from what I observed the Full Refresh mode just doesn’t record the final stream state. That said, is there a way to implement an incremental stream (let’s say with updated_at
cursor), but initially when a stream is created run a full refresh of all objects regardless when they were last updated?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.
Join the conversation on Slack
["incremental-sync", "full-refresh", "pagination-strategy", "stream-state", "updated-at-cursor"]
> That said, is there a way to implement an incremental stream (let’s say with updated_at
cursor), but initially when a stream is created run a full refresh of all objects regardless when they were last updated?
If the state is empty (no previous syncs), this is how an incremental stream should behave the first time the stream is created. Which records are you getting, if not all of them?
> Aparently this seems to not to be the case and from what I observed the Full Refresh mode just doesn’t record the final stream state.
Can you clarify what you’re seeing? Are you seeing that on full refresh mode, your stream is not re-syncing all data?
On the initial sync (when the stream state is empty) it still passes the incremental stream parameters with the values defined by start_datetime/end_datetime values. So in order to do a full sync I need to configure start_datetime far in the past in order to load the full data set.
I guess my problems came from the common incremental_sync configuration that I created for another stream that had a start_datetime defined. If I set it to something like 1970-01-01
that should do the job.
Ah, that would make more sense (although understandable that it is confusing if you’re using a shared config when you thought it was separate ones). The incremental functionality is supported by the partitioner, not the start and end dates.