Incremental sync design for API without cursor value in returned data

Hello :wink:

I’m developing connector for an API that allows me to specify lastmodified as a query param and returns rows without any information about modification time. Do I understand it correctly that to prevent potential problems with upstream (e.g. incomplete data) I have to pull data in overlapping time periods, e.g. every week (7 days) grab data with lastmodified=now - 8 days?

Hi @druid,
If your stream is incremental you can store the lastmodified value in the state and use it as a cursor. On the second run of the sync will pick the latest cursor value and start syncing from this lastmodified value.

You mention that the lastmodified value is not in available in the data returned by the API. Here’s what I suggest, under the assumption your API endpoint allows you to query last modified ranges:

  • Leverage stream slicing to dynamically generate lastmodified range interval: your stream_slices returns a generator of start_lastmodified, stop_lasmodified timestamps with a timedelta of 1 minute (or whatever interval best matches your incoming data).
  • Use start_lastmodified and stop_lastmodified from the stream slice to build your API request (in the request_params function for example)
  • In read_record, use stop_lastmodified from the stream slice to set the lastmodified field and self._cursor_value

The state will be automatically checkpointed in database at the end of each successfully processed slice.

Feel free to check out our documentation about reading data for incremental streams here.