According to the doc of incremental sync with deduped history, it says
When replicating data incrementally, Airbyte provides an at-least-once delivery guarantee. This means that it is acceptable for sources to re-send some data when ran incrementally. One case where this is particularly relevant is when a source’s cursor is not very granular. For example, if a cursor field has the granularity of a day (but not hours, seconds, etc), then if that source is run twice in the same day, there is no way for the source to know which records that are that date were already replicated earlier that day. By convention, sources should prefer resending data if the cursor field is ambiguous.
MSSQL source code:
final String sql = String.format("SELECT %s FROM %s WHERE %s > ?",
String.join(",", newColumnNames),
sourceOperations.getFullyQualifiedTableNameWithQuoting(connection, schemaName, tableName),
sourceOperations.enquoteIdentifier(connection, cursorField));
Say I have a table that I’m currently syncing and the only cursor available is a date column (without hours, minutes, seconds), I then do a sync daily at 14:00 on 2022-07-29; my cursor becomes 2022-07-29 (latest date of the cursor).
Then the next day, at 14:00 the query will execute "SELECT %s FROM %s WHERE %s > 2022-07-29"
, however other data was inserted in that table between 14:00 and midnight on 2022-07-29. The source connector won’t be able to pick those rows that were inserted between 14:00 and 23:59 on 2022-07-29.
Would there be eventually a way to tell the connector what kind of operator we want to use with the cursor (>
or >=
) ? That way we could have MSSQL query be "SELECT %s FROM %s WHERE %s >= ?
instead (greater or equal) if the cursor field is ambiguous.
Ideally, my source would have a complete timestamp for the cursor but we don’t live in a perfect data world, unfortunately.