Question about Incremental Sync with Snowflake Connector

Summary

The user is inquiring about potential issues with the date cursor and ingestion of large tables from Snowflake using the Airbyte Snowflake connector.


Question

Hi folks, I have a question regarding incremental sync. I am trying to ingest a very large table from Snowflake. The only column in the table that can be used as a cursor is of type date (example: order_day). However, each order_day can have about 8 million rows. I see that the intermediate state emission frequency for the Snowflake connector is <airbyte/airbyte-integrations/connectors/source-snowflake/src/main/java/io.airbyte.integrations.source.snowflake/SnowflakeSource.java at master · airbytehq/airbyte · GitHub to 10k>. I’m wondering if this would cause any issues with the date cursor?

By looking at the code it seems like Airbyte runs a query similar to:

FROM table
WHERE day >= '2024-01-01'
ORDER BY order_day ASC```
and ingests the top 10K rows until `actualRecordCount == cursorInfo.getCursorRecordCount()`. Given that each day will result in 8M rows, would it miss ingesting data from Snowflake?

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1724194980390379) if you want 
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["incremental-sync", "snowflake-connector", "large-tables", "date-cursor", "ingestion"]
</sub>