Summary
The user is inquiring about potential issues with the date cursor and ingestion of large tables from Snowflake using the Airbyte Snowflake connector.
Question
Hi folks, I have a question regarding incremental sync. I am trying to ingest a very large table from Snowflake. The only column in the table that can be used as a cursor is of type date (example: order_day
). However, each order_day
can have about 8 million rows. I see that the intermediate state emission frequency for the Snowflake connector is <airbyte/airbyte-integrations/connectors/source-snowflake/src/main/java/io.airbyte.integrations.source.snowflake/SnowflakeSource.java at master · airbytehq/airbyte · GitHub to 10k>. I’m wondering if this would cause any issues with the date cursor?
By looking at the code it seems like Airbyte runs a query similar to:
FROM table
WHERE day >= '2024-01-01'
ORDER BY order_day ASC```
and ingests the top 10K rows until `actualRecordCount == cursorInfo.getCursorRecordCount()`. Given that each day will result in 8M rows, would it miss ingesting data from Snowflake?
<br>
---
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1724194980390379) if you want
to access the original thread.
[Join the conversation on Slack](https://slack.airbyte.com)
<sub>
["incremental-sync", "snowflake-connector", "large-tables", "date-cursor", "ingestion"]
</sub>