Soft Reset Triggering in S3 to Snowflake Incremental Sync

Summary

Airbyte is executing Soft Resets every time the sync runs from S3 to Snowflake, causing significant performance degradation. Logs show conflicting information on whether Soft Reset is needed. The issue is becoming cost prohibitive as the internal table grows.


Question

Has anyone encountered situations where syncing data from S3 –> Snowflake (incremental append) results in Airbyte executing Soft Resets every time the sync runs? We have some large datasets, where we are seeing significant performance degradation as the airbyte_internal table continues to grow and soft reset is happening every time driving syncs that should only take a few minutes to now taking 2 to 3 hours, because it is setting the loaded_at timestamp to NULL across all internal table records, rebuilding the entire final table, and then resetting the loaded_at timestamp to current.

What’s interesting is that reivewing the logs, I see this line indicating SoftReset is not needed
2024-08-28 12:35:49 [43mdestination[0m > INFO main i.a.c.d.j.JdbcDatabase(executeWithinTransaction$lambda$1):46 executing query within transaction: insert into "airbyte_internal"."_airbyte_destination_state" ("name", "namespace", "destination_state", "updated_at") values ('airbyte_brand', 'TALENTREEF', '{"needsSoftReset":false,"airbyteMetaPresentInRaw":true}', '2024-08-28T12:35:48.549790446Z')
, but shortly thereafter I see :
2024-08-28 12:35:51 [43mdestination[0m > INFO sync-operations-3 i.a.i.b.d.t.TyperDeduperUtil(executeTypeAndDedupe):212 Attempting typing and deduping for TALENTREEF.airbyte_brand with suffix _ab_soft_reset
We have syncs that run every 8 hours and, what I suspect is a bug, is quickly becoming cost prohibitive since the long-term solution is not sizing up the Snowflake WH.

Kapa did not provide sufficient info on why SoftReset was triggering in this case

Has anyone encountered this before and can anyone from the Airbyte Team assist?

cc <@U05L207H1BJ> / <@U06KPKLUK26>



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["s3", "snowflake", "incremental-append", "soft-reset", "performance-degradation", "bug"]