Handling schema drift in Airbyte for CDC data replication from MSSQL to Snowflake

Summary

When encountering schema drift in Airbyte for CDC data replication from MSSQL to Snowflake, the user is facing challenges in properly refreshing the target warehouse without data loss. The issue arises when adding a new column and dropping an old one, leading to unintended consequences in the target SCD table.


Question

Hey team,
I seem to have hit a roadblock. I’ve set up an Airbyte connection to capture CDC data from an MSSQL source database into Snowflake.
During a schema drift (where I added a new column and dropped an old one), Airbyte detected the stream changes, as shown in the attached image.

However, I’m unable to select either option to properly refresh the target warehouse.
The issue I’m facing is that if I allow both options, Airbyte creates the new column but also deletes the old column from the target SCD table, resulting in data loss in the warehouse.
Any ideas on what I might be missing here or how to resolve this?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["schema-drift", "cdc-data-replication", "mssql", "snowflake", "data-loss", "refresh", "target-warehouse"]