Error with CDC in MS SQL Server to Snowflake data integration

slack-user-airbyte · June 26, 2024, 6:11am

Summary

Error related to saved offset not present on the server in CDC process. Need to understand how to mitigate this error and handle full refresh in case of database restart or failover. Question about forcing snapshot creation and clarification on the server mentioned in the error message.

Question

Hi, I am extracting data from MS SQL Server database using cdc and loading into snowflake. Every now and then, the job runs into this error. Post this, I need to reset the data and trigger the load from scratch.
io.airbyte.commons.exceptions.ConfigErrorException: Saved offset no longer present on the server. Please reset the connection, and then increase binlog retention and/or increase sync frequency.

I’m new to airbyte and trying to understand how cdc works. I could make out that this has to do with the last offset saved on sql server, which airbyte maintains as a checkpoint to start. However, isn’t it easy to lose that lsn, whenever the database is restarted or a failover happens? How can we mitigate this error/full refresh in that scenario? Is it possible to force snapshot creation after the initial snapshot, so that this lsn is updated to a recent value? I would also like to understand if the server referred to in the error message, is it the db server or airbyte server. Thanks!

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["cdc", "ms-sql-server", "snowflake", "saved-offset", "binlog-retention", "sync-frequency", "snapshot-creation", "failover"]}

slack-user-airbyte · July 25, 2024, 6:17am

this is the offset info from the connection state :
"mssql_cdc_offset": {
"[\"ZoomSTG\",{\"server\":\"ZoomSTG\",\"database\":\"ZOOMSTG\"}]": "{\"transaction_id\":null,\"event_serial_no\":1,\"commit_lsn\":\"0017c0b6:0002465a:0013\",\"change_lsn\":\"0017c0b6:0002465a:0010\"}",
"[\"ZoomSTG\",{\"server\":\"ZoomSTG\",\"database\":\"ZoomSTG\"}]": "{\"commit_lsn\":\"0017c0b3:0009d2f1:0001\",\"snapshot\":true,\"snapshot_completed\":true}"
},

The snapshot commit_lsn 0017c0b3:0009d2f1:0001 is the value noted from a full refresh. This value doesn’t change until I hit the 10 day cdc retention mark in sql server and then it fails with the error that the saved offset is no longer present on the server. The other commit_lsn from incremental changes is updated and data is pulled accurately for 10 days. Is this correct behaviour?

I also noticed this error in the logs, even when the sync succeeds and pulls incremental data successfully
source > ERROR debezium-sqlserverconnector-change-event-source-coordinator i.d.p.ErrorHandler(setProducerThrowable):52 Producer failure io.debezium.DebeziumException: java.util.concurrent.ExecutionException: java.lang.InterruptedException: Interrupted while snapshotting table student The logs also show that the debezium engine was stopped for some reason before hitting this error.

The ``snapshot_completed":true}` is true in the offset file. Does it mean that the snapshot was completed successfully after failing initially. Or the snapshot offset not changing is a result of the above error that the snapshotting was interrupted?

Any pointers in this regard are appreciated. Thanks! <@U01MMSDJGC9> <@U01AB6V6NMQ>

Topic		Replies	Views
Understanding MSSQL CDC Offset and Debezium Snapshotting Connector Questions connector , question , sync-error , mssql-cdc-offset , debezium-snapshotting	0	225	June 30, 2024
Handling schema drift in Airbyte for CDC data replication from MSSQL to Snowflake Connector Questions data-loss , refresh , connector , question , mssql	0	25	September 11, 2024
Error syncing tables from MS SQL Server to Snowflake with CDC enabled Connector Questions connector , error , question , snowflake , ms-sql-server	1	132	July 20, 2024
Issue with MSSQL Connector and Initial Snapshot LSN Check Connector Questions connector , github-issue , bug , mssql-connector , incremental-syncs	0	9	November 19, 2024
Regarding CDC option for Replication Method (MSSQL Server connector) Connector Questions & Issues connectors	11	1311	April 26, 2022

Error with CDC in MS SQL Server to Snowflake data integration

Summary

Question

Related topics