Regarding CDC option for Replication Method (MSSQL Server connector)

Hi Team, we are using airbyte 0.35.60-alpha version running in kubernetes, we have a question from our POC on Airbyte for data ingestion from mssql server to snowflake datawareshouse.

InAirbyte mssql server source connection, For replication method, we have the following description: “Replication Method - The replication method used for extracting data from the database. STANDARD replication requires no setup on the DB side but will not be able to represent deletions incrementally. CDC uses {TBC} to detect inserts, updates, and deletes. This needs to be configured on the source database itself.” **

When we tried CDC option from above for replication method,** airbyte has thrown error “MSSQL requires snapshot isolation enabled for CDC” error.

However we have gone through in detail the usage of snapshot isolation configuration for mssql server, and we have below question:

  1. Why do we need to enable the “ALLOW_SNAPSHOT_ISOLATION” db option on the mssql server databases? This is not a requirement of CDC in mssql server.

Can anyone clarify the above use of “ALLOW_SNAPSHOT_ISOLATION” db option on the mssql server database? (since we dont want to enable this in mssql server unless we understand the intended use of “ALLOW_SNAPSHOT_ISOLATION” db option in capturing CDC data)

Thanks,
Kiran

Please read this article about the topic you’re curious about: What, when and who? Auditing 101 - Part 2 – SQLServerCentral

Thanks Marco, I have gone through the article you have sent.
It was mentioned, that SNAPSHOT_ISOLATION is only advisable and not mandatory requirement for CDC.

We are confident because, we are using from last 3 years CDC enabled on MSSQL in production with snapshot isolation turned off and there were no issues upto date.

Based on the above article(that you have provided) and analysis, can we have changes in the airbyte, that will allow us to proceed further in setting up connection for CDC without snapshot isolation turned on, as it is not mandatory requirement.

Currently in airbyte, we are getting error that “MSSQL” required snapshot isolation turned on and it is error out when setting up connection.

Is it possible on airbyte not to force this requirement (since it is not mandatory as you can see) and it leave up to user to decide on later to enable/disable?

We are confident that the above should work. Please let me know if you need any details.

Thanks,
Kiran

Hi Team,

As mentioned above, can we make have SNAPSHOT_ISOLATION as optional parameter rather than mandatory?

Moreover, we are also using ReadReplica can’t block things btw

So technically blocking(Isolation levels/Optimistic Locking) is nolonger a factor

Can you please check this.

Thanks,
Kiran

Sorry the long delay in @kiran I’ll return tomorrow

Thanks marcos.

Mainly we are looking for having SNAPSHOT_ISOLATION as optional parameter when doing the validation when creating the source mssql connection.

Hi Marcos, sorry for asking again, just wanted to check on this?

Sorry the long delay @kiran
Maybe it’s possible to use CDC without Snapshot Isolation but the way the CDC mode is build in Airbyte you must turn it on. The reason is to not block your tables when doing the initial reading.

Thanks marcos for the reply. Really appreciate responding to my concerns.

I will test and let you know more details.

Thanks,
Kiran

It seems like in the mssql connector, “cdc” schema is excluded.

public Set getExcludedInternalNameSpaces() {
return Set.of(
“INFORMATION_SCHEMA”,
“sys”,
“spt_fallback_db”,
“spt_monitor”,
“spt_values”,
“spt_fallback_usg”,
“MSreplication_options”,
“spt_fallback_dev”,
“cdc”); // is this actually ok? what if the user wants cdc schema for some reason?
}

Hi marcos,

Folowing up from earlier converstaion, actually we want the cdc schema tables in our case for incremental append.

Any reason why these are excluded? and if yes, can this be changed be include "cdc schema.

Thanks,
Kiran

@kiran can you open a Github issue requesting it and explaining your use case? This will be important to us understand better the reason to allow this internal schema be used.

Thanks marcos for the reply. I have raised above issue in github.