CDC Mechanism in MongoDB Connector

slack-user-airbyte · July 12, 2024, 6:11am

Summary

The user is inquiring about the behavior of the CDC mechanism in the MongoDB connector, specifically regarding the handling of deleted records from the source.

Question

Hello all,
We recently switched to the latest MongoDB connector and because of our volume of data, CDC is the only ingestion mode suitable.
However, I’m surprised to read that the CDC mechanism doesn’t yet flag deleted records from source and just deletes them in the destination. I thought that was one of the main benefit of CDC.

Am I missing anything (set up, config, etc.) or is this correct ?

Based on the information provided in the knowledge sources, Airbyte currently removes deleted rows from the final table in the destination as part of the deduplication process for CDC database sources. This means that if a row is deleted in your source (like MongoDB), it will not be present in the final table in your destination (like Redshift).

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["mongodb-connector", "cdc-mechanism", "deleted-records", "ingestion-mode", "deduplication"]}

slack-user-airbyte · July 14, 2024, 6:17am

is the quoted answer from http://kapa.ai|kapa.ai?

We use CDC incremental append to sync from MySQL/Postgres, and with that we keep track of all changes, including deletions.

slack-user-airbyte · July 14, 2024, 6:17am

the deletion flag is _ab_cdc_deleted_at
https://docs.airbyte.com/understanding-airbyte/cdc/#syncing

slack-user-airbyte · July 14, 2024, 6:17am

Thanks for your answer. Yes correct, this is the answer I received from http://kapa.ai|kapa.ai.

We currently use the incremental sync mode “append + deduped”, and I noticed in the log that Airbyte CDC deletes deleted record from source :

46004
2024-07-11 14:09:01 destination &gt; INFO type-and-dedupe i.a.c.d.j.JdbcDatabase(executeWithinTransaction$lambda$1):48 done executing query within transaction: delete from "raw_xxxx"."xxxx_xxxx" where "_ab_cdc_deleted_at" is not null;```
Maybe if we switch to sync mode "append" (without deduped), we would have the deleted records with "_ab_cdc_deleted_at" not null ? But without deduped, I'm worried that we'll end up with a lot of dupplicated records.

slack-user-airbyte · July 14, 2024, 6:17am

Maybe if we switch to sync mode “append” (without deduped), we would have the deleted records with “_ab_cdc_deleted_at” not null ?
exactly, this is what we get

But without deduped, I’m worried that we’ll end up with a lot of dupplicated records.
there are some duplicated and historical records, but in our case, we can deduplicate them according to our needs with a transformation tool (eg. dbt). For us, it’s important to keep track of the changes vs. getting snapshots.

So when we want the current state of the table, we deduplicate and only keep the latest state of each row with something like partition by primary_key(s) order by _ab_cdc_log_file desc, _ab_cdc_log_pos desc, _ab_cdc_updated_at desc… might this also work for you…?

slack-user-airbyte · July 14, 2024, 6:17am

Thank you very much !

> might this also work for you…?
I’m not sure that would work for us, given that the scope of data is large with very frequent changes and at the same time hard deleted records in source should be quite rare. We’ll investigate, thank you very much for your feedbacks !

Topic		Replies	Views
Issue with record deletion in Airbyte connector Connector Questions airbyte-connector , connector , question , mongodb-source , bigquery-target	0	8	August 27, 2024
Issue with Record Deletion in BigQuery from MongoDB Source Connector Questions airbyte , connector , bigquery , mongodb , incremental-append	0	4	January 7, 2025
PG > RedShift sync (CDC, wall2json) not propagating deletes down to destination table Connector Questions & Issues connectors	4	150	July 14, 2022
Sync of Deleted records in source to the target datasource Connector Development	5	1964	July 14, 2022
CDC Postgres (Deduped+history) to Bigquery: deleted_at field is not getting updated for deleted records Connector Questions & Issues normalization , data-loading	11	684	July 14, 2022

CDC Mechanism in MongoDB Connector

Summary

Question

Related topics