What is the best practice regarding mapping data between the source and destination?

pinglin · July 31, 2022, 2:30am

Hello team,

I am wondering what’s the best practice to map data between the source and destination? I know Airbyte destination connectors generally have the _airbyte_ab_id uuid in the destination db table, but how do I use that uuid to find out the original record in the source? Thanks for any tips.

sajarin · August 2, 2022, 3:12pm

Hey @pinglin, thanks for the question. The _airbyte_ab_id_ uuid in the destination table is a metadata column created during the destination normalization process. According to the docs: Basic Normalization | Airbyte Documentation, it’s just a random uuid and cannot be used to find the original record in the source.

What connector are you using for the source and does the data in your source use primary keys for the records?

pinglin · August 15, 2022, 4:21pm

Hey @sajarin, thanks for the follow-up.

We are looking for a general approach here that can apply to all Airbyte destination connectors.

Context: We are the team behind VDP. VDP is an unstructured visual data ETL adopting Airbyte Protocol and integrating Airbyte’s destination connectors for its pipeline.

As the AirbyteMessage/AirbyteRecordMessage data to write to the destination is formed on the fly (i.e., as described in this section), we will need to code the data association logic into VDP’s backend codebase. In other words, there is no primary key for the records in this context. Nonetheless, we can still generate some sort of unique index for each record injected from the source (i.e., the input image).

So far the Airbyte integration for VDP has been implemented purely by ourselves referring to Airbyte documentation and tracing the source code. It will be nice to discuss further with Airbyte engineers to make sure we are not doing something ineffective.

Thanks.

sajarin · August 17, 2022, 5:17pm

Hey @pinglin,

Thanks for the additional context and thanks for using Airbyte in your project. My recommendation is to post your questions here in the forums and our team will help answer or redirect your questions to those who can answer. Alternatively, you can create issues in our Github repo, but it may take longer to get a response there.

Topic		Replies	Views
Mapping SQL schema in Airbyte Connector Questions airbyte , connector , question , primary-key , sql-schema	1	71	May 14, 2024
How does the flow of airbyte goes in incremental dedup history mode Q&A	1	859	August 16, 2022
Best practices for handling data resets Q&A normalization , getting-started , data-loading , schema	4	1875	July 14, 2022
Maintain datetime/timestamp datatype in destination (IE Postgres & SQL Server sources) Connector Questions & Issues normalization , data-loading , connectors	1	837	October 25, 2022
Destination data types all varchars Connector Questions & Issues source-postgres , destination-snowflake , normalization	5	319	November 29, 2022

What is the best practice regarding mapping data between the source and destination?

Related topics