What is the best practice regarding mapping data between the source and destination?

Hello team,

I am wondering what’s the best practice to map data between the source and destination? I know Airbyte destination connectors generally have the _airbyte_ab_id uuid in the destination db table, but how do I use that uuid to find out the original record in the source? Thanks for any tips.

Hey @pinglin, thanks for the question. The _airbyte_ab_id_ uuid in the destination table is a metadata column created during the destination normalization process. According to the docs: Basic Normalization | Airbyte Documentation, it’s just a random uuid and cannot be used to find the original record in the source.

What connector are you using for the source and does the data in your source use primary keys for the records?

Hey @sajarin, thanks for the follow-up.

We are looking for a general approach here that can apply to all Airbyte destination connectors.

Context: We are the team behind VDP. VDP is an unstructured visual data ETL adopting Airbyte Protocol and integrating Airbyte’s destination connectors for its pipeline.

As the AirbyteMessage/AirbyteRecordMessage data to write to the destination is formed on the fly (i.e., as described in this section), we will need to code the data association logic into VDP’s backend codebase. In other words, there is no primary key for the records in this context. Nonetheless, we can still generate some sort of unique index for each record injected from the source (i.e., the input image).

So far the Airbyte integration for VDP has been implemented purely by ourselves referring to Airbyte documentation and tracing the source code. It will be nice to discuss further with Airbyte engineers to make sure we are not doing something ineffective.

Thanks.

Hey @pinglin,

Thanks for the additional context and thanks for using Airbyte in your project. My recommendation is to post your questions here in the forums and our team will help answer or redirect your questions to those who can answer. Alternatively, you can create issues in our Github repo, but it may take longer to get a response there.