How to work with metadata and Airbyte

martindlarsson · October 17, 2022, 10:21am

Hi!
Me and my organization is new to Airbyte and we are currently investigating which data integration tool to use. Some questions remain unanswered regarding Airbyte and specifically around metadata.
We will use dbt on Azure and right now our only option is to stage the data in Azure blob storage as csv or json. We hope the Databricks connector gets support for Azure soon which would be a better option regarding metadata.
Our questions are:

We would like to automate the creation of dbt source definitions. Airbyte has this information but how can we get a hold of this information to pass it to dbt (generate yaml files)?
Can Airbyte notify us about schemadrift?
Are new columns/fields added automatically to streams?
Is removal or rename of columns/fields automatically propagated to streams or does the job fail?

Singer.io has the ability to discover schema of a source and generates a schema definition in json schema format. Is this something that Airbyte has the ability to do today or something that is on the roadmap?

Thank you for your input!

harshith · October 17, 2022, 2:59pm

Hey thanks a lot for reaching out.

Hey we automate the dbt creation and also have the custom DBT option so would recommend use custom DBT after the generated one is successfully run.
We don’t have this right now, would suggest you to create a github issue for this
We don’t do that automatically but we have API using which we can to discover and then update it.
Same as above would suggest to use the API to update the columns/fields and ideally it shouldn’t fail the sync

martindlarsson · October 18, 2022, 7:47am

Thanks!

On 1, are you suggesting to use custom dbt project for a connection/destination? I think we would like to have dbt running separate from Airbyte and orchestrated with a tool like Prefect. What are the pros and cons of letting Airbyte run the transformation? Not even sure that would be possible since we will use Databricks as lakehouse.

After reading the documentation here I realized that our destination (Azure blob storage) doesn’t support normalization…

I also think it might be possible to extract the source and/or destination schema with octavia cli. That would require us to transform the configuration.yaml to a dbt source yaml. Would be cool if octavia ci could do that for us.

harshith · October 20, 2022, 3:29am

We run both the custom DBT and normalisation over Airbyte platform you can go ahead and explore this option. Also on the octavia cli part feel free to create a github issue so that team can check and get back to you

Topic		Replies	Views
Fetching data from Azure Table Storage to Postgres Q&A	3	232	February 27, 2023
Custom Data Transformation in Airbyte Connector Questions airbyte-connector , connector , question , data-transformation , custom-data-transformation	1	1006	July 31, 2024
Custom path for DBT's profiles.yml Connector Questions & Issues normalization , data-loading , transformation	2	1075	September 30, 2022
Airflow for orchestrating Airbyte syncs and DBT transformations Platform Questions platform , airbyte , question , airflow , dbt	3	96	June 28, 2024
Trouble linking Airbyte output to dbt models in dagster Connector Questions airbyte , api , connector , json , dbt	1	37	August 6, 2024

How to work with metadata and Airbyte

Related topics