How to work with metadata and Airbyte

Hi!
Me and my organization is new to Airbyte and we are currently investigating which data integration tool to use. Some questions remain unanswered regarding Airbyte and specifically around metadata.
We will use dbt on Azure and right now our only option is to stage the data in Azure blob storage as csv or json. We hope the Databricks connector gets support for Azure soon which would be a better option regarding metadata.
Our questions are:

  • We would like to automate the creation of dbt source definitions. Airbyte has this information but how can we get a hold of this information to pass it to dbt (generate yaml files)?
  • Can Airbyte notify us about schemadrift?
  • Are new columns/fields added automatically to streams?
  • Is removal or rename of columns/fields automatically propagated to streams or does the job fail?

Singer.io has the ability to discover schema of a source and generates a schema definition in json schema format. Is this something that Airbyte has the ability to do today or something that is on the roadmap?

Thank you for your input!

Hey thanks a lot for reaching out.

  1. Hey we automate the dbt creation and also have the custom DBT option so would recommend use custom DBT after the generated one is successfully run.
  2. We don’t have this right now, would suggest you to create a github issue for this
  3. We don’t do that automatically but we have API using which we can to discover and then update it.
  4. Same as above would suggest to use the API to update the columns/fields and ideally it shouldn’t fail the sync

Thanks!

On 1, are you suggesting to use custom dbt project for a connection/destination? I think we would like to have dbt running separate from Airbyte and orchestrated with a tool like Prefect. What are the pros and cons of letting Airbyte run the transformation? Not even sure that would be possible since we will use Databricks as lakehouse.

After reading the documentation here I realized that our destination (Azure blob storage) doesn’t support normalization…

I also think it might be possible to extract the source and/or destination schema with octavia cli. That would require us to transform the configuration.yaml to a dbt source yaml. Would be cool if octavia ci could do that for us.

We run both the custom DBT and normalisation over Airbyte platform you can go ahead and explore this option. Also on the octavia cli part feel free to create a github issue so that team can check and get back to you

1 Like