Summary
Inquiring about the support for data normalization in Airbyte OSS version and whether it is an Enterprise-only feature.
Question
Does Airbyte OSS supports normalization of data? I have an API that outputs several fields as JSON which I wanted to flatten and normalize. I searched online and saw that this appears to be possible but I cannot find the option in OSS version.
Is this a Enterprise feature only?
Thanks
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["airbyte-oss", "data-normalization", "enterprise-feature", "api", "json"]
Previously there was a normalization option (which used a built-in dbt run), but this was removed. It was very flakey, and the dbt jobs were both slow and created challenges—so this was removed as part of Destinations V2 in favor of the simpler Typing & Deduping.
In Airbyte Cloud, there’s a dbt Cloud integration so you can trigger your jobs after a sync completes. But in OSS, you either need to use your own orchestration tools (e.g. Dagster, Airflow, etc.)—or your connector needs to handle mapping the fields to the top level (which is a bit manual, but in Builder you can do this through Transformations by pulling the fields out one at a time and then deleting the nested object).
Really depends on how you want to tackle the problem, but while it seems unintuitive I can assure you that normalization going away was actually a good thing
The Connector builder in the OSS version supports flattening the data, for example if an API returns a single block containing multiple records.
Yes, that works at the top level—but if there are records below that first level that you want pulled out to the top level (instead of being returned as nested JSON objects), you need to use Transformations to map them and then deleted the nested object (also with a Transformation).
So how people are doing flattening of multi level JSON objects? It means I need to build that outside of Airbyte using other tools and orchestration? That makes the solution a bit weak for such cases, specially ones that leverage complex APIs I guess.
<@U07LD7NUGG3> In my experience, the vast majority of people are already feeding Airbyte data to downstream modeling layers (e.g. dbt) in their pipelines. These are obviously purpose-built for the task, but Builder provides some light utility through Transformations for simple cases. They’re also working to add drop-in dbt models for common sources to get you to a sane flat structure without having to write it yourself.
I do think there are times where it makes sense not to add that complexity, so I personally hope they expand transformations such in Builder to be able to handle more complex operations without things getting too verbose (e.g. “take this whole object and expand it at the top level” vs. having to do it one field at a time). It’s much easier to do this in the CDK, where you’re really controlling your structure much more deeply by default.
I’d suggest you look and see if there’s a feature request in that vain and vote for it, and if not create one. Sometimes their team just needs to understand the common use cases that people have, and it’s a great way to contribute to the project and make it better!
What is Builder? Apologies if a basic question, but want to have a peak. My scenario is not the traditional BI ETL. I want to use Airbyte as a tool that just makes simple data movement from A to B without any business logic but it should handle some needs such as the eventuality of flattening structures in case required.
<https://docs.airbyte.com/connector-development/connector-builder-ui/overview|Connector Builder> is a no-code UI wrapper in the Airbyte UI (what it outputs is a low-code, YAML-based connector definition). You can also make low-code (declarative manifest) connectors directly, or using the Connector Development Kit (CDK) in your preferred language. Builder/Low-Code have several advantages in terms of maintainability and testing/authoring, so I’d recommend starting there.
Within Connector Builder, you have options to expand fields using https://docs.airbyte.com/connector-development/connector-builder-ui/record-processing#transformations|Transformations, but these generally have to be specified one field at a time. So if you have a nested object with 5 fields, you’d need to add 6 transformations to bring them to the top (5 adding the nested fields to the top level, plus 1 to delete the original nested copy). Again, not ideal, but not so bad. Just a pain when it’s 50 or 100 or you need those top-level fields to be dynamic across multiple connections (e.g. a CRM with custom properties).
Thanks for the explanation!