Automatic Schema Propagation in Airbyte Builder for API Source

Summary

When a new field is added in the source API, how to enable automatic schema propagation in Airbyte Builder for the API source stream?


Question

Hi all, I build a custom source via Airbyte Builder that fetches data from API. The stream is created with detect automatic schema, for example (a,b,c), and it publishes to the source, now the connection is running fine for some time
After some days, there is a new field added in the source API. Do we need to every time Publish a new Release for this stream via Builder with new Schema?
How I can enable the propagate all schema changes automatically from API to Source publish via builder?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["automatic-schema-propagation", "airbyte-builder", "api-source", "schema-changes"]

Those settings apply to sources with dynamic schemas, like our database sources - ones which upon discover will return the latest schema from the source. Most api sources and all builder sources have source-defined schemas, which means they output a static catalog that is versioned with the source version.

This isn’t supported in the builder yet, as we’re mostly focused on the static api use case at the moment

<@U047ANT3J84> Is there any way I can do it programmatically via code or via airbyte API?

Thanks for the input <@U035912NS77>! Would you be willing to write up a feature request github issue? It’d be good to have this somewhere. I’ll admit I’m not sure how big of a lift it would be.

<@U052BHR0NJY> You can do this programmatically with a non-builder custom connector. I’d recommend looking at the discover methods of some connectors with similar behavior, like https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-airtable/source_airtable/source.py#L69|airtable

<@U047ANT3J84> Happy to, and don’t hesitate to reach out if you or anyone else have any questions related to this. We’ve built a lot of connectors and are trying to move as many possible to builder/low-code to improve maintainability, so this is definitely a pain point. I’d love to be able to contribute more of these to the community as well if we can make them not only fit our client’s specific fields!

Feature issue is here if you want to take a look or share it with anyone else:
https://github.com/airbytehq/airbyte/discussions/37779

Thank you - pulling it into my team to discuss!

> We’ve built a lot of connectors and are trying to move as many possible to builder/low-code to improve maintainability
Sounds great - we are too! Let’s keep in touch

Hi, <@U035912NS77>. Thanks for opening this discussion. I think you just said my words. I have the same issue while using a CRM platform with multiple random custom fields created. This connection should be replicated to 1 to 100s of different clients. We will not able to track when what custom fields changed by which client/connection. For us, it is very important to have these dynamic custom field changes integrates automatically

If I am not wrong, Airbyte still do not support auto evolution schema. The connector must know the schema before hand.

So every time that your source adds a new column, it will be necessary to update the source schema, release a new version and then refresh the schema on each connection that uses this source.

Usually Airbyte will suggest to reset the connection to avoid issues.

Yes, that should be the case. <@U052BHR0NJY> do you expect your upstream source to change often? Usually we are working with pretty stable APIs for API sources and builder sources especially.

Is it something closer to airtable where its more of a db-via-api? Outside of the builder we tend to do this more dynamically

Hi <@U047ANT3J84>, yes, I am expecting my upstream source to change very often randomly, and we cannot do it manually everytime. Is there a way that Airbyte automatically gets the changes?
I expect these settings in the connection under the Replication Tab to do the work for me.
Detect and propagate schema changes : Propagate all changes
If not, then what do these settings mean.

<@U047ANT3J84> We do run into this as quite often as well with vertical-specific (nonprofit) CRMs and similar that often return things like custom fields at the top level of standard endpoints (similar to how Salesforce does).

This means that different organizations using the same platform can return different schema results for the same API.

So I think there’s definitely a value in being able to configure in Builder whether each endpoint should allow for schema evolution of any time (potentially just bubbling those selections up to the Connection level, basically just letting the user indicate which schema options should be available).

In most cases these are all simple fields, and that could even be a limitation specified. But would help to avoid having to create multiple copies of a connector for different organizations returning different columns from the same endpoint.