Questions about upgrading connectors in self-hosted Airbyte

Summary

User is seeking clarification on the impact of upgrading from the old Databricks connector (v1.1.0) to version 3.x in a self-hosted Airbyte environment. They are specifically concerned about the behavior of the Managed tables Data Source option and the Amazon S3 data source option during the upgrade process.


Question

Hey! We’re running self hosted airbyte and currently using the old databricks connector (v1.1.0). I’m not 100% clear on what happens when I update to 3.x.

When I use the [Recommended] Managed tables Data Source option the airbyte job doesn’t actually create a table but just gives me the raw data in an _airbyte_data column. Will that change with 3.x?

When I use Amazon S3 as the data source option I do get tables with schema, but I’m not sure what happens to them on upgrade. From the docs it sounds like I have to drop and recreate but will they have the same names or what happens?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["self-hosted", "airbyte", "databricks-connector", "upgrade", "managed-tables", "amazon-s3", "data-source", "upgrade-process"]

<@U02NHDB84H5> can you help provide clarity?

for sure! in general, we’re bringing databricks in line with other destinations (the <https://docs.airbyte.com/release_notes/upgrading_to_destinations_v2/|“dv2” featureset>). But to answer your specific questions:
• 3.x will create tables with actual columns (as well as supporting deduping). If you were relying on the _airbyte_data column, that’s still available in the raw tables, though it’s less recommended now
• I would expect table/column names to stay the same (possibly modulo some small changes around special characters), but that’s honestly something that we’re looking for feedback on. I.e. were there significant unexpected changes, is anything better/worse than before, etc.
◦ data types are probably a bit different though, and again - we’re looking for feedback on this

super helpful, thanks! i’ll start with the Amazon S3 path and report back when we upgrade