Setting up connection between S3 and SQL Server for data import

Summary

Exploring how to import S3 data into a SQL Server table with matching schema


Question

Is it possible to setup a connection between s3 and Sql Server where the s3 data is imported into my table structure that matches the s3 file schema? So far all I’ve been able to do is import the data into a table that airbyte created.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["s3", "sql-server", "data-import", "table-structure", "schema", "airbyte"]

In the Airbyte S3 source, the default option is to infer the schema from the source, per <S3 | Airbyte Documentation docs>:
> (Optional) If you want to enforce a specific schema, you can enter a Input schema. By default, this value is set to {} and will automatically infer the schema from the file(s) you are replicating.
(Only under when using the Schemaless option should it just drop everything under a nested object)

I have done this but I still get the same results. I don’t think I’m explaining myself well. When I run the sync, my csv file is imported into a table named airbyte_internal.dbo_raw_stream_test_csv , and my data is a JSON object in the _airbyte_datacol. What I want is to have the csv file imported into a table I have created named test and the data to be parsed into the column matching the csv column name. I have looked through the documentation and have not found a scenario like this yet.

so usually there’s a raw data table that is temporary which is such as you described. Then, a final table is generally created at the end of the run which types and dedupes the data as described here.

The only case in which you should ONLY have the raw table is if you’ve disabled normalization.

It looks like the MSSQL destination (which is a community connector) requires ALTER TABLE permissions for the Airbyte users. Per <MSSQL | Airbyte Documentation docs>:

Permissions
You need a user configured in SQL Server that can create tables and write rows. We highly recommend creating an Airbyte-specific user for this purpose. In order to allow for normalization, please grant ALTER permissions for the user configured.

There’s also this additional note <MSSQL | Airbyte Documentation the docs>:
> Normalization Requirements
> To sync with normalization you’ll need to use MS SQL Server of the following versions: SQL Server 2019, SQL Server 2017, SQL Server 2016, SQL Server 2014. The work of normalization on SQL Server 2012 and bellow are not guaranteed.

Good luck! We’re mostly writing to BigQuery, so I have limited experience with the MSSQL-specific destination, but it does follow similar patterns to other legacy destination types (before the work that’s been done for Destinations V2 for the Certified destination connectors).

(I’m guessing there will eventually be a certified MSSQL destination as well which will likely implement Destinations V2, but I’m not sure where that sit on the Airbyte team’s roadmap.)

In Destinations V2, Normalization is replaced with Typing and Deduping, which functions a bit differently

I’m super new to this and just trying to figure stuff out. Thanks for the help. I know I’ve seen the normalization option, but I can’t find it now. Is it only available during the setup of the destination?

Though the documentation of the mssql connector says it supports normalization, looks like the support was removed in version 1.0.0. Now I have to figure out how to use an older version of the connector, or add that feature.

Interesting, hadn’t seen that before. I’m guessing this is because they’ve been removing normalization from the others, and probably wanted to pull that code entirely. But this one doesn’t have Destinations V2, so there’s nothing to replace it.

Keep in mind, you can query the JSON values directly in SQL, using something like:
SELECT JSON_VALUE(_airbyte_data,'$.yourColName') AS yourColName

. . . or run this through a modeling tool to produce the output table. But I know that’s another step as well.

Someone from the Airbyte team may be able to chime in with plans on the MSSQL destination, but you might also want to check the open Issues on GitHub as this discussion may already be happening there.

I was reading a bit about it and the MSSQL connector is not actually certified, it’s a community connector. The discussion on the PR where they removed normalization says there’s no immediate plan to address that. It has something to do with V2, but I don’t yet understand it.