Summary
Issue with schema preservation during data migration from MSSQL to Postgres using Airbyte connector
Question
Hello community,
My AIM: To migrate all existing data from MSSQL to Postgres (keeping the schema unchanged)
I deployed Airbyte on a linux virtual machine using docker compose official documentation.
Then I created a connection between Source - MSSQL database & Destination - Postgres Database.
Both the mssql and postgres have the same tables and schema initially, the mssql database has all the data and the postgres has no data.
Then I started the sync and all the data got copied in postgres database but the schema got changed in the postgresql tables , for example - I had few columns of type ‘uuid’ which got changed to varchar after the migration. Many other columns also got changed to varchar.
I want to migrate all the data ensuring the schema in the destination is not changed at all.
Please help!!
Thanks
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["data-migration", "mssql", "postgres", "schema-preservation", "airbyte-connector"]
This behavior is fairly common to handle differences between platforms for certain data types.
Each Airbyte source has a data type mapping to an internal “Airbyte Type” which is then mapped out in the destination. So in your case you’d want to look at the <Postgres | Airbyte Documentation mapping for the MSSQL source> and then the <Postgres | Airbyte Documentation tables data type mapping for the Postgres destination>. Almost any type that isn’t explicitly listed on a source’s data type mapping (UUID in this case) will be treated as an Airbyte string
type as is noted on the source page:
> If you do not see a type in this list, assume that it is coerced into a string. We are happy to take feedback on preferred mappings.
It seems like Airbyte tries to keep a very minimal list of internal types to simplify the mappings between systems, so I would personally think it unlikely that they would introduce an internal type as specific as UUID . . . but they may be willing to discuss whether it should be treated as binary such (since a UUID is ultimately just a 128-bit binary value).
In this case I would personally expect to apply some sort of downstream transformation, or work off the raw tables to produce an appropriate field type.
While not ideal on very large tables/joins, it’s also worth noting that—as long as the column is indexed appropriately—treating UUIDs a strings is unlikely to produce a massive performance issue.