Summary
Issues with data size growth in Postgres compared to MongoDB due to Airbyte creating 3 tables per collection, data types normalization problems like converting timestamps to text, and a workaround for hardcoded SSL requirement in the MongoDB source connector.
Question
I’m trying to use airbyte with mongodb source and postgres destination. I’ve encountered few problems
- data size dramatically grows. In my case it’s 9 times bigger in postgres than in mongo. of course part of that is mongo identifiers converted to text, but the main issue is that airbyte has 3 tables per mongo collection, each one holding practically the whole data. Maybe there are approaches that can mitigate that issue ?
• data types normalization. for example mongo timestamps are converted to text. without dbt is there a way to convert/normalize during import ?
• third problem I worked-around. mongo source has hardcoded ssl requirement, whereas my instance does not have it. I had to fork connector and make local changes, but at least that works.
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.