Data Size and Data Types Issues with MongoDB Source and Postgres Destination in Airbyte


Issues with data size growth in Postgres compared to MongoDB due to Airbyte creating 3 tables per collection, data types normalization problems like converting timestamps to text, and a workaround for hardcoded SSL requirement in the MongoDB source connector.


I’m trying to use airbyte with mongodb source and postgres destination. I’ve encountered few problems

  • data size dramatically grows. In my case it’s 9 times bigger in postgres than in mongo. of course part of that is mongo identifiers converted to text, but the main issue is that airbyte has 3 tables per mongo collection, each one holding practically the whole data. Maybe there are approaches that can mitigate that issue ?
    • data types normalization. for example mongo timestamps are converted to text. without dbt is there a way to convert/normalize during import ?
    • third problem I worked-around. mongo source has hardcoded ssl requirement, whereas my instance does not have it. I had to fork connector and make local changes, but at least that works.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["mongodb-source", "postgres-destination", "data-size", "data-types", "normalization", "ssl", "connector"]