Optimizing initial data load performance in Airbyte OSS from MySQL to Postgres


The user is experiencing slow performance during the initial data load process from MySQL to Postgres in Airbyte OSS. The issue is related to the insert-select operation taking a long time for tables with millions of rows. The user is seeking advice on optimizing the initial load for speed.


hi all,
i’m trying to determine if the performance we are seeing is typical or if i need to investigate.
we have setup airbyte oss running on a ec2 instance with 16vcpu / 128GB, with the intention of sync data from mysql to postgres.
have setup some initial tests using incremental append + dedupe. We are not using cdc.
the issue we are seeing is after the initial load of data from mysql in to the airbyte_internal schema’s raw tables, airbyte then tries to insert-select the data, expanding _airbyte_data jsonb into the target schema table, in one single enermous transaction. The issue is that at first load, for a table with 8m rows, the insert has been running for over 2h30m.
I have several dozen tables to sync, some with 80m+ rows.
How can i optimze the inital load for speed?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["optimizing", "initial-load", "performance", "airbyte-oss", "mysql", "postgres", "insert-select"]