Summary
The user is experiencing slow performance and a high number of small files generated in S3 destination while ingesting data from MySQL to S3 based on CDC. They are using Airbyte and have configured a block size of 128mb for the S3 destination. The largest table contains around 1GB per day. They are seeking advice on identifying the cause of this behavior.
Question
Hello! We have an ingestion from MySQL (v3.3.13
) to S3 (v0.5.9
) based on CDC and we’re struggling to get a decent performance - for reference: a job extracting 820mb took 2 hours to complete! :time-zone-fyi:
We’ve also realised that Airbyte is creating an abnormally high number of files - order of a few kb - even if the configured block size for the S3 destination is 128 mb.
We’re ingesting around 20 tables with variable size, the biggest contains around 1GB per day.
Do you know what might be the culprit for this behaviour?
Thanks!
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.