Slow data processing on Airbyte cloud trial vs self-hosted OSS

Summary

Data processing seems slow on Airbyte cloud trial compared to self-hosted OSS. User is experiencing long runtimes for Google Analytics import and MySQL database replication. Seeking insights on potential reasons for the slowness.


Question

Hey friends, I’m presently kicking the tires on an Airbyte cloud trial and noticing that the data processing seems a bit slow. Does anyone know if this is a limitation of a trial, or just cloud infrastructure? Would things be faster on a self hosted Airbyte OSS?

For example I have a Google Analytics import that’s been running for 19 hours. I had a similar experience trying to replicate a MySQL database. Obviously the initial import is expected to be big but I’d expect faster. Can anyone provide some insights?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["slow-data-processing", "airbyte-cloud", "self-hosted-oss", "google-analytics-import", "mysql-database-replication"]

Hi Brian :wave:

What destination are you syncing to?

For performance, there are a variety of factors that impact overall throughput for syncs.

For APIs rate limiting can often be the overarching factor.

For databases, connectivity method can have a big effect. So if you have to use an SSH tunnel on Cloud, it could be more performant to self-host and avoid an intermediary hop.

Lastly, at the platform-level, Airbyte allocates a set amount of CPU and Memory that a jobs can utilize for replication. For Cloud there is a default configuration for trials. For Self-managed instances running on your own infra, you’ll be able to “juice” the amount of CPU and Memory a job can request.

Hey <@U03AM8G2WHG> thanks for the info. This is all syncing into a BigQuery dataset. Is there an ability to “juice” a non trial cloud account?