Deploy EC2: Slow runtime on t3.micro

Hi we tried installing airbyte on an ec2 instance : (t3.micro) - basis the documentation. the installation goes smoothly anf for the first 5 mins we are able to add sources / destiations. however post that everything becomes super slow and freezes. we cant even ssh into the instance. and need to restart. which also allows repeats the same issue after 5 mins. any suggestion / advice where we are going wrong ? the same worked perfectly on docker desktop in my mac.

–>when i monitor cpu % , its still below the 50%.

is there a minimum instance type requirement ? without which this wont work ?

Hi @slunia, a t3.micro is not recommended. According to our documentation: we recommend a t2.medium for testing or t2.large for production.

https://docs.airbyte.com/deploying-airbyte/on-aws-ec2/#:~:text=For%20testing%20out%20Airbyte%2C%20a%20t2.medium%20instance%20is%20likely%20sufficient.

I am on a t2.xlarge at 50% CPU and having the same issues.

I am running an incremental+deduped backfill for 24 hours on Postgres > Snowflake as well. Tried tweaking the workers in .env but hasn’t seemed to help.

Any ideas on what else I can do to speed up that runtime?

I read the issues about FetchSize (Investigate the performance bottleneck of source database connectors · Issue #12532 · airbytehq/airbyte · GitHub), seems like if that defaulted to being dynamic but could be overwritten to a set value we might be able to help.

I don’t know the tradeoffs however it looks like these long-running (now failed) streams are spending most of the time doing:
2022-08-25 03:22:24 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):329 - Records read: 6909000 (2 GB)

As a follow-up I have noticed that per Airbyte’s recommendation, parallelizing your connections to one connection per source table helps with overall performance (even fetching), though it does utilize more CPU.

Hey @lucaswiley,

Apologies for the delay, thanks for following up with what worked for you. We’re definitely still trying to improve our database connectors. Our general recommendation is to parallelize large syncs as different tables can impact performance disproportionately. Feel free to post more questions in our forums here!

We are testing on an ec2 t2.medium, and it freezes every time after about 10 minutes, when running a simple import of three tables from Salesforce into Redshift.

Are there better guidelines available at this time? Does anyone have advice on what capacity and what kind of EC2 instance would work with a simple import?

It’s not exactly straightforward to increase the capacity of an EC2 instance. This is a promising tool but if we cannot ascertain its minimum requirements it’s going to be difficult to convince the team to continue putting resources into this. Thanks for any help!