Ideas for sync jobs speed up

Hey there.

I’m evaluating usage of Airbyte. So far I configured PG source and Snowflake destination.

I have my everything deployed to the k8s using public k8s guide.

After running sample job I’m getting following speed: “23.31 MB | 44,800 emitted records | 44,800 committed records | 8m 8s | Sync”

So it’s around 3-4MB per minuted. How would I make my jobs faster? I followed some advice here Scaling Airbyte | Airbyte Documentation but it didn’t really do much for me. Any tips on how to tune airbyte especially on k8s?

Hey!
I think you first need to identify the bottleneck.
I gave some insights on a pretty similar question here. The fetchsize could be the bottleneck on the source size.
I could also suggest you try the E2E testing source and destination connector to check the best throughput you could achieve on the destination and on the source side.

Any tips on how to tune airbyte especially on k8s?

If you realize your bottleneck is resource-related you can change the following environment variables to give more resources to pod running sync:

JOB_MAIN_CONTAINER_CPU_REQUEST=JOB_MAIN_CONTAINER_CPU_LIMIT=JOB_MAIN_CONTAINER_MEMORY_REQUEST=JOB_MAIN_CONTAINER_MEMORY_LIMIT=

Thank you for getting back to me on this.

I’ve read though the linked issues on fetchSize. It’s not possible to configure it yet, right?

Anton

I’ve read though the linked issues on fetchSize. It’s not possible to configure it yet, right?

No, it’s not possible but it should now be more clever to dynamically set a fetchSize according to the volume of your records.

@podviaznikov, based on my experiments, the dynamic fetchSize won’t provide much performance boost. It is primarily meant to deal with out-of-memory issues.

Have you selected normalization on the Snowflake connector? 44K records in 8 minutes is definitely too slow. Based on our internal benchmark, the expected velocity is 2K-7K per second. I suspect that because you are only syncing a very small dataset, the normalization overhead takes a significant amount of time, if you do have normalization activated. If that’s the case, you will likely see a better velocity, without changing anything, when you sync a larger dataset (e.g. 5GB).