Ideas for sync jobs speed up

podviaznikov · May 20, 2022, 4:54pm

Hey there.

I’m evaluating usage of Airbyte. So far I configured PG source and Snowflake destination.

I have my everything deployed to the k8s using public k8s guide.

After running sample job I’m getting following speed: “23.31 MB | 44,800 emitted records | 44,800 committed records | 8m 8s | Sync”

So it’s around 3-4MB per minuted. How would I make my jobs faster? I followed some advice here Scaling Airbyte | Airbyte Documentation but it didn’t really do much for me. Any tips on how to tune airbyte especially on k8s?

alafanechere · May 20, 2022, 8:02pm

Hey!
I think you first need to identify the bottleneck.
I gave some insights on a pretty similar question here. The fetchsize could be the bottleneck on the source size.
I could also suggest you try the E2E testing source and destination connector to check the best throughput you could achieve on the destination and on the source side.

Any tips on how to tune airbyte especially on k8s?

If you realize your bottleneck is resource-related you can change the following environment variables to give more resources to pod running sync:

JOB_MAIN_CONTAINER_CPU_REQUEST=JOB_MAIN_CONTAINER_CPU_LIMIT=JOB_MAIN_CONTAINER_MEMORY_REQUEST=JOB_MAIN_CONTAINER_MEMORY_LIMIT=

podviaznikov · May 23, 2022, 7:56am

Thank you for getting back to me on this.

I’ve read though the linked issues on fetchSize. It’s not possible to configure it yet, right?

Anton

alafanechere · May 23, 2022, 1:27pm

I’ve read though the linked issues on fetchSize. It’s not possible to configure it yet, right?

No, it’s not possible but it should now be more clever to dynamically set a fetchSize according to the volume of your records.

tuliren · May 25, 2022, 9:47pm

@podviaznikov, based on my experiments, the dynamic fetchSize won’t provide much performance boost. It is primarily meant to deal with out-of-memory issues.

Have you selected normalization on the Snowflake connector? 44K records in 8 minutes is definitely too slow. Based on our internal benchmark, the expected velocity is 2K-7K per second. I suspect that because you are only syncing a very small dataset, the normalization overhead takes a significant amount of time, if you do have normalization activated. If that’s the case, you will likely see a better velocity, without changing anything, when you sync a larger dataset (e.g. 5GB).

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
Sync performance Connector Questions & Issues data-loading	6	2394	July 6, 2022
How to speed up airbyte Jobs Connector Questions & Issues source-postgres , destination-gcs	6	2738	July 18, 2022
Posthog- Sync Speed can't catch up with source emition speed Connector Questions & Issues source-posthog , connectors	5	197	January 17, 2023
Optimized Postgres source connector performance Connector Questions & Issues source-postgres , data-loading , connectors	2	707	July 22, 2022
Airbyte ingestion slower after connector upgrade to 1.X Connector Questions & Issues source-postgres	14	994	September 29, 2022

Ideas for sync jobs speed up

Related topics