Airbyte ingestion slower after connector upgrade to 1.X

lucienfregosi · August 30, 2022, 8:42am

Is this your first time deploying Airbyte?: No
OS Version / Instance: Ubuntu
Memory / Disk: 4Gb
Deployment: Kubernetes
Airbyte Version: 0.40.2
Source name/version: Postgres 1.0.4
Destination name/version: GCS 0.2.10
Step: Sync
Description:

Since I Made the Postgres Connector update from 0.4.X to 1.X it seems that the ingestion is way more slower

From around 15h to 38h for the same data source.
Do you have an idea?

430.65 GB | 528,557,241 emitted records | 528,557,241 committed records | 38h 18m 26s | Sync

Do you expect an ingestion that long ?

natalyjazzviolin · August 30, 2022, 1:31pm

Hi @lucienfregosi, there is an issue open regarding performance bottlenecks like this:
https://github.com/airbytehq/airbyte/issues/12532

Do you have multiple tables? You could try isolating each stream into a separate sync job to help performance. Would you be able to share the sync log so I can see where the bottleneck might be happening? Are there any other performance metrics you could share that we could use as a benchmark?

lucienfregosi · August 31, 2022, 8:15am

Hi

I did isolate each table but it’s still very long…

I don’t understand why it’s now more than 2* slower after the connector upgrade …

I will share a log file hope you will find something.

Other metrics that I have are the memory/CPU used by the sync pod (which are very low)

natalyjazzviolin · September 1, 2022, 6:04pm

Thank you for the logs. I’m waiting on some input from the team and should have some more information for you after the holiday weekend!

lucienfregosi · September 6, 2022, 8:23am

Any update @natalyjazzviolin ?

natalyjazzviolin · September 6, 2022, 12:51pm

Yes:

Could you try allocating more memory to the connection?
https://docs.airbyte.com/operator-guides/configuring-connector-resources/#configuring-connection-specific-requirements
Could you share logs for an individual sync? I see that you’ve combined logs from different syncs in one file, and that is difficult to decipher.

I spoke to the engineering team and their opinion is that an upgrade from 0.40.2 to 1.0.4 probably included many changes that traded speed for stability.

lucienfregosi · September 8, 2022, 8:04am

hi @natalyjazzviolin

i didn’t see a difference with allocating more memory like the link you shared.

Oh sorry for the log I will share one which is clean

Thanks for your help

natalyjazzviolin · September 8, 2022, 1:06pm

Could you specify how much more memory you allocated? Thank you for the log!

lucienfregosi · September 9, 2022, 8:03am

Hi @natalyjazzviolin

I allocated the memory with this command
update connection set resource_requirements = '{"cpu_limit": "1", "cpu_request": "1", "memory_limit": "4Gi", "memory_request": "4Gi"}' where id = '5dc1787f-cfc0-4411-a2c6-4423d0ddff29';

Let me know

natalyjazzviolin · September 9, 2022, 2:04pm

Got it, thanks. I’m waiting for more input from the engineering team and hope to have some more thoughts for you early next week!

natalyjazzviolin · September 9, 2022, 5:43pm

Got feedback from the engineering team: the throughput of 2K-3K rows per second that you are getting is within the normal range of our database connector. Unfortunately there’s no current or obvious way to speed up the connection.

lucienfregosi · September 12, 2022, 7:55am

Ok tansks for your answer

lucienfregosi · September 28, 2022, 10:01am

Hi I’m reopening this thread because now after an upgrade to 1.10 the throughput is even lower

3.69 GB4,761,489 emitted records4,761,489 committed records2h 54m 27sSync

3 hours for 3GB … For us It starts to be a blocking point we will have to consider other options sadly

natalyjazzviolin · September 28, 2022, 1:33pm

Sorry to hear that! I’ve escalated this to GitHub:
https://github.com/airbytehq/airbyte/issues/17321

I’ve triaged it and will ask for more input from the engineering team on this.

lucienfregosi · September 29, 2022, 8:16am

Thanks @natalyjazzviolin

Topic		Replies	Views
How to speed up airbyte Jobs Connector Questions & Issues source-postgres , destination-gcs	6	2755	July 18, 2022
Optimized Postgres source connector performance Connector Questions & Issues source-postgres , data-loading , connectors	2	709	July 22, 2022
Stream is ignored after a connector update Connector Questions & Issues source-postgres	10	674	September 6, 2022
Initial load of huge table (>500 GB) fails - Postgres Connector Connector Questions & Issues source-postgres , data-loading , connectors	4	942	January 23, 2023
Issue with Postgres connector after accidental upgrade in Airbyte Connector Questions airbyte , connector , question , postgres-connector , sync	0	69	May 16, 2024

Airbyte ingestion slower after connector upgrade to 1.X

Related topics