Airbyte ingestion slower after connector upgrade to 1.X

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 4Gb
  • Deployment: Kubernetes
  • Airbyte Version: 0.40.2
  • Source name/version: Postgres 1.0.4
  • Destination name/version: GCS 0.2.10
  • Step: Sync
  • Description:

Since I Made the Postgres Connector update from 0.4.X to 1.X it seems that the ingestion is way more slower :roll_eyes:

From around 15h to 38h for the same data source.
Do you have an idea?

430.65 GB | 528,557,241 emitted records | 528,557,241 committed records | 38h 18m 26s | Sync

Do you expect an ingestion that long ?

Hi @lucienfregosi, there is an issue open regarding performance bottlenecks like this:
https://github.com/airbytehq/airbyte/issues/12532

Do you have multiple tables? You could try isolating each stream into a separate sync job to help performance. Would you be able to share the sync log so I can see where the bottleneck might be happening? Are there any other performance metrics you could share that we could use as a benchmark?

Hi

I did isolate each table but it’s still very long…

I don’t understand why it’s now more than 2* slower after the connector upgrade …

I will share a log file hope you will find something.

Other metrics that I have are the memory/CPU used by the sync pod (which are very low)

Thank you for the logs. I’m waiting on some input from the team and should have some more information for you after the holiday weekend!

1 Like

Any update @natalyjazzviolin ? :slight_smile:

Yes:

  1. Could you try allocating more memory to the connection?
    https://docs.airbyte.com/operator-guides/configuring-connector-resources/#configuring-connection-specific-requirements

  2. Could you share logs for an individual sync? I see that you’ve combined logs from different syncs in one file, and that is difficult to decipher.

I spoke to the engineering team and their opinion is that an upgrade from 0.40.2 to 1.0.4 probably included many changes that traded speed for stability.

hi @natalyjazzviolin

i didn’t see a difference with allocating more memory like the link you shared.

Oh sorry for the log I will share one which is clean

Thanks for your help

Could you specify how much more memory you allocated? Thank you for the log!

Hi @natalyjazzviolin

I allocated the memory with this command
update connection set resource_requirements = '{"cpu_limit": "1", "cpu_request": "1", "memory_limit": "4Gi", "memory_request": "4Gi"}' where id = '5dc1787f-cfc0-4411-a2c6-4423d0ddff29';

Let me know

Got it, thanks. I’m waiting for more input from the engineering team and hope to have some more thoughts for you early next week!

Got feedback from the engineering team: the throughput of 2K-3K rows per second that you are getting is within the normal range of our database connector. Unfortunately there’s no current or obvious way to speed up the connection.

Ok tansks for your answer :slight_smile:

Hi I’m reopening this thread because now after an upgrade to 1.10 the throughput is even lower

3.69 GB4,761,489 emitted records4,761,489 committed records2h 54m 27sSync

3 hours for 3GB … For us It starts to be a blocking point we will have to consider other options sadly

Sorry to hear that! I’ve escalated this to GitHub:
https://github.com/airbytehq/airbyte/issues/17321

I’ve triaged it and will ask for more input from the engineering team on this.

Thanks @natalyjazzviolin