Recommended way to sync PG tables of varying sizes to BigQuery in Airbyte

Summary

When syncing PostgreSQL tables to BigQuery in Airbyte, should you use a single connection for all tables or multiple connections to split the load? Seeking rationale behind the suggestion.


Question

Hi, I’m trying to connect PG to bigQuery using Airbyte. I have a lot of tables of varying sizes in my PG database which I’d like to sync. Question is what is the recommended way to do this? single connection for all of it, or multiple connections to split the load? I did read through https://discuss.airbyte.io/t/does-running-multiple-connections-to-the-same-data-source-help-with-parallelisation-performance/1863|this. but it is outdated. If the suggestion still holds, id like to understand the rationale behind that suggestion (not high level, but logically why)



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["pg", "bigquery", "sync", "postgresql", "tables", "multiple-connections", "single-connection", "performance"]

hi <@U07MN2AN85S>, yes the post still holds. Later this year, we do plan to have concurrency support that allows syncing multiple tables at the same time in the same connection.

If you have multiple connections, each connection will result in a new container running our PG source connector and the BigQuery destination connector.

If you only use one connection, we will serialize the sync by doing each table one by one using a single source and destination connectors

Thanks for the response <@U073KSQ6Z53> that perfectly makes sense.
Just a small follow up!

I understand that having multiple connections at the moment would give true parallelism (Which is the advanatage of goign the multiple connections route). But need your help to understand if there is any other benifit?

Does it ALSO help in any way by

  1. Reducing the pressure/load on RDS?
  2. Reducing the load on Airbyte?
    And 3. Is there any upperbound to the number of connections we can have on Airbyte?

My guess is

  1. Not really (its still same work done)
  2. Not really (its still same work done)
  3. there is (but the number?)
    Please validate this

maybe one other benefit could be localised issues when a connection breaks down

Hey <@U073KSQ6Z53> thanks for taking time to help me out, though my follow up is a low priority, it would really help me get the full picture. eagerly awaiting your response, thanks again :")

hi <@U07MN2AN85S>, your guesses are correct (3, 4). on 5, there is no limit, but we plan to add a limit soon, maybe in the next quarter.

Awesomee and noted! Thanks <@U073KSQ6Z53> for getting back!!!