Initial load of huge table (>500 GB) fails - Postgres Connector

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 16GB/100GB
  • Deployment: Docker
  • Airbyte Version: 0.40.22
  • Source name/version: Postgres/ 1.0.36
  • Destination name/version: Snowflake/ 0.4.40
  • Step: Initial Sync
  • Description:

We are trying since 11/01/2023 to sync one huge table (~500GB) from a Postgres Database into Snowflake using logical replication. We have not yet succeeded it as each time we are getting the same error at the initial sync step (you can find the stack trace logs from the attached file"

Investigation Notes

  1. All of the returned exceptions from debezium operations propagates to

io.debezium.relational.RelationalSnapshotChangeEventSource.rollbackTransaction(RelationalSnapshotChangeEventSource.java:545)

  1. We have upgraded our Postgres connector to the latest version and we are getting the same error.
  2. We have configured our connector’s memory limit to be 2.5GB.
  3. ALL of the sync attempts fail on the same amount of fetched data ~250GB.
  4. From the debezium issues we have found this issue which seems to be related with our exception message. (Loading...)
  5. The storage space (EBS) doesn’t affect the process, since it’s available at 75%.
    large_table_initial_sync_exception.txt (63.8 KB)

(We avoid uploading all the sync logs since they are too huge >200MB).

Hi @gianniskod, a few thoughts while I look into this-

  1. Have you seen this doc on scaling?
    https://docs.airbyte.com/operator-guides/scaling-airbyte/

  2. Could there be any type errors?

  3. Could you create a view of this table for a smaller test sync?

Hello @natalyjazzviolin and thanks for the fast response.

  1. Yes I have seen this doc, and that’s why I have tried to increase the Memory size using JOB_MAIN_CONTAINER_MEMORY_LIMIT but also by increasing the specs of the specific connection following your guideline in Configuring Connector Resources | Airbyte Documentation
  2. Do you mean there is an erroneous entry that produces an error on the schema validation? If that’s the case I haven’t seen such a log until now but I will investigate it further.
  3. We have synced a smaller version of that table on a staging database and it worked. Do you believe that by creating a scoped view on the specific table could provide us with more insight?

Hello,

I want to provide an update on this one in case anyone, may face a common problem in the future.

After a further investigation of our team, we found that the problem is most probably related to the running version of our Postgres Database in the Aurora cluster since we are using Postgres 10.21. More specifically by searching the writer’s logs of the Aurora cluster, we found that on the specific timings where Airbyte failed the database returned TOO_MANY_KEY_UPDATES log error, which led to the Airbyte’s connection interrupt

Apparently, this was not descriptive and after some extra research, we found this OpenSSL issue on Github. Since our Postgres version is 10.21 which was released in October 2017, it is very logical that the dependent version of OpenSSL contains the described issue.

Thank you so much for the update!