Source-psql hitting oom without doing anything with the data and failing

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Amazon Linux 2 AMI
  • Memory / Disk: t3a.xlarge / 2TB
  • Deployment: docker compose
  • Airbyte Version: 0.35.65-alpha
  • Source name/version: Postgres 0.4.30
  • Destination name/version: snowflake 0.4.30
  • Step: Loading data from psql to snowflake
  • Description:
    Trimmed log attached, please advise :slight_smile:
    (max size is 8MB, mine was 9.40MB)

logs-209.txt (1.7 MB)

Hi @anton-epsagon,
What makes you think it’s an Out of memory error?
From your log I find the following error:

Connection org.postgresql.jdbc.PgConnection@1280851e marked as broken because of SQLSTATE(08006), ErrorCode(0)

I think the connection to your source database was lost after 3 hours.
Could you please check to what extent your source database is under heavy load while Airbyte is consuming its data? A lack of resources on the source database might be the reason of a connection reset.
Could you also try to tweak the JDBC URL params to customize timeouts? Such as socketTimeout=600&options=-c%20statement_timeout=5min
The list of available JDBC params is available here.

@alafanechere thanks for the response.
And not quite sure, maybe something I read somewhere else.

Trying the extra params, but I’ve got a table that’ll take a few days to move, so i’m guessing the values should be upped abit? :thinking:

Also, any tips on optimizing?
Both the psql and the EC2 running in the same AZ

Do you have any metrics on the Postgres instance size to evaluate the load it’s under while running the sync?


last 6h stats

The table is ~1.3TB

The postgress instance has been taken down from an 8xl to a db.r4.2xlarge (promoted read replica)

2022-07-06 11:14:57 source > 2022-07-06 11:14:57 WARN c.z.h.p.ProxyConnection(checkException):177 - HikariPool-2 - Connection org.postgresql.jdbc.PgConnection@5e840abf marked as broken because of SQLSTATE(08006), ErrorCode(0)

Still getting this even when using

socketTimeout=600&options=-c%20statement_timeout=5min

also tried

socketTimeout=6000&options=-c%20statement_timeout=30min

Gonna try 0 for both, see if that helps

I asked for more hints about this error possible workarounds from our database connector team. We’ll keep you posted. Let me know if some specific tweaks of the JDCB params are helping.

I was told by @tuliren that the timeout value is hardcoded on the Postgres connector and filling extra JDBC params related to timeout might not override this hardcoded value (1 minutes timeout → 60000ms).
I opened an issue here to make this a user-defined value. Please subscribe to the issue to receive updates on this topic.

I was also suggested another more probable root cause for this error:

  • You are reading a replica
  • While the Airbyte read query runs, the replica data changed
  • It makes the connector Postgres driver drop the connection

This is also an issue we’re currently working on solving. Please subscribe to receive updates.

@alafanechere that has previously been an issue, but this is a promoted replica

Just to confirm, according to what I read in GCP docs, a promoted replica is an instance that was a replica but is now a standalone instance not in sync anymore with the primary instance?

@alafanechere That’s correct, also we’re on AWS.
I’ve tried changing connectionTimeoutMs locally to 10 minutes and an hour.
The 10 mins one seemed to prolong the process, but it still fails, the 1 hours doesn’t seem to make much of a difference.

Do you think you could try to upsize your Postgres instance to give it more memory? :thinking:

It’s got 256GB right now.
I’ll have to move it up to a db.x1.32xlarge or something to accommodate a single partition, if that’s what’s limiting us…

Also not a ton of memory in use, despite the connection being at like 130GB
image

Do you know approximately the volume of data you’re trying to sync from postgres?

Around 1-2TB per table, 10 tables

Ok, so I think the volume is probably responsible for the failure… Could you please try to create a view on top of one of these tables with ~50GB of records and check if you still encounter this problem. Could you also please share what the memory looks like on the Airbyte container side while running a sync (with docker stats command)

New question, let’s say I need to set up a connections for a ~100 of tables, and for each connection I wanna bring only 4 of the 100. I don’t wanna do it manually, since there’s no working ‘uncheck all’ for the schema setup (EDIT: never mind, didn’t understand the UI).

Can I do it via API/CLI?
Breaking up the list into chunks of 4 shouldn’t be an issue

Hey yeah we have API https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/workspaces/create documentation here which you can use to setup the connection