CockroachDB Source Connector Failures at >10M records

Information

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 16Gb / 1 Tb
  • Deployment: Docker
  • Airbyte Version: 0.39.5-alpha
  • Source name/version: CockroachDB 0.1.12
  • Destination name/version: Postgres 0.3.20
  • Step: During Sync
  • Description: Remove this with the description of your problem.

Use case

I am using the CockroachDB connector to transfer data to a postgres database, which I intend to switch to another instance once I have things up and running. The tables I want to sync look loosely like this:

root@:26257/core> SELECT * FROM core.crdb_internal.table_row_statistics ORDER BY 
estimated_row_count DESC;
table_id table_name estimated_row_count
120 triage_flow_events 57232826
69 triage_sessions 522372

Error

When importing the smallest table, everything works fine. However, when I try to import the largest table, either in conjunction with the other or by itself, the CockroachDB source connector breaks. Error logs: logs-17.txt (119.6 KB)

Questions

  1. How do I ameliorate this? Should I “just” increase the RAM size of the deployment?
  2. Is there a way to do multiple incremental syncs sequentially, so as to avoid this crash?

Thanks for any help :slight_smile:

What is the resource for the Airbyte instance (this was requested in the template…)
From error logs:

2022-05-31 08:56:53 e[44msourcee[0m > /airbyte/javabase.sh: line 26: 9 Killed /airbyte/bin/"$APPLICATION" "$@"2022-05-31 08:56:53 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):337 - Source has no more messages, closing connection.

Looks a OOM issue.

Thanks @marcosmarxm :pray:
The resources for the VM was initially 8Gb/1Tb, but I retried with 16Gb/1Tb to no avail. (also updated the original question).

Is there a way to decrease batch sizes or any other way to fix OOM issues?

The latest version of the connector should use a dynamic batch fetch system. Magnus is possible to you sync a smaller table to check if the problem is OOM?

I think you’re right that the problem is most likely OOM. The total size of the table is 18Gb, but all other tables (up to 1M records) sync without problems.
So basically I’m wondering whether I can change any configuration in Airbyte to make this sync instead of adding RAM resources to the machine it runs on?

I’ve retested running the 18Gb (57M rows) table through the CockroachDB Connector, which fills up to 49.7 Gb of RAM before it throws an OOM. Any advice on how to triage this issue and fix it? I find it strange that it should take more than 2x the table size in RAM, but maybe I have overlooked something?

Magnus probably there is a leak memory in the connector. Do you mind create a Github issue for further investigation? Should not be normal to use double memory to sync a table.

1 Like

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.