CockroachDB Source Connector Failures at >10M records

Information

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 16Gb / 1 Tb
  • Deployment: Docker
  • Airbyte Version: 0.39.5-alpha
  • Source name/version: CockroachDB 0.1.12
  • Destination name/version: Postgres 0.3.20
  • Step: During Sync
  • Description: Remove this with the description of your problem.

Use case

I am using the CockroachDB connector to transfer data to a postgres database, which I intend to switch to another instance once I have things up and running. The tables I want to sync look loosely like this:

root@:26257/core> SELECT * FROM core.crdb_internal.table_row_statistics ORDER BY 
estimated_row_count DESC;
table_id table_name estimated_row_count
120 triage_flow_events 57232826
69 triage_sessions 522372

Error

When importing the smallest table, everything works fine. However, when I try to import the largest table, either in conjunction with the other or by itself, the CockroachDB source connector breaks. Error logs: logs-17.txt (119.6 KB)

Questions

  1. How do I ameliorate this? Should I “just” increase the RAM size of the deployment?
  2. Is there a way to do multiple incremental syncs sequentially, so as to avoid this crash?

Thanks for any help :slight_smile:

What is the resource for the Airbyte instance (this was requested in the template…)
From error logs:

2022-05-31 08:56:53 e[44msourcee[0m > /airbyte/javabase.sh: line 26: 9 Killed /airbyte/bin/"$APPLICATION" "$@"2022-05-31 08:56:53 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):337 - Source has no more messages, closing connection.

Looks a OOM issue.

Thanks @marcosmarxm :pray:
The resources for the VM was initially 8Gb/1Tb, but I retried with 16Gb/1Tb to no avail. (also updated the original question).

Is there a way to decrease batch sizes or any other way to fix OOM issues?

The latest version of the connector should use a dynamic batch fetch system. Magnus is possible to you sync a smaller table to check if the problem is OOM?

I think you’re right that the problem is most likely OOM. The total size of the table is 18Gb, but all other tables (up to 1M records) sync without problems.
So basically I’m wondering whether I can change any configuration in Airbyte to make this sync instead of adding RAM resources to the machine it runs on?

I’ve retested running the 18Gb (57M rows) table through the CockroachDB Connector, which fills up to 49.7 Gb of RAM before it throws an OOM. Any advice on how to triage this issue and fix it? I find it strange that it should take more than 2x the table size in RAM, but maybe I have overlooked something?

Magnus probably there is a leak memory in the connector. Do you mind create a Github issue for further investigation? Should not be normal to use double memory to sync a table.

1 Like