Loading large tables from Clickhouse using Airbyte JDBC connector

Summary

How to chunk up full + incremental load in Airbyte connector for Clickhouse to avoid out of memory errors


Question

Anyone have experience loading large tables using Airbyte? I’m specifically loading from Clickhouse and the way the JDBC connector works is it runs
> SELECT col1, col2, col3 FROM table ORDER BY time ASC
but this causes an out of memory error DB::Exception: Memory limit (total) exceeded

Anyway to tell the Airbyte connector to chunk up its full + incremental load?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["loading-large-tables", "clickhouse", "airbyte-jdbc-connector", "chunking", "incremental-load"]

I haven’t dealt with Clickhouse (so I’m not familiar with the Airbyte config for it), but I’m wondering if you could work around the memory load of the ORDER BY with specifying a sort index on the table (which should eliminate or greatly reduce the memory cost of the sort). I imagine it would also break up the internal operations if the table were partitioned as well. So may be some potential workarounds if you can’t find Airbyte-native solutions.

Good idea <@U035912NS77> , I’ll try that route as well. Thanks and happy 4th!