Optimizing data loading from AWS S3 to Teradata using custom connector in Java and Kotlin

slack-user-airbyte · June 21, 2024, 6:11am

Summary

The user is looking to optimize data loading from AWS S3 to Teradata using a custom connector in Java and Kotlin. They want to modify the batch size and implement additional tweaks to improve performance.

Question

hello, I am coding my own connector in Java and Kotlin. Fetching data from a S3 butket (AWS) to Teradata. I have like 36gb of data to load, the data contains an id, a json object and a date (totals 162M records). This is taking too much time; 7+ h. I want to optimize that. How can I modify the batch_size? the default value is 25MB it seems.
Can you suggest me additional tweaks to optimize the performance please.

Info: I am using the com.teradata.jdbc:terajdbc4:17.20.00.12 driver.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

_{["optimize-performance", "aws-s3", "teradata", "java", "kotlin", "custom-connector", "batch-size", "terajdbc4"]}

slack-user-airbyte · June 23, 2024, 6:21am

Just to be sure, are you using source-s3 connector and writing your own terradata connector?

I might be wrong but I think batch size is based on 10_000 records https://github.com/airbytehq/airbyte/blob/ea68817cf96976d54ca041551b4c409683868eab/airbyte-cdk/python/airbyte_cdk/sources/file_based/stream/cursor/default_file_based_cursor.py#L17|default_file_based_cursor.py#L17

Depending on how you have your data organized, maybe it would be possible to configure multiple connections, each per S3 prefix, so you could synchronize data in parallel

slack-user-airbyte · June 23, 2024, 6:21am

I’m just asking because there is destination-teradata connector https://docs.airbyte.com/integrations/destinations/teradata

Topic		Replies	Views
Optimizing Connector Performance for S3 to Teradata Connector Questions connector , s3 , java , teradata , batch-size	0	5	November 28, 2024
SQL Server big table to S3 fails Connector Questions & Issues source-microsoft-sql-server-mssql , destination-s3 , data-loading	1	247	October 24, 2022
Failed to fetch streams from S3 Connector Connector Questions & Issues connectors	2	260	July 14, 2022
Improving Throughput for Destination Connectors in Teradata Java Connector Connector Questions connector , question , improve-throughput , destination-connectors , teradata-java-connector	0	3	August 16, 2024
Slow data loading and sequential loading in Airbyte from S3 to Postgres Connector Questions connector , question , s3-connector , postgres-connector , slow-data-loading	1	13	September 24, 2024

Optimizing data loading from AWS S3 to Teradata using custom connector in Java and Kotlin

Summary

Question

Related topics