Optimizing Data Pipeline Ingestion

slack-user-airbyte · May 14, 2024, 6:03pm

Summary

Determining the impact of data volume versus number of tables on data pipeline ingestion performance.

Question

Hello,

We are ingesting data from MySQL db in AWS to BigQuery and we had some pipelines failing. After splitting the pipelines into smaller pipelines with less tables, all pipelines worked fine. Therefore my question is:

• What impacts more a data pipeline ingestion — amount of data or number of tables ? i.e. when we have a pipeline of 50 tables failing, should we split into two pipelines with 25 tables or should we split so that each pipeline ingest same amount of data with regards to tables ?

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

_{["data-pipeline-ingestion", "mysql-db", "aws", "bigquery", "pipeline-optimization"]}

Topic		Replies	Views
Issue with committing data during MySQL to PostgreSQL ingestion Connector Questions mysql , cpu , memory , connector , data-ingestion	0	2	July 12, 2024
Failing to ingest a "big" MySQL table (38Gb) Connector Questions & Issues data-loading	20	3219	June 24, 2022
Memory usage issue when ingesting data from MySQL to PostgreSQL Connector Questions mysql , connector , question , postgresql , data-ingestion	1	6	July 10, 2024
MySQL ingest huge amount of data Connector Questions & Issues data-loading	3	573	July 14, 2022
Data ingestion issue from Amazon Ad to BigQuery Connector Questions connector , bigquery , question , data-ingestion , missing-rows	0	1	September 9, 2024

Optimizing Data Pipeline Ingestion

Summary

Question

Related topics