Failing to ingest a "big" MySQL table

danieldiamond · June 7, 2022, 1:45am

Is this your first time deploying Airbyte?: No
OS Version / Instance: Ubuntu
Memory / Disk: 64Gb
Deployment: Docker
Airbyte Version: 0.39.10
Source name/version: MySQL 0.5.11
Destination name/version: Snowflake (0.4.28)
Step: Sync
Description: MySQL to Snowflake unable to ingest large table, the first sync for a CDC incremental + dedupe sync

NOTE: This was previously working but then the airbyte instance died and it lost the position of the binlogs which meant it needed to perform a full-refresh sync. I was lucky to get the full-refresh sync to work a month ago and since then it’s been working fine as incremental. I just cannot get over this first initial sync! Can you please review the logs and let me know what’s going wrong.

57.77 GB | 127,417,153 emitted records | no records | 5h 0m 42s

mylogs-111.txt (62.1 KB)

danieldiamond · June 7, 2022, 4:03am

FYI not a timeout issue AKAIK

MySQL has its wait_timeout variable default value set to 28800 seconds (8 hours).

Therefore, if both sides of the connection still keep the defaults, the problem will never happen, as MySQL will never timeout a connection before airbyte.

alafanechere · June 8, 2022, 8:48am

Hey @danieldiamond,
I’m wondering if the root cause could be on the destination side.
Do you see part of the data being written to Snowflake? Which loading method did you choose? We recommend using internal staging.
Moreover, I did not exactly get if your attempting an initial load again after you Airbyte instance died or if this problem happens for your incremental loads?

danieldiamond · June 9, 2022, 1:23am

This is an issue with initial load after the instance died. there are no issues with incremental loads.

we use AWS S3 Staging for loading. Isn’t that more efficient for loading large pieces of data instead of internal?

What is the issue from the logs? I thought it was an issue with debezium or source and not necessarily snowflake side

alafanechere · June 9, 2022, 11:03am

What is the issue from the logs?

I can’t tell, there are no errors displayed. It looks like the destination gets cancelled 2022-06-07 01:22:50 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(cancel):441 - Cancelling destination...

Isn’t that more efficient for loading large pieces of data instead of internal?

The recommended way is to use internal staging because it leverages Snowflake’s own library to perform the staging which can be more efficitent.

Could you please check the following:

Do you have partial writes to your snowflake destination in the raw tables?
What is the memory consumption of your sync and destination container when the sync run?
Try to use internal staging loading to check if it changes anything?

Do you mind sharing your full logs? I think you truncated it, maybe you missed an error ?

marcosmarxm · July 13, 2022, 12:00am

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.

Topic		Replies	Views
Syncing huge tables (10B+ rows) from MySQL Connector Questions & Issues source-mysql , data-loading	2	417	January 31, 2023
MySQL to Snowflake incremental loading fails Connector Questions & Issues source-mysql , destination-snowflake	24	1801	November 1, 2022
Failing to ingest a "big" MySQL table (38Gb) Connector Questions & Issues data-loading	20	3363	June 24, 2022
MySQL to Snowflake Fails Normalization (Doesn't build RAW table) Connector Questions & Issues source-mysql , destination-snowflake , normalization	20	1006	March 24, 2023
MySQL to Snowflake CDC Sync Fails: Race condition, raw table not found but still being written to snowflake Connector Questions & Issues source-mysql , destination-snowflake , data-loading , cdc	9	637	July 14, 2022

Failing to ingest a "big" MySQL table

Related topics