MySQL to Snowflake Fails Normalization (Doesn't build RAW table)

danieldiamond · October 24, 2022, 5:27am

Is this your first time deploying Airbyte?: No
OS Version / Instance: Ubuntu
Memory / Disk: 48gb
Deployment: Docker
Airbyte Version: 0.40.17
Source name/version: MySQL 1.0.6
Destination name/version: Snowflake 0.4.38
Step: Source sync fails (possibly due to timeout)

I created one connection with 4 tables and it fails.
I created 4 separate connections and 3 out of 4 pass and one fails.

FAILED: 6.29 GB20,834,086 emitted recordsno records21m 50s
SUCCEEDED: 3.81 GB10,427,393 emitted records10,427,393 committed records12m 29s
SUCCEEDED: 2.8 GB3,890,211 emitted records3,890,211 committed records6m 11s
SUCCEEDED: 38.47 GB50,160,338 emitted records50,160,338 committed records1h 6m 36s

It is weird that the middle one failed i.e. not the largest

danieldiamond · October 24, 2022, 5:34am

Logs

d39daee2_294b_4e75_8522_740f14df7112_logs_51001_txt.txt (255.6 KB)

danieldiamond · October 24, 2022, 5:41am

Has the normalization process been updated? Unless I’ve done something wrong - which I can’t imagine what as this is a brand new source connected and I’m using the most recent versions - there really needs to be much better regression testing going on.

danieldiamond · October 24, 2022, 1:17pm

From the logs it appears to be failing before normalisation. Is it possible that this is caused by a timeout of 120 seconds in the source configuration instead of the default 300 seconds?

marcosmarxm · October 24, 2022, 3:03pm

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

natalyjazzviolin · October 24, 2022, 3:12pm

Hi!
I was looking through past threads and see that you’ve had a similar issue before:
https://discuss.airbyte.io/t/mysql-to-snowflake-cdc-sync-fails-race-condition-raw-table-not-found-but-still-being-written-to-snowflake/1426

Is this also a CDC sync?

I also found this article in the Snowflake knowledge base:
https://community.snowflake.com/s/article/Error-Procedure-does-not-exist-or-not-authorized-though-the-procedure-exists-in-the-schema

I’m looking more into this and getting input from the team, hope to have some ideas for you soon! Thanks for you patience

danieldiamond · October 24, 2022, 10:02pm

@natalyjazzviolin thank you so much for your prompt reply and for sharing that previous issue - I had completely forgotten about that one.

Re-reviewing the logs it appears there’s something wrong with the source. And yes it is CDC.

I’m not sure how your snowflake community post relates to this issue as - from the logs - it appears to be an issue with the source.

danieldiamond · October 25, 2022, 8:33am

Updates: I just tried the connection with 1 table instead of 4 and it succeeded. So that is 3M records (2.8gb) instead of 85M records (50gb).
It appears that MySQL CDC is unstable for large syncs.

danieldiamond · October 25, 2022, 11:00am

@natalyjazzviolin actually, honestly thank you again for sharing that previous issue. This does seem to be a timeout issue with the source.

I broke the connector into separate connectors (one for each stream) and one still failed. It seems to hang until the timeout limit specified in the source configuration.

look at the time signatures

2022-10-25 10:48:36 source > Stopping the embedded engine
2022-10-25 10:48:36 source > Waiting for PT5M for connector to stop
2022-10-25 10:50:35 source > Oct 25, 2022 10:50:35 AM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run
2022-10-25 10:50:35 source > INFO: Keepalive: Trying to restore lost connection to ...
2022-10-25 10:53:36 source > Stopping the task and engine
2022-10-25 10:53:36 source > Stopping down connector
2022-10-25 10:55:06 source > Coordinator didn't stop in the expected time, shutting down executor now
2022-10-25 10:56:36 source > Connection gracefully closed

danieldiamond · October 25, 2022, 11:00am

Related issue? Source Mysql: syncing timeout (fetch size may be ignored) · Issue #9784 · airbytehq/airbyte · GitHub

danieldiamond · October 25, 2022, 11:05am

I created one connection with 4 tables and it fails.
I created 4 separate connections and 3 out of 4 pass and one fails.

FAILED: 6.29 GB20,834,086 emitted recordsno records21m 50s
SUCCEEDED: 3.81 GB10,427,393 emitted records10,427,393 committed records12m 29s
SUCCEEDED: 2.8 GB3,890,211 emitted records3,890,211 committed records6m 11s
SUCCEEDED: 38.47 GB50,160,338 emitted records50,160,338 committed records1h 6m 36s

It is weird that the middle one failed i.e. not the largest

danieldiamond · October 25, 2022, 11:05am

I have tried extending the timeout on the source configuration but no luck (300, 600 and 900s)

natalyjazzviolin · October 25, 2022, 6:30pm

So glad the past thread was helpful! I see the GitHub comment and all your follow ups here are super helpful. I’m writing to the engineering team and hope to hear from them soon!

danieldiamond · October 27, 2022, 12:32am

Updates:
The destination config (snowflake credentials) were updated and the sync started working. but after a few incremental + dedupe syncs its started failing again. There are no failed queries in snowflake. It appears to be an issue with the source (or debezium).

Can someone please investigate.

danieldiamond · October 28, 2022, 5:28am

To be clear: I have one MySQL source with 4 tables. Using that one connector I can sync all tables except for one. I have tried syncing one table, it successfully syncs, then i unselected it and select the other one and it fails. The connections are all the same, there are no existing SCD tables or anything.
The schema does not seem problematic.
The various tables are roughly the same size.
I can’t seem to understand what is going wrong from the logs.

Can someone pleas assist

danieldiamond · October 30, 2022, 9:48pm

I think it is related to debezium

Coordinator didn't stop in the expected time, shutting down executor now

that line doesn’t appear in successful syncs

danieldiamond · October 30, 2022, 11:45pm

After investigating the mysql RDS logs - it appears to be an issue with connecting to RDS in the VPC

IP address 'X.X.X.X' could not be resolved

Following stackoverflow MySQL warning "IP address could not be resolved" - Server Fault

danieldiamond · November 1, 2022, 12:34pm

Updates: that seems to be unrelated. After a few successful syncs it’s now randomly failing every subsequent incremental sync

I wonder if this issue is related

github.com/airbytehq/airbyte

MySQL to Snowflake incremental loading fails

opened 05:12PM - 03 Oct 22 UTC

marcosmarxm

type/bug temporal connectors/source/mysql zendesk team/db-dw-sources

This Github issue is synchronized with Zendesk: **Ticket ID:** [#2471](https://…airbyte7538.zendesk.com/agent/tickets/2471) **Priority:** normal **Group:** Community Assistance Engineer **Assignee:** Nataly Merezhuk **Original ticket description:** <blockquote><div class="zd-comment" dir="auto"><ul style="margin: 0 0 0 10px; padding: 0 0 0 20px" dir="auto"> <li style="padding-bottom: 10px">Is this your first time deploying Airbyte?: No</li> <li style="padding-bottom: 10px">OS Version / Instance: EC2 Linux AMI 2</li> <li style="padding-bottom: 10px">Memory / Disk: M6a.xlarge instance - 4 vCPUs, 16 GiB memory, 60 Gb storage</li> <li style="padding-bottom: 10px">Deployment: Docker</li> <li style="padding-bottom: 10px">Airbyte Version: 0.40.9 (latest as of writing)</li> <li style="padding-bottom: 10px">Source name/version: MySQL 0.6.14 (latest as of writing) connected through SSH tunnel</li> <li style="padding-bottom: 10px">Destination name/version: Snowflake 0.4.38 (latest as of writing) internal staging</li> <li style="padding-bottom: 10px">Step: During incremental sync</li> <li style="padding-bottom: 10px">Description:</li> </ul> <p dir="auto">Hi.</p> <p dir="auto">I am trying to sync moodle data sitting in Amazon Aurora to Snowflake. I was able to do so successfully in a previous Airbyte version 0.39.19-alpha.</p> <p dir="auto">Syncing 7.21Gb of data (about 40 tables) is successful on the initial run.<br> All subsequent syncs are failing. See image below.</p> <p dir="auto"></p><div><a href="https://aws1.discourse-cdn.com/business7/uploads/airbyte/original/2X/a/ab87d0b2c21fd7de2036abb527efbaf07b725eb4.png" title="historical-success-incremental-fail" rel="noopener nofollow ugc noreferrer" style="color: #006699; font-weight: bold; text-decoration: none"><img src="https://aws1.discourse-cdn.com/business7/uploads/airbyte/optimized/2X/a/ab87d0b2c21fd7de2036abb527efbaf07b725eb4_2_690x435.png" alt="historical-success-incremental-fail" width="auto" height="auto" style="max-width: 100%"><div style="display: none"> <span>historical-success-incremental-fail</span><span>1493×943 67.8 KB</span> </div></a></div><p dir="auto"></p> <p dir="auto">Here is the log for the successful initial sync:<br> <a href="https://discuss.airbyte.io/uploads/short-url/2Tud15TexlxX7Xb9V4VGJggJVsn.txt" style="color: #006699; font-weight: bold; text-decoration: none" rel="noreferrer">success-logs.txt</a> (7.0 MB)<br> Errors/warnings in the log that may be of interest:</p> <ul style="margin: 0 0 0 10px; padding: 0 0 0 20px" dir="auto"> <li style="padding-bottom: 10px">JSON schema validation failed</li> <li style="padding-bottom: 10px">Signalling close because record’s binlog file : mysql-bin-changelog.002503 , position : 75169383 is after target file : mysql-bin-changelog.002503 , target position : 75125447</li> <li style="padding-bottom: 10px">The main thread is exiting while children non-daemon threads from a connector are still active. Ideally, this should not happen</li> </ul> <p dir="auto">Here is the log for the first failed sync:<br> <a href="https://discuss.airbyte.io/uploads/short-url/7JmUomae5tbeMucpn97dAMO3Zq0.txt" style="color: #006699; font-weight: bold; text-decoration: none" rel="noreferrer">failure-logs.txt</a> (5.3 MB)<br> Errors/warnings in the log that may be of interest:</p> <ul style="margin: 0 0 0 10px; padding: 0 0 0 20px" dir="auto"> <li style="padding-bottom: 10px">2022-09-28 19:47:53 - Additional Failure Information: ScheduleActivityTaskCommandAttributes.Input exceeds size limit.</li> <li style="padding-bottom: 10px">Repeat of the errors/warnings found in successful.logs</li> </ul> <p dir="auto">Even with the warnings, the initial sync is successful. However, subsequent syncs are failing. I believe there is enough CPU and RAM for this sync. The error:</p> <pre style="max-width: 694px; word-wrap: break-word"><code style="background-color: #f9f9f9; display: block; overflow: auto; padding: 2px 5px">ScheduleActivityTaskCommandAttributes.Input exceeds size limit. </code></pre> <p dir="auto">seems to be related to temporal, but it makes no sense that the initial sync ran without an issue.</p> <p dir="auto">Please could someone assist me on this issue. Thank you.</p> [<a href="https://discuss.airbyte.io/t/mysql-to-snowflake-incremental-loading-fails/2740/1" rel="noreferrer">Discourse post</a>]</div></blockquote>

danieldiamond · November 3, 2022, 4:42am

FYI CDC full-refresh overwrite works fine. its the incremental+dedupe that fails

adam · March 22, 2023, 5:45pm

@danieldiamond were you able to isolate this any further? We’ve been hitting issues on a few postgres sources where debezium fails to shut down correctly on syncs. Sometimes, it’ll work after a reset. Other times, no luck.

Topic		Replies	Views
Multiple mysql sources streamed to the same snowflake db are failing after the first successful sync Connector Questions & Issues connectors	5	380	July 14, 2022
MySQL to Snowflake incremental loading fails Connector Questions & Issues source-mysql , destination-snowflake	24	1794	November 1, 2022
MySQL to Snowflake CDC Sync Fails: Race condition, raw table not found but still being written to snowflake Connector Questions & Issues source-mysql , destination-snowflake , data-loading , cdc	9	632	July 14, 2022
Normalization process fails for the MySQL source and Snowflake destination Connector Questions & Issues normalization	2	207	March 21, 2023
Failing to ingest a "big" MySQL table (38Gb) Connector Questions & Issues data-loading	20	3331	June 24, 2022

MySQL to Snowflake Fails Normalization (Doesn't build RAW table)

Related topics