Logs
d39daee2_294b_4e75_8522_740f14df7112_logs_51001_txt.txt (255.6 KB)
Has the normalization process been updated? Unless I’ve done something wrong - which I can’t imagine what as this is a brand new source connected and I’m using the most recent versions - there really needs to be much better regression testing going on.
From the logs it appears to be failing before normalisation. Is it possible that this is caused by a timeout of 120 seconds in the source configuration instead of the default 300 seconds?
Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:
Thank you for your time and attention.
Best,
The Community Assistance Team
Hi!
I was looking through past threads and see that you’ve had a similar issue before:
https://discuss.airbyte.io/t/mysql-to-snowflake-cdc-sync-fails-race-condition-raw-table-not-found-but-still-being-written-to-snowflake/1426
Is this also a CDC sync?
I also found this article in the Snowflake knowledge base:
https://community.snowflake.com/s/article/Error-Procedure-does-not-exist-or-not-authorized-though-the-procedure-exists-in-the-schema
I’m looking more into this and getting input from the team, hope to have some ideas for you soon! Thanks for you patience
@natalyjazzviolin thank you so much for your prompt reply and for sharing that previous issue - I had completely forgotten about that one.
Re-reviewing the logs it appears there’s something wrong with the source
. And yes it is CDC.
I’m not sure how your snowflake community post relates to this issue as - from the logs - it appears to be an issue with the source.
Updates: I just tried the connection with 1 table instead of 4 and it succeeded. So that is 3M records (2.8gb) instead of 85M records (50gb).
It appears that MySQL CDC is unstable for large syncs.
@natalyjazzviolin actually, honestly thank you again for sharing that previous issue. This does seem to be a timeout issue with the source.
I broke the connector into separate connectors (one for each stream) and one still failed. It seems to hang until the timeout limit specified in the source configuration.
look at the time signatures
2022-10-25 10:48:36 source > Stopping the embedded engine
2022-10-25 10:48:36 source > Waiting for PT5M for connector to stop
2022-10-25 10:50:35 source > Oct 25, 2022 10:50:35 AM com.github.shyiko.mysql.binlog.BinaryLogClient$5 run
2022-10-25 10:50:35 source > INFO: Keepalive: Trying to restore lost connection to ...
2022-10-25 10:53:36 source > Stopping the task and engine
2022-10-25 10:53:36 source > Stopping down connector
2022-10-25 10:55:06 source > Coordinator didn't stop in the expected time, shutting down executor now
2022-10-25 10:56:36 source > Connection gracefully closed
Related issue? Source Mysql: syncing timeout (fetch size may be ignored) · Issue #9784 · airbytehq/airbyte · GitHub
I created one connection with 4 tables and it fails.
I created 4 separate connections and 3 out of 4 pass and one fails.
6.29 GB20,834,086 emitted recordsno records21m 50s
3.81 GB10,427,393 emitted records10,427,393 committed records12m 29s
2.8 GB3,890,211 emitted records3,890,211 committed records6m 11s
38.47 GB50,160,338 emitted records50,160,338 committed records1h 6m 36s
It is weird that the middle one failed i.e. not the largest
I have tried extending the timeout on the source configuration but no luck (300, 600 and 900s)
So glad the past thread was helpful! I see the GitHub comment and all your follow ups here are super helpful. I’m writing to the engineering team and hope to hear from them soon!
Updates:
The destination config (snowflake credentials) were updated and the sync started working. but after a few incremental + dedupe syncs its started failing again. There are no failed queries in snowflake. It appears to be an issue with the source (or debezium).
Can someone please investigate.
To be clear: I have one MySQL source with 4 tables. Using that one connector I can sync all tables except for one. I have tried syncing one table, it successfully syncs, then i unselected it and select the other one and it fails. The connections are all the same, there are no existing SCD tables or anything.
The schema does not seem problematic.
The various tables are roughly the same size.
I can’t seem to understand what is going wrong from the logs.
Can someone pleas assist
I think it is related to debezium
Coordinator didn't stop in the expected time, shutting down executor now
that line doesn’t appear in successful syncs
After investigating the mysql RDS logs - it appears to be an issue with connecting to RDS in the VPC
IP address 'X.X.X.X' could not be resolved
Following stackoverflow MySQL warning "IP address could not be resolved" - Server Fault
Updates: that seems to be unrelated. After a few successful syncs it’s now randomly failing every subsequent incremental sync
I wonder if this issue is related
FYI CDC full-refresh overwrite
works fine. its the incremental+dedupe that fails
@danieldiamond were you able to isolate this any further? We’ve been hitting issues on a few postgres sources where debezium fails to shut down correctly on syncs. Sometimes, it’ll work after a reset. Other times, no luck.
cdc: fix debezium shutdown interruption bug by subodh1810 · Pull Request #24166 · airbytehq/airbyte · GitHub will likely resolve the original issue