MySQL to Snowflake incremental loading fails

@natalyjazzviolin

I am not 100% sure though if the cursor field is timemodified as I am using CDC and don’t have the option of selecting the cursor field.

Interestingly, this link describes that the source does not necessarily need a suitable cursor field.

“On the other hand, CDC incremental replication reads a log of the changes that have been made to the source database and transmits these changes to the destination. Because changes are read from a transaction log when using CDC, it is not necessary for the source data to have a suitable cursor field.”

I’ve also tested the same connection on a bit older Airbyte 0.39.19-alpha instance with source MySql 0.5.11, destination Snowflake 0.4.28, and the incremental loading works.

It is really odd that you’re having this issue with only 7Gb of data. Let me get some input from my team and I’ll get back to you shortly!

Could you update to Airbyte 0.40.10? I’m getting input from my team, and we’re thinking updating could possibly fix the problem.

@natalyjazzviolin
Hope you had a great weekend.

The incremental syncs are still failing after upgrading to Airbyte 0.40.10. Connectors are the latest versions.
The initial historical sync also succeeded after failing once.

Here is a screenshot of the UI:

Here is the historical sync log that failed: (/tmp/workspace/46/0)
historical-0-fail.log (4.1 MB)

Here is the historical sync log that succeeded: (/tmp/workspace/46/1)
historical-1-success.log (7.6 MB)
Note: I’ve removed some “Records read” logs to reduce size.

Here is the incremental sync log that fails:
failure-logs.log (5.0 MB)

Could you share your schema please?

You should be able to see the whole Moodle schema here:
https://www.examulator.com/er/3.11/index.html
or:
https://docs.moodle.org/dev/Database_Schema

@natalyjazzviolin It also appears on the historical success logs in this format:

For table ‘uctohsDB.mdl_glossary’ using select statement: ‘SELECT’ id, course, name, intro

Hope this is what you’re looking for.

After discussing this with the team, I’ve escalated this to GitHub. You can follow the issue here:
https://github.com/airbytehq/airbyte/issues/17512

This looks like it is failing because of the StandardSyncInput which contains the full catalog and the state. It is hard to estimate what will be the size for this specific connection. The state is only added on the second sync, which is why the initial sync works.

Could you try a few things? These could be fixes in the short term, but we will work on making a longterm fix.

  1. Split the tables into 2 or more connections so as not to hit the maximum message size.
  2. Tweak the dynamic.config and build temporal with a bigger BlobSize

Let me know if this helps!

@natalyjazzviolin

I have about 65 tables of interest that are about 21 Gb in size.
To split the tables into 2 or more connections, I have selected only a few tables that add up to 1.74 Gb, which I believe is small, yet the incremental sync keeps failing.

This error came up in Cloud as well, and a fix has been merged:
https://github.com/airbytehq/airbyte/pull/17538

Please update Airbyte and let me know if the incremental sync works!

Hi @natalyjazzviolin

I have a critical critical issue. The connections are passing but there is a huge data integrity issue.
The counts of the records are significantly off. Please could you and the team look into this urgently.

And thank you for all the help thus far.

Hi! Please make a GitHub issue with all the details and we’ll be able to look into it!

hey @jamo there is a workaround for the issue Sync hangs with error: ScheduleActivityTaskCommandAttributes.Input exceeds size limit. · Issue #16236 · airbytehq/airbyte · GitHub

Thanks @alexnikitchuk I’ve updated to the latest version and I no longer get that error

@natalyjazzviolin I can’t seem to replicate the previous issue, but after updating to the latest version the incremental syncs still aren’t fully reliable

The syncs work at times

And fails/retries at times even though it looks like it is successful

Logs:
fail-logs.log (7.8 MB)

I’ve found an issue for the latest error you’re encountering. Looks like it happens sporadically just like in your case, and the only current fix is to reset the data and start a new sync, unfortunately.
https://github.com/airbytehq/airbyte/issues/17372

My suggestion would be to update the MySQL source to 1 or higher! As the user in the issue is also on a 6.x version.

We stick to one issue per forum post for documentation purposes. If you’d like to discuss this further, please start a new thread!

Thank you @natalyjazzviolin. It seems I am still facing the same issue. Will continue on that github thread :+1:

Sorry to hear it’s still happening! I’ll ask one of the engineers on the databases team for their input and will post in the GitHub thread as well when I get new info!

1 Like

I’m all of a sudden experiencing this as well

airbyte 0.40.17
airbyte/source-mysql:1.0.8
airbyte/destination-snowflake:0.4.38

CDC and incremental+dedupe

Sorry to hear that! Please upvote and comment on that GitHub thread!

Sharing related discourse thread