Error in Airbyte MySQL CDC Connector during Initial Sync

Summary

The user is facing an error in the Airbyte MySQL CDC Connector during the initial sync process, encountering a ‘Could not find first log file name in binary log index file’ error. They have questions regarding the duration of the job, resuming from failure, and resolving the error.


Question

Hi Team,

We are using airbyte new mysql cdc connector to sync around 500 GB data from the source to s3.
After initial sync of 500 GB , it starts creation initial snapshot (reading CDC bin logs) but after some time it fails with error

2024-03-06 16:22:00 source > io.debezium.DebeziumException: Could not find first log file name in binary log index file Error code: 1236; SQLSTATE: HY000.
2024-03-06 16:22:00 source >         at io.debezium.connector.mysql.MySqlStreamingChangeEventSource.wrap(MySqlStreamingChangeEventSource.java:1254) ~[debezium-connector-mysql-2.4.0.Final.jar:2.4.0.Final]
2024-03-06 16:22:00 source >         at io.debezium.connector.mysql.MySqlStreamingChangeEventSource$ReaderThreadLifecycleListener.onCommunicationFailure(MySqlStreamingChangeEventSource.java:1299) ~[debezium-connector-mysql-2.4.0.Final.jar:2.4.0.Final]
2024-03-06 16:22:00 source >         at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1079) ~[mysql-binlog-connector-java-0.28.1.jar:0.28.1]
2024-03-06 16:22:00 source >         at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:631) ~[mysql-binlog-connector-java-0.28.1.jar:0.28.1]
2024-03-06 16:22:00 source >         at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:932) ~[mysql-binlog-connector-java-0.28.1.jar:0.28.1]
2024-03-06 16:22:00 source >         at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2024-03-06 16:22:00 source > Caused by: com.github.shyiko.mysql.binlog.network.ServerException: Could not find first log file name in binary log index file
2024-03-06 16:22:00 source >         at com.github.shyiko.mysql.binlog.BinaryLogClient.listenForEventPackets(BinaryLogClient.java:1043) ~[mysql-binlog-connector-java-0.28.1.jar:0.28.1]
2024-03-06 16:22:00 source >         ... 3 more```
We have couple of question below:
1. Is it normal for this king of job to run for 24 hrs (mentioning we are fetching only a single large table)
 (if yes, is there any way we can speed it up)
2. This job after failure when to attempt 2 which started again with initial sync (i guess it should be resumable from the offset)
3. What about the above error( i was facing it on smaller tables as well on daily basis), how to resolve it??
Some info
``` log_bin                    | ON
 sql_log_bin                | ON 
 binlog_format              | ROW 
 binlog_row_image           | FULL
 binlog_expire_logs_seconds | 2592000```
Any help would be much appreciated
Thanks.

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1709797718905729) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["airbyte", "mysql-cdc-connector", "initial-sync", "error", "binary-log", "resumable", "binlog-format"]
</sub>

<@U02RZ7YQNKY> <@U01AB6V6NMQ>
Request to have someone looked into it.
As, this mysql connector is the certified one…

Thanks for the cooperation…

Hi Himanshu!
we actually provide direct support only to our Airbyte Self-Managed or Airbyte Cloud customers.
let me know if you’re interested in either.

Hello <@U05367G70MB> can you share what version of the connector (mysql, s3) and the platform you’re using?

Searching for the error message I found this https://lokesh1729.com/posts/change-data-capture-debezium/#could-not-find-the-first-log-file-name-in-the-binary-log-index-file-error-code-1236-sqlstate-hy000|info:
> This generally happens if you had upgraded MySQL, and cleaned up the binlogs. Delete the connector and re-deploy it again using REST API.

A lot depends on how you’ve set things up:
• Yes, we need to first make an initial snapshot of your data before we read the CDC changes onward. We note the CDC position at the start of the sync, and once the snapshot is complete, we then start reading the CDC logs from then until now.
• If the sync was slow enough, or your data is changing fast enough, it’s possible that the CDC position noted at the start of the sync has been flushed away by then.
• You’ll need to do one of:
◦ Make the airbyte sync faster - can you provision larger pods with more memory?
◦ Increase the CDC retention period on your database?
◦ Split the sync into a few syncs, so that there’s less to do in those initial snapshots

Hi,
Thanks you guys for looking into it.

Please find my responses below :
<@U01MMSDJGC9> Mysql Connector Version : 3.3.7,
S3 Connector Version : 4.5.3
Platform : Airbyte Deployed on a single EC2 machine [4 cores, 16 GB RAM]
i guess, we need to move it to pod environment with auto scaling

<@U0397RTD8E4>
• You’ll need to do one of:
Make the airbyte sync faster - can you provision larger pods with more memory?Yes we will do the same…
Increase the CDC retention period on your database?
It’s already set 30 days.
Split the sync into a few syncs, so that there's less to do in those initial snapshots
How we can achieve it ? (as we are syncing a single large table and cannot create views on top of that table).

But having a concern, this mysql (from which we are syncing is a standalone instance), we have no writes/updates being performed on it and expiration is set to 30 days
so, we should not have faced the error
Could not find first log file name in binary log index file Error code: 1236; SQLSTATE: HY000

any more insights?
thanks for all the help till now…

It might be the case that you aren’t connecting to the leader node? (<mysql - Error 1236 - "Could not find first log file name in binary log index file" - Database Administrators Stack Exchange relevant stackoverflow>)