Privilleges required for the MySQL connector (CDC replication)

Dear all. This is my first message so I hope I’m posting at the right place.

At my company, we use AWS RDS with MySQL 5.7 for our transactional data. Our warehouse is Big Query.

We have a production MySQL instance as well as a replica. I want to ingest our database data (CDC) into Big Query using the airbyte cloud service (we already bought credits). Following the article here I asked our technical team to

  1. enable binary logging in our replica, and
  2. create an airbyte user with access to the replica, and permissions including RELOAD, REPLICATION SLAVE, and REPLICATION CLIENT

My technical team challenged me about this, asking:

  1. What do these privilleges exactly do and why they are needed in my case
  2. Will it affect anything on the production database (such as performance, or anything else for that matter)
  3. Some kind of guarantee that it is safe to use a 3d party service, such as airbyte, with our private data (we are also based in EU, and want to comply with GDPR).

Now, forgive my lack of knowledge, my background on databases is fairly basic. I understand that REPLICATION SLAVE/CLIENT are used to access the binary logs. I’m trying to understand what RELOAD does, but I’m not 100% sure. I see some mention of accessing the master database for the logs, so I’m a bit worried. In any case, I don’t feel confident enough to answer the 2nd question specifically, i.e. whether this will affect our production database somehow. Could somebody please explain to me if this is the case? And any other comments about those questions, are more than welcome.

Thank you very much in advance.

Hi @Giorgos_Tzanakis, welcome to the forum!

What do these privileges exactly do and why they are needed in my case

Our CDC connectors are using Debezium under the hood to perform CDC replication. You can find more details about the required permissions and there meaning [here]. CDC is a lower level replication operation than standard replication, it reads the binlogs of a database to capture changes on the fly. This is why the permission set is a bit different than a classic read-only user: the user must have privileges read the binlog, track updates made on a replica, aquires locks etc. If you need more details, the Debezium documentation is probably the best place to get these.

I’m trying to understand what RELOAD does

From what I read in Debezium documentation the RELOAD permission is used by Debezium to make a snapshot.
Quoting their docs: When a Debezium MySQL connector is first started, it performs an initial consistent snapshot of your database. The following flow describes how the connector creates this snapshot. This flow is for the default snapshot mode, which is initial.
If you are worried about this permission I would suggest revoking it once your initial replicate succeeded, Debezium might not need it afterward (I’m not sure of this though :grin: )

Will it affect anything on the production database (such as performance, or anything else for that matter)

If you target a replica that is not used by the production database the read operations from Airbyte will not impact the performance of the master database that your production app is probably using. If you are not sure I would suggest to create a fresh replica dedicated to Airbyte.

Some kind of guarantee that it is safe to use a 3d party service, such as airbyte, with our private data (we are also based in EU, and want to comply with GDPR).

Airbyte only acts as a data movement solution, it is not storing any data that transfers through it nor expose this data to a 3d party service. And Airbyte, as a company, has no means to read the data that is flowing through your deployment.
You can find more details on this documentation.
Do you confirm that you are using a self-deployed version of Airbyte that you are running on your own infrastructure (I guessed you are because our Cloud offer is not yet available in EU)?
Let me know if you have any other concerns on this topic.

I hope it will help you move forward with CDC replication and Airbyte!

Hi @alafanechere,

First off, thank you for your quick reply and apologies for my very delayed response.

Everything you discuss about the permissions part of my questions are very clear. I will further dig into the debezium documenation to understand things better.

To be honest I’m a bit confused about that. No, I’m not using a self-hosted deployment… I am based in Greece, and I have already signed up for airbyte cloud as well as bought $500 worth of credit. I haven’t used it yet, although I had ran some experiments with a mailchimp integration when I was playing around with the trial period. So, is there any problem with that?!

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.