Help Needed to Configure Airbyte for MySQL and BigQuery Real-Time Data Synchronisation

Hello Everyone :hugs:,

As a new user of Airbyte, I’ve been learning about its possibilities to assist in establishing a real-time data synchronisation pipeline for my company’s internal analytics requirements between Google BigQuery and MySQL. I’ve read the material, but I’m having some trouble, so I’m hoping someone here had setups similar to this before.

This is what I want to accomplish:

  • MySQL (hosted on AWS RDS) is the source.
  • Google BigQuery is the destination.
  • Moderate data volume, with two to three million rows per day
  • Sync Mode: As close to live as feasible, incremental updates in almost real-time

Here are a few particular difficulties I’m having:

Latency: Although I’m trying to synchronise in almost real-time, I’ve found that Airbyte’s default sync interval appears to be scheduled. Is there a recommended configuration to obtain faster syncs between BigQuery and MySQL, or is there a method to make this closer to real-time?

Schema Changes: Which method works best for managing schema modifications in MySQL? Will Airbyte automatically reflect modifications made to a table’s structure (such as the addition of additional columns) in my source database, or will human intervention be necessary?

Managing Huge Data Volumes: Are there any best practices I should adhere to in order to maximise the sync process’ speed given the number of data I’m working with? How incremental updates scale over time and with expanding datasets is something that especially worries me.

https://discuss.airbyte.io/t/missing-cdc-mode-in-airbyte-ui-for-real-time-sync/gen-ai

I would very appreciate any advice on solving these obstacles, or if anyone has worked on comparable pipelines!

Thanks in advance for your help and support.