MySQL to Snowflake CDC Sync Fails: Race condition, raw table not found but still being written to snowflake

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 64Gb
  • Deployment: Docker
  • Airbyte Version: 0.39.10
  • Source name/version: MySQL 0.5.11
  • Destination name/version: Snowflake (0.4.28)
  • Step: Sync
  • Description: First sync (full refresh) - sync fails with normalization error:
    Object 'MYDB.MYSCHEMA._AIRBYTE_RAW_MYTABLE' does not exist or not authorized.
    But looking at snowflake history i can see the query is still running
COPY INTO MYSCHEMA._airbyte_tmp_mbc_mytable FROM 's3://my-bucket/MY_SCHEMA/2022/06/13/23/CBA85027-1111-4DCC-BDB4-052BA187146B/' CREDENTIALS=(aws_key_id='☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺' aws_secret_key='☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺☺') file_format = (type = csv compression = auto field_delimiter = ',' skip_header = 0 FIELD_OPTIONALLY_ENCLOSED_BY = '"') files = ('0.csv.gz','1.csv.gz','2.csv.gz','3.csv.gz','4.csv.gz','5.csv.gz','6.csv.gz','7.csv.gz','8.csv.gz','9.csv.gz','10.csv.gz','11.csv.gz','12.csv.gz','13.csv.gz','14.csv.gz','15.csv.gz','16.csv.gz','17.csv.gz','18.csv.gz','19.csv.gz','20.csv.gz','21.csv.gz','22.csv.gz','23.csv.gz','24.csv.gz','25.csv.gz','26.csv.gz','27.csv.gz','28.csv.gz','29.csv.gz','30.csv.gz','31.csv.gz','32.csv.gz');

cdc_logs.txt (188.1 KB)

I published a custom mysql source connector with the updated debezium configuration.
It failed for a normalization issue. Posting issue here: MySQL to Snowflake CDC Sync Fails: Race condition, raw table not found but still being written to snowflake

Some interesting things to note:

  1. This normalisation issue occurred last week when I was trying to get CDC full-refresh to work - so i dont believe its related to updated debezium spec (but it could be)
  2. Experiencing almost an hour delay from completed source to next step
2022-06-14 03:12:10 e[44msourcee[0m > 2022-06-14 03:12:10 e[32mINFOe[m i.a.i.s.r.AbstractDbSource(lambda$read$2):132 - Closed database connection pool.
2022-06-14 03:12:10 e[44msourcee[0m > 2022-06-14 03:12:10 e[32mINFOe[m i.a.i.b.IntegrationRunner(runInternal):171 - Completed integration: io.airbyte.integrations.base.ssh.SshWrappedSource
2022-06-14 03:12:10 e[44msourcee[0m > 2022-06-14 03:12:10 e[32mINFOe[m i.a.i.s.m.MySqlSource(main):213 - completed source: class io.airbyte.integrations.source.mysql.MySqlSource
2022-06-14 04:22:10 e[44msourcee[0m > 2022-06-14 04:22:10 e[1;31mERRORe[m i.a.i.b.IntegrationRunner(lambda$watchForOrphanThreads$8):266 - Failed to interrupt children non-daemon threads, forcefully exiting NOW...
  1. that Active non-daemon thread error (which i know isn’t an issue) doesn’t show up with my other connectors.

This was using AWS S3 staging. I have just tried internal staging (recommended) and same issue.

Reviewing the snowflake history, it doensn’t appear that the _AIRBYTE_RAW_TABLE is being created. i see the schema, tmp table and stage all being created but not the raw table

The problem way before normalization; Looks the source failed and was even able to start creating the destination table.

what line are you looking at? are you referring to
The main thread is exiting while children non-daemon threads from a connector are still active.

Caused by: io.airbyte.workers.exception.WorkerException: Source process exit with code 2. This warning is normal if the job was cancelled.

This warning is normal if the job was cancelled.
doesn’t the sync actively force cancel the source as it hangs? i.e. it resolved this previous issue: Failed CDC job does not cancel job · Issue #5516 · airbytehq/airbyte · GitHub

if this is not normal. whats the underlying cause?

From logs is not clear why the sync failed :frowning: The CDC for MySQL is quite unstable currently and will be improved in the future.