Square to MySQL normalization errors

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: MacOS Monterey 12.5
  • Memory / Disk: 16Gb / 2 Tb
  • Deployment: Docker
  • Airbyte Version: 0.39.42-alpha
  • Source name/version: Square (0.1.4)
  • Destination name/version: MySQL (0.1.20)
  • Step: During sync, Normalization phase from what I can tell
  • Description: I am new, first sync attempt. I finally got the MySQL destination set up and connected to the Square source. It is dying during Normalization, it looks like it is generating SQL code invalid for MySQL… ?
2022-08-09 05:21:45 e[42mnormalizatione[0m > Found 392 models, 0 tests, 0 snapshots, 0 analyses, 563 macros, 0 operations, 0 seed files, 7 sources, 0 exposures, 0 metrics
2022-08-09 05:21:46 e[42mnormalizatione[0m > Concurrency: 1 threads (target='prod')
2022-08-09 05:21:46 e[42mnormalizatione[0m > 1 of 98 START table model airbyte.categories............................................................................ [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 1 of 98 ERROR creating table model airbyte.categories................................................................... [e[31mERRORe[0m in 0.05s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 2 of 98 START table model airbyte.customers............................................................................. [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 2 of 98 ERROR creating table model airbyte.customers.................................................................... [e[31mERRORe[0m in 0.04s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 3 of 98 START table model airbyte.items................................................................................. [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 3 of 98 ERROR creating table model airbyte.items........................................................................ [e[31mERRORe[0m in 0.02s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 4 of 98 START table model airbyte.locations............................................................................. [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 4 of 98 ERROR creating table model airbyte.locations.................................................................... [e[31mERRORe[0m in 0.03s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 5 of 98 START table model airbyte.orders................................................................................ [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 5 of 98 ERROR creating table model airbyte.orders....................................................................... [e[31mERRORe[0m in 0.04s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 6 of 98 START table model airbyte.payments.............................................................................. [RUN]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 6 of 98 ERROR creating table model airbyte.payments..................................................................... [e[31mERRORe[0m in 0.05s]
2022-08-09 05:21:46 e[42mnormalizatione[0m > 7 of 98 START table model airbyte.refunds............................................................................... [RUN]
2022-08-09 05:21:47 e[42mnormalizatione[0m > 7 of 98 ERROR creating table model airbyte.refunds...................................................................... [e[31mERRORe[0m in 0.04s]
2022-08-09 05:21:47 e[42mnormalizatione[0m > 8 of 98 SKIP relation airbyte.categories_category_data.................................................................. [e[33mSKIPe[0m]

Then there are a set of these kinds of errors:

2022-08-09 05:21:47 e[42mnormalizatione[0m > Finished running 98 table models in 1.77s.
2022-08-09 05:21:47 e[42mnormalizatione[0m > e[31mCompleted with 7 errors and 0 warnings:e[0m
2022-08-09 05:21:47 e[42mnormalizatione[0m > e[33mDatabase Error in model categories (models/generated/airbyte_tables/airbyte/categories.sql)e[0m
2022-08-09 05:21:47 e[42mnormalizatione[0m >   1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'RETURNING CHAR) as id,
2022-08-09 05:21:47 e[42mnormalizatione[0m >       json_value(_airbyte_data, 
2022-08-09 05:21:47 e[42mnormalizatione[0m >       '$."type"' RETURNIN...' at line 14
2022-08-09 05:21:47 e[42mnormalizatione[0m >   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/airbyte/categories.sql
2022-08-09 05:21:47 e[42mnormalizatione[0m > e[33mDatabase Error in model customers (models/generated/airbyte_tables/airbyte/customers.sql)e[0m
2022-08-09 05:21:47 e[42mnormalizatione[0m >   1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'RETURNING CHAR) as id,
2022-08-09 05:21:47 e[42mnormalizatione[0m >       json_extract(_airbyte_data, 
2022-08-09 05:21:47 e[42mnormalizatione[0m >       '$."cards"') as c...' at line 14

There are further errors, but I think they may be the result of the previous failures… ?

Anyway, this may be a configuration error, but I didn’t really change anything beyond setting up the source and destination. Maybe I needed to?

I was able to copy one of the generated SQL statements out:



  create  table
    airbyte.`locations__dbt_tmp`
  as (
    
with __dbt__cte__locations_ab1 as (

-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema
-- depends_on: airbyte._airbyte_raw_locations
select
    json_value(_airbyte_data, 
    '$."id"' RETURNING CHAR) as id,
    json_value(_airbyte_data, 
    '$."mcc"' RETURNING CHAR) as mcc,
    json_value(_airbyte_data, 
    '$."name"' RETURNING CHAR) as `name`,
    json_value(_airbyte_data, 
    '$."type"' RETURNING CHAR) as `type`,
    json_value(_airbyte_data, 
    '$."status"' RETURNING CHAR) as `status`,
    
        json_extract(table_alias._airbyte_data, 
    '$."address"')
     as address,
    json_value(_airbyte_data, 
    '$."country"' RETURNING CHAR) as country,
    json_value(_airbyte_data, 
    '$."currency"' RETURNING CHAR) as currency,
    json_value(_airbyte_data, 
    '$."timezone"' RETURNING CHAR) as timezone,
    json_value(_airbyte_data, 
    '$."created_at"' RETURNING CHAR) as created_at,
    json_value(_airbyte_data, 
    '$."description"' RETURNING CHAR) as `description`,
    json_value(_airbyte_data, 
    '$."merchant_id"' RETURNING CHAR) as merchant_id,
    json_value(_airbyte_data, 
    '$."website_url"' RETURNING CHAR) as website_url,
    json_extract(_airbyte_data, 
    '$."capabilities"') as capabilities,
    json_value(_airbyte_data, 
    '$."facebook_url"' RETURNING CHAR) as facebook_url,
    json_value(_airbyte_data, 
    '$."phone_number"' RETURNING CHAR) as phone_number,
    json_value(_airbyte_data, 
    '$."business_name"' RETURNING CHAR) as business_name,
    json_value(_airbyte_data, 
    '$."language_code"' RETURNING CHAR) as language_code,
    json_value(_airbyte_data, 
    '$."business_email"' RETURNING CHAR) as business_email,
    
        json_extract(table_alias._airbyte_data, 
    '$."business_hours"')
     as business_hours,
    json_value(_airbyte_data, 
    '$."twitter_username"' RETURNING CHAR) as twitter_username,
    json_value(_airbyte_data, 
    '$."instagram_username"' RETURNING CHAR) as instagram_username,
    _airbyte_ab_id,
    _airbyte_emitted_at,
    
    CURRENT_TIMESTAMP
 as _airbyte_normalized_at
from airbyte._airbyte_raw_locations as table_alias
-- locations
where 1 = 1
),  __dbt__cte__locations_ab2 as (

-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type
-- depends_on: __dbt__cte__locations_ab1
select
    cast(id as char(1024)) as id,
    cast(mcc as char(1024)) as mcc,
    cast(`name` as char(1024)) as `name`,
    cast(`type` as char(1024)) as `type`,
    cast(`status` as char(1024)) as `status`,
    cast(address as json) as address,
    cast(country as char(1024)) as country,
    cast(currency as char(1024)) as currency,
    cast(timezone as char(1024)) as timezone,
    cast(created_at as char(1024)) as created_at,
    cast(`description` as char(1024)) as `description`,
    cast(merchant_id as char(1024)) as merchant_id,
    cast(website_url as char(1024)) as website_url,
    capabilities,
    cast(facebook_url as char(1024)) as facebook_url,
    cast(phone_number as char(1024)) as phone_number,
    cast(business_name as char(1024)) as business_name,
    cast(language_code as char(1024)) as language_code,
    cast(business_email as char(1024)) as business_email,
    cast(business_hours as json) as business_hours,
    cast(twitter_username as char(1024)) as twitter_username,
    cast(instagram_username as char(1024)) as instagram_username,
    _airbyte_ab_id,
    _airbyte_emitted_at,
    
    CURRENT_TIMESTAMP
 as _airbyte_normalized_at
from __dbt__cte__locations_ab1
-- locations
where 1 = 1
),  __dbt__cte__locations_ab3 as (

-- SQL model to build a hash column based on the values of this record
-- depends_on: __dbt__cte__locations_ab2
select
    md5(cast(concat(coalesce(cast(id as char), ''), '-', coalesce(cast(mcc as char), ''), '-', coalesce(cast(`name` as char), ''), '-', coalesce(cast(`type` as char), ''), '-', coalesce(cast(`status` as char), ''), '-', coalesce(cast(address as char), ''), '-', coalesce(cast(country as char), ''), '-', coalesce(cast(currency as char), ''), '-', coalesce(cast(timezone as char), ''), '-', coalesce(cast(created_at as char), ''), '-', coalesce(cast(`description` as char), ''), '-', coalesce(cast(merchant_id as char), ''), '-', coalesce(cast(website_url as char), ''), '-', coalesce(cast(capabilities as char), ''), '-', coalesce(cast(facebook_url as char), ''), '-', coalesce(cast(phone_number as char), ''), '-', coalesce(cast(business_name as char), ''), '-', coalesce(cast(language_code as char), ''), '-', coalesce(cast(business_email as char), ''), '-', coalesce(cast(business_hours as char), ''), '-', coalesce(cast(twitter_username as char), ''), '-', coalesce(cast(instagram_username as char), '')) as char)) as _airbyte_locations_hashid,
    tmp.*
from __dbt__cte__locations_ab2 tmp
-- locations
where 1 = 1
)-- Final base SQL model
-- depends_on: __dbt__cte__locations_ab3
select
    id,
    mcc,
    `name`,
    `type`,
    `status`,
    address,
    country,
    currency,
    timezone,
    created_at,
    `description`,
    merchant_id,
    website_url,
    capabilities,
    facebook_url,
    phone_number,
    business_name,
    language_code,
    business_email,
    business_hours,
    twitter_username,
    instagram_username,
    _airbyte_ab_id,
    _airbyte_emitted_at,
    
    CURRENT_TIMESTAMP
 as _airbyte_normalized_at,
    _airbyte_locations_hashid
from __dbt__cte__locations_ab3
-- locations from airbyte._airbyte_raw_locations
where 1 = 1
  )

It seems this is due to an incompatibility between MySQL JSON_VALUE() and MariaDB JSON_VALUE(). I thought the documentation said it would work with a recent version of MariaDB but maybe not.

Hi @n9yty, I think you’re right about the root cause of your issue! The problem seems to be in the SQL syntax.

Yup, after finally migrating everything over to MySQL (I really should have just spun up a docker container LOL) it works OK. So maybe the documentation could use an update to mention MariaDB is not compatible with the normalization process.

Could you please point me to where in the documentation you think this should go? I’d be happy to add that note in there, or if you’d rather submit a PR yourself - please go for it! So glad this got figured out, I’m sure it’ll be really helpful to future users.