BigQuery source array not normalized correctly in Snowflake destination

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu (Linux airbyte 5.4.0-1086-azure #91~18.04.1-Ubuntu SMP Thu Jun 23 20:33:05 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux)
  • Memory / Disk: you can use something like 32GB/100GB
  • Deployment: Docker
  • Airbyte Version: 0.39.28-alpha
  • Source name/version: BigQuery 0.1.8
  • Destination name/version: Snowflake 0.4.30
  • Step: normalization
  • Description:
    I have a BigQuery source that has a few tables with columns defined as arrays. For the columns defined as ARRAY, the data makes it to the raw table just fine, but the datatype for the array column is FLOAT.

The sync ends up failing during normalization with errors like:

2022-07-19 22:52:44 normalization > 22:52:39.115256 [error] [MainThread]: Database Error in model VSCHEDULEOPENSHIFT (models/generated/airbyte_tables/DIMENSIONS/VSCHEDULEOPENSHIFT.sql)
2022-07-19 22:52:44 normalization > 22:52:39.115606 [error] [MainThread]:   100038 (22018): Numeric value '[5135]' is not recognized

Based on the normalization docs, I would have expected a second table to be created, but I don’t see that happening.

Right now, my only workaround to not have a failure is to have two separately configured connections between BigQuery and Snowflake - one to handle the set of tables that don’t have any array columns with normalization, and one to handle the others only as raw data, and then create views in Snowflake to handle the normalization for end users.

Am I missing something?

Hello and welcome to the community, @bnam! Could you please try updating the Airbyte version to the latest one as a first step?

I just got the same result on 0.39.37-alpha with BigQuery source upgraded to 0.1.9 and Snowflake destination to 0.4.33.

Could you provide the full logs you’re getting? And are you using SSO authentication in Snowflake by any chance?

No SSO auth.

The log file is attached.
failed_normalization.txt (341.6 KB)

Could you please go into Connections → Replication to check if the field is being recognized as an array by Airbyte? I want to make sure it’s being ingested correctly.

In Airbyte, it shows the column as a Number. Since this is a view, I also checked the underlying table - same datatype.

In DataGrip, which I’m using to query BigQuery, in both the view and in the table it shows the datatype as ARRAY

Just to be sure, I refreshed the schema and synced and still have the same result.

I reproduced your issue locally, this is a bug that’s happening with the BigQuery source. I’ve escalated this to Github here:

Feel free to keep an eye on it, and I will let you know when any progress is made!