Normalization Fails for bigquery-destination

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu
  • Memory / Disk: you can use something like 8Gb / 1 Tb
  • Deployment: Docker
  • Airbyte Version: 0.39.20-alpha
  • Source name/version: HubSpot (0.1.69)
  • Destination name/version: BigQuery (1.1.8)
  • Step: The issue is happening during sync.
  • Description:

The sync got failed with following error. This looks happening because “Array cannot have a null element” in model email_events_browser.

How can I avoid this error and stream data to BigQuery?

Thank you for your advice:)

[MainThread]: 
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.750337 [info ] [MainThread]: e[31mCompleted with 1 error and 0 warnings:e[0m
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.750949 [info ] [MainThread]: 
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.751477 [error] [MainThread]: e[33mDatabase Error in model email_events_browser (models/generated/airbyte_incremental/hubspot_test_gcs/email_events_browser.sql)e[0m
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.752051 [error] [MainThread]:   Array cannot have a null element; error in writing field version
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.752554 [error] [MainThread]:   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_incremental/hubspot_test_gcs/email_events_browser.sql
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.753525 [info ] [MainThread]: 
2022-06-17 03:36:10 e[42mnormalizatione[0m > 03:35:59.753936 [info ] [MainThread]: Done. PASS=113 WARN=0 ERROR=1 SKIP=0 TOTAL=114

Hey could you share the below details

  1. Complete log of the job
  2. Also the schema you are using for this connection?

Hi, harshith. Thank you for your reply.

  1. I attached my log.
    logs-28.txt (2.7 KB)

  2. The schema for email_event_browser is as follows and how the config panel look like is inserted below. Please let me know how to collect if additional info is required.

_airbyte_email_events_hashid STRING NULLABLE
url STRING NULLABLE
name STRING NULLABLE
type STRING NULLABLE
family STRING NULLABLE
version STRING REPEATED
producer STRING NULLABLE
producerUrl STRING NULLABLE
_airbyte_ab_id STRING NULLABLE
_airbyte_emitted_at TIMESTAMP NULLABLE
_airbyte_normalized_at TIMESTAMP NULLABLE
_airbyte_browser_hashid STRING NULLABLE

Your support will be much appreciated…

Hey could you check these

  1. Can you uncheck email_events_browser and run the sync and see if the sync is successful
  2. Could you create a new connection with just email_events as the stream and check the value for browser > version key?

Thank you very much for your message.

The 1. task succeeded.

The 2. task the log is as follows and it complains about normalization problem.
logs-34.txt (547.1 KB)
And the query “SELECT browser FROM analytics-platform-201110.hubspot_test_email.email_events” returned following csv files where it contains “version”: as well as “version”:[“”]

What’s next?

Thank you for your support!

Looks like this is the problem https://www.yuichiotsuka.com/google-bigquery-unnest-arrays/#Error_Message_array_cannot_have_a_null_element_error_in_writing_field.

Is it possible to get all the version data? You can do this by disabling normalisation and directly querying from Bigquery.

To avoid this problem temporarily in normalisation you can

  1. Do a get request for this connection https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/workspaces/create
  2. Take all the catalog and remove the version key
  3. You can update the connection back again https://airbyte-public-api-docs.s3.us-east-2.amazonaws.com/rapidoc-api-docs.html#post-/v1/connections/update

This is the extracted data for versions.
bquxjob_versions.txt (1.2 MB)

I got catalog and removed version in email_event > browser and updated the connection through API and got “Response Status: OK:200” but sync throughs other errors with normalization failure…

logs-46.txt (190.7 KB)

What should I do next?

I think in versions some are strings also not just arrays. I have created an issue for this so that team can look into it. https://github.com/airbytehq/airbyte/issues/14039

Thank you. How long do you estimate this fix will take?

Also, why the API connection update is not working?

Not sure on the API part as that looks like a different error and is not related to the API. And on the estimate team should triage it and then the respective team will start taking a look into it