Handling JSON columns in destination-bigquery

Summary

Exploring how to handle JSON columns in destination-bigquery instead of nested columns.


Question

Hello,
I’d like to know how others are handling the JSON columns (instead of nested columns) being loaded in destination-bigquery ?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["json-columns", "destination-bigquery", "data-loading"]

We model these out in our existing dbt pipelines. To be honest, I’ve been far happier since they removed Normalization, since it was a common source of both poor performance and phantom job failures.

Keep in mind that simple JSON objects can be referenced with dot-notation now: SELECT colName.propName. When you need typed data, you can use safe conversion with things like LAX_STRING(), LAX_BOOL(), LAX_INT64(), etc. (and then the whole host of JSON_* functions when you need more.

A common case is just using CROSS JOIN UNNEST(JSON_EXTRACT_ARRAY(colName)) AS jsonArray and then working with jsonArray.fieldName. For simple string arrays, you just swap JSON_EXTRACT_ARRAY for JSON_EXTRACT_STRING_ARRAY.

So all of that works very well at the query/modeling level, and there are very few cases where you need to get fancy.

When you do, relying on your modeling tool’s feature set. For example, we have cases where I need to know all the unique fields in nested JSON objects across all rows; so I abstract that into a dbt macro so that I can model it accordingly.

If you aren’t using dbt, keep in mind that you can use Dataform (same functional purpose) for free within the BigQuery UI. Their SQLX format is based on JavaScript, so very easy for people new to data engineering to pick up. And has all the same built-in version control of the models that dbt does. So if you don’t need to deal with non-BigQuery sources/destinations, Dataform is an easy win if you want to get out of query/view-based model hell.

Hey Justin
Thanks for this detailed answer :smile: