Storing Nested Objects in ClickHouse Table

Summary

The user is facing an issue with storing nested objects like ‘email_addresses’ in ClickHouse table when connecting data from Merge.dev CRM. They are looking for ideas on how to handle this.


Question

Hello ! I’ve built a connector for http://Merge.dev|Merge.dev that retrieves “CRM” data but I’m facing a problem / question. For example, here’s an example of the response I get for a “contact”.

    "id": "b2abfe13-8be0-44d0-b183-d23e64b8f4da",
    "remote_id": "6771951",
    "created_at": "2024-03-06T00:13:57.074137Z",
    "modified_at": "2024-03-06T00:13:57.074146Z",
    "first_name": "Elon",
    "last_name": "Musk",
    "owner": "02ea3230-4587-427f-a23c-c8cb5f85c9ee",
    "addresses": [
      {
        "created_at": "2024-03-04T22:46:08.333060Z",
        "modified_at": "2024-03-04T22:46:08.333080Z",
        "country": "FR"
      }
    ],
    "email_addresses": [
      {
        "created_at": "2024-03-06T00:13:57.391366Z",
        "modified_at": "2024-03-06T00:13:57.391373Z",
        "email_address": "<mailto:elon.musk@tesla.com|elon.musk@tesla.com>",
        "email_address_type": "PRIMARY"
      }
    ],
    "phone_numbers": [
      {
        "created_at": "2024-03-06T00:13:57.361615Z",
        "modified_at": "2024-03-06T00:13:57.361622Z",
        "phone_number": "+33600000000"
      }
    ],
    "remote_created_at": "2024-03-05T20:39:44Z",
    "remote_was_deleted": false
  },```
I then connect the data to ClickHouse so that I can play with the data. The problem is that objects like "email_addresses" are stored in JSON Format in my contact table.

With some connectors, another table is created, for example contact_addresses, and the link can be made via a join.

Does anyone have any ideas to help me :smile: Thx a lot !!

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C027KKE4BCZ/p1709733165540029) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["merge.dev", "crm", "connector", "clickhouse", "nested-objects", "json-format"]
</sub>

New message text here

You can add “Field transformations → Add field” and add a field for email_address that would key into the address field on the json object. That way, you’ll have a string email address in clickhouse.

That would be a naive and simple normalization. For more advanced / consistent cleanup, you’d probably a more robust normalization workflow.

I.e. dbt: https://clickhouse.com/docs/en/integrations/dbt