When syncing to Snowflake, we get a column called _AIRBYTE_<TABLE_NAME>_HASHID
. I’m assuming it might be MD5? How is it generated?
Hey it is made for the whole row. Ideally, it’s also an identifier for that row.
Hi @harshith,
Thanks for the quick reply. I know it’s an identifier for that row. My question was more on how is it generated? Is it a random hash? I’m trying to find in the code where that string is generated for a row.
Is it by concatenating all values, then applying md5 to it? Do you know where pin-point in the code where it’s implemented?
Hey my bad. It’s md5. Yeah here you can also read through the docs https://docs.airbyte.com/understanding-airbyte/basic-normalization#normalization-metadata-columns
Also https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/bases/base-normalization all of the normalisation happens here.
Hi @philippeboyd , did you find the how questions?
Is it a random hash, or string concat from all column of that field?
I tried to read through the Normalization Code, but can’t find it!
Hi @phucdinh, it is a string concatenation of all columns hashed with md5 (surrogate_key function from dbt_utils package).