Normalization Issue with Upgraded Airbyte and New Destinations V2

slack-user-airbyte · May 16, 2024, 12:48pm

Summary

After upgrading to the new version of Airbyte with destinations v2, normalization is disabled, causing issues with downstream dependencies of normalized tables. Column names are changing in Redshift.

Question

We previously had a connection with normalization. Since we upgraded, normalization is disabled. But this broke a lot of downstream dependencies of the normalized tables. We are using the new destinations v2 ? This isn’t clarified anywhere in the documentation I think.
In Redshift it is changing my column names.

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

_{["normalization", "upgrade", "destinations-v2", "documentation", "redshift", "column-names"]}

slack-user-airbyte · June 20, 2024, 6:19am

Custom normalization and transformations was deprecated.

slack-user-airbyte · June 20, 2024, 6:19am

Here’s where it’s noted in the docs how Destinations v2 works:
https://docs.airbyte.com/integrations/destinations/redshift-migrations

slack-user-airbyte · June 20, 2024, 6:19am

so previously did it pass the kanji characters through to the final table, and now it’s replacing them with underscores?

slack-user-airbyte · June 20, 2024, 6:19am

Yes absolutely right.

slack-user-airbyte · June 20, 2024, 6:19am

Seems as though since they’re quoted identifiers they should be allowed to be preserved. Based on a quick read of the source, I’m thinking the issue is that the RedshiftSQLNameTransformer is basically just extending the io.airbyte.cdk.integrations.destination.StandardNameTransformer add forcing it to lowercase, even though this destination can support a broader set of characters in field names.

Code here:
https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b/airbyte-integrations/connectors/destination-redshift/src/main/java/io/airbyte/integrations/destination/redshift/RedshiftSQLNameTransformer.java#L9|https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b[…]tegrations/destination/redshift/RedshiftSQLNameTransformer.java

More than likely, it’s the upstream call to toAlphanumericAndUnderscore that is leading to this replacement:
https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b/airbyte-cdk/java/airbyte-cdk/dependencies/src/main/kotlin/io/airbyte/commons/text/Names.kt#L27|https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b[…]k/dependencies/src/main/kotlin/io/airbyte/commons/text/Names.kt

I’d recommend putting an issue in on GitHub in hopes that the team can adjust accordingly to support the wider naming support in RedShift (and I’m assuming Postgres behaves the same under Destinations V2)

slack-user-airbyte · June 20, 2024, 6:19am

So don’t we have any settings to handle this in the connector level?

slack-user-airbyte · June 21, 2024, 6:18am

<@U04SE163WC8> My question is why it is changing the column names in my redshift table?

insert into “raw_daas_jp”.“raw_fas_pfqdi002” (“__”, “__cd”, “__seq”, “____”, “_____1”, “2", “____cd”, “____no”, "”, “1", "”, “_______1”, “ab_source_file_url", "cd", "”, “________1”, “______2", “3", "4", "ad", "”, “_1", "ab_source_file_last_modified", "”, “_airbyte_raw_id”, “_airbyte_extracted_at”, “_airbyte_meta”) with “intermediate_data” as (select CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“メモ”) <> ‘string’ and “_airbyte_data”.“メモ” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“メモ”) ELSE cast(“_airbyte_data”.“メモ” as varchar(65535)) END as "”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“包形CD”) <> ‘string’ and “_airbyte_data”.“包形CD” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“包形CD”) ELSE cast(“_airbyte_data”.“包形CD” as varchar(65535)) END as “__cd”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“更新SEQ”) <> ‘string’ and “_airbyte_data”.“更新SEQ” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“更新SEQ”) ELSE cast(“_airbyte_data”.“更新SEQ” as varchar(65535)) END as “__seq”, CASE WHEN (JSON_TYPEOF(“airbyte_data".“充填順序”) <> ‘string’ and "airbyte_data".“充填順序” is not null) THEN JSON_SERIALIZE("airbyte_data".“充填順序”) ELSE cast("airbyte_data".“充填順序” as varchar(65535)) END as "”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“更新日時”) <> ‘string’ and “_airbyte_data”.“更新日時” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“更新日時”) ELSE cast(“_airbyte_data”.“更新日時” as varchar(65535)) END as “_____1”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“登録日時”) <> ‘string’ and “_airbyte_data”.“登録日時”

Topic		Replies	Views
Issue with Redshift destination V2 and Hubspot connection Connector Questions connector , bug , redshift-destination-v2 , hubspot-connection , incremental-normalization	5	38	June 6, 2024
Normalization error and final table not created with Jira to Redshift Connector Questions & Issues source-jira	8	575	September 7, 2022
Downgrading Redshift destination connector not working Connector Questions & Issues destination-redshift , normalization	15	1069	July 14, 2022
Destination Snowflake version 0.4.24: KeyError: 'password' - Normalization failure Connector Questions & Issues normalization , data-loading	5	396	July 14, 2022
Google Analytics data NULL'ing out in redshift due to bad macros Connector Questions & Issues destination-redshift , normalization	4	290	July 14, 2022

Normalization Issue with Upgraded Airbyte and New Destinations V2

Summary

Question

Related topics