Normalization Issue with Upgraded Airbyte and New Destinations V2

Summary

After upgrading to the new version of Airbyte with destinations v2, normalization is disabled, causing issues with downstream dependencies of normalized tables. Column names are changing in Redshift.


Question

We previously had a connection with normalization. Since we upgraded, normalization is disabled. But this broke a lot of downstream dependencies of the normalized tables. We are using the new destinations v2 ? This isn’t clarified anywhere in the documentation I think.
In Redshift it is changing my column names.



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want to access the original thread.

Join the conversation on Slack

["normalization", "upgrade", "destinations-v2", "documentation", "redshift", "column-names"]

Custom normalization and transformations was deprecated.

Here’s where it’s noted in the docs how Destinations v2 works:
https://docs.airbyte.com/integrations/destinations/redshift-migrations

so previously did it pass the kanji characters through to the final table, and now it’s replacing them with underscores?

Yes absolutely right.

Seems as though since they’re quoted identifiers they should be allowed to be preserved. Based on a quick read of the source, I’m thinking the issue is that the RedshiftSQLNameTransformer is basically just extending the io.airbyte.cdk.integrations.destination.StandardNameTransformer add forcing it to lowercase, even though this destination can support a broader set of characters in field names.

Code here:
https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b/airbyte-integrations/connectors/destination-redshift/src/main/java/io/airbyte/integrations/destination/redshift/RedshiftSQLNameTransformer.java#L9|https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b[…]tegrations/destination/redshift/RedshiftSQLNameTransformer.java

More than likely, it’s the upstream call to toAlphanumericAndUnderscore that is leading to this replacement:
https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b/airbyte-cdk/java/airbyte-cdk/dependencies/src/main/kotlin/io/airbyte/commons/text/Names.kt#L27|https://github.com/airbytehq/airbyte/blob/fbb480fe8aef2b594b93e9b6199597715b80c59b[…]k/dependencies/src/main/kotlin/io/airbyte/commons/text/Names.kt

I’d recommend putting an issue in on GitHub in hopes that the team can adjust accordingly to support the wider naming support in RedShift (and I’m assuming Postgres behaves the same under Destinations V2)

So don’t we have any settings to handle this in the connector level?

<@U04SE163WC8> My question is why it is changing the column names in my redshift table?

insert into “raw_daas_jp”.“raw_fas_pfqdi002” (“__”, “__cd”, “__seq”, “____”, “_____1”, “2", “____cd”, “____no”, "”, “1", "”, “_______1”, “ab_source_file_url", "cd", "”, “________1”, “______2", “3", "4", "ad", "”, “_1", "ab_source_file_last_modified", "”, “_airbyte_raw_id”, “_airbyte_extracted_at”, “_airbyte_meta”) with “intermediate_data” as (select CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“メモ”) <> ‘string’ and “_airbyte_data”.“メモ” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“メモ”) ELSE cast(“_airbyte_data”.“メモ” as varchar(65535)) END as "”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“包形CD”) <> ‘string’ and “_airbyte_data”.“包形CD” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“包形CD”) ELSE cast(“_airbyte_data”.“包形CD” as varchar(65535)) END as “__cd”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“更新SEQ”) <> ‘string’ and “_airbyte_data”.“更新SEQ” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“更新SEQ”) ELSE cast(“_airbyte_data”.“更新SEQ” as varchar(65535)) END as “__seq”, CASE WHEN (JSON_TYPEOF(“airbyte_data".“充填順序”) <> ‘string’ and "airbyte_data".“充填順序” is not null) THEN JSON_SERIALIZE("airbyte_data".“充填順序”) ELSE cast("airbyte_data".“充填順序” as varchar(65535)) END as "”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“更新日時”) <> ‘string’ and “_airbyte_data”.“更新日時” is not null) THEN JSON_SERIALIZE(“_airbyte_data”.“更新日時”) ELSE cast(“_airbyte_data”.“更新日時” as varchar(65535)) END as “_____1”, CASE WHEN (JSON_TYPEOF(“_airbyte_data”.“登録日時”) <> ‘string’ and “_airbyte_data”.“登録日時”