Destination Redshift - Workaround for failed DBT normalization of SUPER datatype

josephbrownskilljar · January 26, 2023, 7:24pm

Is this your first time deploying Airbyte?: Yes
OS Version / Instance: AWS Linux
Memory / Disk: you can use something like 32Gb / 100 GB
Deployment: Are you using Docker or Kubernetes deployment? No
Airbyte Version: What version are you using now? v0.40.28
Source name/version: Jira
Destination name/version: Redshift
Step: The issue is happening during sync, creating the connection or a new source? During sync
Description:

Similar to Hubspot Normalization Failure we are seeing a normalization error in DBT when sync’ing from Jira to Redshift:

  compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/source_jira/jira__issues.sql
19 of 57 ERROR creating table model source_jira.jira__issues............................................................ [ERROR in 1.34s]
Database Error in model jira__issues (models/generated/airbyte_tables/source_jira/jira__issues.sql)
  Invalid input
  DETAIL:  
    -----------------------------------------------
    error:  Invalid input
    code:      8001
    context:   SUPER value exceeds export size.
    query:     11119536
    location:  partiql_export.cpp:9

This is clearly a DBT problem since these error messages look very familiar, also the raw table, which holds the data before it’s run through DBT in normalization is also of datatype SUPER and already contains the data!

CREATE TABLE _airbyte_raw_jira__issues
  (_airbyte_ab_id      varchar(256) NOT NULL DISTKEY
     PRIMARY KEY,
   _airbyte_data       super ENCODE ZSTD,
   _airbyte_emitted_at timestamp WITH TIME ZONE DEFAULT ('now'::text)::timestamp WITH TIME ZONE ENCODE AZ64);

So I’m wondering if anyone can advise on a few things:

Should I add these details to the associated Github ticket in the above thread? It has been open since last summer (7/22).
Should we use the “Raw data (JSON)” transformation and parse the table ourselves?
Should we use a “custom transformation”?

Is there a nice walkthrough of 1) or 2) that I could apply in this situation?

Thanks for any help!

josephbrownskilljar · January 26, 2023, 7:28pm

Ps. Redshift SUPER type actually has an upper limit of 16MB, so the 1MB limit described in this ticket is probably a DBT limitation, not a Redshift limitation.

bpjena · January 27, 2023, 6:06pm

we have been facing the same issue too. We are on Airbyte version 0.40.27

adam · January 30, 2023, 3:34pm

Redshift SUPERs are still limited to 1MB. 16MB is coming soon though! It’s in technical preview now. They finally added a disclaimer at the top of Limitations - Amazon Redshift. I’d been trying to figure out what was going on for the last few weeks, since they never announced 16MB formally, just started changing some docs, and greater than 1MB still was throwing errors. Glad to finally see a docs clarification!

Topic		Replies	Views
Hubspot Normalization Failure Connector Questions & Issues normalization	5	668	July 11, 2022
Destination-redshift 0.3.32 issues with SUPER data type Connector Questions & Issues destination-redshift , data-loading	3	385	July 14, 2022
Issue with failing Redshift dbt normalisation Connector Questions & Issues source-postgres , destination-redshift , normalization	7	942	July 14, 2022
Normalized error when syncing large amount of data Connector Questions & Issues normalization , data-loading	6	2138	July 14, 2022
Issue with long JSON field and mysql source Connector Questions & Issues source-mysql , destination-redshift , data-loading , connectors	7	1449	July 14, 2022

Destination Redshift - Workaround for failed DBT normalization of SUPER datatype

Related topics