Destination Redshift - Workaround for failed DBT normalization of SUPER datatype

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: AWS Linux
  • Memory / Disk: you can use something like 32Gb / 100 GB
  • Deployment: Are you using Docker or Kubernetes deployment? No
  • Airbyte Version: What version are you using now? v0.40.28
  • Source name/version: Jira
  • Destination name/version: Redshift
  • Step: The issue is happening during sync, creating the connection or a new source? During sync
  • Description:

Similar to Hubspot Normalization Failure we are seeing a normalization error in DBT when sync’ing from Jira to Redshift:

  compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/source_jira/jira__issues.sql
19 of 57 ERROR creating table model source_jira.jira__issues............................................................ [ERROR in 1.34s]
Database Error in model jira__issues (models/generated/airbyte_tables/source_jira/jira__issues.sql)
  Invalid input
  DETAIL:  
    -----------------------------------------------
    error:  Invalid input
    code:      8001
    context:   SUPER value exceeds export size.
    query:     11119536
    location:  partiql_export.cpp:9

This is clearly a DBT problem since these error messages look very familiar, also the raw table, which holds the data before it’s run through DBT in normalization is also of datatype SUPER and already contains the data!

CREATE TABLE _airbyte_raw_jira__issues
  (_airbyte_ab_id      varchar(256) NOT NULL DISTKEY
     PRIMARY KEY,
   _airbyte_data       super ENCODE ZSTD,
   _airbyte_emitted_at timestamp WITH TIME ZONE DEFAULT ('now'::text)::timestamp WITH TIME ZONE ENCODE AZ64);

So I’m wondering if anyone can advise on a few things:

  1. Should I add these details to the associated Github ticket in the above thread? It has been open since last summer (7/22).
  2. Should we use the “Raw data (JSON)” transformation and parse the table ourselves?
  3. Should we use a “custom transformation”?

Is there a nice walkthrough of 1) or 2) that I could apply in this situation?

Thanks for any help!

Ps. Redshift SUPER type actually has an upper limit of 16MB, so the 1MB limit described in this ticket is probably a DBT limitation, not a Redshift limitation.

we have been facing the same issue too. We are on Airbyte version 0.40.27

Redshift SUPERs are still limited to 1MB. 16MB is coming soon though! It’s in technical preview now. They finally added a disclaimer at the top of Limitations - Amazon Redshift. I’d been trying to figure out what was going on for the last few weeks, since they never announced 16MB formally, just started changing some docs, and greater than 1MB still was throwing errors. Glad to finally see a docs clarification!

2 Likes