Postgres Source -> Redshift Destination Connection Error

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Amazon Linux 2 (worker/job containers run on AWS Fargate node, the rest run on EKS managed EC2 nodes)
  • Memory / Disk: Fargate 2vCPU 8GB memory node per worker/job container
  • Deployment: Kubernetes
  • Airbyte Version: 0.37.0-alpha
  • Source name/version: Postgres 0.4.14
  • Destination name/version: Redshift 0.3.32
  • Step: Destination Redshift write after Postgres source read has completed
  • Description:

Hey everyone! I’m running Airbyte Open Source on an EKS k8s cluster. Worker and job pods are configured to run on AWS Fargate nodes.

I’m trying to run a full sync of a Postgres source to a Redshift destination and running into a strange issue I’m hoping someone can help with!

The Redshit destination is configured to use the S3 COPY strategy.

The Postgres read source job container completes successfully and all data seems to be properly written to S3.

The Redshift write source job container then starts to copy the S3 data to Redshift temporary tables (I see data in the temp tables before the job fails and they are cleaned up).

The ‘raw’ tables are also created but never populated with any data.

The final output tables are not created.

The write container is consistently failing with the following error…

2022-05-09 14:44:08 destination > 2022-05-09 14:44:08 INFO i.a.i.d.r.RedshiftSqlOperations(onDestinationCloseOperations):110 - Executing operations for Redshift Destination DB engine...
Log4j2Appender says: 2022-05-09 14:44:08 INFO i.a.i.d.r.RedshiftSqlOperations(onDestinationCloseOperations):110 - Executing operations for Redshift Destination DB engine...
2022-05-09 14:44:08 destination > 2022-05-09 14:44:08 INFO i.a.i.d.r.RedshiftSqlOperations(discoverNotSuperTables):129 - Discovering NOT SUPER table types...
Log4j2Appender says: 2022-05-09 14:44:08 INFO i.a.i.d.r.RedshiftSqlOperations(discoverNotSuperTables):129 - Discovering NOT SUPER table types...
2022-05-09 14:44:08 destination > 2022-05-09 14:44:08 ERROR i.a.i.d.r.RedshiftSqlOperations(discoverNotSuperTables):146 - Error during discoverNotSuperTables() appears: 
Log4j2Appender says: 2022-05-09 14:44:08 ERROR i.a.i.d.r.RedshiftSqlOperations(discoverNotSuperTables):146 - Error during discoverNotSuperTables() appears: 
2022-05-09 14:44:08 destination > java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "null" 
Log4j2Appender says: java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "null" 
2022-05-09 14:44:08 destination > Position: 20;
Log4j2Appender says: Position: 20;
2022-05-09 14:44:08 destination > 	at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
Log4j2Appender says: 	at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-05-09 14:44:08 destination > 	at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
Log4j2Appender says: 	at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-05-09 14:44:08 destination > 	at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
Log4j2Appender says: 	at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-05-09 14:44:08 destination > 	at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
Log4j2Appender says: 	at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-05-09 14:44:08 destination > 	at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
Log4j2Appender says: 	at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-05-09 14:44:08 destination > 	at com.amazon.redshift.client.PGMessagingContext.moveThroughMetadata(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]

I can complete the sync on the same source/destination if I use the INSERT strategy on the Redshift destination (it’s just wayyyyy slower).

I’m happy to provide the full job logs and any other details that may be helpful. Thanks in advance!

Hi @damiantw,
This error happens in the discoverNotSuperTables method of the Redshift connector (it’s declared here).
This method tries to find all the tables that do not have super-type columns.
It’s running this query against Redshift:

select tablename, schemanamefrom pg_table_defwhere tablename in ( select tablename as tablename from pg_table_def where schemaname = '<YOURSCHEMANAME>' and tablename like '%%airbyte_raw%%' and "column" in ('_airbyte_data', '_airbyte_emitted_at', '_airbyte_ab_id') group by tablename having count(*) = 3) and schemaname = '<YOURSCHEMANAME>' and type <> 'super'and "column" = '_airbyte_data';

Could you please try to run this query yourself, check if you have an error, and share the output?

This method is a recent addition to the connector and I found a similar issue mentioning downgrading the connector mention might temporarily solve the problem.

@alafanechere that query runs fine against Redshift :thinking: (though it returns no results).

Downgrading to Redshift destination 0.3.28 did resolve the issue though.

Thanks! I’ll test again after a new version of the Redshift destination connector is released.

Ok, so the error could come from the fact that this query does not return anything on your Redshift cluster…
Thank you for trying, feel free to subscribe to the issue I linked above as I think it’s closely related to your error, and once fixed you can upgrade to the latest connector version again.

Hey @damiantw,
The bug should be resolved now that this PR was merged, could you please try to upgrade your redshift destination connector to 0.3.31?

If you will see any additional errors, please make sure you have set the next Airbyte configuration:

destination-redshift  0.3.34
base-normalization    0.1.77+ (airbyte platform v0.36.2-alpha+)

@damiantw
CC: @alafanechere
Thanks.