Destination S3 key error when moving large amount of data

  • Is this your first time deploying Airbyte?: Yes

  • OS Version / Instance: Ubuntu

  • Memory / Disk: 16Gb / 200Gb

  • Deployment: Docker

  • Airbyte Version: 0.39.20-alpha

  • Source name/version: source-postgres/0.4.25

  • Destination name/version: destination-redshift/0.3.39

  • Step: The issue is happening during sync, when loading data from s3 to redshift

  • Description: This won’t happen when I moving small amount of data however when moving big chunk of data (50G up) I can find something like this in the log:

2022-06-19 05:50:54 e[43mdestinatione[0m > Details: -----------------------------------------------
2022-06-19 05:50:54 e[43mdestinatione[0m >   error:  Mandatory url is not present in manifest file.
2022-06-19 05:50:54 e[43mdestinatione[0m >   code:      8001
2022-06-19 05:50:54 e[43mdestinatione[0m >   context:   Manifest file location=s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/0e670848-f88f-491d-b31c-c3b23b893c25.manifest url=s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_
2022-06-19 05:50:54 e[43mdestinatione[0m >   query:     84680
2022-06-19 05:50:54 e[43mdestinatione[0m >   location:  s3_utility.cpp:400
2022-06-19 05:50:54 e[43mdestinatione[0m >   process:   padbmaster [pid=16418]
2022-06-19 05:50:54 e[43mdestinatione[0m >   -----------------------------------------------;
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGClient.handleErrorsScenario3(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.client.PGClient.handleErrors(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.dataengine.PGQueryExecutor$CallableExecuteTask.call(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at com.amazon.redshift.dataengine.PGQueryExecutor$CallableExecuteTask.call(Unknown Source) ~[redshift-jdbc42-no-awssdk-1.2.51.1078.jar:RedshiftJDBC_1.2.51.1078]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 2022-06-19 05:50:54 e[1;31mERRORe[m i.a.i.b.AirbyteExceptionHandler(uncaughtException):26 - Something went wrong in the connector. See the logs for more details.
2022-06-19 05:50:54 e[43mdestinatione[0m > java.lang.RuntimeException: Failed to upload data from stage source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.destination.staging.StagingConsumerFactory.lambda$onCloseFunction$4(StagingConsumerFactory.java:204) ~[io.airbyte.airbyte-integrations.connectors-destination-jdbc-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.destination.buffered_stream_consumer.OnCloseFunction.accept(OnCloseFunction.java:9) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer.close(BufferedStreamConsumer.java:179) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.lambda$close$0(FailureTrackingAirbyteMessageConsumer.java:67) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.base.sentry.AirbyteSentry.executeWithTracing(AirbyteSentry.java:54) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:67) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:166) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:107) ~[io.airbyte.airbyte-integrations.bases-base-java-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > 	at io.airbyte.integrations.destination.redshift.RedshiftDestination.main(RedshiftDestination.java:62) ~[io.airbyte.airbyte-integrations.connectors-destination-redshift-0.39.7-alpha.jar:?]
2022-06-19 05:50:54 e[43mdestinatione[0m > Caused by: java.lang.RuntimeException: java.sql.SQLException: [Amazon](500310) Invalid operation: Mandatory url is not present in manifest file

So seems like airbyte couldn’t find the files in s3 so I check the manifest file in the bucket and print out all the url of the files as below:

s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/0.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/1.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/2.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/3.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/4.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/5.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/6.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/7.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/8.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/9.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/10.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/11.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/12.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/13.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/14.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/15.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/16.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/17.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/18.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/19.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/20.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/21.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/22.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/23.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/24.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/25.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/26.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/27.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/28.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/29.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/30.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/31.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/32.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/33.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/34.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/35.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/0.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/1.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/2.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/3.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/4.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/5.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/6.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/7.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/8.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/9.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/10.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/11.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/12.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/13.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/14.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/15.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/16.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/17.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/18.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/19.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/20.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/21.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/22.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/23.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/24.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/25.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/26.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/27.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/28.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/29.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/30.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/31.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/32.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/33.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/34.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/35.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/36.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/37.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/38.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/39.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/40.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/41.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/0.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/1.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/2.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/3.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/4.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/5.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/6.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/7.csv.gz
s3://airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/8.csv.gz

it seems while airbyte wrote the data into files at one certain moment it will repeat the number from 0 (ex: 0 ~ 35 and all in sudden 0 ~ 41 finally 0 ~ 8)
and I also check the files I have in S3 bucket to write into Redshift:

xxx-ooo@ip-172-29-0-35:~$ aws s3 ls airbyte-dev-data/source_dp_indicator_daily_test_20220517/2022_06_18_13_b8dc2c44-6686-4386-87ff-84b82b111ce4/
2022-06-19 06:35:23  209727805 0.csv.gz
2022-06-19 07:50:54      13034 0e670848-f88f-491d-b31c-c3b23b893c25.manifest
2022-06-19 06:45:49  209720864 1.csv.gz
2022-06-19 06:57:04  209729973 2.csv.gz
2022-06-19 07:07:06  209718819 3.csv.gz
2022-06-19 07:15:58  209726267 4.csv.gz
2022-06-19 07:24:52  209724788 5.csv.gz
2022-06-19 07:34:14  209735005 6.csv.gz
2022-06-19 07:43:20  209717309 7.csv.gz
2022-06-19 07:50:53  157915457 8.csv.gz

Seems there is only 8 files are wrote into S3 bucket in other word the record in manifest 0 ~ 35 and 0 ~ 41 are gone. Hence I think its the problem to raise this error.

Not sure if I miss anything or its a bug. Thank you in advanced for your help.

Hey @khungCU,
Thank you for your investigation! I have multiple question to help you find the root cause:

  • Could you please try to upgrade your destination redshift connector to 0.3.40?
  • Did you try to change the stream part size and observe if you have a different behavior?
  • Do you know if the duplicate 0.csv.gz file in the list refer to the same file? I’m thinking that another process could have written a different .gz archive to S3 with the same name. Are you syncing multiple tables? You should try to load a single table and check if you get the same problem. I would also suggest tracking the data loading on S3 and check if the 0.csv.gz remains with the same size.
  • Could you please try to upgrade your destination redshift connector to 0.3.40?
    Will do
  • Did you try to change the stream part size and observe if you have a different behavior?
    Could you tell me more of the behavior here? when it comes to large table should I increase this argument or decrease? once I update my connector to 0.3.40 argument stream part size not exist anymore
  • Do you know if the duplicate 0.csv.gz file in the list refer to the same file?
    It’s a bit hard to monitoring that is there any way I could identify if there are the same file?
  • Are you syncing multiple tables?
    Since its a big table I created its own connection.

Thank you for the details. Let me know if the error persists after the upgrade.

  1. It works well after upgrade the redshift connector to 0.3.40 .
  2. Could you tell me how is the settingstream part size would affect airbyte writing files into S3?

Thanks a lot for the help!!

It’s great that the upgrade solved your problem! The latest version does not expose the stream part size anymore and it’s managed internally by the connector. More detail here and there

Same issue happens again with Redshift connector to 0.3.40 (It’s the CDC sync but only the first time hence a snapshot of the full table)
Airbyte at one point will purge all the files it uploaded to S3 and start writing from file 0 to S3.

Below is the log:

First the 0.csv.gz upload to the S3 (Time around 2022-06-28 09:44:56)

2022-06-28 09:44:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 4652000 (2 GB)
2022-06-28 09:44:56 destination > 2022-06-28 09:44:56 INFO i.a.i.d.r.SerializedBufferingStrategy(flushWriter):93 - Flushing buffer of stream dp_indicator_daily (200 MB)
2022-06-28 09:44:56 destination > 2022-06-28 09:44:56 INFO i.a.i.d.s.StagingConsumerFactory(lambda$flushBufferFunction$3):158 - Flushing buffer for stream dp_indicator_daily (200 MB) to staging
2022-06-28 09:44:56 destination > 2022-06-28 09:44:56 INFO i.a.i.d.r.BaseSerializedBuffer(flush):131 - Wrapping up compression and write GZIP trailer data.
2022-06-28 09:44:56 destination > 2022-06-28 09:44:56 INFO i.a.i.d.r.BaseSerializedBuffer(flush):138 - Finished writing data to b7a1cab5-f22d-433e-8bb4-e1e0eab79b0016208607828865667773.csv.gz (200 MB)
2022-06-28 09:44:56 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 4653000 (2 GB)
2022-06-28 09:44:57 destination > 2022-06-28 09:44:57 INFO a.m.s.StreamTransferManager(getMultiPartOutputStreams):329 - Initiated multipart upload to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with full ID YPRTK3GkxPbqEQrjM175DvzwHY8y6xAM8erxbrA4FfEGPSVYNYVb.3V35pAO3wR9WHy8Nm86quRT8qFonda.2Qc2Hpy5Su3pYQrlKKYf.fSqM0_4Y6vIla9QDL5SyNFF
2022-06-28 09:44:58 destination > 2022-06-28 09:44:58 INFO a.m.s.MultiPartOutputStream(close):158 - Called close() on [MultipartOutputStream for parts 1 - 10000]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 1 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 5 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 10 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 2 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 6 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 9 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 3 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 4 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 8 containing 10.01 MB]
2022-06-28 09:44:59 destination > 2022-06-28 09:44:59 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id YPRTK3Gkx...QDL5SyNFF]: Finished uploading [Part number 7 containing 10.01 MB]

Hours later another 0.csv.gz upload to the S3 (Time at 2022-06-28 16:39:54)

2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.MultiPartOutputStream(close):158 - Called close() on [MultipartOutputStream for parts 1 - 10000]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 14 containing 10.01 MB]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 17 containing 10.01 MB]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 16 containing 10.01 MB]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 18 containing 10.01 MB]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 20 containing 9.87 MB]
2022-06-28 16:39:54 destination > 2022-06-28 16:39:54 INFO a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Finished uploading [Part number 19 containing 10.01 MB]
2022-06-28 16:39:55 destination > 2022-06-28 16:39:55 INFO a.m.s.StreamTransferManager(complete):395 - [Manager uploading to airbyte-dev-data/source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz with id 4ejtf5T.F...6idQuqWoF]: Completed
2022-06-28 16:39:55 destination > 2022-06-28 16:39:55 INFO i.a.i.d.s.S3StorageOperations(loadDataIntoBucket):178 - Uploaded buffer file to storage: bd489602-eeaa-40ab-a4c6-aa311c45bbd34089681396285297501.csv.gz -> source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/0.csv.gz (filename: 0.csv.gz)
2022-06-28 16:39:55 destination > 2022-06-28 16:39:55 INFO i.a.i.d.r.FileBuffer(deleteFile):81 - Deleting tempFile data bd489602-eeaa-40ab-a4c6-aa311c45bbd34089681396285297501.csv.gz
2022-06-28 16:39:55 destination > 2022-06-28 16:39:55 INFO i.a.i.d.r.SerializedBufferingStrategy(lambda$addRecord$0):48 - Starting a new buffer for stream dp_indicator_daily (current state: -100678 bytes in 0 buffers)
2022-06-28 16:39:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 181901000 (96 GB)
2022-06-28 16:39:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 181902000 (96 GB)
2022-06-28 16:39:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 181903000 (96 GB)
2022-06-28 16:39:55 source > 2022-06-28 16:39:55 INFO i.d.r.RelationalSnapshotChangeEventSource(createDataEventsForTable):374 - 	 Exported 181921793 records for table 'dp.dp_indicator_daily' after 07:05:53.979
2022-06-28 16:39:55 INFO i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):344 - Records read: 181904000 (96 GB)

The screenshot is when I monitor the S3 bucket:

  1. first it keep writing the files into the bucket
  2. All in sudden the files are gone and start all over from the 0

The error log as attachment (due to the file is too huge to upload here I only capture the main error part)
airbyte_error.txt (67.1 KB)

I think this part in the error log is the main reason why airbyte failed to do the first snapshot:

2022-06-29 06:25:56 e[43mdestinatione[0m > 2022-06-29 06:25:56 e[32mINFOe[m i.a.i.d.s.StagingConsumerFactory(lambda$onCloseFunction$4):195 - Copying stream dp_indicator_daily of schema source into tmp table _airbyte_tmp_ffp_dp_indicator_daily to final table _airbyte_raw_dp_indicator_daily from stage path source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/ with 106 file(s) [0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz,27.csv.gz,28.csv.gz,29.csv.gz,30.csv.gz,31.csv.gz,32.csv.gz,33.csv.gz,34.csv.gz,35.csv.gz,36.csv.gz,0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz,27.csv.gz,28.csv.gz,29.csv.gz,30.csv.gz,31.csv.gz,32.csv.gz,33.csv.gz,34.csv.gz,35.csv.gz,36.csv.gz,37.csv.gz,38.csv.gz,39.csv.gz,0.csv.gz,1.csv.gz,0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz]
2022-06-29 06:25:56 e[43mdestinatione[0m > 2022-06-29 06:25:56 e[32mINFOe[m i.a.i.d.r.o.RedshiftS3StagingSqlOperations(copyIntoTmpTableFromStage):90 - Starting copy to tmp table from stage: _airbyte_tmp_ffp_dp_indicator_daily in destination from stage: source_dp_indicator_daily/2022_06_28_09_0c089d8b-14d9-407b-8847-ea88bf892c1d/, schema: source, .

Why the files were written as below in the S3?
[0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz,27.csv.gz,28.csv.gz,29.csv.gz,30.csv.gz,31.csv.gz,32.csv.gz,33.csv.gz,34.csv.gz,35.csv.gz,36.csv.gz,0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz,27.csv.gz,28.csv.gz,29.csv.gz,30.csv.gz,31.csv.gz,32.csv.gz,33.csv.gz,34.csv.gz,35.csv.gz,36.csv.gz,37.csv.gz,38.csv.gz,39.csv.gz,0.csv.gz,1.csv.gz,0.csv.gz,1.csv.gz,2.csv.gz,3.csv.gz,4.csv.gz,5.csv.gz,6.csv.gz,7.csv.gz,8.csv.gz,9.csv.gz,10.csv.gz,11.csv.gz,12.csv.gz,13.csv.gz,14.csv.gz,15.csv.gz,16.csv.gz,17.csv.gz,18.csv.gz,19.csv.gz,20.csv.gz,21.csv.gz,22.csv.gz,23.csv.gz,24.csv.gz,25.csv.gz,26.csv.gz]

supposed to be 0.csv.gz ~ 106.csv.gz

I found the root cause of this issue:
I have two connection using the same destination hence point to same the S3 Bucket Path.
While a long running connection job is running another short running connection is kick off. when the sort running connection end it clean up the S3 Bucket Path (default setting) even the files that long running connection have been uploaded to so airbyte seems not aware of it hence start from beginning upload 0.csv.gz again.
However when two short running connection running at the same time won’t have this issue. Could you explain why??

1 Like

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.