Source S3 - One row of mostly nulled data

I have a connection set up that is working using deduped + history, however it appears that final item in our csv file has all the data nulled out except for a single column which is correct. All the information for this record is correct in the _ab_additional_properties interestingly enough. The record exists in all the csv files we have loaded. The issue persists after resetting data and syncing. Hopefully you have some thoughts on this issue. Thanks

Hey @wattsjon2, thanks for the post and welcome to the community.

Could you share some more info about what destination you’re using? What version of airbyte are you using? What version of the File connector are you using?

What does this last row look like? Sometimes a simple upgrade of the connectors or Airbyte resolves these issues. Have you tried that? Do you have a log file you can share?

We are connecting a csv in an s3 bucket to redshift. The airbyte version we are using is 0.39.41-alpha. The s3 version we are using is 0.1.18 and the redshift version we are using is 0.3.49, both of which are the latest version.

I checked and all the versions are up to date. I didn’t see anything out of the ordinary in the log. I can’t post it because it’s too long

Hey @wattsjon2, would it be too much trouble to upgrade airbyte to the latest version? (0.40.6)

And then rerun the sync and then post an abridged log if you’re still experiencing the same problem?

Thank you for your quick replies. I’ll reply as soon as our airbyte version gets updated with the result and log

Unfortunately updating the version did not fix the problem. Here are the logs (some of the start and finish has been removed so it can fit)

2022-09-14 19:52:16 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - initialised stream with format: {‘encoding’: ‘utf8’, ‘filetype’: ‘csv’, ‘delimiter’: ‘,’, ‘block_size’: 10000, ‘quote_char’: ‘"’, ‘double_quote’: True, ‘infer_datatypes’: True, ‘advanced_options’: ‘{}’, ‘newlines_in_values’: False, ‘additional_reader_options’: ‘{}’}

2022-09-14 19:52:16 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - Iterating S3 bucket ‘act-data-inbox-phi’ with prefix: ‘omaha-path-assist’

2022-09-14 19:52:17 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - Check succeeded

2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…

2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log

2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6

2022-09-14 19:52:17 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-redshift:0.3.49 exists…

2022-09-14 19:52:17 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-redshift:0.3.49 was found locally.

2022-09-14 19:52:17 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204

2022-09-14 19:52:17 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name destination-redshift-check-204-0-epnkb --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-redshift:0.3.49 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/destination-redshift:0.3.49 check --config source_config.json

2022-09-14 19:52:18 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:18 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):63 - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):105 - Command: CHECK

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):106 - Integration config:

IntegrationConfig{command=CHECK, configPath=‘source_config.json’, catalogPath=‘null’, statePath=‘null’}

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m i.a.i.d.r.RedshiftDestination(determineUploadMode):54 - The “standard” upload mode is not performant, and is not recommended for production. Please use the Amazon S3 upload mode if you are syncing a large amount of data.

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.d.j.c.SwitchingDestination(check):55 - Using destination type: STANDARD

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.HikariDataSource():80 - HikariPool-1 - Starting…

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.HikariDataSource():82 - HikariPool-1 - Start completed.

2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.p.PoolBase(getAndSetNetworkTimeout):536 - HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon]JDBC Driver does not support this optional feature.)

2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m c.z.h.HikariDataSource(close):350 - HikariPool-1 - Shutdown initiated…

2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m c.z.h.HikariDataSource(close):352 - HikariPool-1 - Shutdown completed.

2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):152 - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):65 - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…

2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log

2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6

2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):116 - start sync worker. job id: 204 attempt id: 0

2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):128 - configured sync modes: {null.path_assist_addresses=incremental - append_dedup}

2022-09-14 19:52:23 [32mINFO[m i.a.w.i.DefaultAirbyteDestination(start):69 - Running destination…

2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-redshift:0.3.49 exists…

2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-redshift:0.3.49 was found locally.

2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204

2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name destination-redshift-write-204-0-agncm --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-redshift:0.3.49 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/destination-redshift:0.3.49 write --config destination_config.json --catalog destination_catalog.json

2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/source-s3:0.1.18 exists…

2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/source-s3:0.1.18 was found locally.

2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204

2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name source-s3-read-204-0-nnnwj --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/source-s3:0.1.18 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/source-s3:0.1.18 read --config source_config.json --catalog source_catalog.json --state input_state.json

2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):170 - Waiting for source and destination threads to complete.

2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):299 - Replication thread started.

2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getDestinationOutputRunnable$7):406 - Destination output thread started.

2022-09-14 19:52:25 [44msource[0m > initialised stream with format:

2022-09-14 19:52:25 [44msource[0m > Starting syncing SourceS3

2022-09-14 19:52:25 [44msource[0m > initialised stream with format: {‘encoding’: ‘utf8’, ‘filetype’: ‘csv’, ‘delimiter’: ‘,’, ‘block_size’: 10000, ‘quote_char’: ‘"’, ‘double_quote’: True, ‘infer_datatypes’: True, ‘advanced_options’: ‘{}’, ‘newlines_in_values’: False, ‘additional_reader_options’: ‘{}’}

2022-09-14 19:52:25 [44msource[0m > Syncing stream: path_assist_addresses

2022-09-14 19:52:25 [44msource[0m > Iterating S3 bucket ‘act-data-inbox-phi’ with prefix: ‘omaha-path-assist’

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):63 - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):105 - Command: WRITE

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):106 - Integration config: IntegrationConfig{command=WRITE, configPath=‘destination_config.json’, catalogPath=‘destination_catalog.json’, statePath=‘null’}

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m i.a.i.d.r.RedshiftDestination(determineUploadMode):54 - The “standard” upload mode is not performant, and is not recommended for production. Please use the Amazon S3 upload mode if you are syncing a large amount of data.

2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.d.j.c.SwitchingDestination(getConsumer):65 - Using destination type: STANDARD

2022-09-14 19:52:26 [44msource[0m > Detected mismatched datatype on column ‘postal_code’, in file ‘omaha-path-assist/year=2022/month=09/day=12/addresses.csv’. Should be ‘string’, but found ‘integer’. Airbyte will attempt to coerce this to string on read.

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m c.z.h.HikariDataSource():80 - HikariPool-1 - Starting…

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m c.z.h.HikariDataSource():82 - HikariPool-1 - Start completed.

2022-09-14 19:52:26 [44msource[0m > Detected mismatched datatype on column ‘postal_code’, in file ‘omaha-path-assist/year=2022/month=09/day=14/addresses.csv’. Should be ‘string’, but found ‘integer’. Airbyte will attempt to coerce this to string on read.

2022-09-14 19:52:26 [44msource[0m > determined master schema: {‘postal_code’: ‘string’, ‘id’: ‘string’, ‘restricted’: ‘boolean’, ‘state’: ‘string’, ‘ancestry’: ‘string’, ‘owner_id’: ‘string’, ‘creator_id’: ‘string’, ‘updater_id’: ‘string’, ‘owner_organization_id’: ‘string’, ‘created_at’: ‘string’, ‘updated_at’: ‘string’, ‘addressable_type’: ‘string’, ‘addressable_id’: ‘string’, ‘person_id’: ‘string’, ‘kind’: ‘string’, ‘street1’: ‘string’, ‘street2’: ‘string’, ‘city’: ‘string’, ‘region’: ‘string’, ‘country’: ‘string’}

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$toWriteConfig$0):98 - Write config: WriteConfig{streamName=path_assist_addresses, namespace=null, outputSchemaName=public, tmpTableName=_airbyte_tmp_cpx_path_assist_addresses, outputTableName=_airbyte_raw_path_assist_addresses, syncMode=append_dedup}

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(startTracked):116 - class io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer started.

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):123 - Preparing tmp tables in destination started for 1 streams

2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):127 - Preparing tmp table in destination started for stream path_assist_addresses. schema: public, tmp table name: _airbyte_tmp_cpx_path_assist_addresses

2022-09-14 19:52:27 [43mdestination[0m > 2022-09-14 19:52:27 [32mINFO[m c.z.h.p.PoolBase(getAndSetNetworkTimeout):536 - HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon]JDBC Driver does not support this optional feature.)

2022-09-14 19:52:29 [43mdestination[0m > 2022-09-14 19:52:29 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):133 - Preparing tables in destination completed.

2022-09-14 19:52:29 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 1000 (783 KB)

2022-09-14 19:52:29 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 2000 (1 MB)

2022-09-14 19:52:30 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 3000 (2 MB)

2022-09-14 19:52:30 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 4000 (3 MB)

2022-09-14 19:52:31 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 5000 (3 MB)

2022-09-14 19:52:31 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 6000 (4 MB)

2022-09-14 19:52:32 [44msource[0m > finished reading a stream slice

2022-09-14 19:52:32 [44msource[0m > Read 6626 records from path_assist_addresses stream

2022-09-14 19:52:32 [44msource[0m > Finished syncing path_assist_addresses

2022-09-14 19:52:32 [44msource[0m > SourceS3 runtimes:

Syncing stream path_assist_addresses 0:00:07.005665

2022-09-14 19:52:32 [44msource[0m > Finished syncing SourceS3

2022-09-14 19:52:32 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):336 - Total records read: 6627 (5 MB)

2022-09-14 19:52:32 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):175 - One of source or destination thread complete. Waiting on the other.

2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):62 - Airbyte message consumer: succeeded.

2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(close):171 - executing on success close procedure.

2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAll):84 - Flushing path_assist_addresses: 6626 records (20 MB)

2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(insertRecordsInternal):89 - actual size of batch: 6626

2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(onDestinationCloseOperations):139 - Executing operations for Redshift Destination DB engine…

2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(discoverNotSuperTables):189 - Discovering NOT SUPER table types…

2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(onDestinationCloseOperations):157 - Executing operations for Redshift Destination DB engine completed.

2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):163 - Finalizing tables in destination started for 1 streams

2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):168 - Finalizing stream path_assist_addresses. schema public, tmp table _airbyte_tmp_cpx_path_assist_addresses, final table _airbyte_raw_path_assist_addresses

2022-09-14 19:52:39 [43mdestination[0m > 2022-09-14 19:52:39 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):181 - Executing finalization of tables.

2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):183 - Finalizing tables in destination completed.

2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):186 - Cleaning tmp tables in destination started for 1 streams

2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):190 - Cleaning tmp table in destination started for stream path_assist_addresses. schema public, tmp table name: _airbyte_tmp_cpx_path_assist_addresses

2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):195 - Cleaning tmp tables in destination completed.

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getDestinationOutputRunnable$7):416 - State in DefaultReplicationWorker from destination: io.airbyte.protocol.models.AirbyteMessage@6244a484[type=STATE,log=,spec=,connectionStatus=,catalog=,record=,state=io.airbyte.protocol.models.AirbyteStateMessage@6c8356bc[type=,stream=,global=,data={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}},additionalProperties={}],trace=,additionalProperties={}]

2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):152 - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):65 - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):177 - Source and destination threads complete.

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):240 - sync summary: io.airbyte.config.ReplicationAttemptSummary@7e83a04c[status=completed,recordsSynced=6626,bytesSynced=5314961,startTime=1663185143157,endTime=1663185161692,totalStats=io.airbyte.config.SyncStats@6b55a73f[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=1,recordsCommitted=6626],streamStats=[io.airbyte.config.StreamSyncStats@520f4042[streamName=path_assist_addresses,stats=io.airbyte.config.SyncStats@a43bb4b[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=,recordsCommitted=6626]]]]

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):267 - Source output at least one state message

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):273 - State capture: Updated state to: Optional[io.airbyte.config.State@69a1eb8f[state={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}}]]

2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…

2022-09-14 19:52:41 [32mINFO[m i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):161 - sync summary: io.airbyte.config.StandardSyncOutput@72e44d1[standardSyncSummary=io.airbyte.config.StandardSyncSummary@33476a99[status=completed,recordsSynced=6626,bytesSynced=5314961,startTime=1663185143157,endTime=1663185161692,totalStats=io.airbyte.config.SyncStats@6b55a73f[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=1,recordsCommitted=6626],streamStats=[io.airbyte.config.StreamSyncStats@520f4042[streamName=path_assist_addresses,stats=io.airbyte.config.SyncStats@a43bb4b[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=,recordsCommitted=6626]]]],normalizationSummary=,state=io.airbyte.config.State@69a1eb8f[state={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@481856b1[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@5f7f004b[stream=io.airbyte.protocol.models.AirbyteStream@2a02cfb9[name=path_assist_addresses,jsonSchema={“type”:“object”,“properties”:{“id”:{“type”:[“null”,“string”]},“city”:{“type”:[“null”,“string”]},“kind”:{“type”:[“null”,“string”]},“state”:{“type”:[“null”,“string”]},“region”:{“type”:[“null”,“string”]},“country”:{“type”:[“null”,“string”]},“street1”:{“type”:[“null”,“string”]},“street2”:{“type”:[“null”,“string”]},“ancestry”:{“type”:[“null”,“string”]},“owner_id”:{“type”:[“null”,“string”]},“person_id”:{“type”:[“null”,“string”]},“created_at”:{“type”:[“null”,“string”]},“creator_id”:{“type”:[“null”,“string”]},“restricted”:{“type”:[“null”,“boolean”]},“updated_at”:{“type”:[“null”,“string”]},“updater_id”:{“type”:[“null”,“string”]},“postal_code”:{“type”:[“null”,“string”]},“addressable_id”:{“type”:[“null”,“string”]},“addressable_type”:{“type”:[“null”,“string”]},“_ab_source_file_url”:{“type”:“string”},“owner_organization_id”:{“type”:[“null”,“string”]},“_ab_additional_properties”:{“type”:“object”},“_ab_source_file_last_modified”:{“type”:“string”,“format”:“date-time”}}},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=true,defaultCursorField=[_ab_source_file_last_modified],sourceDefinedPrimaryKey=,namespace=,additionalProperties={}],syncMode=incremental,cursorField=[_ab_source_file_last_modified],destinationSyncMode=append_dedup,primaryKey=[[id]],additionalProperties={}]],additionalProperties={}],failures=]

2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalUtils(withBackgroundHeartbeat):291 - Stopping temporal heartbeating…

2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log

2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6

2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultNormalizationWorker(run):50 - Running normalization.

2022-09-14 19:52:41 [32mINFO[m i.a.w.n.DefaultNormalizationRunner(runProcess):122 - Running with normalization version: airbyte/normalization-redshift:0.2.12

2022-09-14 19:52:41 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/normalization-redshift:0.2.12 exists…

2022-09-14 19:52:41 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/normalization-redshift:0.2.12 was found locally.

2022-09-14 19:52:41 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204

2022-09-14 19:52:41 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0/normalize --log-driver none --name normalization-redshift-normalize-204-0-weixy --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e AIRBYTE_VERSION=0.40.6 airbyte/normalization-redshift:0.2.12 run --integration-type redshift --config destination_config.json --catalog destination_catalog.json

2022-09-14 19:52:42 [42mnormalization[0m > Running: transform-config --config destination_config.json --integration-type redshift --out /data/204/0/normalize

2022-09-14 19:52:42 [42mnormalization[0m > Namespace(config=‘destination_config.json’, integration_type=<DestinationType.REDSHIFT: ‘redshift’>, out=‘/data/204/0/normalize’)

2022-09-14 19:52:42 [42mnormalization[0m > transform_redshift

2022-09-14 19:52:42 [42mnormalization[0m > Running: transform-catalog --integration-type redshift --profile-config-dir /data/204/0/normalize --catalog destination_catalog.json --out /data/204/0/normalize/models/generated/ --json-column _airbyte_data

2022-09-14 19:52:43 [42mnormalization[0m > Processing destination_catalog.json…

2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_ctes/public/path_assist_addresses_ab1.sql from path_assist_addresses

2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_ctes/public/path_assist_addresses_ab2.sql from path_assist_addresses

2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_views/public/path_assist_addresses_stg.sql from path_assist_addresses

2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_incremental/scd/public/path_assist_addresses_scd.sql from path_assist_addresses

2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_incremental/public/path_assist_addresses.sql from path_assist_addresses

2022-09-14 19:52:43 [42mnormalization[0m > detected no config file for ssh, assuming ssh is off.

2022-09-14 19:52:45 [42mnormalization[0m > [–event-buffer-size EVENT_BUFFER_SIZE]

2022-09-14 19:52:45 [42mnormalization[0m > --event-buffer-size EVENT_BUFFER_SIZE

2022-09-14 19:52:45 [42mnormalization[0m >

2022-09-14 19:52:45 [42mnormalization[0m > DBT >=1.0.0 detected; using 10K event buffer size

2022-09-14 19:52:45 [42mnormalization[0m >

2022-09-14 19:52:48 [42mnormalization[0m > 19:52:48 Running with dbt=1.0.0

2022-09-14 19:52:48 [42mnormalization[0m > 19:52:48 Partial parse save file not found. Starting full parse.

2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.

2022-09-14 19:52:50 [42mnormalization[0m > There are 1 unused configuration paths:

2022-09-14 19:52:50 [42mnormalization[0m > - models.airbyte_utils.generated.airbyte_tables

2022-09-14 19:52:50 [42mnormalization[0m >

2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50 Found 5 models, 0 tests, 0 snapshots, 0 analyses, 584 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics

2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50

2022-09-14 19:52:51 [42mnormalization[0m > 19:52:51 Concurrency: 4 threads (target=‘prod’)

2022-09-14 19:52:51 [42mnormalization[0m > 19:52:51

2022-09-14 19:52:55 [42mnormalization[0m > 19:52:55 1 of 3 START view model _airbyte_public.path_assist_addresses_stg… [RUN]

2022-09-14 19:52:56 [42mnormalization[0m > 19:52:56 1 of 3 OK created view model _airbyte_public.path_assist_addresses_stg… [CREATE VIEW in 1.19s]

2022-09-14 19:52:56 [42mnormalization[0m > 19:52:56 2 of 3 START incremental model public.path_assist_addresses_scd… [RUN]

2022-09-14 19:53:05 [42mnormalization[0m > 19:53:05 2 of 3 OK created incremental model public.path_assist_addresses_scd… [INSERT 0 3365 in 8.94s]

2022-09-14 19:53:05 [42mnormalization[0m > 19:53:05 3 of 3 START incremental model public.path_assist_addresses… [RUN]

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 3 of 3 OK created incremental model public.path_assist_addresses… [INSERT 0 1 in 3.20s]

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Finished running 1 view model, 2 incremental models in 18.42s.

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Completed successfully

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09

2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Hey @wattsjon2, thanks for taking the time to post the logs (I wish there was a better way to share logs that go over the size limit.)

This is quite an unusual issue. At first I thought it was something with the normalization but it looks like there are no errors with normalization. Probably because the last record isn’t even being emitted. I want to escalate this Github but before we do so would it be okay if we tried a few more things?

  1. Have you tried running this sync as full refresh or incremental (without deduped)? I’m wondering if the fact that it’s deduped has anything to do with the last row being nulled.
  2. Have you tried utilizing a user defined schema? More info: https://docs.airbyte.com/integrations/sources/s3/#user-schema
  3. Could there be anything configuration-wise that’d solve this problem that you’ve looked into? More info: https://docs.airbyte.com/integrations/sources/s3/#user-schema (specifically additional_reader_option)

Sorry I don’t have any immediate answers here. Let me know if you decide to try any of the above. If nothing else, I’ll create an issue on Github.

I believe I found the issue. Whomever had set up this connection used a user defined schema tag for only 1 column. After assigning all the columns it appears to have fixed the issue.

Thanks @wattsjon2 for following up here. Happy to hear you were able to figure out the problem and the sync is now working! Feel free to make additional topics on our forums if you have any other issues.