I have a connection set up that is working using deduped + history, however it appears that final item in our csv file has all the data nulled out except for a single column which is correct. All the information for this record is correct in the _ab_additional_properties interestingly enough. The record exists in all the csv files we have loaded. The issue persists after resetting data and syncing. Hopefully you have some thoughts on this issue. Thanks
Hey @wattsjon2, thanks for the post and welcome to the community.
Could you share some more info about what destination you’re using? What version of airbyte are you using? What version of the File connector are you using?
What does this last row look like? Sometimes a simple upgrade of the connectors or Airbyte resolves these issues. Have you tried that? Do you have a log file you can share?
We are connecting a csv in an s3 bucket to redshift. The airbyte version we are using is 0.39.41-alpha. The s3 version we are using is 0.1.18 and the redshift version we are using is 0.3.49, both of which are the latest version.
I checked and all the versions are up to date. I didn’t see anything out of the ordinary in the log. I can’t post it because it’s too long
Hey @wattsjon2, would it be too much trouble to upgrade airbyte to the latest version? (0.40.6)
And then rerun the sync and then post an abridged log if you’re still experiencing the same problem?
Thank you for your quick replies. I’ll reply as soon as our airbyte version gets updated with the result and log
Unfortunately updating the version did not fix the problem. Here are the logs (some of the start and finish has been removed so it can fit)
2022-09-14 19:52:16 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - initialised stream with format: {‘encoding’: ‘utf8’, ‘filetype’: ‘csv’, ‘delimiter’: ‘,’, ‘block_size’: 10000, ‘quote_char’: ‘"’, ‘double_quote’: True, ‘infer_datatypes’: True, ‘advanced_options’: ‘{}’, ‘newlines_in_values’: False, ‘additional_reader_options’: ‘{}’}
2022-09-14 19:52:16 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - Iterating S3 bucket ‘act-data-inbox-phi’ with prefix: ‘omaha-path-assist’
2022-09-14 19:52:17 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):99 - Check succeeded
2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…
2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log
2022-09-14 19:52:17 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6
2022-09-14 19:52:17 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-redshift:0.3.49 exists…
2022-09-14 19:52:17 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-redshift:0.3.49 was found locally.
2022-09-14 19:52:17 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204
2022-09-14 19:52:17 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name destination-redshift-check-204-0-epnkb --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-redshift:0.3.49 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/destination-redshift:0.3.49 check --config source_config.json
2022-09-14 19:52:18 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:18 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):63 - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):105 - Command: CHECK
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):106 - Integration config:
IntegrationConfig{command=CHECK, configPath=‘source_config.json’, catalogPath=‘null’, statePath=‘null’}
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [33mWARN[m i.a.i.d.r.RedshiftDestination(determineUploadMode):54 - The “standard” upload mode is not performant, and is not recommended for production. Please use the Amazon S3 upload mode if you are syncing a large amount of data.
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m i.a.i.d.j.c.SwitchingDestination(check):55 - Using destination type: STANDARD
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.HikariDataSource():80 - HikariPool-1 - Starting…
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.HikariDataSource():82 - HikariPool-1 - Start completed.
2022-09-14 19:52:19 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:19 [32mINFO[m c.z.h.p.PoolBase(getAndSetNetworkTimeout):536 - HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon]JDBC Driver does not support this optional feature.)
2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m c.z.h.HikariDataSource(close):350 - HikariPool-1 - Shutdown initiated…
2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m c.z.h.HikariDataSource(close):352 - HikariPool-1 - Shutdown completed.
2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):152 - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:22 [32mINFO[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-09-14 19:52:22 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):65 - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…
2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log
2022-09-14 19:52:23 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6
2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):116 - start sync worker. job id: 204 attempt id: 0
2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):128 - configured sync modes: {null.path_assist_addresses=incremental - append_dedup}
2022-09-14 19:52:23 [32mINFO[m i.a.w.i.DefaultAirbyteDestination(start):69 - Running destination…
2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-redshift:0.3.49 exists…
2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-redshift:0.3.49 was found locally.
2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204
2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name destination-redshift-write-204-0-agncm --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-redshift:0.3.49 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/destination-redshift:0.3.49 write --config destination_config.json --catalog destination_catalog.json
2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/source-s3:0.1.18 exists…
2022-09-14 19:52:23 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/source-s3:0.1.18 was found locally.
2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204
2022-09-14 19:52:23 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0 --log-driver none --name source-s3-read-204-0-nnnwj --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/source-s3:0.1.18 -e AIRBYTE_VERSION=0.40.6 -e WORKER_JOB_ID=204 airbyte/source-s3:0.1.18 read --config source_config.json --catalog source_catalog.json --state input_state.json
2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):170 - Waiting for source and destination threads to complete.
2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):299 - Replication thread started.
2022-09-14 19:52:23 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getDestinationOutputRunnable$7):406 - Destination output thread started.
2022-09-14 19:52:25 [44msource[0m > initialised stream with format:
2022-09-14 19:52:25 [44msource[0m > Starting syncing SourceS3
2022-09-14 19:52:25 [44msource[0m > initialised stream with format: {‘encoding’: ‘utf8’, ‘filetype’: ‘csv’, ‘delimiter’: ‘,’, ‘block_size’: 10000, ‘quote_char’: ‘"’, ‘double_quote’: True, ‘infer_datatypes’: True, ‘advanced_options’: ‘{}’, ‘newlines_in_values’: False, ‘additional_reader_options’: ‘{}’}
2022-09-14 19:52:25 [44msource[0m > Syncing stream: path_assist_addresses
2022-09-14 19:52:25 [44msource[0m > Iterating S3 bucket ‘act-data-inbox-phi’ with prefix: ‘omaha-path-assist’
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):63 - starting destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {catalog=destination_catalog.json, write=null, config=destination_config.json}
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):105 - Command: WRITE
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):106 - Integration config: IntegrationConfig{command=WRITE, configPath=‘destination_config.json’, catalogPath=‘destination_catalog.json’, statePath=‘null’}
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [33mWARN[m i.a.i.d.r.RedshiftDestination(determineUploadMode):54 - The “standard” upload mode is not performant, and is not recommended for production. Please use the Amazon S3 upload mode if you are syncing a large amount of data.
2022-09-14 19:52:25 [43mdestination[0m > 2022-09-14 19:52:25 [32mINFO[m i.a.i.d.j.c.SwitchingDestination(getConsumer):65 - Using destination type: STANDARD
2022-09-14 19:52:26 [44msource[0m > Detected mismatched datatype on column ‘postal_code’, in file ‘omaha-path-assist/year=2022/month=09/day=12/addresses.csv’. Should be ‘string’, but found ‘integer’. Airbyte will attempt to coerce this to string on read.
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m c.z.h.HikariDataSource():80 - HikariPool-1 - Starting…
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m c.z.h.HikariDataSource():82 - HikariPool-1 - Start completed.
2022-09-14 19:52:26 [44msource[0m > Detected mismatched datatype on column ‘postal_code’, in file ‘omaha-path-assist/year=2022/month=09/day=14/addresses.csv’. Should be ‘string’, but found ‘integer’. Airbyte will attempt to coerce this to string on read.
2022-09-14 19:52:26 [44msource[0m > determined master schema: {‘postal_code’: ‘string’, ‘id’: ‘string’, ‘restricted’: ‘boolean’, ‘state’: ‘string’, ‘ancestry’: ‘string’, ‘owner_id’: ‘string’, ‘creator_id’: ‘string’, ‘updater_id’: ‘string’, ‘owner_organization_id’: ‘string’, ‘created_at’: ‘string’, ‘updated_at’: ‘string’, ‘addressable_type’: ‘string’, ‘addressable_id’: ‘string’, ‘person_id’: ‘string’, ‘kind’: ‘string’, ‘street1’: ‘string’, ‘street2’: ‘string’, ‘city’: ‘string’, ‘region’: ‘string’, ‘country’: ‘string’}
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$toWriteConfig$0):98 - Write config: WriteConfig{streamName=path_assist_addresses, namespace=null, outputSchemaName=public, tmpTableName=_airbyte_tmp_cpx_path_assist_addresses, outputTableName=_airbyte_raw_path_assist_addresses, syncMode=append_dedup}
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(startTracked):116 - class io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer started.
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):123 - Preparing tmp tables in destination started for 1 streams
2022-09-14 19:52:26 [43mdestination[0m > 2022-09-14 19:52:26 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):127 - Preparing tmp table in destination started for stream path_assist_addresses. schema: public, tmp table name: _airbyte_tmp_cpx_path_assist_addresses
2022-09-14 19:52:27 [43mdestination[0m > 2022-09-14 19:52:27 [32mINFO[m c.z.h.p.PoolBase(getAndSetNetworkTimeout):536 - HikariPool-1 - Driver does not support get/set network timeout for connections. ([Amazon]JDBC Driver does not support this optional feature.)
2022-09-14 19:52:29 [43mdestination[0m > 2022-09-14 19:52:29 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onStartFunction$1):133 - Preparing tables in destination completed.
2022-09-14 19:52:29 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 1000 (783 KB)
2022-09-14 19:52:29 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 2000 (1 MB)
2022-09-14 19:52:30 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 3000 (2 MB)
2022-09-14 19:52:30 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 4000 (3 MB)
2022-09-14 19:52:31 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 5000 (3 MB)
2022-09-14 19:52:31 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):325 - Records read: 6000 (4 MB)
2022-09-14 19:52:32 [44msource[0m > finished reading a stream slice
2022-09-14 19:52:32 [44msource[0m > Read 6626 records from path_assist_addresses stream
2022-09-14 19:52:32 [44msource[0m > Finished syncing path_assist_addresses
2022-09-14 19:52:32 [44msource[0m > SourceS3 runtimes:
Syncing stream path_assist_addresses 0:00:07.005665
2022-09-14 19:52:32 [44msource[0m > Finished syncing SourceS3
2022-09-14 19:52:32 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):336 - Total records read: 6627 (5 MB)
2022-09-14 19:52:32 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):175 - One of source or destination thread complete. Waiting on the other.
2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):62 - Airbyte message consumer: succeeded.
2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.b.BufferedStreamConsumer(close):171 - executing on success close procedure.
2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.r.InMemoryRecordBufferingStrategy(flushAll):84 - Flushing path_assist_addresses: 6626 records (20 MB)
2022-09-14 19:52:32 [43mdestination[0m > 2022-09-14 19:52:32 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(insertRecordsInternal):89 - actual size of batch: 6626
2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(onDestinationCloseOperations):139 - Executing operations for Redshift Destination DB engine…
2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(discoverNotSuperTables):189 - Discovering NOT SUPER table types…
2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.r.o.RedshiftSqlOperations(onDestinationCloseOperations):157 - Executing operations for Redshift Destination DB engine completed.
2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):163 - Finalizing tables in destination started for 1 streams
2022-09-14 19:52:38 [43mdestination[0m > 2022-09-14 19:52:38 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):168 - Finalizing stream path_assist_addresses. schema public, tmp table _airbyte_tmp_cpx_path_assist_addresses, final table _airbyte_raw_path_assist_addresses
2022-09-14 19:52:39 [43mdestination[0m > 2022-09-14 19:52:39 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):181 - Executing finalization of tables.
2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):183 - Finalizing tables in destination completed.
2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):186 - Cleaning tmp tables in destination started for 1 streams
2022-09-14 19:52:40 [43mdestination[0m > 2022-09-14 19:52:40 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):190 - Cleaning tmp table in destination started for stream path_assist_addresses. schema public, tmp table name: _airbyte_tmp_cpx_path_assist_addresses
2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.d.j.JdbcBufferedConsumerFactory(lambda$onCloseFunction$3):195 - Cleaning tmp tables in destination completed.
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(lambda$getDestinationOutputRunnable$7):416 - State in DefaultReplicationWorker from destination: io.airbyte.protocol.models.AirbyteMessage@6244a484[type=STATE,log=,spec=,connectionStatus=,catalog=,record=,state=io.airbyte.protocol.models.AirbyteStateMessage@6c8356bc[type=,stream=,global=,data={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}},additionalProperties={}],trace=,additionalProperties={}]
2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.b.IntegrationRunner(runInternal):152 - Completed integration: io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:41 [43mdestination[0m > 2022-09-14 19:52:41 [32mINFO[m i.a.i.d.r.RedshiftDestination(main):65 - completed destination: class io.airbyte.integrations.destination.redshift.RedshiftDestination
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):177 - Source and destination threads complete.
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):240 - sync summary: io.airbyte.config.ReplicationAttemptSummary@7e83a04c[status=completed,recordsSynced=6626,bytesSynced=5314961,startTime=1663185143157,endTime=1663185161692,totalStats=io.airbyte.config.SyncStats@6b55a73f[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=1,recordsCommitted=6626],streamStats=[io.airbyte.config.StreamSyncStats@520f4042[streamName=path_assist_addresses,stats=io.airbyte.config.SyncStats@a43bb4b[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=,recordsCommitted=6626]]]]
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):267 - Source output at least one state message
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultReplicationWorker(run):273 - State capture: Updated state to: Optional[io.airbyte.config.State@69a1eb8f[state={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}}]]
2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):131 - Stopping cancellation check scheduling…
2022-09-14 19:52:41 [32mINFO[m i.a.w.t.s.ReplicationActivityImpl(lambda$replicate$3):161 - sync summary: io.airbyte.config.StandardSyncOutput@72e44d1[standardSyncSummary=io.airbyte.config.StandardSyncSummary@33476a99[status=completed,recordsSynced=6626,bytesSynced=5314961,startTime=1663185143157,endTime=1663185161692,totalStats=io.airbyte.config.SyncStats@6b55a73f[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=1,recordsCommitted=6626],streamStats=[io.airbyte.config.StreamSyncStats@520f4042[streamName=path_assist_addresses,stats=io.airbyte.config.SyncStats@a43bb4b[recordsEmitted=6626,bytesEmitted=5314961,stateMessagesEmitted=,recordsCommitted=6626]]]],normalizationSummary=,state=io.airbyte.config.State@69a1eb8f[state={“path_assist_addresses”:{“_ab_source_file_last_modified”:“2022-09-14T13:23:13+0000”,“schema”:{“postal_code”:“string”,“_ab_additional_properties”:“object”,“_ab_source_file_last_modified”:“string”,“_ab_source_file_url”:“string”},“history”:{“2022-09-11”:[“omaha-path-assist/year=2022/month=09/day=11/addresses.csv”],“2022-09-12”:[“omaha-path-assist/year=2022/month=09/day=12/addresses.csv”],“2022-09-14”:[“omaha-path-assist/year=2022/month=09/day=14/addresses.csv”]}}}],outputCatalog=io.airbyte.protocol.models.ConfiguredAirbyteCatalog@481856b1[streams=[io.airbyte.protocol.models.ConfiguredAirbyteStream@5f7f004b[stream=io.airbyte.protocol.models.AirbyteStream@2a02cfb9[name=path_assist_addresses,jsonSchema={“type”:“object”,“properties”:{“id”:{“type”:[“null”,“string”]},“city”:{“type”:[“null”,“string”]},“kind”:{“type”:[“null”,“string”]},“state”:{“type”:[“null”,“string”]},“region”:{“type”:[“null”,“string”]},“country”:{“type”:[“null”,“string”]},“street1”:{“type”:[“null”,“string”]},“street2”:{“type”:[“null”,“string”]},“ancestry”:{“type”:[“null”,“string”]},“owner_id”:{“type”:[“null”,“string”]},“person_id”:{“type”:[“null”,“string”]},“created_at”:{“type”:[“null”,“string”]},“creator_id”:{“type”:[“null”,“string”]},“restricted”:{“type”:[“null”,“boolean”]},“updated_at”:{“type”:[“null”,“string”]},“updater_id”:{“type”:[“null”,“string”]},“postal_code”:{“type”:[“null”,“string”]},“addressable_id”:{“type”:[“null”,“string”]},“addressable_type”:{“type”:[“null”,“string”]},“_ab_source_file_url”:{“type”:“string”},“owner_organization_id”:{“type”:[“null”,“string”]},“_ab_additional_properties”:{“type”:“object”},“_ab_source_file_last_modified”:{“type”:“string”,“format”:“date-time”}}},supportedSyncModes=[full_refresh, incremental],sourceDefinedCursor=true,defaultCursorField=[_ab_source_file_last_modified],sourceDefinedPrimaryKey=,namespace=,additionalProperties={}],syncMode=incremental,cursorField=[_ab_source_file_last_modified],destinationSyncMode=append_dedup,primaryKey=[[id]],additionalProperties={}]],additionalProperties={}],failures=]
2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalUtils(withBackgroundHeartbeat):291 - Stopping temporal heartbeating…
2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):105 - Docker volume job log path: /tmp/workspace/204/0/logs.log
2022-09-14 19:52:41 [32mINFO[m i.a.w.t.TemporalAttemptExecution(get):110 - Executing worker wrapper. Airbyte version: 0.40.6
2022-09-14 19:52:41 [32mINFO[m i.a.w.g.DefaultNormalizationWorker(run):50 - Running normalization.
2022-09-14 19:52:41 [32mINFO[m i.a.w.n.DefaultNormalizationRunner(runProcess):122 - Running with normalization version: airbyte/normalization-redshift:0.2.12
2022-09-14 19:52:41 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/normalization-redshift:0.2.12 exists…
2022-09-14 19:52:41 [32mINFO[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/normalization-redshift:0.2.12 was found locally.
2022-09-14 19:52:41 [32mINFO[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 204
2022-09-14 19:52:41 [32mINFO[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/204/0/normalize --log-driver none --name normalization-redshift-normalize-204-0-weixy --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE= -e AIRBYTE_VERSION=0.40.6 airbyte/normalization-redshift:0.2.12 run --integration-type redshift --config destination_config.json --catalog destination_catalog.json
2022-09-14 19:52:42 [42mnormalization[0m > Running: transform-config --config destination_config.json --integration-type redshift --out /data/204/0/normalize
2022-09-14 19:52:42 [42mnormalization[0m > Namespace(config=‘destination_config.json’, integration_type=<DestinationType.REDSHIFT: ‘redshift’>, out=‘/data/204/0/normalize’)
2022-09-14 19:52:42 [42mnormalization[0m > transform_redshift
2022-09-14 19:52:42 [42mnormalization[0m > Running: transform-catalog --integration-type redshift --profile-config-dir /data/204/0/normalize --catalog destination_catalog.json --out /data/204/0/normalize/models/generated/ --json-column _airbyte_data
2022-09-14 19:52:43 [42mnormalization[0m > Processing destination_catalog.json…
2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_ctes/public/path_assist_addresses_ab1.sql from path_assist_addresses
2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_ctes/public/path_assist_addresses_ab2.sql from path_assist_addresses
2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_views/public/path_assist_addresses_stg.sql from path_assist_addresses
2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_incremental/scd/public/path_assist_addresses_scd.sql from path_assist_addresses
2022-09-14 19:52:43 [42mnormalization[0m > Generating airbyte_incremental/public/path_assist_addresses.sql from path_assist_addresses
2022-09-14 19:52:43 [42mnormalization[0m > detected no config file for ssh, assuming ssh is off.
2022-09-14 19:52:45 [42mnormalization[0m > [–event-buffer-size EVENT_BUFFER_SIZE]
2022-09-14 19:52:45 [42mnormalization[0m > --event-buffer-size EVENT_BUFFER_SIZE
2022-09-14 19:52:45 [42mnormalization[0m >
2022-09-14 19:52:45 [42mnormalization[0m > DBT >=1.0.0 detected; using 10K event buffer size
2022-09-14 19:52:45 [42mnormalization[0m >
2022-09-14 19:52:48 [42mnormalization[0m > 19:52:48 Running with dbt=1.0.0
2022-09-14 19:52:48 [42mnormalization[0m > 19:52:48 Partial parse save file not found. Starting full parse.
2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
2022-09-14 19:52:50 [42mnormalization[0m > There are 1 unused configuration paths:
2022-09-14 19:52:50 [42mnormalization[0m > - models.airbyte_utils.generated.airbyte_tables
2022-09-14 19:52:50 [42mnormalization[0m >
2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50 Found 5 models, 0 tests, 0 snapshots, 0 analyses, 584 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics
2022-09-14 19:52:50 [42mnormalization[0m > 19:52:50
2022-09-14 19:52:51 [42mnormalization[0m > 19:52:51 Concurrency: 4 threads (target=‘prod’)
2022-09-14 19:52:51 [42mnormalization[0m > 19:52:51
2022-09-14 19:52:55 [42mnormalization[0m > 19:52:55 1 of 3 START view model _airbyte_public.path_assist_addresses_stg… [RUN]
2022-09-14 19:52:56 [42mnormalization[0m > 19:52:56 1 of 3 OK created view model _airbyte_public.path_assist_addresses_stg… [CREATE VIEW in 1.19s]
2022-09-14 19:52:56 [42mnormalization[0m > 19:52:56 2 of 3 START incremental model public.path_assist_addresses_scd… [RUN]
2022-09-14 19:53:05 [42mnormalization[0m > 19:53:05 2 of 3 OK created incremental model public.path_assist_addresses_scd… [INSERT 0 3365 in 8.94s]
2022-09-14 19:53:05 [42mnormalization[0m > 19:53:05 3 of 3 START incremental model public.path_assist_addresses… [RUN]
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 3 of 3 OK created incremental model public.path_assist_addresses… [INSERT 0 1 in 3.20s]
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Finished running 1 view model, 2 incremental models in 18.42s.
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Completed successfully
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09
2022-09-14 19:53:09 [42mnormalization[0m > 19:53:09 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3
Hey @wattsjon2, thanks for taking the time to post the logs (I wish there was a better way to share logs that go over the size limit.)
This is quite an unusual issue. At first I thought it was something with the normalization but it looks like there are no errors with normalization. Probably because the last record isn’t even being emitted. I want to escalate this Github but before we do so would it be okay if we tried a few more things?
- Have you tried running this sync as full refresh or incremental (without deduped)? I’m wondering if the fact that it’s deduped has anything to do with the last row being nulled.
- Have you tried utilizing a user defined schema? More info: https://docs.airbyte.com/integrations/sources/s3/#user-schema
- Could there be anything configuration-wise that’d solve this problem that you’ve looked into? More info: https://docs.airbyte.com/integrations/sources/s3/#user-schema (specifically additional_reader_option)
Sorry I don’t have any immediate answers here. Let me know if you decide to try any of the above. If nothing else, I’ll create an issue on Github.
I believe I found the issue. Whomever had set up this connection used a user defined schema tag for only 1 column. After assigning all the columns it appears to have fixed the issue.
Thanks @wattsjon2 for following up here. Happy to hear you were able to figure out the problem and the sync is now working! Feel free to make additional topics on our forums if you have any other issues.