BigQuery with GCS staging seems to fail when syncing Typeform

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Debian
  • Memory / Disk: 4Gb / 200 GB
  • Deployment: Docker
  • Airbyte Version: Version: 0.39.17-alpha
  • Source name/version: Typeform (0.1.6)
  • Destination name/version: BigQuery (1.1.8)
  • Step: Add a BigQuery dest using GCS staging, add Typeform source, create connection
  • Description: When trying to make initial sync from Typeform to BigQuery with GCS staging, the sync fails. OBS: Note the errors wrt GCS hmac_keys etc in the log.

Full log:

2022-06-15 07:37:13 e[32mINFOe[m i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.credential: is not defined in the schema and the schema does not allow additional properties, $.part_size_mb: is not defined in the schema and the schema does not allow additional properties, $.gcs_bucket_name: is not defined in the schema and the schema does not allow additional properties, $.gcs_bucket_path: is not defined in the schema and the schema does not allow additional properties, $.keep_files_in_gcs-bucket: is not defined in the schema and the schema does not allow additional properties, $.method: must be a constant value Standard
2022-06-15 07:37:13 e[32mINFOe[m i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.credential.hmac_key_access_id: object found, string expected, $.credential.hmac_key_secret: object found, string expected
2022-06-15 07:37:13 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):108 - Docker volume job log path: /tmp/workspace/2964/0/logs.log
2022-06-15 07:37:13 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):113 - Executing worker wrapper. Airbyte version: 0.39.17-alpha
2022-06-15 07:37:13 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/source-typeform:0.1.6 exists...
2022-06-15 07:37:13 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/source-typeform:0.1.6 was found locally.
2022-06-15 07:37:13 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 2964
2022-06-15 07:37:13 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/2964/0 --log-driver none --name source-typeform-check-2964-0-xqizm --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/source-typeform:0.1.6 -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.39.17-alpha -e WORKER_JOB_ID=2964 airbyte/source-typeform:0.1.6 check --config source_config.json
2022-06-15 07:37:15 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(internalLog):97 - Check succeeded
2022-06-15 07:37:15 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
2022-06-15 07:37:15 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):108 - Docker volume job log path: /tmp/workspace/2964/0/logs.log
2022-06-15 07:37:15 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):113 - Executing worker wrapper. Airbyte version: 0.39.17-alpha
2022-06-15 07:37:15 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-bigquery:1.1.8 exists...
2022-06-15 07:37:16 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 2964
2022-06-15 07:37:16 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-bigquery:1.1.8 was found locally.
2022-06-15 07:37:16 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/2964/0 --log-driver none --name destination-bigquery-check-2964-0-jlreq --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-bigquery:1.1.8 -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.39.17-alpha -e WORKER_JOB_ID=2964 airbyte/destination-bigquery:1.1.8 check --config source_config.json
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Class path contains multiple SLF4J bindings.
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Found binding in [jar:file:/airbyte/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Found binding in [jar:file:/airbyte/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Found binding in [jar:file:/airbyte/lib/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2022-06-15 07:37:17 e[1;31mERRORe[m i.a.c.i.LineGobbler(voidCall):82 - SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2022-06-15 07:37:18 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:18 e[32mINFOe[m i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}
2022-06-15 07:37:18 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:18 e[32mINFOe[m i.a.i.b.IntegrationRunner(runInternal):123 - Running integration: io.airbyte.integrations.destination.bigquery.BigQueryDestination
2022-06-15 07:37:18 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:18 e[32mINFOe[m i.a.i.b.IntegrationRunner(runInternal):124 - Command: CHECK
2022-06-15 07:37:18 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:18 e[32mINFOe[m i.a.i.b.IntegrationRunner(runInternal):125 - Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
2022-06-15 07:37:19 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:19 e[33mWARNe[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-06-15 07:37:19 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:19 e[33mWARNe[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-06-15 07:37:19 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:19 e[33mWARNe[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-06-15 07:37:20 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:20 e[32mINFOe[m i.a.i.d.b.BigQueryUtils(getLoadingMethod):303 - Selected loading method is set to: GCS
2022-06-15 07:37:27 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:27 e[32mINFOe[m i.a.i.d.s.S3FormatConfigs(getS3FormatConfig):22 - S3 format config: {"format_type":"CSV","flattening":"No flattening","part_size_mb":"5"}
2022-06-15 07:37:29 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:29 e[32mINFOe[m i.a.i.d.s.S3Destination(testSingleUpload):81 - Started testing if all required credentials assigned to user for single file uploading
2022-06-15 07:37:30 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:30 e[32mINFOe[m i.a.i.d.s.S3Destination(testSingleUpload):91 - Finished checking for normal upload mode
2022-06-15 07:37:30 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:30 e[32mINFOe[m i.a.i.d.s.S3Destination(testMultipartUpload):95 - Started testing if all required credentials assigned to user for multipart upload
2022-06-15 07:37:30 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:30 e[32mINFOe[m a.m.s.StreamTransferManager(getMultiPartOutputStreams):329 - Initiated multipart upload to sf_airbyte_sync/sf_airbyte_sync/data_synch/typeform/test_1655278650456 with full ID ABPnzm64p8gGnnyq6MjT6I5aVytfgIr-Yg9_Wzap7pzMMWH00zdRexvFWQ42quh01ToLfd0
2022-06-15 07:37:31 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:31 e[32mINFOe[m a.m.s.MultiPartOutputStream(close):158 - Called close() on [MultipartOutputStream for parts 1 - 10000]
2022-06-15 07:37:31 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:31 e[32mINFOe[m a.m.s.MultiPartOutputStream(close):158 - Called close() on [MultipartOutputStream for parts 1 - 10000]
2022-06-15 07:37:31 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:31 e[33mWARNe[m a.m.s.MultiPartOutputStream(close):160 - [MultipartOutputStream for parts 1 - 10000] is already closed
2022-06-15 07:37:31 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:31 e[32mINFOe[m a.m.s.StreamTransferManager(complete):367 - [Manager uploading to sf_airbyte_sync/sf_airbyte_sync/data_synch/typeform/test_1655278650456 with id ABPnzm64p...h01ToLfd0]: Uploading leftover stream [Part number 1 containing 3.34 MB]
2022-06-15 07:37:33 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:33 e[32mINFOe[m a.m.s.StreamTransferManager(uploadStreamPart):558 - [Manager uploading to sf_airbyte_sync/sf_airbyte_sync/data_synch/typeform/test_1655278650456 with id ABPnzm64p...h01ToLfd0]: Finished uploading [Part number 1 containing 3.34 MB]
2022-06-15 07:37:33 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:33 e[32mINFOe[m a.m.s.StreamTransferManager(complete):395 - [Manager uploading to sf_airbyte_sync/sf_airbyte_sync/data_synch/typeform/test_1655278650456 with id ABPnzm64p...h01ToLfd0]: Completed
2022-06-15 07:37:33 e[32mINFOe[m i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-06-15 07:37:33 e[32mINFOe[m i.a.i.d.s.S3Destination(testMultipartUpload):119 - Finished verification for multipart upload mode
2022-06-15 07:37:35 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):108 - Docker volume job log path: /tmp/workspace/2964/0/logs.log
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):113 - Executing worker wrapper. Airbyte version: 0.39.17-alpha
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(run):115 - start sync worker. job id: 2964 attempt id: 0
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(run):127 - configured sync modes: {null.responses=incremental - append_dedup, null.forms=full_refresh - overwrite}
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.i.DefaultAirbyteDestination(start):69 - Running destination...
2022-06-15 07:37:36 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/destination-bigquery:1.1.8 exists...
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 2964
2022-06-15 07:37:36 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/2964/0 --log-driver none --name destination-bigquery-write-2964-0-xponb --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/destination-bigquery:1.1.8 -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.39.17-alpha -e WORKER_JOB_ID=2964 airbyte/destination-bigquery:1.1.8 write --config destination_config.json --catalog destination_catalog.json
2022-06-15 07:37:36 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/destination-bigquery:1.1.8 was found locally.
2022-06-15 07:37:37 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - Checking if airbyte/source-typeform:0.1.6 exists...
2022-06-15 07:37:37 e[32mINFOe[m i.a.c.i.LineGobbler(voidCall):82 - airbyte/source-typeform:0.1.6 was found locally.
2022-06-15 07:37:37 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):108 - Creating docker job ID: 2964
2022-06-15 07:37:37 e[32mINFOe[m i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/2964/0 --log-driver none --name source-typeform-read-2964-0-oamvc --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/source-typeform:0.1.6 -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.39.17-alpha -e WORKER_JOB_ID=2964 airbyte/source-typeform:0.1.6 read --config source_config.json --catalog source_catalog.json
2022-06-15 07:37:37 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(lambda$getDestinationOutputRunnable$7):395 - Destination output thread started.
2022-06-15 07:37:37 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(run):169 - Waiting for source and destination threads to complete.
2022-06-15 07:37:37 e[32mINFOe[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):298 - Replication thread started.
2022-06-15 07:37:40 e[43mdestinatione[0m > SLF4J: Class path contains multiple SLF4J bindings.
2022-06-15 07:37:40 e[43mdestinatione[0m > SLF4J: Found binding in [jar:file:/airbyte/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:40 e[43mdestinatione[0m > SLF4J: Found binding in [jar:file:/airbyte/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:40 e[43mdestinatione[0m > SLF4J: Found binding in [jar:file:/airbyte/lib/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-06-15 07:37:40 e[43mdestinatione[0m > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2022-06-15 07:37:40 e[44msourcee[0m > Starting syncing SourceTypeform
2022-06-15 07:37:40 e[44msourcee[0m > Syncing stream: forms 
2022-06-15 07:37:41 e[43mdestinatione[0m > SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Not sure if this issue is actually related to BigQuery GCS staging. I see this error in the console output:

airbyte-worker      | 2022-06-15 10:02:04 WARN i.t.i.r.GrpcSyncRetryer(retry):56 - Retrying after failure
airbyte-worker      | io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: grpc: received message larger than max (5311106 vs. 4194304)
airbyte-worker      | 	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[grpc-stub-1.44.1.jar:1.44.1]
airbyte-worker      | 	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243) ~[grpc-stub-1.44.1.jar:1.44.1]
airbyte-worker      | 	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156) ~[grpc-stub-1.44.1.jar:1.44.1]
airbyte-worker      | 	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.respondActivityTaskCompleted(WorkflowServiceGrpc.java:2746) ~[temporal-serviceclient-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.lambda$sendReply$0(ActivityWorker.java:278) ~[temporal-sdk-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.retryer.GrpcRetryer.lambda$retry$0(GrpcRetryer.java:44) ~[temporal-serviceclient-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.retryer.GrpcSyncRetryer.retry(GrpcSyncRetryer.java:61) ~[temporal-serviceclient-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.retryer.GrpcRetryer.retryWithResult(GrpcRetryer.java:51) ~[temporal-serviceclient-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.retryer.GrpcRetryer.retry(GrpcRetryer.java:41) ~[temporal-serviceclient-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.sendReply(ActivityWorker.java:272) ~[temporal-sdk-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:208) ~[temporal-sdk-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:164) ~[temporal-sdk-1.8.1.jar:?]
airbyte-worker      | 	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.8.1.jar:?]
airbyte-worker      | 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
airbyte-worker      | 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
airbyte-worker      | 	at java.lang.Thread.run(Thread.java:833) [?:?]

Please upload the complete file log. The error you pointed is only an warning and we need the complete log file to identify the root cause correctly.

Had the same problem and solved it by pasting the json key once again into the Service Account field of the destination settings. Seems like it was skipped after last BigQuere module update.

The issue in my case: Apparently too much data when extracting all data from our Typeform account. After limiting the data extraction to specific typeform ids, it works.

Here is the full server log if you want to investigate this issue further.
server-logs.txt (2.2 MB)