Destination BigQuery - Normalization Error with CDC

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Ubuntu, Ubuntu 22.04.1 LTS
  • Memory / Disk: you can use something like 8Gb / 100GB
  • Deployment: Docker deployment?
  • Airbyte Version: * 0.40.4
  • Source name/version: RDS-Posgres
  • Destination name/version: GCP-Bigquery
  • Step: Normalization
  • Description: I am following this tutorial Incremental data synchronization between Postgres databases to have implement CDC to Bigquery as destination. I got this error message

770.75 KB3,750 emitted records3,750 committed records1m 27s
Failure Origin: normalization, Message: Normalization failed during the dbt run. This may indicate a problem with the data itself.

with following log

2022-10-07 16:12:15 INFO i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.method: does not have a value in the enumeration [Standard], $.method: must be a constant value Standard
2022-10-07 16:12:15 INFO i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.method: must be a constant value Standard
2022-10-07 16:12:15 INFO i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.credential.hmac_key_access_id: object found, string expected, $.credential.hmac_key_secret: object found, string expected
2022-10-07 16:12:15 INFO i.a.w.t.TemporalAttemptExecution(get):106 - Docker volume job log path: /tmp/workspace/538/0/logs.log
2022-10-07 16:12:15 INFO i.a.w.t.TemporalAttemptExecution(get):111 - Executing worker wrapper. Airbyte version: 0.40.4
2022-10-07 16:12:15 INFO i.a.c.i.LineGobbler(voidCall):83 - Checking if airbyte/source-postgres:1.0.7 exists...
2022-10-07 16:12:15 INFO i.a.c.i.LineGobbler(voidCall):83 - airbyte/source-postgres:1.0.7 was found locally.
2022-10-07 16:12:15 INFO i.a.w.p.DockerProcessFactory(create):119 - Creating docker container = source-postgres-check-538-0-fyrod with resources io.airbyte.config.ResourceRequirements@4cc6837[cpuRequest=<null>,cpuLimit=<null>,memoryRequest=<null>,memoryLimit=<null>]
2022-10-07 16:12:15 INFO i.a.w.p.DockerProcessFactory(create):163 - Preparing command: docker run --rm --init -i -w /data/538/0 --log-driver none --name source-postgres-check-538-0-fyrod --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e USE_STREAM_CAPABLE_STATE=true -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e WORKER_JOB_ATTEMPT=0 -e WORKER_CONNECTOR_IMAGE=airbyte/source-postgres:1.0.7 -e AIRBYTE_VERSION=0.40.4 -e WORKER_JOB_ID=538 airbyte/source-postgres:1.0.7 check --config source_config.json
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.a.AdaptiveSourceRunner$Runner(getSource):73 - Running source under deployment mode: OSS
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.a.AdaptiveSourceRunner$Runner(run):85 - Starting source: io.airbyte.integrations.base.ssh.SshWrappedSource
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.IntegrationCliParser(parseOptions):118 - integration args: {check=null, config=source_config.json}
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.IntegrationRunner(runInternal):104 - Running integration: io.airbyte.integrations.base.ssh.SshWrappedSource
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.IntegrationRunner(runInternal):105 - Command: CHECK
2022-10-07 16:12:18 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:18 INFO i.a.i.b.IntegrationRunner(runInternal):106 - Integration config: IntegrationConfig{command=CHECK, configPath='source_config.json', catalogPath='null', statePath='null'}
2022-10-07 16:12:19 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:19 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword order - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-10-07 16:12:19 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:19 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword examples - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-10-07 16:12:19 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-10-07 16:12:19 WARN c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword airbyte_secret - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword

The following screenshot file are attached

  1. Replication page sync mode is Incremental | Deduped + history. was used as instructed in the tutorial
  2. Transformation page: Normalized tabular data was used as instructed in the tutorial
  3. Content of Bigquery is attached (data got to raw), only two table was created,


Hi! Thanks again for your patience!

As the error reads This may indicate a problem with the data itself., I think this may be a case of types not matching up/or being converted correctly. COuld you try to sync without CDC and see if that works? Also, could you please give me the exact connector versions?

I tried the setup further using the following

Data remain the same in the source database Postgres (on AWS)
destination set to Postgres
Every other settings according to the tutorial link above are remaining in their default.

Outcome;
The sync was successful and expected data was generated in the right table
in addition all intermediary tables such as *raw, *scd and normalized data table was created and populated with expected data.

So, Data was not the problem.

If you check, the log for bigquery complaint of not able to create some table with name end in “_stg”

PS
I noticed air byte create another schema to save stage table use in it operation
See attached file for Postgres to Postgres output

Thanks





Hi,

I am having issues syncing data from HubSpot to Big Query through Airbyte when I use normalization (normalization tabular data), I keep getting this error “Pickling client objects is explicitly not supported.” . I have had success uploading data using the Raw Data Json, but I need the data stored in tabular format. Has anyone had this issue? Thank you for your help in advance

Hey @blueband,
I know it’s been a while but were you able to get incremental sync working with BigQuery?

@novice_analyst are you using BigQuery denormalized or the default BigQuery destination connector? Can you post the logs?

Hi @sajarin I am using the default BigQuery destination connector and also just implementing the default tabular normalization available on Airbyte. This is the log
huspot_bq_errorlog (2).txt (1.2 MB)

@novice_analyst @blueband are you still having issues? Can you test latest version of Airbyte and Bigquery connector?