Normalized error when syncing large amount of data

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Ubuntu
  • Memory / Disk: 16Gb / 200GB
  • Deployment: Docker deployment?
  • Airbyte Version: 0.39.10-alpha
  • Source name/version: airbyte/source-postgres/0.4.25
  • Destination name/version: airbyte/destination-redshift/ 0.3.39
  • Step: The issue is happening during sync
  • Description: There is a large table 70 GB in Postgresql database intended to move to Redshift through Airbyte. To keep it simple I’m trying only using “Full refresh (overwrite)” mode to do the work. I realized to move 70 GB will takes a lots of time so before I create another table with same schema as the huge table but only insert a small portion of data and it works well however now I’m trying to move the originate table it failed because of the normalization process.
Failure Origin: normalization, Message: Something went wrong during normalization

**2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  1 of 1 ERROR creating table model source.manual_dp_indicator_daily...................................................... [ERROR in 18.42s]**
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  Finished running 1 table model in 21.90s.
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  Completed with 1 error and 0 warnings:
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  Database Error in model manual_dp_indicator_daily (models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql)
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33    Error converting text to date
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33    compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  
2022-06-17 07:15:33 e[42mnormalizatione[0m > 07:15:33  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
2022-06-17 07:15:33 e[42mnormalizatione[0m > 
2022-06-17 07:15:33 e[42mnormalizatione[0m > Diagnosing dbt debug to check if destination is available for dbt and well configured (1):
2022-06-17 07:15:33 e[42mnormalizatione[0m > 
2022-06-17 07:15:35 e[42mnormalizatione[0m > 07:15:35  Running with dbt=1.0.0
2022-06-17 07:15:35 e[42mnormalizatione[0m > dbt version: 1.0.0
2022-06-17 07:15:35 e[42mnormalizatione[0m > python version: 3.9.9
2022-06-17 07:15:35 e[42mnormalizatione[0m > python path: /usr/local/bin/python
2022-06-17 07:15:35 e[42mnormalizatione[0m > os info: Linux-5.13.0-1021-aws-x86_64-with-glibc2.31
2022-06-17 07:15:35 e[42mnormalizatione[0m > Using profiles.yml file at /data/30/0/normalize/profiles.yml
2022-06-17 07:15:35 e[42mnormalizatione[0m > Using dbt_project.yml file at /data/30/0/normalize/dbt_project.yml
2022-06-17 07:15:35 e[42mnormalizatione[0m > 
2022-06-17 07:15:35 e[42mnormalizatione[0m > Configuration:
2022-06-17 07:15:35 e[42mnormalizatione[0m >   profiles.yml file [OK found and valid]
2022-06-17 07:15:35 e[42mnormalizatione[0m >   dbt_project.yml file [OK found and valid]
2022-06-17 07:15:35 e[42mnormalizatione[0m > 
2022-06-17 07:15:35 e[42mnormalizatione[0m > Required dependencies:
2022-06-17 07:15:35 e[42mnormalizatione[0m >  - git [OK found]
2022-06-17 07:15:35 e[42mnormalizatione[0m > 
2022-06-17 07:15:35 e[42mnormalizatione[0m > Connection:
2022-06-17 07:15:35 e[42mnormalizatione[0m >   host: xxx
2022-06-17 07:15:35 e[42mnormalizatione[0m >   port: 5439
2022-06-17 07:15:35 e[42mnormalizatione[0m >   user: xxx
2022-06-17 07:15:35 e[42mnormalizatione[0m >   database: defaultdb
2022-06-17 07:15:35 e[42mnormalizatione[0m >   schema: source
2022-06-17 07:15:35 e[42mnormalizatione[0m >   search_path: None
2022-06-17 07:15:35 e[42mnormalizatione[0m >   keepalives_idle: 240
2022-06-17 07:15:35 e[42mnormalizatione[0m >   sslmode: None
2022-06-17 07:15:35 e[42mnormalizatione[0m >   method: database
2022-06-17 07:15:35 e[42mnormalizatione[0m >   cluster_id: None
2022-06-17 07:15:35 e[42mnormalizatione[0m >   iam_profile: None
2022-06-17 07:15:35 e[42mnormalizatione[0m >   iam_duration_seconds: 900
2022-06-17 07:15:35 e[42mnormalizatione[0m >   Connection test: [OK connection ok]
2022-06-17 07:15:35 e[42mnormalizatione[0m > 
2022-06-17 07:15:35 e[42mnormalizatione[0m > All checks passed!
2022-06-17 07:15:36 e[42mnormalizatione[0m > 
2022-06-17 07:15:36 e[42mnormalizatione[0m > Forward dbt output logs to diagnose/debug errors (0):
2022-06-17 07:15:36 e[42mnormalizatione[0m > 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 
2022-06-17 07:15:36 e[42mnormalizatione[0m > ============================== 2022-06-17 07:15:10.532744 | d3d2d4ca-46f5-4fa4-b477-83a2bf87c561 ==============================
.....
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.617573 [debug] [Thread-4  ]: Postgres adapter: Postgres error: Error converting text to date
2022-06-17 07:15:36 e[42mnormalizatione[0m > 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.617913 [debug] [Thread-4  ]: On model.airbyte_utils.manual_dp_indicator_daily: ROLLBACK
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.621346 [debug] [Thread-4  ]: finished collecting timing info
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.621515 [debug] [Thread-4  ]: On model.airbyte_utils.manual_dp_indicator_daily: Close
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.621974 [debug] [Thread-4  ]: Database Error in model manual_dp_indicator_daily (models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql)
2022-06-17 07:15:36 e[42mnormalizatione[0m >   Error converting text to date
2022-06-17 07:15:36 e[42mnormalizatione[0m >   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.622370 [error] [Thread-4  ]: 1 of 1 ERROR creating table model source.manual_dp_indicator_daily...................................................... [e[31mERRORe[0m in 18.42s]
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.622739 [debug] [Thread-4  ]: Finished running node model.airbyte_utils.manual_dp_indicator_daily
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.623979 [debug] [MainThread]: Acquiring new redshift connection "master"
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.624148 [debug] [MainThread]: Using redshift connection "master"
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.624240 [debug] [MainThread]: On master: BEGIN
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.624322 [debug] [MainThread]: Opening a new connection, currently in state closed
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.624416 [debug] [MainThread]: Redshift adapter: Connecting to Redshift using 'database' credentials
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.655205 [debug] [MainThread]: SQL status: BEGIN in 0.03 seconds
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.655371 [debug] [MainThread]: On master: COMMIT
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.655464 [debug] [MainThread]: Using redshift connection "master"
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.655547 [debug] [MainThread]: On master: COMMIT
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.658908 [debug] [MainThread]: SQL status: COMMIT in 0.0 seconds
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.659023 [debug] [MainThread]: On master: Close
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.659409 [info ] [MainThread]: 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.659626 [info ] [MainThread]: Finished running 1 table model in 21.90s.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.659866 [debug] [MainThread]: Connection 'master' was properly closed.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.660049 [debug] [MainThread]: Connection 'model.airbyte_utils.manual_dp_indicator_daily_ab1' was properly closed.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.660166 [debug] [MainThread]: Connection 'model.airbyte_utils.manual_dp_indicator_daily_ab2' was properly closed.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.660245 [debug] [MainThread]: Connection 'model.airbyte_utils.manual_dp_indicator_daily_ab3' was properly closed.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.660331 [debug] [MainThread]: Connection 'model.airbyte_utils.manual_dp_indicator_daily' was properly closed.
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.669598 [info ] [MainThread]: 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.669857 [info ] [MainThread]: e[31mCompleted with 1 error and 0 warnings:e[0m
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.670154 [info ] [MainThread]: 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.670350 [error] [MainThread]: e[33mDatabase Error in model manual_dp_indicator_daily (models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql)e[0m
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.670569 [error] [MainThread]:   Error converting text to date
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.670753 [error] [MainThread]:   compiled SQL at ../build/run/airbyte_utils/models/generated/airbyte_tables/source/manual_dp_indicator_daily.sql
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.670968 [info ] [MainThread]: 
2022-06-17 07:15:36 e[42mnormalizatione[0m > 07:15:33.671153 [info ] [MainThread]: Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
2022-06-17 07:15:36 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(lambda$getWorkerThread$2):161 - Completing future exceptionally...
io.airbyte.workers.exception.WorkerException: Normalization Failed.
	at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:63) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
	at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:21) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:158) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.airbyte.workers.exception.WorkerException: Normalization Failed.
	at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:60) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
	... 3 more
	Suppressed: io.airbyte.workers.exception.WorkerException: Normalization process wasn't successful
		at io.airbyte.workers.normalization.DefaultNormalizationRunner.close(DefaultNormalizationRunner.java:162) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
		at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:48) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
		at io.airbyte.workers.general.DefaultNormalizationWorker.run(DefaultNormalizationWorker.java:21) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
		at io.airbyte.workers.temporal.TemporalAttemptExecution.lambda$getWorkerThread$2(TemporalAttemptExecution.java:158) ~[io.airbyte-airbyte-workers-0.39.10-alpha.jar:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
2022-06-17 07:15:36 e[32mINFOe[m i.a.w.t.TemporalAttemptExecution(get):134 - Stopping cancellation check scheduling...
2022-06-17 07:15:36 e[32mINFOe[m i.a.w.t.TemporalUtils(withBackgroundHeartbeat):236 - Stopping temporal heartbeating...

I found there are date values which Redshift does not support (ex: datetime > 10000-01-01), I think its the root cause for this issue, need to clean up the source table first.

Hey it would be great to share the complete log for the sync?

Here is the complete log

error_due_to_10000-01-01.txt (853.4 KB)

Yeah got it. Thanks and your are right looks like there needs a cleanup