Postgres REPLICA IDENTITY FULL tables are not found in source stream

filipe.gananca · August 10, 2022, 10:19am

Is this your first time deploying Airbyte?: Yes
OS Version / Instance: AWS linux
Memory / Disk: 8 GB /250 GB
Deployment: Docker
Airbyte Version: 0.39.41-alpha
Source name/version: Postgres 0.4.43 and 1.0.0
Destination name/version: Postgres 0.3.21
Step: Synch
Description:
Our replication configuration works well in CDC mode for all the source Postgres tables that have REPLICA IDENTITY DEFAULT;
However, two source tables that are log tables don’t have primary key and we set REPLICA IDENTITY FULL;
For these two tables Airbyte GUI only shows Full Refresh synch modes, and when synching is performed, no data is retrieved for these two tables.
Log says Skipping stream xxxx because it is not in the source for both tables.

[my logs|attachment](upload://nqA89VnpX0llxelgC2KdSqkmAnb.txt) (138.7 KB)

marcosmarxm · August 10, 2022, 12:54pm

@filipe.gananca if you create another connection without CDC enabled and sync only this two tables are you able to sync data?

filipe.gananca · August 11, 2022, 4:11pm

Hi @marcosmarxm, I finally have something systematic to share, as I was getting some confusing results. So the full story was:

I was trying to synch tables from 2 different schemas;
Changed the Source (that has two schemas defined) from Logical to Standard;
On the connection Replication, refreshed the schema, selected only the two problematic tables that are on the same schema, and and got error
logs-NO CDC.txt (117.5 KB)
Started from scratch, created a Source with only one schema and Standard Replication.
Created a new connection with only the two problematic tables: success!
Changed the source to Logical. On replication refreshed the schema and: success!
Changed the source to two schemas, refreshed schema on connection, and selected only the tables that I had succeeded: error.
Changed the source again to only one schema, refreshed the schema on connection, selected the two tables: same error.
Created again new source with one schema, new connection, and was able to do CDC and standard.

So apparently:

Two schemas as a source is problematic;
Changing back after having two schemas probably leaves something in metadata that causes problem.

In the meantime, while creating a new connection again I understood that Normalising has problems with timestamp fields (json stores a T between date and time on target) and can only make this work in Raw (json mode) . This is both for CDC and Standard.
logs-24 NO CDC one table with normalization.txt (240.4 KB)

Thanks!

marcosmarxm · August 12, 2022, 2:15pm

The source fields that is causing the issue in normalization is a datetime/timestamp in postgres or string? Both postgres are the same version?

filipe.gananca · August 27, 2022, 9:20am

Hi @marcosmarxm,

Source fields are timestamp in Postgres with no TZ. Source and target Postgres is actually the same DB.

However, as this schema has thousands of partitions, which would cause the timeout when reading schema, I created the connection on a dev environment and then migrated with octavia to QUA environment. DEV has PG 14, QUA is 13.4.

marcosmarxm · August 31, 2022, 8:59pm

Maybe one solution would be edit the Normalization to run the command described here https://stackoverflow.com/questions/29374472/error-date-time-field-value-out-of-range-30-12-2014-235804-postgresql and use as a custom transformation. Also you can export the normalization project and try it locally first to see if works.

Topic		Replies	Views
Source Postgres - Sync completes without failure but not all streams get synced Connector Questions & Issues source-postgres , destination-s3 , data-loading	6	534	February 16, 2023
Stack Syncs and Automated Full Resyncs on CDC Connector Questions & Issues source-postgres , destination-snowflake , connectors	13	1245	October 17, 2022
Stream is ignored after a connector update Connector Questions & Issues source-postgres	10	669	September 6, 2022
How to reflect schema changes in destination for Postgres to Postgres CDC? Connector Questions & Issues source-postgres , destination-postgres , data-loading , schema , cdc	4	2212	July 14, 2022
Failure to discover streams from Postgres source Connector Questions & Issues source-postgres , schema	9	1245	July 14, 2022

Postgres REPLICA IDENTITY FULL tables are not found in source stream

Related topics