Postgres REPLICA IDENTITY FULL tables are not found in source stream

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: AWS linux
  • Memory / Disk: 8 GB /250 GB
  • Deployment: Docker
  • Airbyte Version: 0.39.41-alpha
  • Source name/version: Postgres 0.4.43 and 1.0.0
  • Destination name/version: Postgres 0.3.21
  • Step: Synch
  • Description:
    Our replication configuration works well in CDC mode for all the source Postgres tables that have REPLICA IDENTITY DEFAULT;
    However, two source tables that are log tables don’t have primary key and we set REPLICA IDENTITY FULL;
    For these two tables Airbyte GUI only shows Full Refresh synch modes, and when synching is performed, no data is retrieved for these two tables.
    Log says Skipping stream xxxx because it is not in the source for both tables.
[my logs|attachment](upload://nqA89VnpX0llxelgC2KdSqkmAnb.txt) (138.7 KB)

@filipe.gananca if you create another connection without CDC enabled and sync only this two tables are you able to sync data?

Hi @marcosmarxm, I finally have something systematic to share, as I was getting some confusing results. So the full story was:

  1. I was trying to synch tables from 2 different schemas;
  2. Changed the Source (that has two schemas defined) from Logical to Standard;
  3. On the connection Replication, refreshed the schema, selected only the two problematic tables that are on the same schema, and and got error
    logs-NO CDC.txt (117.5 KB)
  4. Started from scratch, created a Source with only one schema and Standard Replication.
  5. Created a new connection with only the two problematic tables: success!
  6. Changed the source to Logical. On replication refreshed the schema and: success!
  7. Changed the source to two schemas, refreshed schema on connection, and selected only the tables that I had succeeded: error.
  8. Changed the source again to only one schema, refreshed the schema on connection, selected the two tables: same error.
  9. Created again new source with one schema, new connection, and was able to do CDC and standard.

So apparently:

  1. Two schemas as a source is problematic;
  2. Changing back after having two schemas probably leaves something in metadata that causes problem.

In the meantime, while creating a new connection again I understood that Normalising has problems with timestamp fields (json stores a T between date and time on target) and can only make this work in Raw (json mode) . This is both for CDC and Standard.
logs-24 NO CDC one table with normalization.txt (240.4 KB)

Thanks!

The source fields that is causing the issue in normalization is a datetime/timestamp in postgres or string? Both postgres are the same version?

Hi @marcosmarxm,

Source fields are timestamp in Postgres with no TZ. Source and target Postgres is actually the same DB.

However, as this schema has thousands of partitions, which would cause the timeout when reading schema, I created the connection on a dev environment and then migrated with octavia to QUA environment. DEV has PG 14, QUA is 13.4.

Maybe one solution would be edit the Normalization to run the command described here https://stackoverflow.com/questions/29374472/error-date-time-field-value-out-of-range-30-12-2014-235804-postgresql and use as a custom transformation. Also you can export the normalization project and try it locally first to see if works.