Stream is ignored after a connector update

lucienfregosi · August 16, 2022, 3:46pm

Is this your first time deploying Airbyte?: No
OS Version / Instance: Docker imaga
Memory / Disk: Enough
Deployment: Kubernetes
Airbyte Version: 0.39.32-alpha
Source name/version: Postgres (1.0.2)
Destination name/version: Google Cloud Storage (GCS) (0.2.9)
Step: Sync
Description:
I have a weird behaviour with Airbyte when I upgrade the postgres connector.
I did upgrade from 0.4.0 to 1.0.2 (but it happens whenever I upgrade the Postgres connector).
And when I upgrade some tables (in full refresh mode) are ignored and not synced anymore.
From the log file, I can see that the table is well detected by airbyte

2022-08-16 13:20:42 e[44msourcee[0m > 2022-08-16 13:20:42 e[32mINFOe[m i.a.i.s.r.s.CursorManager(createCursorInfoForStream):151 - Found matching cursor in state. Stream: AirbyteStreamNameNamespacePair{name='messages_v2', namespace='public'}. Cursor Field: null Value: null

But at the end I got this message in the log :

2022-08-16 13:20:48 e[44msourcee[0m > 2022-08-16 13:20:47 e[32mINFOe[m i.a.i.s.r.AbstractDbSource(getSelectedIterators):195 - Skipping stream public.messages_v2 because it is not in the source

Do you have any idea ? The only workaround I found is to drop the Airbyte Postgres database to force a full reset …

marcosmarxm · August 16, 2022, 6:58pm

@lucienfregosi do you mind sharing the complete log file?

lucienfregosi · August 17, 2022, 8:23am

Sure, here it is

gist.github.com

https://gist.github.com/lucienfregosibodyguard/ded5eb94a28b79597a7e337115c9fc75

gistfile1.txt

2022-08-16 13:20:06 [32mINFO[m i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.method: does not have a value in the enumeration [Standard], $.method: must be a constant value Standard
2022-08-16 13:20:37 [43mdestination[0m > SLF4J: Class path contains multiple SLF4J bindings.
2022-08-16 13:20:37 [43mdestination[0m > SLF4J: Found binding in [jar:file:/airbyte/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2022-08-16 13:20:06 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword max - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-08-16 13:20:06 [33mWARN[m c.n.s.JsonMetaSchema(newValidator):338 - Unknown keyword min - you should define your own Meta Schema. If the keyword is irrelevant for validation, just use a NonValidationKeyword
2022-08-16 13:20:06 [32mINFO[m i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.format_type: does not have a value in the enumeration [Avro], $.compression_codec: string found, object expected, $.compression_codec: should be valid to one and only one of the schemas 
2022-08-16 13:20:06 [32mINFO[m i.a.v.j.JsonSchemaValidator(test):71 - JSON schema validation failed. 
errors: $.format_type: does not have a value in the enumeration [CSV]

This file has been truncated. show original

Let me know if you find something.

Maybe I can add that the tables that are not replicated are partitioned table in full refresh mode (because I can’t replicate them in an incremental way Postgres Source CDC with partitioned tables · Issue #13442 · airbytehq/airbyte · GitHub)

marcosmarxm · August 17, 2022, 4:46pm

If you create another connection without using CDC in the Source are you able to sync the tables? My guess is maybe CDC won’t sync tables without being incremental + dedup

lucienfregosi · August 18, 2022, 2:29pm

Indeed @marcosmarxm if I create a new connection without CDC I am able to sync these table.

I’m not sure to understand why CDC can’t sync table without being incremental after the connector update (it would make more sense to me if it never works).

So maybe I should split everything in 2 connections:

One connection for all the incremental table with CDC
One connection for the partitioned table in full refresh without CDC

BTW still looking for a solution to get partitioned table in an incremental way …

marcosmarxm · August 22, 2022, 5:58am

This can be a workaround for now. I’ll raise a Github issue to check with team about this.

Can you share the desc/summary of a partitioned table you’re trying to sync? It failed to sync or doesn’t show to you the incremental sync mode?

lucienfregosi · August 22, 2022, 9:10am

For now the workaround is enough for us

By desc you mean the DDL of the table ? Basically the table is partitioned by day, each day we append a new partition that contains around 2-3 millions rows.

Table definition :

create table messages_v2
(
    uuid                   uuid                                   not null,
    published_at           timestamp(3)                           not null,
    ....
    text                   text                                   not null
)
    partition by RANGE (published_at);

And for every day :

create table messages_v2_20201020
    partition of messages_v2
        FOR VALUES FROM ('2020-10-20 00:00:00') TO ('2020-10-21 00:00:00');

With Postgres 13+ I can add the partition table to the publication. It is showed in the incremental mode and it doesn’t failed. But at the end 0 row and 0 byte are inserted.

Thanks for your help

marcosmarxm · August 25, 2022, 3:16am

Sorry the delay here Lucien I didn’t have to investigate further. I’ll do next week.

lucienfregosi · September 6, 2022, 8:22am

Any update @marcosmarxm ?

marcosmarxm · September 13, 2022, 5:53pm

Sorry Lucien, not yet

marcosmarxm · September 27, 2022, 4:46pm

Lucien sorry the long delay to help you. Did you still have the issue?

Topic		Replies	Views
Stack Syncs and Automated Full Resyncs on CDC Connector Questions & Issues source-postgres , destination-snowflake , connectors	13	1260	October 17, 2022
Partially Synced Postgres Connection Connector Questions & Issues source-postgres	2	249	November 11, 2022
Source Postgres - Sync completes without failure but not all streams get synced Connector Questions & Issues source-postgres , destination-s3 , data-loading	6	536	February 16, 2023
Posrgres Source: CDC Replication issues Connector Questions & Issues source-postgres , destination-snowflake , connectors	2	690	September 8, 2022
Missing Deletions on Postgres CDC Connector Questions & Issues source-postgres , data-loading , connectors	2	487	March 30, 2023

Stream is ignored after a connector update

Related topics