Stream is ignored after a connector update

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance: Docker imaga
  • Memory / Disk: Enough
  • Deployment: Kubernetes
  • Airbyte Version: 0.39.32-alpha
  • Source name/version: Postgres (1.0.2)
  • Destination name/version: Google Cloud Storage (GCS) (0.2.9)
  • Step: Sync
  • Description:
    I have a weird behaviour with Airbyte when I upgrade the postgres connector.
    I did upgrade from 0.4.0 to 1.0.2 (but it happens whenever I upgrade the Postgres connector).
    And when I upgrade some tables (in full refresh mode) are ignored and not synced anymore.
    From the log file, I can see that the table is well detected by airbyte
2022-08-16 13:20:42 e[44msourcee[0m > 2022-08-16 13:20:42 e[32mINFOe[m i.a.i.s.r.s.CursorManager(createCursorInfoForStream):151 - Found matching cursor in state. Stream: AirbyteStreamNameNamespacePair{name='messages_v2', namespace='public'}. Cursor Field: null Value: null

But at the end I got this message in the log :

2022-08-16 13:20:48 e[44msourcee[0m > 2022-08-16 13:20:47 e[32mINFOe[m i.a.i.s.r.AbstractDbSource(getSelectedIterators):195 - Skipping stream public.messages_v2 because it is not in the source

Do you have any idea ? The only workaround I found is to drop the Airbyte Postgres database to force a full reset …

@lucienfregosi do you mind sharing the complete log file?

Sure, here it is

Let me know if you find something.

Maybe I can add that the tables that are not replicated are partitioned table in full refresh mode (because I can’t replicate them in an incremental way Postgres Source CDC with partitioned tables · Issue #13442 · airbytehq/airbyte · GitHub)

If you create another connection without using CDC in the Source are you able to sync the tables? My guess is maybe CDC won’t sync tables without being incremental + dedup

Indeed @marcosmarxm if I create a new connection without CDC I am able to sync these table.

I’m not sure to understand why CDC can’t sync table without being incremental after the connector update (it would make more sense to me if it never works).

So maybe I should split everything in 2 connections:

  • One connection for all the incremental table with CDC
  • One connection for the partitioned table in full refresh without CDC

BTW still looking for a solution to get partitioned table in an incremental way …

This can be a workaround for now. I’ll raise a Github issue to check with team about this.

Can you share the desc/summary of a partitioned table you’re trying to sync? It failed to sync or doesn’t show to you the incremental sync mode?

For now the workaround is enough for us :+1:

By desc you mean the DDL of the table ? Basically the table is partitioned by day, each day we append a new partition that contains around 2-3 millions rows.

Table definition :

create table messages_v2
(
    uuid                   uuid                                   not null,
    published_at           timestamp(3)                           not null,
    ....
    text                   text                                   not null
)
    partition by RANGE (published_at);

And for every day :

create table messages_v2_20201020
    partition of messages_v2
        FOR VALUES FROM ('2020-10-20 00:00:00') TO ('2020-10-21 00:00:00');

With Postgres 13+ I can add the partition table to the publication. It is showed in the incremental mode and it doesn’t failed. But at the end 0 row and 0 byte are inserted.

Thanks for your help

Sorry the delay here Lucien I didn’t have to investigate further. I’ll do next week.

1 Like

Any update @marcosmarxm ?

Sorry Lucien, not yet :frowning: