Posrgres Source: CDC Replication issues

Creating this ticket on behalf of Orfium, as they are experiencing issues using the Postgres Source Connector. Below is the detailed breakdown of the issue Giannis from Orfium shared, as well as the logs. I

@gianniskod can you fill out the additional details below.

  • Is this your first time deploying Airbyte?: No
  • OS Version / Instance:
  • Memory / Disk:
  • Deployment:
  • Airbyte Version: What version are you using now?
  • Source name/version: Posrgres 0.44.4 / 1.0.5
  • Destination name/version: Snowflake?
  • Step: The issue is happening during sync.
  • Description:
    Problem 1

Practically, there is one source Postgres Aurora Database in which we have already connected Fivetran and now we are trying to connect Airbyte also. Fivetran connector has been set up using a replication slot with the ‘test-decoding’ plugin. The important note here is that using this plugin, you don’t have to set up any publication for specific tables, and instead, it reads all the published changes through the WAL files and Fivetran chooses which tables to replicate, based on the schema configuration that each connector has (For more information check here and here).

Now for Airbyte, we have created a new replication slot using **pgoutput** plugin connected with a publication slot that has registered for 2 tables. One of these tables was also configured with Fivetran.

So our problem is that after completing the initial sync on Airbyte all the respective syncs have not fetched any data from the replication slot, even if there were thousands of them. From our investigation, the replication slot seems that has been stuck on the same LSN pointer for the last 15-20 days at least.

Of course, we have tried to increase the initial time-out change, but we didn’t see any change. Our last resort is to drop and recreate the replication slot along with the publication slot and see if that fixes the issue. (below you can find logs from one sync to check if there are any issues).

Problem 2

Here the problem emerged after upgrading the Postgres Connector from 0.44.4 to 1.0.5 version. From the provided logs you can check that the first problematic sync was related to the missing configuration of the connector since there were new settings that needed to be configured after the upgrade. So after applying the changes on the connector, we got 1 successful sync, and from that point, all the subsequent syncs started working in an unexpected way.

Practically, the connector instead of trying to identify the changes from the last sync, it started to scan all the tables rows like it was a reset operation. We left one sync like this to complete and then we noticed that all the next’s had the same behavior.

As a result, we

  1. Rollbacked the connector’s version to 0.44.4
  2. Reset the connection + the next sync to full resync the tables.
  3. Applied one regular sync after the full resync, and again it read all the raws of the tables.
  4. Attempted more sync, and it worked as expected.

I know that this scenario seems a little bit complex to replicate and for sure there may be steps that weren’t completed with the expected process, but our main questions here are:

  1. In order to upgrade our Postgres Connectors, do we have to create new connections? Or it should work with the already configured?

  2. If we have to create a new connection, should we do it with the same connector or we should create a new one?

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

Hey @gianniskod, I’m currently looking into this. Can you post the logs that you mentioned in your post?