Postgres Source replication slot safe_wal_size only reset when a change occurs

  • Is this your first time deploying Airbyte?: Yes
  • OS Version / Instance: Oracle Linux 8 Ec2 Instance
  • Memory / Disk: 2Gb / 100GB
  • Deployment: Docker
  • Airbyte Version: 0.40.18
  • Source name/version: Postgres
  • Destination name/version: S3
  • Step: The issue is happening during sync
  • Description:

When Airbyte syncs and successfully pulls and records a change, the safe_wal_size metric in postgresql’s replication slot is reset, and any extra disk space used for recording the wal is free’d, however, When no change in the data source is made the sync completes but the safe_wal_size metric continues to decrease.
Should a sync that pulls no changes still reset the wal as it has been read?
This could cause a disk usage issue where the replication slot is not reset unless data has changed at the source.

Postgres version 13
Parameter group wal options;

  • max_slot_wal_keep_size 1024
  • max_wal_size 250
  • wal_keep_size 4096

Here are the logs for the sync with data which successfully resets the wal replication slot
airbyte-sync-data.txt (4.3 MB)

Here are the logs for the sync with no data which does not reset the wal replication slot
airbyte-sync-nodata.txt (4.2 MB)

Hello there! You are receiving this message because none of your fellow community members has stepped in to respond to your topic post. (If you are a community member and you are reading this response, feel free to jump in if you have the answer!) As a result, the Community Assistance Team has been made aware of this topic and will be investigating and responding as quickly as possible.
Some important considerations that will help your to get your issue solved faster:

  • It is best to use our topic creation template; if you haven’t yet, we recommend posting a followup with the requested information. With that information the team will be able to more quickly search for similar issues with connectors and the platform and troubleshoot more quickly your specific question or problem.
  • Make sure to upload the complete log file; a common investigation roadblock is that sometimes the error for the issue happens well before the problem is surfaced to the user, and so having the tail of the log is less useful than having the whole log to scan through.
  • Be as descriptive and specific as possible; when investigating it is extremely valuable to know what steps were taken to encounter the issue, what version of connector / platform / Java / Python / docker / k8s was used, etc. The more context supplied, the quicker the investigation can start on your topic and the faster we can drive towards an answer.
  • We in the Community Assistance Team are glad you’ve made yourself part of our community, and we’ll do our best to answer your questions and resolve the problems as quickly as possible. Expect to hear from a specific team member as soon as possible.

Thank you for your time and attention.
Best,
The Community Assistance Team

I’ve had this issue with postgres source connectors versions;
1.0.22 and 1.0.25

From poking around release notes, I’m going to try upgrading airbyte version from 0.40.18 to the latest 0.40.22, as the following fix in release 0.40.19 may resolve the issue;

Investigate usage of heartbeat property in debezium for CDC · Issue #15040 · airbytehq/airbyte · GitHub
https://github.com/airbytehq/airbyte/pull/19004

Upgrading did not resolve the issue.

Hello Billy, it’s been a while without an update from us. Are you still having problems or did you find a solution?

Hey, This is still a problem.

We’ve now got our production Airbyte instance running and have created a few connections with production databases.

Our replication slots are configured via RDS parameter group with the following parameters.

  • max_slot_wal_keep_size 2048 (MB) Sets the maximum WAL size that can be reserved by replication slots.
  • max_wal_senders 20 Sets the maximum number of simultaneously running WAL sender processes.
  • max_wal_size 2048 (MB) Sets the WAL size that triggers a checkpoint.
  • min_wal_size 192 (MB) Sets the minimum size to shrink the WAL to.
  • wal_compression 1 Compresses full-page writes written in WAL file.
  • wal_keep_size 2048 (MB) Sets the size of WAL files held for standby servers.

We are only syncing 9 tables of data via the publication which are not updated overnight.
Airbyte is syncing once an hour.
Airbyte appears to be only resetting a replication slot if there are changes to the data, which means overnight the replication slot will eventually fill and then become lost, causing us to have to reset the replication slot and then perform the initial sync of all data again.

We’re at the point where creating a table which we update every 30 minutes and adding this into the publication should hopefully cause airbyte to clear the replication slot as a workaround.

Along with increasing the RDS parameter max_slot_wal_keep_size from 2048 to 3072 I’ve implemented a table into the publication called airbyteworkaround for each database which gets updated once an hour.

This ensures that there is at least 1 data change in every hourly sync and the replication slot is now being cleared.