Source Postgres: "Sync worker failed. Source cannot be stopped!"

  • Is this your first time deploying Airbyte?: No
  • Deployment: Kubernetes
  • Airbyte Version: v0.39.36-alpha
  • Source name/version: Postgres 11.16
  • Destination name/version: Big Query

We are trying to set up a connection using logical replication between Postgres and Big Query. We managed to make it work on a small table. But when running on a bigger dataset, it fails after all the data have been uploaded to GCS for staging (which can take a couple of hours).

Here is the log file of the first failed run: https://storage.googleapis.com/zapper-fi-assets/misc/logs-15692.txt (I had to upload it to our bucket because it busts the 8mb limit)

A few things that stand out:

  1. JSON schema validator errors
  2. org.postgresql.util.PSQLException: ERROR: snapshot too old
  3. Sync worker failed, Source cannot be stopped!

At the end it states that the run was successful, but it actually fails and restarts from scratch in the second attempt.

I would appreciate any pointers you can provide :pray:

Hi @felix-d, thanks for your post. Upon first inspection, i think this is related to this issue on github: https://github.com/airbytehq/airbyte/issues/5870.

The behavior that you described seems very similar to what has been described by some of our other users. Could you share some more details about how big your dataset is and how you’ve deployed Airbyte on Kubernetes?

Our dataset is quite large, hundreds of GBs. Airbyte is deployed on GKE (Google Cloud) and the source is Postgres on Cloud SQL. I’ll keep investigating. Let me know if you find anything.