All Airbyte Syncs stopped for several days

So we encountered an error on the 26th of April where all of our syncs stopped working. They were stuck in either a “pending” state or a “running” state for about 8 days.

Any attempt to restart sync through the web UI would cause them to be stuck in the “pending” state until cancelled.

Devops managed to fix the problem with docker-compose restart and it hasn’t happened since.

The thing that bothers us about this is that there was no indication that everything had stopped for that long. One of our Data Scientists had noticed that a specific set of data hadn’t been updated in over a week, and that’s what prompted me to check out the web UI.

Is there any kind of mechanism available to alert us when it’s stuck like that? My boss suggested asking for a webhook that fires on sync completion, but anything else that gets the same result is welcome.

Hey which airbyte version are you using?

Sorry, forgot to include that in the initial post = 0.35.15-alpha

Can you try upgrading Airbyte to the latest version and try again?

Yeah sure I can look into updating.

Just to be clear, a restart on the current version got everything running again. We’re more concerned about being able to spot this in case it happens again, is there anything we can do?