Summary
Workers stop functioning intermittently in Airbyte deployment on GCP CE, requiring container cleanup and recreation. Seeking guidance on resolving the issue and implementing a health check service for monitoring service availability.
Question
Hi Team,
We’ve encountered an issue with our Airbyte deployment on GCP CE. Randomly, our workers seem to stop functioning while the application UI remains operational. Syncing and connection setups halt intermittently. We’ve observed that a workaround involves cleaning all containers and recreating them, after which functionality resumes as expected.
Could you provide guidance on resolving this issue? The error details can be found https://docs.google.com/document/d/1mg067RhSh5gZ46sytjVasEbD13qKFLXO-K2Q4Imopf0/edit|here.
Additionally, we’re in need of a health check service to monitor service availability. We’d like confirmation that the health check API (https://reference.airbyte.com/reference/get_health|link) will alert us to any service outages. Alternatively, do you have suggestions on how we can detect if a worker or specific container is running but unresponsive?
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.