Repeated heartbeat failed WARN

Hello team,

We are seeing the following error in almost every sync that runs, although they do eventually succeed, sometimes they keep running forever and I have to manually cancel and rerun the sync.

2022-07-07 10:51:29 WARN ActivityExecutionContextImpl(doHeartBeat):165 - Heartbeat failed
io.grpc.StatusRuntimeException: UNKNOWN: maximum attempts exceeded to update history
	at io.grpc.stub.ClientCalls.toStatusRuntimeException( ~[grpc-stub-1.42.1.jar:1.42.1]

It is deployed on EKS with 4 nodes of M5.xlarge size, backed by an external DB (m5.2xlarge), usage level never exceeds 30% on the DB. On reviewing your architecture I don’t see a seperate DB for Temporal.

  • What is causing this issue?
  • Which parameters would I have to fine tune to avoid this?
  • What are the implications if this goes unresolved? (more failure rate?)

thank you

What version of Airbyte are you running?

The version we are running is 0.39.17

This is the first time I saw this error Jagannath, if you deploy Airbyte with the local database the same error happens?

I haven’t tried that. That would require a lot of testing, can try it. Do you think the number of connection pools has something to do with it? We have 10-15 active sync jobs which isn’t a lot I suppose.

No it isn’t a lot of connections. The external database is in the same region/vpc the Kubernetes cluster?