Repeated heartbeat failed WARN

jagannathsrs · July 7, 2022, 6:40pm

Hello team,

We are seeing the following error in almost every sync that runs, although they do eventually succeed, sometimes they keep running forever and I have to manually cancel and rerun the sync.

2022-07-07 10:51:29 WARN ActivityExecutionContextImpl(doHeartBeat):165 - Heartbeat failed
io.grpc.StatusRuntimeException: UNKNOWN: maximum attempts exceeded to update history
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262) ~[grpc-stub-1.42.1.jar:1.42.1]

It is deployed on EKS with 4 nodes of M5.xlarge size, backed by an external DB (m5.2xlarge), usage level never exceeds 30% on the DB. On reviewing your architecture I don’t see a seperate DB for Temporal.

What is causing this issue?
Which parameters would I have to fine tune to avoid this?
What are the implications if this goes unresolved? (more failure rate?)

thank you

marcosmarxm · July 8, 2022, 1:02pm

What version of Airbyte are you running?

jagannathsrs · July 8, 2022, 4:19pm

The version we are running is 0.39.17

marcosmarxm · July 12, 2022, 8:44pm

This is the first time I saw this error Jagannath, if you deploy Airbyte with the local database the same error happens?

jagannathsrs · July 13, 2022, 4:14pm

I haven’t tried that. That would require a lot of testing, can try it. Do you think the number of connection pools has something to do with it? We have 10-15 active sync jobs which isn’t a lot I suppose.

marcosmarxm · July 14, 2022, 9:05pm

No it isn’t a lot of connections. The external database is in the same region/vpc the Kubernetes cluster?

Topic		Replies	Views
Failure Origin: replication, Message: Something went wrong during replication Connector Questions & Issues getting-started , data-loading	5	2008	March 3, 2023
Kubernetes - Bigquery sync failed Connector Questions & Issues kubernetes	11	1349	July 14, 2022
Sync fails due to "Forbidden: pod updates may not change fields other than `spec.containers[].image`, `spec.initContainers[].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)" Platform, Deploy & Infra Issues connectors , kubernetes , deploy	4	3293	October 20, 2022
Failed attempts leading to duplicates Connector Questions & Issues source-postgres , destination-bigquery , data-loading , databases	5	1436	July 14, 2022
Running too many syncs concurrently causing non-explicit failures Connector Questions & Issues normalization , data-loading	3	1257	March 7, 2023

Repeated heartbeat failed WARN

Related topics