Error while checking connection for multiple sources in Airbyte

Summary

The user is facing an issue where the check connection pod errors out when trying to connect multiple sources in Airbyte. The logs show a NullPointerException and an error message indicating no output for checking connection status.


Question

Hello Airbyte Community,

We are facing an odd but interesting issue while using airbyte. While setting up a source (mailchimp) it’s check pod is created and connection is being succeeded. While I connect another source or destination i.e redshift/bigquery, the check connection pod errors out.

While airbyte worker logs are:

Log4j2Appender says: 
2024-05-22 16:34:42 INFO i.a.c.i.LineGobbler(voidCall):149 - 
Log4j2Appender says: ----- END CHECK -----
2024-05-22 16:34:42 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- END CHECK -----
2024-05-22 16:34:42 INFO i.a.c.t.TemporalUtils(withBackgroundHeartbeat):330 - Temporal heartbeating stopped.
Log4j2Appender says: 
2024-05-22 16:34:42 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2024-05-22 16:34:42 WARN i.t.i.a.ActivityTaskExecutors$BaseActivityTaskExecutor(execute):114 - Activity failure. ActivityId=10c6cf05-ef8e-31fe-8796-3aa961395c4c, activityType=RunWithJobOutput, attempt=1
java.lang.RuntimeException: io.temporal.serviceclient.CheckedExceptionWrapper: io.airbyte.workers.exception.WorkerException: Unexpected error while getting checking connection.
	at io.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:319) ~[io.airbyte-airbyte-commons-temporal-0.50.33.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:121) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:578) ~[?:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:95) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:92) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:241) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:206) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:179) ~[temporal-sdk-1.17.0.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:93) ~[temporal-sdk-1.17.0.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1589) ~[?:?]```
```Log4j2Appender says: (pod: dev / ion-redshift-check-4438e16d-6b8f-485b-9986-34668445e7fe-0-ujvfq) - Closed all resources for pod
2024-05-22 15:40:48 [32mINFO[m i.a.w.p.KubePodProcess(close):809 - (pod: dev / ion-redshift-check-4438e16d-6b8f-485b-9986-34668445e7fe-0-ujvfq) - Closed all resources for pod
Log4j2Appender says: Check connection job subprocess finished with exit code 3
2024-05-22 15:40:48 [33mWARN[m i.a.w.g.DefaultCheckConnectionWorker(run):110 - Check connection job subprocess finished with exit code 3
Log4j2Appender says: Unexpected error while checking connection: 
2024-05-22 15:40:48 [1;31mERROR[m i.a.w.g.DefaultCheckConnectionWorker(run):133 - Unexpected error while checking connection: 
io.airbyte.workers.exception.WorkerException: Error checking connection status: no status nor failure reason were outputted
	at io.airbyte.workers.WorkerUtils.throwWorkerException(WorkerUtils.java:268) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:120) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]
	at io.airbyte.workers.general.DefaultCheckConnectionWorker.run(DefaultCheckConnectionWorker.java:44) ~[io.airbyte-airbyte-commons-worker-0.50.33.jar:?]
	at io.airbyte.workers.temporal.TemporalAttemptExecution.get(TemporalAttemptExecution.java:135) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.lambda$runWithJobOutput$1(CheckConnectionActivityImpl.java:136) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
	at io.airbyte.commons.temporal.TemporalUtils.withBackgroundHeartbeat(TemporalUtils.java:314) ~[io.airbyte-airbyte-commons-temporal-0.50.33.jar:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionActivityImpl.runWithJobOutput(CheckConnectionActivityImpl.java:121) ~[io.airbyte-airbyte-workers-0.50.33.jar:?]
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) ~[?:?]
...
2024-05-22 15:40:48 [32mINFO[m i.a.c.i.LineGobbler(voidCall):149 - 
Log4j2Appender says: ----- END CHECK -----
2024-05-22 15:40:48 [32mINFO[m i.a.c.t.TemporalUtils(withBackgroundHeartbeat):330 - Temporal heartbeating stopped.```
At the same time, logs of airbyte-server pod:
```2024-05-22 15:40:48 [1;31mERROR[m i.a.c.l.Exceptions(swallow):65 - Swallowed error.
java.lang.NullPointerException: Cannot invoke "io.airbyte.config.Metadata.getAdditionalProperties()" because the return value of "io.airbyte.config.FailureReason.getMetadata()" is null
	at io.airbyte.persistence.job.errorreporter.JobErrorReporter.getFailureReasonMetadata(JobErrorReporter.java:283) ~[io.airbyte.airbyte-persistence-job-persistence-0.50.33.jar:?]
	at io.airbyte.persistence.job.errorreporter.JobErrorReporter.reportJobFailureReason(JobErrorReporter.java:328) ~[io.airbyte.airbyte-persistence-job-persistence-0.50.33.jar:?]
	at io.airbyte.persistence.job.errorreporter.JobErrorReporter.reportDestinationCheckJobFailure(JobErrorReporter.java:195) ~[io.airbyte.airbyte-persistence-job-persistence-0.50.33.jar:?]
	at io.airbyte.commons.server.scheduler.DefaultSynchronousSchedulerClient.lambda$reportError$4(DefaultSynchronousSchedulerClient.java:293) ~[io.airbyte-airbyte-commons-server-0.50.33.jar:?]
	at io.airbyte.commons.lang.Exceptions.swallow(Exceptions.java:63) ~[io.airbyte-airbyte-commons-0.50.33.jar:?]
	at io.airbyte.commons.server.scheduler.DefaultSynchronousSchedulerClient.reportError(DefaultSynchronousSchedulerClient.java:286) ~[io.airbyte-airbyte-commons-server-0.50.33.jar:?]
	at io.airbyte.commons.server.scheduler.DefaultSynchronousSchedulerClient.execute(DefaultSynchronousSchedulerClient.java:228) ~[io.airbyte-airbyte-commons-server-0.50.33.jar:?]
	at io.airbyte.commons.server.scheduler.DefaultSynchronousSchedulerClient.createDestinationCheckConnectionJob(DefaultSynchronousSchedulerClient.java:143) ~[io.airbyte-airbyte-commons-server-0.50.33.jar:?]
	at io.airbyte.commons.server.handlers.SchedulerHandler.checkDestinationConnectionFromDestinationCreate(SchedulerHandler.java:341) ~[io.airbyte-airbyte-commons-server-0.50.33.jar:?]
	at io.airbyte.server.apis.SchedulerApiController.lambda$executeDestinationCheckConnection$0(SchedulerApiController.java:39) ~[io.airbyte-airbyte-server-0.50.33.jar:?]
	at io.airbyte.server.apis.ApiHelper.execute(ApiHelper.java:27) ~[io.airbyte-airbyte-server-0.50.33.jar:?]
	at io.airbyte.server.apis.SchedulerApiController.executeDestinationCheckConnection(SchedulerApiController.java:39) ~[io.airbyte-airbyte-server-0.50.33.jar:?]
	at io.airbyte.server.apis.$SchedulerApiController$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-server-0.50.33.jar:?]
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:371) ~[micronaut-inject-3.10.1.jar:3.10.1]
	at io.micronaut.context.DefaultBeanContext$4.invoke(DefaultBeanContext.java:594) ~[micronaut-inject-3.10.1.jar:3.10.1]
	at io.micronaut.web.router.AbstractRouteMatch.execute(AbstractRouteMatch.java:303) ~[micronaut-router-3.10.1.jar:3.10.1]
	at io.micronaut.web.router.RouteMatch.execute(RouteMatch.java:111) ~[micronaut-router-3.10.1.jar:3.10.1]
	at io.micronaut.http.context.ServerRequestContext.with(ServerRequestContext.java:103) ~[micronaut-http-3.10.1.jar:3.10.1]
	at io.micronaut.http.server.RouteExecutor.lambda$executeRoute$14(RouteExecutor.java:659) ~[micronaut-http-server-3.10.1.jar:3.10.1]
	at reactor.core.publisher.FluxDeferContextual.subscribe(FluxDeferContextual.java:49) ~[reactor-core-3.5.5.jar:3.5.5]
	at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62) ~[reactor-core-3.5.5.jar:3.5.5]
	at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194) ~[reactor-core-3.5.5.jar:3.5.5]
	at io.micronaut.reactive.reactor.instrument.ReactorInstrumentation.lambda$init$0(ReactorInstrumentation.java:62) ~[micronaut-runtime-3.10.1.jar:3.10.1]
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84) ~[reactor-core-3.5.5.jar:3.5.5]
	at reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37) ~[reactor-core-3.5.5.jar:3.5.5]
	at io.micronaut.scheduling.instrument.InvocationInstrumenterWrappedCallable.call(InvocationInstrumenterWrappedCallable.java:53) ~[micronaut-context-3.10.1.jar:3.10.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.lang.Thread.run(Thread.java:1589) ~[?:?]
2024-05-22 15:40:48 [32mINFO[m i.m.h.s.n.h.a.e.AccessLog(log):125 - 16-0-101-0.airbyte-airbyte-webapp-svc.dev.svc.cluster.local - - [22/May/2024:15:40:43 +0000] "POST /api/v1/scheduler/destinations/check_connection HTTP/1.0" 200 597```
While airbyte-temporal logs (no errrot at that time):
```{"level":"info","ts":"2024-05-22T15:38:30.264Z","msg":"none","service":"matching","component":"matching-engine","wf-task-queue-name":"1@airbyte-worker-79744d44ff-jzkqq:926328e8-4d8b-4426-a6ed-b389af190af9","wf-task-queue-type":"Workflow","wf-namespace":"default","lifecycle":"Started","logging-call-at":"taskQueueManager.go:292"}
{"level":"info","ts":"2024-05-22T15:38:30.286Z","msg":"none","service":"matching","component":"matching-engine","wf-task-queue-name":"1@airbyte-worker-79744d44ff-jzkqq:73310b14-0d08-4dc8-a4b0-6ddb59d02b4f","wf-task-queue-type":"Workflow","wf-namespace":"default","lifecycle":"Started","logging-call-at":"taskQueueManager.go:292"}```
---
To hightlight one error log, It is not telling the reason why it is failing the check connection:
`Caused by: io.airbyte.workers.exception.WorkerException: Error checking connection status: no ./ outputted` 
Please help me out here how can I debug this further? Or if you have any resolutions for this.
Thanks in advance!!!

---
*Deployed On*: Kubernetes v1.28 (EKS)
*AIrbyte Version: _v0.50.33_*
*Helm Chart Version: _0.49.6_*
Deployed Via ArgoCD

FYI: <@U0333LTETKQ> <@U05BMQPJMGE> <@U03C20E10G6> <@U038BF2DY72>  Team please add if I missed anything.

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1716396543014009) if you want 
to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["airbyte", "check-connection", "source", "destination", "error", "null-pointer-exception", "debugging"]
</sub>

as always, it depends :wink: how much data and how many streams do you have for connector
in my project, due to huge amount of data and more than 2000 streams (one stream per sharded table) in BigQuery connector we have following configuration for jobs’ resources and it works stable for us

      requests:
        cpu: 2048m
        memory: 4Gi
      limits:
        cpu: 4096m
        memory: 8Gi```
it might a bit of overprovisioning, adjust to your needs
keep in mind, that connectors written in Java and running on JVM quite often will require more memory than those written in Python, but it also depends what connector is doing

there’s a bit of advice https://docs.airbyte.com/operator-guides/scaling-airbyte#memory|here on memory, not sure if you’ve already seen it :slightly_smiling_face:

We have identified the issue, Actually sync pods were failing with exit code 137, which means they had memory pressure. (although pods were not in OOMkilled state – nvm)
Initially we had applied this resource quota on airbyte jobs:

      requests:
        cpu: 100m
        memory: 25Mi
      limits:
        cpu: 200m
        memory: 50Mi```
We updated the resource quota to the following and our sync pods started working fine.
```    resources:
      requests:
        cpu: 100m
        memory: 256Mi
      limits:
        cpu: '200m'
        memory: 1Gi ```

If anyone can suggest what would be the recommended values for the resource quota for production environment, please share your insights :bulb: ??


<@U02CP5YE44V> <@U03C20E10G6>