502 Timeout When Creating Connection

  • Is this your first time deploying Airbyte?: Yes!

  • Deployment: Kubernetes (AWS EKS v1.21) Helm Chart v0.3.2

  • Airbyte Version: 0.35.12-alpha

  • Source name/version: Gitlab v0.1.5

  • Step: Connection Creation

I’ve deployed Airbyte successfully to our K8S cluster via the Helm chart. I suspect this is an issue similar to #594 where Airbyte is unable to communicate with temporal.

When I attempt to validate the Gitlab connection, I see a pod created for Gitlab however the webapp receives a 502 timeout.

NAME                                                              READY   STATUS      RESTARTS   AGE
airbyte-bootloader                                                0/1     Completed   0          5h41m
airbyte-minio-5599f47df9-6s2cv                                    1/1     Running     7          5h40m
airbyte-pod-sweeper-6d48fbbdc6-9cl2c                              1/1     Running     0          5h40m
airbyte-postgresql-0                                              1/1     Running     0          5h41m
airbyte-scheduler-6cbdbc8d96-sjfsd                                1/1     Running     0          5h40m
airbyte-server-85859cf846-skrqd                                   1/1     Running     0          5h40m
airbyte-temporal-fd66cbf59-rdm8d                                  1/1     Running     0          5h40m
airbyte-webapp-dcdcdcf74-74mcn                                    1/1     Running     0          5h40m
airbyte-worker-76dd6c68f4-9mdrq                                   1/1     Running     0          5h40m
source-gitlab-sync-5e54fde4-7122-4a6d-a504-7a44a056b8f3-0-hbesx   4/4     Running     0          8m37s

I see that temporal exposes a port, is that port something that needs to be available via HTTP or is it a pod-to-pod network call?

I have attached the server, however the scheduler logs downloaded from the UI were empty. I pulled the logs from the pod manually.

scheduler.log (6.9 KB)
server-logs.txt (692.9 KB)

Here are the worker logs as well, which seem to also imply that there is a connection issue.

2022-05-24 20:12:25 ERROR i.t.i.s.WorkflowExecuteRunnable(logWorkflowExecutionException):125 - Workflow execution failure WorkflowId=14f1807a-5034-40df-8eb2-cde93b740a76, RunId=c70e1b91-6c8a-48a9-85cd-8968cec07ad6, WorkflowType=CheckConnectionWorkflow
io.temporal.failure.ActivityFailure: scheduledEventId=5, startedEventId=6, activityType='Run', activityId='b6fc9175-c295-3a7d-bee4-5e0166bef996', identity='', retryState=RETRY_STATE_NON_RETRYABLE_FAILURE
	at java.lang.Thread.getStackTrace(Thread.java:1610) ~[?:?]
	at io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:48) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:77) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:70) ~[temporal-sdk-1.6.0.jar:?]
	at jdk.proxy2.$Proxy43.run(Unknown Source) ~[?:?]
	at io.airbyte.workers.temporal.check.connection.CheckConnectionWorkflowImpl.run(CheckConnectionWorkflowImpl.java:28) ~[io.airbyte-airbyte-workers-0.35.12-alpha.jar:?]
	at jdk.internal.reflect.GeneratedMethodAccessor338.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:317) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:292) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:72) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:137) ~[temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:101) [temporal-sdk-1.6.0.jar:?]
	at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:111) [temporal-sdk-1.6.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: io.temporal.failure.TimeoutFailure: message='activity timeout', timeoutType=TIMEOUT_TYPE_SCHEDULE_TO_CLOSE

worker.txt (598.2 KB)

Hey I could see an error in scheduler log about not connecting to minio. Can you share the logs of the minio pod?

minio.txt (1.3 KB)

So taking a hint from #594 I took our ingress out of the picture, which removed the 504 timeout issue. However now it spins at “Testing connection…” and the POST /api/v1/scheduler/sources/check_connection never resolves, however I see the airbyte-server returns a 200 based on the access logs.

2022-05-26 16:33:44 INFO i.a.s.RequestLogger(filter):95 - REQ 10.172.104.5 POST 200 /api/v1/scheduler/sources/check_connection - {"sourceDefinitionId":"5e6175e5-68e1-4c17-bff9-56103bbb0d80","connectionConfiguration":"REDACTED"}

Hey do you find some error in the web console? If not can you try hitting the same API through POSTMAN with the same body and see if that works? Is this something doable?

Unfortunately the request never completes in the browser or via cURL, it just spins.

When I terminate the request in cURL I see the following log line in the webapp pod:

127.0.0.1 - - [26/May/2022:18:10:06 +0000] "POST /api/v1/scheduler/sources/check_connection HTTP/1.1" 499 0 "http://localhost:8000/workspaces/dbaad36f-427b-4f4c-b9a7-8a211c101a63/connections/new-connection" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:100.0) Gecko/20100101 Firefox/100.0" "-"

However, when I send the request a new worker pod is scheduled:

NAME                                                              READY   STATUS    RESTARTS   AGE
airbyte-minio-5599f47df9-d9sj6                                    1/1     Running   0          5h20m
airbyte-pod-sweeper-6d48fbbdc6-85vc6                              1/1     Running   0          5h20m
airbyte-postgresql-0                                              1/1     Running   0          5h21m
airbyte-scheduler-6cbdbc8d96-pc579                                1/1     Running   0          5h20m
airbyte-server-85859cf846-cmrqc                                   1/1     Running   2          5h20m
airbyte-temporal-fd66cbf59-bkskh                                  1/1     Running   0          5h20m
airbyte-webapp-dcdcdcf74-2r72d                                    1/1     Running   0          5h20m
airbyte-worker-76dd6c68f4-h6p7l                                   1/1     Running   0          5h20m
...
source-gitlab-sync-5d58be9f-e843-437a-985d-56e7792b6113-0-umtyj   4/4     Running   0          4m41s
source-gitlab-sync-72291a43-3366-49de-a776-e39ca6b6481f-0-cekpc   4/4     Running   0          2m16s
source-gitlab-sync-89395c08-5396-4b68-8d33-ece1fe205ffe-0-nvmsn   4/4     Running   0          7m28s
source-gitlab-sync-b29363dc-ab9a-42b1-b327-54bb5a1ba5c6-0-pymqr   4/4     Running   0          8m1s
source-gitlab-sync-fae8da68-5d0a-4e83-8c8a-799eee7681a1-0-nhswd   4/4     Running   0          7m21s

Here are the latest logs, if it helps:

server-logs(1).txt (5.0 MB)
scheduler-logs(1).txt (74.5 KB)
worker.txt (208.5 KB)

Hey looks like the connection is failing as per the logs? To help us understand better can you try with other sources like github or something

I’m also testing airbyte out with Mike and we may have been getting the timeout initially from our gloo service. However, after bypassing that I’m noticing that we’re getting a 1 hour timeout when trying to pull all groups from gitlab. It errors when attempting to test the connection. I tried limiting the start date to a date in the future in case that was somehow also running the full date range in the test. Is there a 1 hour timeout/limitation within the app?

Hey what do you see as a response for that failed call?

It looks like it was a 504 gateway timeout on check_connection_for_update right at the 1 hour mark.

Hey it would be great to check with some smaller source like github/csv/postgres(local) to understand if the error is related to platform or related to the gitlab? Could you help with this information

Same thing here. We have a local Gitlab deployment with hundreds of projects. Airbyte 0.36.7-alpha, and 0.39.23-alpha both show same behaviour. Must be because of the connector, still in alpha.

It wont pass from Source test when trying to create the gitlab source. Maybe it works with small gitlab deployments, but not with ours. I see it trying to get info from all projects/groups/etc.

Yeah could be because it is taking more time and the server is timing out. You can check the server timeout time and increase it probably