Kubernetes check connection issues

  • Is this your first time deploying Airbyte?: Yes
  • Deployment: Kubernetes
  • Airbyte Version: What version are you using now? 0.35.61-alpha
  • Source name/version: BigQuery / 0.1.6
  • Step: The issue is happening during adding BigQuery as a connection.
  • Description:

I am super new to airbyte and having issues setting up bigquery out the gate, after deploying to GKE in the same project.
On the first screen after adding the Project ID and Credentials JSON I get:

The connection tests failed.
non-json response

In addition, the logs buttons through the UI do nothing.

1 Like

Hi @jpcassil,
It looks like your GKE cluster is having hard time communicating with BigQuery.
Could you please download and share the server logs so that we can try to identify the root cause?

Like I mentioned, the “download server logs” button doesn’t do anything, so that’s a bug that should probably be looked into separately.

I can grab the logs of the pods, but I am taking the next few days off. Will report back later!

@alafanechere , here are the complete server logs. I don’t see any relevant errors from the times when I am getting the error from the UI. I have even tried with the file connector and it fails.
server.log (1.7 MB)


I get this in the browser console when it fails… a 502

Thank you for sharing the logs. So according to what you said above, you are not able to set any source / destination / connection properly, this probably mean the Airbyte GKE deployment is unhealthy.
I have several additional questions to help you troubleshoot this:

  • Could you please share how you deployed Airbyte to GKE? By using our Helm Chart or via Kustomize?
  • Could you send the 502 response payload by using your browser network inspector?
  • Is your cluster connected to internet? I found this concerning log line:
    Unable to retrieve latest Source list from Github. Using the list bundled with Airbyte. This warning is expected if this Airbyte cluster does not have internet access.
  • It looks like the server can’t connect to the temporal service:
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: airbyte-temporal-svc/10.44.2.87:7233
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
  • Could you please share the log of the temporal pod? If it’s not too cumbersome for you, adding the scheduler and worker pod could help.

Thanks!

  • I deployed to GKE following your tutorial, via Kustomize.
  • Response payload:

<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
  • Is it connected to the internet? I tried ensuring that this was the case earlier. I believe we have opened all outgoing network traffic, and I tested it with curl on both the server and temporal pods:
 kubectl exec --stdin --tty airbyte-temporal-895c445d7-rn88r   -- /bin/bash 
bash-5.0# curl https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
kubectl exec --stdin --tty airbyte-server-587c49495f-vhzns   -- /bin/bash
root@airbyte-server-587c49495f-vhzns:/app# apt-get update
root@airbyte-server-587c49495f-vhzns:/app# apt-get -y install curl
root@airbyte-server-587c49495f-vhzns:/app#  curl https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv
John,Doe,120 jefferson st.,Riverside, NJ, 08075
Jack,McGinnis,220 hobo Av.,Phila, PA,09119
"John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
,Blankman,,SomeTown, SD, 00298
"Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123
  • Is there something else I should be doing to ensure that I have internet access? Can I manually attempt to refresh the source list?
  • Temporal logs here: temporal.log (112.7 KB)

Also, what is the best way to wipe airbyte from the cluster before reinstalling airbyte from scratch?

I have done this a couple of times by deleting all of the services, deployments, secrets, persistentVolumeClaims, persistentVolumes, configmaps and pods, and then applying again. It’s possible that this reinstall workflow causes downtime in a way that is not normal. I’m a bit new to kubernetes

Additional logs:
scheduler.log (418.7 KB)
worker.log (251.4 KB)

I think you performed the right check. This error can be transient and only appearing on startup but resolved afterward. If you are able to get the list of connectors it means this list was properly retrieved from GitHub, so the internet access is probably not the issue.

Also, what is the best way to wipe Airbyte from the cluster before reinstalling airbyte from scratch?

I think what you suggested is the right approach. If you are using an external database for Airbyte please also wipe this database. Another approach would be to delete all the kubernetes resources in the airbyte kubernetes namespace.

I found in the worker logs that the jobs to check your source connection succeeded on the back end: 2022-04-26 13:08:21 e[36mDEBUGe[m i.a.w.DefaultCheckConnectionWorker(run):77 - Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@7791267d[status=succeeded,message=<null>]

So the problem is probably a communication issue between the server and temporal. I’m going to reach out to our platform team for additional support. Coul you please upgrade to the latest Airbyte version beforehand?

Thank you for the help!

I am currently using 0.36.3-alpha which I think is the latest version? I wiped everything yesterday and reinstalled with a git pull from the master branch of the repo.

Do you still encounter this problem with 0.36.3-alpha ? I’m going to reach our platform team if you do.

Yes I do.

After further research, this problem does not happen when I use kubectl port-forward svc/airbyte-webapp-svc 8000:80 to directly access the service, but does happen when I’m accessing the airbyte instance(s) through the subdomain(s).

I think it might have to do with the ingress I am using, but I don’t know what the requirements are for making it work correctly.

I was using just a backend spec to forward traffic to the airbyte-webapp-svc

resource "kubernetes_ingress" "airbyte_ingress" {
  wait_for_load_balancer = "true"
  metadata {
    name = "${var.env}-airbyte-ingress"
    annotations = {
      "ingress.gcp.kubernetes.io/pre-shared-cert"   = google_compute_managed_ssl_certificate.airbyte_ssl.name
      "kubernetes.io/ingress.global-static-ip-name" = "${var.env}-airbyte-ip"
      "kubernetes.io/ingress.allow-http"            = "false"
    }
  }
  spec {
    backend {
      service_name = "airbyte-webapp-svc"
      service_port = "80"
    }
  }
}

Is there some wizardry with TLS that I need to take into account here?

@jpcassil could you first try to set it up without HTTPS / TLS?
Is the ingress in the same namespace as the service?

1 Like

I am also stuck with the same issue on latest airbyte on gke. Still not able to figure out where server and temporal not able to connect. I have also mapped external DNS to webapp ingress.

@saxenashivang could you also provide the logs? If you’re getting the same errors, I will escalate this to a GitHub issue.

Hi.
I am also stuck with the same issue on latest airbyte on gke.
I have also placed ingress and mapped it to airbyte-webapp.
I added the ingress resource to webapp.yaml as follows.

...
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: airbyte-webapp-ingress
spec:
  defaultBackend:
    service:
      name: airbyte-webapp-svc
      port:
        number: 80

The same non-json response is displayed as in @jpcassil .
I am attaching the job logs, and they don’t seem to be particularly problematic.
job.log (3.8 KB)

I have solved this problem.
It was a problem with the timeout setting for Load Balancing created by Ingress.

The default timeout is 30 seconds if not specifically set, and I solved the problem by setting this to a longer time.

That’s great to hear, thank you for your input @Nakachi-S!

Mine is set to 3600 seconds and it still fails. Please advice on what else I should try

I’m not able to add custom connectors or make changes to already existing connectors
405a5793_ab56_434d_ae7c_e548273227d9_logs_1051_txt.txt (62.6 KB)

I have checked for internet access and it works fine
I have airbyte version 0.40.26 deployed using helm chart
I have attached the logs from the UI