Error creating source or destination on GKE Autopilot private cluster behind Shared VPC with IAP

slack-user-airbyte · May 14, 2024, 6:10pm

Summary

User is facing an issue when trying to create a source or destination on a GKE Autopilot private cluster behind a Shared VPC with IAP. The error message received is ‘Server temporarily unavailable (http.502.my4YrLdeHmndZBTKh9j1Kr)’. User has made necessary changes like adding annotations, creating Ingress, enabling IAP, and setting up firewall rules. The logs show successful connections but the operation fails. Looking for insights from others with a similar setup.

Question

Hello, I’m using a GKE Autopilot private cluster on a new (but not our first) Airbyte deployment (on current app version, 0.57.2). It is deployed via Helm (chart version 0.64.81). This is the first setup where we’ve been behind a Shared VPC (so the Airbyte project isn’t the VPC host project, just a service project). We’re using Cloud NAT for a stable outbound IP, and Identity-Aware Proxy (IAP) for auth.

Overall, things deployed pretty smoothly by only making the following changes:

Added the <http://cloud.google.com/neg|cloud.google.com/neg>: '{"ingress": true}' annotation to the airbyte-webapp-svc Service
Created an Ingress (external LB) for the airbyte-webapp-svc Service with HTTPS termination set up
Enabled IAP for airbyte-webapp-svc
Created a firewall rule in the VPC host to allow traffic through to the Ingress LB
From there, everything seemed to work fine—IAP forced auth, Airbyte detected that it was secured, and the webapp loads and interacts correctly including updating connector versions (which means the outbound Cloud NAT is working).

The only thing that doesn’t seem to work is that when I try to create a source OR destination, regardless of type, I get the following error:
Server temporarily unavailable (http.502.my4YrLdeHmndZBTKh9j1Kr)

After about 50 tries my BigQuery destination worked, but I haven’t been able to get it to work since. I saw some notes like https://discuss.airbyte.io/t/kubernetes-check-connection-issues/594/18|this suggesting increasing the timeout of the created LB, which has no effect for me (and in theory pod-to-pod communication shouldn’t be going through the Ingress LB anyway). The logs on the connection check workload always look the same, like this (in this case, a Mailchimp API key connector to reduce variables, but it’s all of them):

Using existing AIRBYTE_ENTRYPOINT: python /airbyte/integration_code/main.py
Waiting on CHILD_PID 7
PARENT_PID: 1
2024/04/11 20:00:54 socat[8] N reading from and writing to stdio
2024/04/11 20:00:54 socat[8] N opening connection to AF=2 10.1.0.74:9032
2024/04/11 20:00:54 socat[8] N successfully connected from local address AF=2 10.1.1.146:45958
2024/04/11 20:00:54 socat[8] N starting data transfer loop with FDs [0,1] and [5,5]
2024/04/11 20:00:54 socat[7] N reading from and writing to stdio
2024/04/11 20:00:54 socat[7] N opening connection to AF=2 10.1.0.74:9033
2024/04/11 20:00:54 socat[7] N successfully connected from local address AF=2 10.1.1.146:43238
2024/04/11 20:00:54 socat[7] N starting data transfer loop with FDs [0,1] and [5,5]
EXIT_STATUS: 0
2024/04/11 20:01:34 socat[7] N socket 1 (fd 0) is at EOF
2024/04/11 20:01:34 socat[8] N socket 1 (fd 0) is at EOF
2024/04/11 20:01:34 socat[7] N socket 2 (fd 5) is at EOF
2024/04/11 20:01:34 socat[7] N exiting with status 0
2024/04/11 20:01:34 socat[8] N socket 2 (fd 5) is at EOF
2024/04/11 20:01:34 socat[8] N exiting with status 0
Terminated```
I'm not seeing any logging indicating resource constraints for the pods (but they're also very short-lived for connection checks).

Is anyone else using a similar GKE Autopilot setup with any insights?

<br>

---

This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. [Click here](https://airbytehq.slack.com/archives/C021JANJ6TY/p1712872234158139) if you want to access the original thread.

[Join the conversation on Slack](https://slack.airbyte.com)

<sub>
["gke-autopilot", "private-cluster", "shared-vpc", "iap", "source-destination", "http-502-error", "ingress-lb", "firewall-rules", "connection-check", "pod-communication"]
</sub>

Topic		Replies	Views
Source creation failure in self-managed OSS instance on GKE Platform Questions platform , airbyte , question , 504-error , gke	8	14	October 22, 2024
Issue Airbyte on GKE without internet Platform, Deploy & Infra Issues kubernetes , deploy	11	1102	July 14, 2022
Kubernetes check connection issues Connector Questions & Issues getting-started , connectors , kubernetes , source-bigquery , temporal	19	3283	January 5, 2023
Secure Airbyte access on GKE with IAP Guides security	0	664	February 24, 2022
Setup Guide for Airbyte + GKE + Identity Aware Proxy Platform Questions platform , airbyte , question , gke , identity-aware-proxy	4	231	September 13, 2024

Error creating source or destination on GKE Autopilot private cluster behind Shared VPC with IAP

Summary

Question

Related topics