Troubleshooting custom connector deployment on Airbyte running on GKE cluster

Summary

User is facing HTTP Internal Server Error when trying to use a custom connector on Airbyte running on GKE cluster. The user has ensured cluster access to Google Artifact Registry.


Question

Hello everyone - My airbyte are running on gke cluster and I’m trying to use a custom connetor. For this I put the docker image on google artifact registry and try on airbyte ui add this connector and pass the information from my repo and image tag, however, the airbyte return a http.internalservererror. I already ensure that the cluster had access on artifact registry, but the error continues.
Anyone already did this or tryed this?



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airbyte", "gke-cluster", "custom-connector", "docker-image", "google-artifact-registry", "http-internalservererror"]

Have you checked logs from pods in gke?

How are you authenticating to Artifact Registry? And is the image in the same project?

Also, make sure that whatever service account GKE is running as (either the default or a custom one you supplied when the cluster was created) has the following IAM Role (or its component permissions):
roles/artifactregistry.reader

Generally that’s all it should take, but there are of course exceptions.

<@U05JENRCF7C> Yes, but can’t find any information there yet.

<@U035912NS77> About the authentication I created a service account and gave the artifact permissions and also apply this on cluster. The artifact is in the same project, actually I already use this on Airflow, with same authentication.
This is the error that I’m receiving:

Which Airbyte version are you using? In some older versions there was this issue that pod couldn’t start because of length limitations in labels or annotations (I don’t remember exactly)
Have you checked kubectl get events?

Still, I’d give another try in checking logs.
Sometimes I connect to cluster, ensure that kubectl works, and use stern tool
https://github.com/stern/stern
stern --tail 0 . then I click in user interface, and Ctrl+C to stop capturing more logs

<@U05JENRCF7C> I’m using version 0.63.4 right now. I don’t check the events, but will proceed now and also use this tool that you share. Tks a lot.

You may also want to check whether the service account set up on the GKE cluster (under the Security section) is the same as the one set in the serviceAccountName field of your values.yaml (if not, you may not be granting on the right account)

If you don’t see log entries related to the container image not being found or such, I would look specifically for an auth error in Cloud Logging—it’s possible that it either isn’t using the service account you’re expecting it to, or that there are additional grants that are missing.

<@U035912NS77> I already have a credential that is used to access the Artifact Registry and I trying to use the same, because my image is in the same repo.
I attached the log from worker pod when I try to push the image on UI.

if you look in <https://console.cloud.google.com/logs|Cloud Logging> around that time, do you see any auth/permission errors listed?

I checked the logs and can’t find any log with authentication or permission error related this or looking to worker pod. But I don’t know if I did correctly. Do you have an query example for this?

This is the message showed on worker pod, when running on UI:

Using existing AIRBYTE_ENTRYPOINT: python /airbyte/integration_code/main.py
2024-07-08T22:29:14.203911847Z Waiting on CHILD_PID 7
2024-07-08T22:29:14.204103586Z PARENT_PID: 1
2024-07-08T22:29:16.117158405Z EXIT_STATUS: 139

We’ve set the var JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET as suggested here https://docs.airbyte.com/operator-guides/using-custom-connectors/.

However, we got this error:

[map[name:JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET value:gcp-service-account] map[name:JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET valueFrom:map[configMapKeyRef:map[key:JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET name:airbyte-0-1720015847-airbyte-env]]] map[name:SECRET_PERSISTENCE value:&lt;nil&gt;]]
 doesn't match $setElementOrder list:```

<@U06SV3WK399> helm has issues to update resources sometimes. The fastest way for me was to delete worker deployment kubectl delete deployment ... and repeat helm install/upgrade. I suggest not having active synchronizations when doing that.

<@U05JENRCF7C> in which part of the helm chart you added this var? JOB_KUBE_MAIN_CONTAINER_IMAGE_PULL_SECRET . We are trying to add it under the worker part, in extraEnv.

You may also want to check the environment variables listed on the deployment, as most of the time that I see the The order in patch list . . . doesn't match $setElementOrder list it’s actually because the value is duplicated (i.e. already being merged in the templates, and doesn’t need to be passed in ExtraEnv). Not sure if that’s the case on this one, but worth checking

Cool, we were able to set the var, it’s working at least this part. But still getting this message from the pod when we try to pull the custom connection from artifact.

2024-07-08T22:29:14.203911847Z Waiting on CHILD_PID 7
2024-07-08T22:29:14.204103586Z PARENT_PID: 1
2024-07-08T22:29:16.117158405Z EXIT_STATUS: 139```