Using Airflow to trigger Airbyte connection

Summary

Using Airflow to trigger an Airbyte connection and determining the appropriate connector to use.


Question

Hello,

I deployed Airbyte and Airflow on 2 kb8 pods (within the same network).
I want to use Airflow to trigger one of my airbyte connections.

What is the connector in airflow I have to use?
Should I use airbyte or http connector?

I tried with http (host being my kb8 private endpoint) with no success…



This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.

Join the conversation on Slack

["airflow", "airbyte", "connector", "http-connector", "kubernetes", "deployment"]

Here is the error when I try http:

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='172.16.0.2', port=8001): Max retries exceeded with url: /api/v1/connections/sync (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7838d7851000>, 'Connection to 172.16.0.2 timed out. (connect timeout=None)'))
During handling of the above exception, another exception occurred```
I tried different ports (8080, 8000, 8001) with the same result.

what do you get for kubectl get services in namespace where your Airbyte is deployed?

airbyte-airbyte-connector-builder-server-svc   NodePort    34.118.226.19    <none>        80:31967/TCP   3d20h
airbyte-airbyte-server-svc                     ClusterIP   34.118.227.218   <none>        8001/TCP       3d20h
airbyte-airbyte-webapp-svc                     ClusterIP   34.118.232.102   <none>        80/TCP         3d20h
airbyte-db-svc                                 ClusterIP   34.118.239.210   <none>        5432/TCP       3d20h
airbyte-minio-svc                              ClusterIP   34.118.239.182   <none>        9000/TCP       3d20h
airbyte-temporal                               ClusterIP   34.118.227.59    <none>        7233/TCP       3d20h
airbyte-workload-api-server-svc                ClusterIP   34.118.239.105   <none>        8007/TCP       3d20h```
I also tried the cluster-ip `34.118.227.218` afterwards without any success neither.

you can perform quick connectivity test in airflow namespace
kubectl run -it ubuntu --image=ubuntu:latest --restart=Never --rm -- /bin/bash
then in bash
apt-get update && apt-get install -y curl dnsutils
nslookup airbyte-airbyte-webapp-svc
curl airbyte-airbyte-webapp-svc

when not necessary, don’t use IP address
DNS name is safer and DNS resolution should do the trick in k8s cluster

I had also done some connectivity tests from the pod and they had failed… Seems like I am facing a networking problem indeed. I think I will simplify the architecture and deploy airlfow within the same cluster as airbyte. Not sure splitting was a good idea (I am not a devops person and my first reflex was to split both but I read a few cases where airbyte and airflow were both merged within the same cluster).

Thanks a lot for your guidance anyway :wink:

> I deployed Airbyte and Airflow on 2 kb8 pods (within the same network).
Aah, this part was misleading for me, because words matter. Two separate CLUSTERS is a completely different story than two PODS.

Yeah, maintaining one kubernetes cluster will be easier and cheaper

Yes sorry - they are on 2 different clusters :smile: (sorry my fault…)
I will try again and reach out if I need more help :wink:
Thanks again for your assistance :pray:

Hi <@U05JENRCF7C>,

Reopening the discussion here :slightly_smiling_face:
I deployed Airflow on the same node (and same cluster…) as Airbyte.
I am now able to reach airbyte-airbyte-webapp-svc service within an airflow pod (curl <http://airbyte-airbyte-webapp-svc.airbyte.svc.cluster.local> responded positively).

Maybe back to my initial question: should I use the airbyte or http connection type in airflow? (or both should work?)

I don’t have much Airflow experience, but I’d start with airbyte connection type first (hopefully, dedicated one should work better and it should be easier to configure). But if you get stuck on any issue, then switching to http is the next choice

And the service to use? is it airbyte-airbyte-server-svc ?

I think yes. I managed to trigger a sync to <http://airbyte-airbyte-server-svc.airbyte.svc.cluster.local:8001/api/v1/connections/sync>

but through a curl command so far

As I’ve seen in the code, webapp is a proxy for some endpoints to server

can you check
<http://airbyte-airbyte-webapp-svc.airbyte.svc.cluster.local/api/public/v1/connections/sync> ?

I tried with no success

(with client_id and client_secret )

so I guess http is the way

I’ll check it later on my machine