Summary
Airflow DAG is throwing a network error when trying to trigger the Airbyte sync job. Error mentions connection to Airbyte (host.docker.internal:8000
) is unreachable (Max retries exceeded
). Looking for ideas on what might be causing the issue or things to check.
Question
Hey all,
I’ve been setting up Airbyte and Airflow, and I have both running smoothly in Docker. I’ve also successfully created a connection in Airbyte to sync data and set up a DAG in Airflow to trigger the sync - sadly the DAG is not working.
However, I’m now facing an issue I can’t quite figure out. My Airflow DAG, is throwing a network error when trying to trigger the Airbyte sync job (first task in my DAG). The error mentions that the connection to Airbyte (host.docker.internal:8000
) is unreachable (Max retries exceeded
), but I’m not sure why this is happening as everything is set up similiar to how it is explained here: <https://airbyte.com/tutorials/how-to-use-airflow-and-airbyte-together|Airflow & Airbyte: Better Together> .
Any ideas on what might be going wrong or things I could check? Appreciate any pointers!
This topic has been created from a Slack thread to give it more visibility.
It will be on Read-Only mode here. Click here if you want
to access the original thread.
Join the conversation on Slack
["airflow", "airbyte", "docker", "network-error", "sync-job", "dag", "host.docker.internal"]
Airbyte is installed with abctl? Have you connected Airflow to kind
network if you used abctl?
No, It’s still the standard setup without abctl - basically just one ridiculously big compose file. One docker compose up -d
and everthing just starts working out fine.
I believe the problem to be in the Airflow Connection (here i use host.docker.internal
as host with Port 8000
). It did work in the past but today for some reason it wouldn’t work anymore.
all containers belong to single network?
you should be able to use DNS names
what outputs do you get for:
docker inspect <airflow container name> | jq ".[0].NetworkSettings.Networks"
docker inspect <airbyte-proxy container name> | jq ".[0].NetworkSettings.Networks"
?
(jq needs to installed https://jqlang.github.io/jq/)
Yes, all belong to the same single network.
Also checked that (just to make sure, right?) using docker network inspect <network name>
- all my services are here.
For the airflow-worker:
"b_b": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"b-airflow-worker-1",
"airflow-worker"
],
"MacAddress": "02:42:ac:12:00:15",
"DriverOpts": null,
"NetworkID": "36a6d64029a6667187f080aaa51df530576cd9ef2ae80a8df31d957bc522cbf6",
"EndpointID": "ab40b5552230a3a16207d0899638fec34721339c94e834640c4eff71e0e42e5c",
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.21",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": [
"b-airflow-worker-1",
"airflow-worker",
"cc0d02a4bc2a"
]
}
}```
For airbyte-worker:
```{
"b_b": {
"IPAMConfig": null,
"Links": null,
"Aliases": [
"b-airbyte-worker-1",
"airbyte-worker"
],
"MacAddress": "02:42:ac:12:00:1a",
"DriverOpts": null,
"NetworkID": "36a6d64029a6667187f080aaa51df530576cd9ef2ae80a8df31d957bc522cbf6",
"EndpointID": "bd1e5f06aa1b1ff72864daca5faa01cae48948965d1c1dff2143419899f7dd29",
"Gateway": "172.18.0.1",
"IPAddress": "172.18.0.26",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"DNSNames": [
"b-airbyte-worker-1",
"airbyte-worker",
"71ed37fda80e"
]
}
}```
So instead of using host.docker.internal
you mean I should instead go for one of the services?
How do I know which one to use? My best guess would be airbyte-server
instead.
Yes. You can use airbyte-server
… or airbyte-proxy
Instead of asking, you could just run few experiments just to check if it’s working
Keep in mind that depending on which versions of Airbyte and Airflow you are using, there might be some issues with endpoints
Recently this pull request was merged https://github.com/apache/airflow/pull/41122, but I think it’s not released yet.