How can I monitor Airbyte OSS on Kubernetes using Prometheus?

We use the prom stack and are deploying Airbyte on our Kubernetes cluster. It looks like https://github.com/airbytehq/airbyte/pull/6529/files added a framework for emitting events. Are there any metrics currently being emitted to prometheus? If there are, what metrics are there and how can we enable them?

Thanks!

2 Likes

Hi @miguel-firebolt,
Airbyte does not yet exposes Prometheus metrics. We are internally working with Datadog for Airbyte Cloud. If you are using DataDog too you can set the following environment variables on your Airbyte pods:

  • PUBLISH_METRICS: true
  • DD_AGENT_HOST: your datadog agent host
  • DD_DOGSTATSD_PORT: your datadog statsd port

Thanks @alafanechere! Unfortunately, we are using Prometheus instead of Datadog.

We know Prometheus is an industry standard and will eventually expose prometheus metrics but I can’t share an ETA for this at the moment.

1 Like

@alafanechere do i have to deploy airbyte-metric reporter for this or just having those env values in scheduler is enough?

Hey @Bikram Dhoju,

do i have to deploy airbyte-metric reporter for this or just having those env values in scheduler is enough?

Do you mean you want to report metrics with Datadog? If this is what you want to do please open a new topic in the Guide section and we will try to explain how to set this up.

This thread contains more details about how to setup Airbyte monitoring with Datadog

Hi we are super interested with monitoring airbyte in Prometheus as we don’t have datadog.
Do you have a workaround/hack in mind to be able to have at least the metric for Job successs/failure before you release the prometheus exporter ? :slight_smile:
Thanks

Hey @lucienfregosi,
I think I have good news! We recently added OpenTelemetry that can help you transfer the metrics to Prometheus and other monitoring solution. You will find more details in this documentation.

2 Likes

Awesome @alafanechere i will have a look

Thanks a lot “ça régale” like we say in french

1 Like

Great! Let me know if it works! If “ça roule” like we say in French.

@alafanechere I tried locally with Docker, followed the tutorial steps and nothing happened …
http://localhost:9090/api/v1/label/__name__/values don’t show these metrics airbyte/OssMetricsRegistry.java at master · airbytehq/airbyte · GitHub

Hey @lucienfregosi,
Which version of Airbyte are you running? I think a missing part of the puzzle can be found here.
You need to use Airbyte > 0.39.19 and deploy an airbyte-metrics service along with the other services declared in the docker-compose.
Something like:

airbyte-metrics: image: airbyte/metrics-reporter:${VERSION} container_name: airbyte-metrics environment: - METRIC_CLIENT=${METRIC_CLIENT} - OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}

This feature is quite fresh and still not really well documented, sorry about that!

Yeah @alafanechere it works way better with the metrics-reporter container :slight_smile: Thanks !
One “weird” thing is that in Prometheus the metric is called promexample_job_succeeded_by_release_stage instead of job_succeeded_by_release_stage (the pattern is for all the metrics) but it’s not a big issue …
Now I will try to get the same result in my production/kubernetes airbyte env

Cool! I think the promexample_ prefix is not something configured on the Airbyte side but rather some fine-tuning to perform in Open Telemetry. I’m not familiar enough with this solution to be 100% sure though.

Hello @alafanechere

I’ve taken the time to try to collect metrics in the kubernetes cluster and I’am kind of stuck.

I did add the airbyte metrics deployment in the helm chart as I did for the docker compose test.
Then I deployed the open telemetry collector as specified in the documentation (Getting Started | OpenTelemetry)

I replaced the yaml open collector endpoint values :

  • l18 : endpoint: "otel-collector.default:4317" by endpoint: "otel-collector:4317"
  • l122 endpoint: "http://someotlp.target.com:4317" by `endpoint: “http://otel-collector:4317”

But I got an error message on the otel collector pod (for the agent it seems to work)

 grpc: addrConn.createTransport failed to connect to {                              
   "Addr": "otel-collector:4317",                                                          
   "ServerName": "otel-collector:4317",                                                    
   "Attributes": null,                                                                     
   "BalancerAttributes": null,                                                             
   "Type": 0,                                                                              
   "Metadata": null                                                                        
 }. Err: connection error: desc = "transport: Error while dialing dial tcp 10.1.71.127:431 
 7: connect: connection refused"    {"grpc_log": true}

Did I miss something ?
Many thanks for your help

@alafanechere

I did manage to get the metrics in Prometheus :tada:

It was not that easy as the documentation is rather light :face_with_raised_eyebrow:

I would be glad to help for improving the documentation. Does it interest you ?
If yes how can I contribute ?

@lucienfregosi you can submit a PR correcting the documentation in Airbyte Github Project

Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.