Context
Hello!
We are using Airbyte 0.39.42-alpha
with Docker Compose, and are setting it up to send metrics using OpenTelemetry, using information from the following documentation and threads:
- Collecting Metrics | Airbyte Documentation
- How can I monitor Airbyte OSS on Kubernetes using Prometheus?
- Setup Datadog monitoring
According to the documentation, we have updated the Docker Compose stack to:
- setup the
airbyte-metrics-reporter
service for OpenTelemetry - setup the
airbyte-worker
service for OpenTelemetry - setup the
opentelemetry-collector
service to handle OTEL gRPC calls, and expose metrics using the Prometheus exporter
Additionally, we have setup:
- Prometheus to scrape data from
opentelemetry-collector
- Grafana to display Prometheus metrics
Issue
When the airbyte-metrics-reporter-service
emits metrics using the OpenTelemetry SDK, the following warning can be seen:
airbyte-metrics-reporter | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter | WARNING: Instrument oldest_running_job_age_secs has recorded multiple values for the same attributes.
airbyte-metrics-reporter | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter | WARNING: Instrument num_running_jobs has recorded multiple values for the same attributes.
airbyte-metrics-reporter | Aug 08, 2022 3:21:29 PM io.opentelemetry.sdk.internal.ThrottlingLogger doLog
airbyte-metrics-reporter | WARNING: Instrument oldest_pending_job_age_secs has recorded multiple values for the same attributes.
When sync jobs are running, the gauges corresponding to the number of pending and running jobs do not seem to be updated accordingly, e.g. with two sync jobs running:
$ curl --silent http://localhost:8889/metrics | rg 'num_running'
# HELP airbyte_num_running_jobs number of running jobs
# TYPE airbyte_num_running_jobs gauge
airbyte_num_running_jobs{job="metrics-reporter"} 0
This issue seems to be limited to gauge values, as counters are correctly incremented:
Configuration details
Please find the (curated) configuration related to OpenTelemetry that we used for the different services:
.env
VERSION=0.39.42-alpha
PUBLISH_METRICS="true"
METRIC_CLIENT=otel
OTEL_COLLECTOR_ENDPOINT="http://otel-collector:4317"
docker-compose.yml
services:
worker:
environment:
- PUBLISH_METRICS=${PUBLISH_METRICS}
- METRIC_CLIENT=${METRIC_CLIENT}
- OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}
airbyte-metrics-reporter:
image: airbyte/metrics-reporter:${VERSION}
logging: *default-logging
container_name: airbyte-metrics-reporter
environment:
- CONFIG_DATABASE_PASSWORD=${CONFIG_DATABASE_PASSWORD:-}
- CONFIG_DATABASE_URL=${CONFIG_DATABASE_URL:-}
- CONFIG_DATABASE_USER=${CONFIG_DATABASE_USER:-}
- CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=${CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION:-}
- CONFIG_ROOT=${CONFIG_ROOT}
- DATABASE_PASSWORD=${DATABASE_PASSWORD}
- DATABASE_URL=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_DB}
- DATABASE_USER=${DATABASE_USER}
- PUBLISH_METRICS=${PUBLISH_METRICS}
- METRIC_CLIENT=${METRIC_CLIENT}
- OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}
otel-collector:
image: otel/opentelemetry-collector:0.57.2
command: ["--config=/etc/otel-collector-config.yaml"]
ports:
- "8888:8888" # Prometheus metrics exposed by the collector
- "8889:8889" # Prometheus exporter metrics
volumes:
- ./otel-collector/otel-collector-config.yaml:/etc/otel-collector-config.yaml
otel-collector/otel-collector-config.yaml
---
receivers:
otlp:
protocols:
grpc: {}
processors:
batch: {}
exporters:
logging: {}
prometheus:
endpoint: 0.0.0.0:8889
namespace: airbyte
send_timestamps: true
metric_expiration: 60m
extensions:
health_check:
pprof:
zpages:
service:
extensions: [health_check, pprof, zpages]
pipelines:
metrics:
receivers: [otlp]
processors: [batch]
exporters: [logging, prometheus]
Attempts
After seeing the following issue being fixed on the OTEL SDK:
- feat: `otel.metrics.exporter` setting support multiple values by wallezhang · Pull Request #4466 · open-telemetry/opentelemetry-java · GitHub (available from Release Version 1.15.0 · open-telemetry/opentelemetry-java · GitHub )
I tried bumping the version of the SDK to 1.16
using Airbyte’s deps.toml
and rebuilding the Docker image for airbyte-metrics-reporter
:
$ git clone https://github.com/airbytehq/airbyte
$ cd airbyte
$ vim deps.toml # set OTEL SDK version to 1.16.0
$ cd airbyte-metrics/reporter
$ ../../gradlew build
but observed the same behaviour: warning messages, gauges stuck to 0
.
The following discussion may provide better insights as to why the emission of the latest value fails for Airbyte gauges:
Please let me know if you need more information to reproduce the issue, I’ll also be happy to contribute fixes
Thanks,
Aurélien