Hello!
I am trying to get job-related metrics sent to Datadog.
With airbyte-metrics-reporter
configured in my docker-compose.yaml
, I am now able to get the following metrics in my Datadog Dashboard:
- est_num_metrics_emitted_by_reporter
- num_pending_jobs
- num_running_jobs
- num_active_conn_per_workspace
- oldest_pending_job_age_secs
- oldest_running_job_age_secs
- overall_job_runtime_in_last_hour_by_terminal_state_secs
But I’m still trying to get these ones:
- attempt_failed_by_failure_origin
- job_cancelled_by_release_stage
- job_failed_by_release_stage
- job_succeeded_by_release_stage
etc, which, from what I see, are related to the airbyte-worker
container. So I guess I need a specific config for in my docker-compose.yaml
file for airbyte-worker
to get these metrics published and sent to Datadog right? If so, what should I add? If not, what should I do to get these metrics?
Thanks in advance for your help!
Rachel
Hi @rachelr,
This is great! You are testing out a feature that is quite hidden at the moment We plan to better document it in the future.
Did you try to add the DD_DOGSTATSD_PORT
, DD_AGENT_HOST
and PUBLISH_METRICS
env var on the airbyte-worker
service in your docker-compose.yaml file too?
Do you mind sharing the snippets from your docker-compose.yaml that is declaring the airbyte-metrics-reporter
service?
I’m sure @davinchia will have more insights to share than I do on this topic.
@rachelr
Great! You are right in that if you pass the same env vars to the worker deployment, you should start getting the metrics.
Hey everyone,
Thanks a lot for being so quick to help
Regarding this:
Did you try to add the DD_DOGSTATSD_PORT
, DD_AGENT_HOST
and PUBLISH_METRICS
env var on the airbyte-worker
service in your docker-compose.yaml file too?
I did try to add the env var to the airbyte-worker
service, but I can’t see any metrics like I do for metrics-reporter. But now I suspect this comes from how we implemented the datadog agent (directly installed on the instance and not containerised). I’m now retrying with the agent as a container. I’ll keep you posted! (+ I’ll post the snippets it this can help anyone trying to get these metrics)
1 Like
You are right to try this Rachel as the DD_AGENT_HOST
must be reachable by the worker. Have a dockerized datadog agent will make it reachable by the worker without extra network configuration (like using network_mode: "host"
on the worker). Keep us posted!
Hi again!
Have a dockerized datadog agent will make it reachable by the worker without extra network configuration (like using network_mode: "host"
on the worker)
You were absolutely right, I had to setup the network_mode as host, otherwise it did not work
So now I’ve dockerised the agent with this configuration:
datadog:
image: gcr.io/datadoghq/agent:7
container_name: dd-agent
environment:
- DD_API_KEY=${DD_API_KEY}
- DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc/:/host/proc/:ro
- /sys/fs/cgroup:/host/sys/fs/cgroup:ro
airbyte-metrics:
image: airbyte/metrics-reporter:${VERSION}
container_name: airbyte-metrics
environment:
- PUBLISH_METRICS=true
- DD_AGENT_HOST=dd-agent
- DD_DOGSTATSD_PORT=8125
- DATABASE_USER=${DATABASE_USER}
- DATABASE_URL=${DATABASE_URL}
- DATABASE_PASSWORD=${DATABASE_PASSWORD}
and it works perfectly for airbyte-metrics-reporter
(I’m able to see the metrics in my DD dashboards, under metrics_reporter.name_of_metric
). However, adding the three env var to the worker
container:
worker:
image: airbyte/worker:${VERSION}
logging: *default-logging
container_name: airbyte-worker
restart: unless-stopped
environment:
- AIRBYTE_VERSION=${VERSION}
- AUTO_DISABLE_FAILING_CONNECTIONS=${AUTO_DISABLE_FAILING_CONNECTIONS}
- etc.
- PUBLISH_METRICS=true
- DD_AGENT_HOST=dd-agent
- DD_DOGSTATSD_PORT=8125
Does not seem to trigger any monitoring for worker on my side… Am I missing something?
I’ve also tried to add the following to the datadog container, with no results either:
datadog:
image: gcr.io/datadoghq/agent:7
container_name: dd-agent
environment:
- DD_API_KEY=${DD_API_KEY}
- DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true
- DD_LOGS_ENABLED=true
- DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
- DD_SITE=datadoghq.com
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /proc/:/host/proc/:ro
- /sys/fs/cgroup:/host/sys/fs/cgroup:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
Note: I’m on v0.38.4-alpha version
Thanks again for your help!
Thank you for sharing your setup!
@davinchia do you think additional setup is required to collect metric from the worker? @rachelr are your running jobs? I think metrics get collected by the worker on job run.
@alafanechere yes, since it comes from airbyte-worker
I assumed it needed a job to send metrics I tried with two connections:
- E2E testing (as source & destination)
- File downloaded via HTTPS as source + Local file as destination
Then each time I waited a bit to see if a worker-related metric popped in available metrics (airbyte-metrics-reporter
did detect the running jobs).
I also tried putting specific datadog labels/tags to retrieve the metric but it did not change anything for worker
Update: I’ve upgraded version to v0.39-17alpha but still not able to get the metrics.
If someone has an idea that would be awesome
Your settings look correct to me.
Do you see logs like these coming up in the worker container?
airbyte-worker-54db8c4c76-wlspw worker 2022-06-23 22:04:35 INFO i.a.m.l.DogStatsDMetricClient(count):76 - publishing count, name: ATTEMPT_CREATED_BY_RELEASE_STAGE, value: 1, tags: [release_stage:alpha]
airbyte-worker-54db8c4c76-wlspw worker 2022-06-23 22:04:35 INFO i.a.m.l.DogStatsDMetricClient(count):76 - publishing count, name: ATTEMPT_CREATED_BY_RELEASE_STAGE, value: 1, tags: [release_stage:alpha]
airbyte-worker-54db8c4c76-wlspw worker 2022-06-23 22:05:32 INFO i.a.w.t.TemporalAttemptExecution(get):110 - Cloud storage job log path: /workspace/260540/0/logs.log
airbyte-worker-54db8c4c76-z9xml worker 2022-06-24 00:12:29 INFO i.a.m.l.DogStatsDMetricClient(count):76 - publishing count, name: JOB_CREATED_BY_RELEASE_STAGE, value: 1, tags: [release_stage:alpha]
airbyte-worker-54db8c4c76-wlspw worker Using cache monitor: TimePeriodBasedBufferMonitor(periodInSeconds: 60)
airbyte-worker-54db8c4c76-z9xml worker 2022-06-24 00:12:29 INFO i.a.m.l.DogStatsDMetricClient(count):76 - publishing count, name: JOB_CREATED_BY_RELEASE_STAGE, value: 1, tags: [release_stage:alpha]
Can you try upgrading to the latest? We’ve been making improvements in our metrics implementation. Since 0.39.19-alpha, in additional to PUBLISH_METRICS
set to true
, we also need to set the METRIC_CLIENT
variable to datadog
.
If both are set correctly, we should see the above log lines come up in the applications emitting metrics.
Thanks a lot @davinchia, upgrading + adding METRIC_CLIENT did the trick!
1 Like
Hi there from the Community Assistance team.
We’re letting you know about an issue we discovered with the back-end process we use to handle topics and responses on the forum. If you experienced a situation where you posted the last message in a topic that did not receive any further replies, please open a new topic to continue the discussion. In addition, if you’re having a problem and find a closed topic on the subject, go ahead and open a new topic on it and we’ll follow up with you. We apologize for the inconvenience, and appreciate your willingness to work with us to provide a supportive community.