Вы находитесь на странице: 1из 36

Service instrumentation, monitoring,

and alerting with Prometheus

Julius Volz, Bjrn Beorn Rabenstein.


Production Engineers, SoundCloud Ltd.
Velocity New York, 2015-10-12
Velocity Amsterdam, 2015-10-28
Architecture
Resources
Project homepage: http://prometheus.io

These slides: https://goo.gl/qTs1BI

Instructions and examples:


https://github.com/juliusv/prometheus_workshop

If you didnt download the files from the pre-work, go to


http://10.10.32.101
If I had to tell you only four things...

1. Multi-dimensional data model (like OpenTSDB).


2. Operational simplicity (unlike OpenTSDB).
3. Scalable data collection (yes, it's pull, not push).
4. Powerful query language (the same for exploring, graphing, alerting).

SOUNDCLOUD
Operational simplicity

$ go build
$ ./prometheus

SOUNDCLOUD
Hands on!

Work through the following sections in the instructions:


Getting Prometheus (hopefully already done...)
Configuring Prometheus to monitor itself
Starting Prometheus
Using the expression browser

SOUNDCLOUD
Architecture
Multi-dimensional data model

api_http_requests_total{method="GET", endpoint="/api/tracks", status="200"} 2034834

(like OpenTSDB)

SOUNDCLOUD
Powerful query language

topk(3, sum(rate(bazooka_instance_cpu_time_seconds_total[5m])) by (app, proc))

sort_desc(sum(bazooka_instance_memory_limit_bytes -
bazooka_instance_memory_usage_bytes) by (app, proc))

SOUNDCLOUD
Scalable data collection

Thousands of targets.
Hundreds of thousands of samples per second.
Millions of time series.
On a single monitoring server.
Running many servers is easy, too
Pull, not push.

SOUNDCLOUD
Expression browser

SOUNDCLOUD
Built-in graphing

SOUNDCLOUD
Hands on!

Work through the following sections in the instructions:


Start the node exporter
Configure Prometheus to monitor node exporter
Use the node exporter to export the contents of a text file
Configuring targets with service discovery

SOUNDCLOUD
Architecture
Example: Request Duration
http_request_duration_seconds_total
http_requests_total
http_request_duration_seconds_total / http_requests_total

http_request_duration_seconds

http_request_duration_seconds_sum
http_request_duration_seconds_count
http_request_duration_seconds_sum / http_request_duration_seconds_count
Request Duration Average
...and how to aggregate it.

http_request_duration_seconds_sum / http_request_duration_seconds_count

sum(http_request_duration_seconds_sum)
/
sum(http_request_duration_seconds_count)

sum(http_request_duration_seconds_sum) by (job)
/
sum(http_request_duration_seconds_count) by (job)
Request Duration Average
How to specify the time range.

rate(http_request_duration_seconds_sum[10m])
/
rate(http_request_duration_seconds_count[10m])

sum(rate(http_request_duration_seconds_sum[10m])) by (job)
/
sum(rate(http_request_duration_seconds_count[10m])) by (job)
Prometheus Summary
Ruby, Go, legacy Java client only...

temps := prometheus.NewSummary(prometheus.SummaryOpts{
Name: "http_request_duration_seconds",
Help: "Summary for the duration of all HTTP requests.",
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01},
})

temps.Observe(0.083)
temps.Observe(0.119)

http_request_duration_seconds{quantile="0.5"}
http_request_duration_seconds{quantile="0.9"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
Hands on!

Work through the whole chapter The expression language.


(End before Instrument code: Go.)

SOUNDCLOUD
Prometheus Histogram
Let's do the bucketing ourselves.

temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: []float64{0.02, 0.05, 0.1},
})

temps.Observe(0.153)

http_request_duration_seconds_bucket{le="0.02"}
http_request_duration_seconds_bucket{le="0.05"}
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="+Inf"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
Bucketing utilities
temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: prometheus.LinearBuckets(20, 5, 5),
})

temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: prometheus.ExponentialBuckets(10, 1.5, 10),
})
Am I within SLA?
Serve 95% of requests within 300ms.

sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
/
sum(rate(http_request_duration_seconds_count[5m])) by (job)
Apdex score
Target request duration 300ms, tolerable request duration 1.2s.

(
sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
+
sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job)
) / 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)
Finally aggregatable quantiles...
Plus: pick -quantile and time window at evaluation time.

histogram_quantile(0.9, http_request_duration_seconds_bucket)

histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[5m]))

histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

histogram_quantile(0.9, sum(rate(http_request_duration_seconds_bucket[5m])) by (le,job))


Integrations
Official exporters 3rd party exporters and probers Direct instrumentation

Node/system metrics exporter Bind exporter cAdvisor


JMX exporter CouchDB exporter Kubernetes
MySQL server exporter Django exporter Kubernetes-Mesos
SNMP exporter Google's mtail log data extractor Etcd
Graphite exporter HTTP(s)/TCP/ICMP blackbox prober gokit
Collectd exporter Memcached exporter go-metrics instrumentation library
HAProxy exporter Meteor JS web framework exporter RobustIRC
StatsD bridge Minecraft exporter module
AWS CloudWatch exporter MongoDB exporter
Hystrix metrics publisher Munin exporter
Mesos task exporter New Relic exporter
Consul exporter RabbitMQ exporter
Redis exporter
RethinkDB exporter
Rsyslog exporter
scollector exporter
SMTP/Maildir MDA blackbox prober
SQL query result set metrics exporter
Client libraries
Official Unofficial

Go .NET / C#
Java (JVM) Node.js
Ruby Haskell
Python Bash
(more to come...)
Hands on!

Now instrument your code. Pick the Go chapter or the


Python chapter, whatever you prefer.
Point Prometheus to your instrumented code.
Use the expression browser to explore.

SOUNDCLOUD
PromDash

SOUNDCLOUD
Hands on!

Work through the following chapters in the instructions:


Dashboard Building: Console Templates
Dashboard Building: PromDash

SOUNDCLOUD
Architecture
Alertmanager
Hands on!

Work through the Alerting chapter in the instructions.

SOUNDCLOUD
Architecture
Hands on!

Work through the Pushing Metrics chapter in the instructions.

SOUNDCLOUD
Architecture
Done!

Tour de force over.


Touched all the boxes.
Hope you have enjoyed the ride.

SOUNDCLOUD

Вам также может понравиться