Академический Документы
Профессиональный Документы
Культура Документы
SOUNDCLOUD
Operational simplicity
$ go build
$ ./prometheus
SOUNDCLOUD
Hands on!
SOUNDCLOUD
Architecture
Multi-dimensional data model
(like OpenTSDB)
SOUNDCLOUD
Powerful query language
sort_desc(sum(bazooka_instance_memory_limit_bytes -
bazooka_instance_memory_usage_bytes) by (app, proc))
SOUNDCLOUD
Scalable data collection
Thousands of targets.
Hundreds of thousands of samples per second.
Millions of time series.
On a single monitoring server.
Running many servers is easy, too
Pull, not push.
SOUNDCLOUD
Expression browser
SOUNDCLOUD
Built-in graphing
SOUNDCLOUD
Hands on!
SOUNDCLOUD
Architecture
Example: Request Duration
http_request_duration_seconds_total
http_requests_total
http_request_duration_seconds_total / http_requests_total
http_request_duration_seconds
http_request_duration_seconds_sum
http_request_duration_seconds_count
http_request_duration_seconds_sum / http_request_duration_seconds_count
Request Duration Average
...and how to aggregate it.
http_request_duration_seconds_sum / http_request_duration_seconds_count
sum(http_request_duration_seconds_sum)
/
sum(http_request_duration_seconds_count)
sum(http_request_duration_seconds_sum) by (job)
/
sum(http_request_duration_seconds_count) by (job)
Request Duration Average
How to specify the time range.
rate(http_request_duration_seconds_sum[10m])
/
rate(http_request_duration_seconds_count[10m])
sum(rate(http_request_duration_seconds_sum[10m])) by (job)
/
sum(rate(http_request_duration_seconds_count[10m])) by (job)
Prometheus Summary
Ruby, Go, legacy Java client only...
temps := prometheus.NewSummary(prometheus.SummaryOpts{
Name: "http_request_duration_seconds",
Help: "Summary for the duration of all HTTP requests.",
Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01},
})
temps.Observe(0.083)
temps.Observe(0.119)
http_request_duration_seconds{quantile="0.5"}
http_request_duration_seconds{quantile="0.9"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
Hands on!
SOUNDCLOUD
Prometheus Histogram
Let's do the bucketing ourselves.
temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: []float64{0.02, 0.05, 0.1},
})
temps.Observe(0.153)
http_request_duration_seconds_bucket{le="0.02"}
http_request_duration_seconds_bucket{le="0.05"}
http_request_duration_seconds_bucket{le="0.1"}
http_request_duration_seconds_bucket{le="+Inf"}
http_request_duration_seconds_count
http_request_duration_seconds_sum
Bucketing utilities
temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: prometheus.LinearBuckets(20, 5, 5),
})
temps := prometheus.NewHistogram(prometheus.HistogramOpts{
Name: "http_request_duration",
Help: "Histogram for the duration of all HTTP requests.",
Buckets: prometheus.ExponentialBuckets(10, 1.5, 10),
})
Am I within SLA?
Serve 95% of requests within 300ms.
sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
/
sum(rate(http_request_duration_seconds_count[5m])) by (job)
Apdex score
Target request duration 300ms, tolerable request duration 1.2s.
(
sum(rate(http_request_duration_seconds_bucket{le="0.3"}[5m])) by (job)
+
sum(rate(http_request_duration_seconds_bucket{le="1.2"}[5m])) by (job)
) / 2 / sum(rate(http_request_duration_seconds_count[5m])) by (job)
Finally aggregatable quantiles...
Plus: pick -quantile and time window at evaluation time.
histogram_quantile(0.9, http_request_duration_seconds_bucket)
histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[5m]))
Go .NET / C#
Java (JVM) Node.js
Ruby Haskell
Python Bash
(more to come...)
Hands on!
SOUNDCLOUD
PromDash
SOUNDCLOUD
Hands on!
SOUNDCLOUD
Architecture
Alertmanager
Hands on!
SOUNDCLOUD
Architecture
Hands on!
SOUNDCLOUD
Architecture
Done!
SOUNDCLOUD