0% found this document useful (0 votes)

13 views

Prometheus Course

Uploaded by

7ardle7

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Prometheus Course

Uploaded by

7ardle7

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 162

Prometheus

Prometheus
Course Introduction
Prometheus
• Thank you for taking this Prometheus Cours

• Open-source monitoring solution & time series database

• Was built by Soundclou

• Very active developer and user community

• Now its a standalone open source project

• Joined the Cloud Native Computing Foundation in 2016

• Ideal for monitoring on premise as well as cloud workloads

Spinnaker course: Edward Viaene & Jorn Jambers
d

Course Overview
Introduction Monitoring Alerting Internals Use cases

What is Prometheus Client Libraries Introduction Storage Grafana Provisioning

Installing Prometheus Cloudwatch

Pushing metrics Setting up alerts Security
& Grafana Integration

Concepts Querying

Con guration Service Discovery

Monitoring Nodes Exporters

Architecture

Prometheus course: Edward Viaene & Jorn Jambers

fi
Course Objectives
• To be able to use Prometheus

• To get familiar with the Prometheus ecosystem

• To setup a monitoring platform usin

• Prometheus

• To create alerts in Prometheus

• To be able to query Prometheus data

Prometheus course: Edward Viaene & Jorn Jambers

Who is Edward Viaene

• My name is Edward Viaene

• I am a consultant and trainer in Cloud and Big Data technologies

• I’m a big advocate of Agile and DevOps techniques

• I held various roles from banking to startups

• I have a background in Unix/Linux, Networks, Security, Risk, and

distributed computing

• Nowadays I specialize in everything that has to do with Cloud and

DevOps
Prometheus course: Edward Viaene & Jorn Jambers
Who is Jorn Jambers
• My name is Jorn Jambers

• I am a freelance DevOps consultant and trainer

• DevOps advocate

• Worked in banks, consultancy companies and startups

• In the latter I found my passion for DevOps

• I have a background in Unix/Linux, Hadoop, DBA, Networks, automations

• Today I help companies succeed on the public cloud

Prometheus course: Edward Viaene & Jorn Jambers
Online Training
• Online training on Udemy

• DevOps, Distributed Computing, Cloud, Big Data

• Using online video lectures

• 40,000+ enrolled students in 100+ countries

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Introduction
Prometheus
• Prometheus is an Open source monitoring solution

• Started at SoundCloud around 2012-2013, and was made public in early

2015

• Prometheus provides Metrics & Alerting

• It is inspired by Google’s Borgmon, which uses time-series data as a

datasource, to then send alerts based on this data

• It ts very well in the cloud native infrastructur

• Prometheus is also a member of the CNCF (Cloud Native Foundation)

Prometheus course: Edward Viaene & Jorn Jambers
fi
e

Prometheus
• In Prometheus we talk about Dimensional Data: time series are identi ed by metric
name and a set of key/value pairs
Metric name Label Sample
Temperature location=outside 90

• Prometheus includes a Flexible Query Language

• Visualizations can be shown using a built-in expression browser or with

integrations like Grafana

• It stores metrics in memory and local disk in an own custom, ef cient format

• It is written in Go

• Many client libraries and integrations available

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
How does Prometheus work?
• Prometheus collects metrics from monitored
targets by scraping metrics HTTP endpoint
Database HTTP
• This is fundamentally different than other
monitoring and alerting systems, (except
Windows Server HTTP this is also how Google’s Borgmon works)

• Rather than using custom scripts that

check on particular services and systems,
Application HTTP
the monitoring data itself is use

• Scraping endpoints is much more ef cient

than other mechanisms, like 3rd party agents
Prometheus

• A single prometheus server is able to ingest

up to one million samples per second as
several million time series
Prometheus course: Edward Viaene & Jorn Jambers
d

fi
s

Prometheus
Installation
Prometheus Installation
• I will install Prometheus using scripts from our GitHub repository (https://github.com/
in4it/prometheus-course)

• They will work on any modern Linux distribution

• I’ll install it on a DigitalOcean droplet

• Feel free to use the scripts with any Cloud Provider, Virtual Machine, or Docker
image, as long as it’s a recent Linux distribution

• To get a free $100 coupon on DigitalOcean, valid for 60-days with a valid payment
method added, use the following link:

https://m.do.co/c/b71b388ab76f

• $10 is enough to run a 2 GB memory droplet for one month

Prometheus course: Edward Viaene & Jorn Jambers
Prometheus Installation
• If you do not want to use the provided scripts, you can download the full
distribution from https://github.com/prometheus/prometheus/releases

• MacOS, Windows, Linux, and some Unix distributions are supported

• After extracting you’ll get a prometheus executable (prometheus.exe for

windows), which you can use to run prometheus, for example:

• ./prometheus --con g. le /path/to/prometheus.yaml

• It’s best to use the scripts we provided so that your environment is the
same as ours when you follow the demos

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Demo
Installing Prometheus & Grafana
Prometheus
Basic Concepts
Concepts
• All data is stored as time series

• Every time series is identi ed

by the “metric name” and a
set of key-value pairs,
called labels

• metric: go_memstat_alloc_bytes

• instance=localhost:9090

• job=prometheus

Prometheus course: Edward Viaene & Jorn Jambers

fi
Prometheus
• The time series data also consists of the actual data, called Samples:

• It can be a oat64 value

• or a millisecond-precision timestamp

Prometheus course: Edward Viaene & Jorn Jambers

fl
Prometheus
• The notation of time series is often using this notation:

• <metric name>{<label name>=<label value>, …}

• For example:

• node_boot_time{instance="localhost:9100",job="node_exporter"}

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Prometheus Con guration le
fi
fi
Prometheus Con guration
• The con guration is stored in the Prometheus con guration le, in yaml
format

• The con guration le can be changed and applied, without having to

restart Prometheus

• A reload can be done by executing kill -SIGHUP <pid>

• You can also pass parameters ( ags) at startup time to ./prometheus

• Those parameters cannot be changed without restarting Prometheus

• The con guration le is passed using the ag --con g. le

Prometheus course: Edward Viaene & Jorn Jambers
fi
fi
fi
fi
fi
fl
fi
fl
fi
fi
fi
fi
Prometheus Con guration
• The default con guration looks like this:

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Prometheus Con guration
• To scrape metrics, you need to add con guration to the prometheus
con g le

• For example, to scrape metrics from prometheus itself, the following code
block is added by default

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:
- targets: ['localhost:9090']

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
fi
Demo
Prometheus Con guration
fi
Prometheus
Monitoring Nodes
Monitor nodes
• To monitor nodes, you need to install the node-exporter

• The node exporter will expose machine metrics of Linux / *Nix machines

• For example: cpu usage, memory usage

• The node exporter can be used to monitor machines, and later on, you
can create alerts based on these ingested metric

• For Windows, there’s a WMI exporter (see https://github.com/martinlindhe/

wmi_exporter)

Prometheus course: Edward Viaene & Jorn Jambers

Monitor nodes
Linux machine

node exporter HTTP

Prometheus

Windows machine

WMI exporter HTTP

Prometheus course: Edward Viaene & Jorn Jambers

Demo
Node exporter
Demo
WMI exporter
Prometheus
Monitoring
Monitoring - Introduction
• Client Libraries

• Pushing Metrics

• Querying

• Service Discovery

• Exporters

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Client Libraries
Client Libraries - Introduction
• Instrumenting your code

• Libraries

• Of cial: Go, Java/Scala, Python, Ruby

• Unof cial: Bash, C++, Common Lisp, Elixir, Erlang, Haskell, Lua for
Nginx, Lua for Tarantool, .NET / C#, Node.js, PHP, Rust

• No client library available?

• Implement it yourself in one of the supported exposition formats

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Client Libraries - Introduction
• Exposition formats:

• Simple text-based format

• Protocol-buffer format (Prometheus 2.0 removed support for the protocol-buffer format)

metric_name [
"{" label_name "=" `"` label_value `"` { "," label_name "=" `"` label_value `"` } [ "," ] "}"
] value [ timestamp ]

node_filesystem_avail_bytes{device="/dev/vda1",fstype="ext4",mountpoint="/"} 4.9386491904e+10
node_filesystem_avail_bytes{device="/dev/vda15",fstype="vfat",mountpoint="/boot/efi"} 1.05903104e+08
node_filesystem_avail_bytes{device="lxcfs",fstype="fuse.lxcfs",mountpoint="/var/lib/lxcfs"} 0
node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run"} 2.01273344e+08
node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/run/lock"} 5.24288e+06

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Introduction
• 4 types of metrics

• Counte

• Gaug

• Histogra

• Summary

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Introduction

• Counter

A value that only goes up (e.g. Visits to a website)

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Introduction
• Gauge

Single numeric value that can go up and down (e.g. CPU load,
temperature)

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Introduction
• Histogram

Samples observations (e.g. request durations or response sizes) and

these observations get counted into buckets. Includes (_count and _sum)

Main purpose is calculating quantiles

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Introduction
• Summary

Similar to a histogram, a summary samples observations (e.g. request

durations or response sizes). A summary also provides a total count of
observations and a sum of all observed values, it calculates con gurable
quantiles over a sliding time window.

Example: You need 2 counters for calculating the latency

1) total request(_count)
2) the total latency of those requests (_sum)

Take the rate() and divide = average latency

Prometheus course: Edward Viaene & Jorn Jambers

fi
Prometheus
Instrumentation- Python
Client Libraries - Python Example
• https://github.com/prometheus/client_python

• Of cially supported language

• pip install prometheus_client

• Supported metrics: Counter, Gauge, Summary and Histogram

Prometheus course: Edward Viaene & Jorn Jambers

fi
Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/prometheus-course/<name>')
@IN_PROGRESS.track_inprogress()
@TIMINGS.time()
def index(name):
REQUESTS.labels(method='GET', endpoint="/prometheus-course/<name>", status_code=200).inc()
return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/prometheus-course/<name>')
@IN_PROGRESS.track_inprogress()
@TIMINGS.time()
def index(name):
REQUESTS.labels(method='GET', endpoint=“/prometheus-course/jorn", status_code=200).inc()
return render_template_string('<b>Hello {{name}} welcome!</b>!', name=name)

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
import random, time

from flask import Flask, render_template_string, abort

from prometheus_client import generate_latest, REGISTRY, Counter, Gauge, Histogram

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP Requests (count)', ['method', 'endpoint', 'status_code'])

IN_PROGRESS = Gauge('http_requests_inprogress', 'Number of in progress HTTP requests')
TIMINGS = Histogram('http_request_duration_seconds', 'HTTP request latency (seconds)')

@app.route('/')
@TIMINGS.time()
@IN_PROGRESS.track_inprogress()
def hello_world():
REQUESTS.labels(method='GET', endpoint="/", status_code=200).inc() # Increment the counter
return 'Hello, World!'

@app.route('/metrics')
@IN_PROGRESS.track_inprogress()

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Python Example
• Python example:
@app.route('/metrics')
@IN_PROGRESS.track_inprogress()
@TIMINGS.time()
def metrics():
REQUESTS.labels(method='GET', endpoint="/metrics", status_code=200).inc()
return generate_latest(REGISTRY)

if __name__ == "__main__":
app.run(host='0.0.0.0')

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Instrumentation - Go
Client Libraries - Golang Example
• https://github.com/prometheus/client_golang

• Of cially supported language

• Easy to implement:
package main

import (
"github.com/prometheus/client_golang/prometheus/promhttp"
"net/http"
)

func main() {
http.Handle("/metrics", promhttp.Handler())
panic(http.ListenAndServe(":8080", nil))
}

• Supported metrics: Counter, Gauge, Summary and Histogram

Prometheus course: Edward Viaene & Jorn Jambers
fi
Client Libraries - Golang Example
• Gauge
import "github.com/prometheus/client_golang/prometheus"

var jobsInQueue = prometheus.NewGauge(

prometheus.GaugeOpts{
Name: "jobs_queued",
Help: "Current number of jobs queued",
},
)

func init(){
promtheus.MustRegister(jobsQueued)
}

func enqueueJob(job Job) {

queue.Add(job)
jobsInQueue.Inc()
}

func runNextJob() {
job := queue.Dequeue()
jobsInQueue.Dec()

job.Run()
}

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Gauge
import "github.com/prometheus/client_golang/prometheus"

var jobsQueued = prometheus.NewGauge(

prometheus.GaugeOpts{
Name: "jobs_queued",
Help: "Current number of jobs queued",
},
)

func init(){
promtheus.MustRegister(jobsQueued)
}

func enqueueJob(job Job) {

queue.Add(job)
jobsQueued.Inc()
}

func runNextJob() {
job := queue.Dequeue()
jobsInQueue.Dec()

job.Run()
}

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Adding labels
import "github.com/prometheus/client_golang/prometheus"

var jobsQueued = prometheus.NewGaugeVec(

prometheus.GaugeOpts{
Name: "jobs_queued",
Help: "Current number of jobs in the queue",
},
[]string{"job_type"},
)

func init(){
promtheus.MustRegister(jobsQueued)
}

func enqueueJob(job Job) {

queue.Add(job)
jobsInQueue.WithLabelValues(job.Type()).Inc()
}

func runNextJob() {
job := queue.Dequeue()
jobsInQueue.WithLabelValues(job.Type()).Dec()

job.Run()
}

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Adding labels
import "github.com/prometheus/client_golang/prometheus"

var jobsQueued = prometheus.NewGaugeVec(

prometheus.GaugeOpts{
Name: "jobs_queued",
Help: "Current number of jobs in the queue",
},
[]string{"job_type"},
)

func init(){
promtheus.MustRegister(jobsQueued)
}

func enqueueJob(job Job) {

queue.Add(job)
jobsQueued.WithLabelValues(job.Type()).Inc()
}

func runNextJob() {
job := queue.Dequeue()
jobsQueued.WithLabelValues(job.Type()).Dec()

job.Run()
}

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Histogram
import "github.com/prometheus/client_golang/prometheus"

var jobsDurationHistogram = prometheus.NewHistogramVec(

prometheus.HistogramOpts{
Name: "jobs_duration_seconds",
Help: "Jobs duration distribution",
Buckets: []float64{1, 2, 5, 10, 20, 60},
},
[]string{"job_type"},
)

start := time.Now()
job.Run()
duration := time.Since(start)
jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Histogram
import "github.com/prometheus/client_golang/prometheus"

var jobsDurationHistogram = prometheus.NewHistogramVec(

prometheus.HistogramOpts{
Name: "jobs_duration_seconds",
Help: "Jobs duration distribution",
Buckets: []float64{1, 2, 5, 10, 20, 60},
},
[]string{"job_type"},
)

start := time.Now()
job.Run()
duration := time.Since(start)
jobsDurationHistogram.WithLabelValues(job.Type()).Observe(duration.Seconds())

Prometheus course: Edward Viaene & Jorn Jambers

Client Libraries - Golang Example
• Summary
prometheus.NewSummary()

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Pushing metrics
Pushing Metrics - Introduction
• https://github.com/prometheus/pushgateway

• Diagram
Push Pull
Metrics Metrics
App Push Gateway Prometheus

Prometheus course: Edward Viaene & Jorn Jambers

Pushing Metrics - Introduction
• Sometimes metrics cannot be scraped
Example: batch jobs, servers are not reachable due to NAT, rewal

• Pushgateway is used as an intermediary service which allows you to push metrics.

• Pitfall

• Most of the times this is a single instance so this results in a SPOF

• Prometheus’s automatic instance health monitoring is not possible

• The Pushgateway never forgets the metrics unless they are deleted via the api
example:
curl -X DELETE http://localhost:9091/metrics/job/prom_course/instance/localhost

Prometheus course: Edward Viaene & Jorn Jambers

fi
l

Pushing Metrics - Introduction

• Only 1 valid use case for the Pushgateway

• Service-level batch jobs and not related to a speci c machine

• If NAT or/both rewall is blocking you from using the pull mechanism

• Move the Prometheus server on the same network

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Pushing Metrics - Python Example
• Python example:
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway

registry = CollectorRegistry()
g = Gauge('job_last_success_unixtime', ‘Last time the course batch job has finished', registry=registry)
g.set_to_current_time()
push_to_gateway('localhost:9091', job='batchA', registry=registry)

• Pushgateway functions take a grouping key.

• push_to_gateway replaces metrics with the same grouping key

• pushadd_to_gateway only replaces metrics with the same name and grouping key

• delete_from_gateway deletes metrics with the given job and grouping key.

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Pushing Metrics - Go
Pushing Metrics - Go Example
• Go example:
package main
import (
"flag"
"log"
"net/http"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/prometheus/client_golang/prometheus/push"
)

gatewayUrl:="http://localhost:9091/"

throughputGuage := prometheus.NewGauge(prometheus.GaugeOpts{
Name: “throughput”,
Help: "Throughput in Mbps",
})
throughputGuage.Set(800)

if err := push.Collectors(
"throughput_job", push.HostnameGroupingKey(),
gatewayUrl, throughputGuage
); err != nil {
fmt.Println("Could not push completion time to Pushgateway:", err)
}

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Querying
Querying Metrics - Introduction
• Prometheus provides a functional expression language called PromQL

• Provides built in operators and functions

• Vector-based calculations like Excel

• Expressions over time-series vectors

• PromQL is read-only

• Example:
100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100)

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Querying - Expressions
Querying Metrics - Introduction
• Instant vector - a set of time series containing a single sample for each
time series, all sharing the same timestamp
Example: node_cpu_seconds_total

• Range vector - a set of time series containing a range of data points over
time for each time series
Example: node_cpu_seconds_total[5m]

• Scalar - a simple numeric oating point value

Example: -3.14

• String - a simple string value; currently unused

Example: foobar
Prometheus course: Edward Viaene & Jorn Jambers
fl
Prometheus
Querying - Operators
Querying Metrics - Introduction
• Arithmetic binary operators
Example: - (subtraction), * (multiplication), / (division), % (modulo), ^ (power/exponentiation)

• Comparison binary operators

Example: == (equal), != (not-equal), > (greater-than), < (less-than) ,>= (greater-or-equal), <=
(less-or-equal)

• Logical/set binary operators

Example: and (intersection), or (union), unless (complement)

• Aggregation operators
Example:sum (calculate sum over dimensions), min (select minimum over dimensions) ,max
(select maximum over dimensions), avg (calculate the average over dimensions), stddev
(calculate population standard deviation over dimensions), stdvar (calculate population standard
variance over dimensions), count (count number of elements in the vector), count_values (count
number of elements with the same value), bottomk (smallest k elements by sample value), topk
(largest k elements by sample value), quantile (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
Prometheus course: Edward Viaene & Jorn Jambers
Demo
Querying
Prometheus
Service Discovery
Service Discovery - Introduction
• De nition:
Service discovery is the automatic detection of devices and services
offered by these devices on a computer network.

• Not really a service discovery mechanism

static_configs:
- targets: ['localhost:9090']

• Cloud support for (AWS, Azure, Google,…)

• Cluster managers (Kubernetes, Marathon, …)

• Generic mechanisms (DNS, Consul, Zookeeper, …)

Prometheus course: Edward Viaene & Jorn Jambers
fi
Prometheus
Service Discovery - Example AWS
Service Discovery - Introduction
• EC2 Example:
Add following con g to /etc/prometheus/prometheus.yml
global:
scrape_interval: 1s
evaluation_interval: 1s

scrape_configs:
- job_name: 'node'
ec2_sd_configs:
- region: eu-west-1
access_key: PUT_THE_ACCESS_KEY_HERE
secret_key: PUT_THE_SECRET_KEY_HERE
port: 9100

• Make sure the user has the following IAM role: AmazonEC2ReadOnlyAcces

• Make sure you security groups allow access to port (9100, 9090)

Prometheus course: Edward Viaene & Jorn Jambers

fi
s

Service Discovery - Introduction

• EC2 Example:
Only monitor instances started with the name PROD
global:
scrape_interval: 1s
evaluation_interval: 1s

scrape_configs:
- job_name: 'node'
ec2_sd_configs:
- region: eu-west-1
access_key: PUT_THE_ACCESS_KEY_HERE
secret_key: PUT_THE_SECRET_KEY_HERE
port: 9100
relabel_configs:
# Only monitor instances with a tag Name starting with “PROD"
- source_labels: [__meta_ec2_tag_Name]
regex: PROD.*
action: keep
# Use the instance ID as the instance label
- source_labels: [__meta_ec2_instance_id]
target_label: instance

Prometheus course: Edward Viaene & Jorn Jambers

Service Discovery - Introduction
• EC2 Example:
Relabel ip adress to instance id for convenience
global:
scrape_interval: 1s
evaluation_interval: 1s

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Service Discovery - Example Kubernetes
Service Discovery - Introduction
• Kubernetes Example:
Add following con g to /etc/prometheus/prometheus.yml
- job_name:'kubernetes'

kubernetes_sd_configs:
-
api_servers:
- https://kubernetes.default.svc

in_cluster: true

basic_auth:
username: prometheus
password: secret
retry_interval:5s

- job_name:’kubernetes-service-endpoints'

kubernetes_sd_configs:
-
api_servers:
- https://kube-master.prometheuscourse.com

in_cluster: true

Prometheus course: Edward Viaene & Jorn Jambers

fi
Prometheus
Service Discovery - Example DNS
Service Discovery - DNS
• DNS Example:
Add following con g to /etc/prometheus/prometheus.yml
- job_name: mysql
dns_sd_configs:
- names:
- metrics.mysql.example.com
- job_name: haproxy
dns_sd_configs:
- names:
- metrics.haproxy.example.com

Prometheus course: Edward Viaene & Jorn Jambers

fi
Prometheus
Service Discovery - Example using le
fi
Service Discovery - Using le
• File Example:
Add following con g to /etc/prometheus/prometheus.yml
scrape_configs:
- job_name: 'dummy' # This is a default value, it is
mandatory.
file_sd_configs:
- files:
- targets.json

• Format target.json
[
{
"targets": [ "myslave1:9104", "myslave2:9104" ],
"labels": {
"env": "prod",
"job": "mysql_slave"
}
},
{
"targets": [ "mymaster:9104" ],
"labels": {
"env": "prod",
"job": "mysql_master"
}
}
]

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Prometheus
Exporters
Exporters - Introduction
• Build for exporting prometheus metrics from existing third-party metrics

• When Prometheus is not able to pull metrics directly(Linux sys stats, haproxy, …)

• Examples:
MySQL server exporter
Memcached exporter
Consul exporter
Node/system metrics exporter
MongoDB
Redis
Many more….

• https://prometheus.io/docs/instrumenting/exporters/

Prometheus course: Edward Viaene & Jorn Jambers

Exporters - Introduction
• We are already using one :-)
Check /etc/prometheus/prometheus.yml
- job_name: 'node_exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Alerting
Prometheus
Alerting - Introduction
Alerting - Introduction
• Alerting in Prometheus is separated into 2 parts

• Alerting rules in Prometheus server

• Alertmanager Alertmanage

Receivers
EMAIL
Routes

SLACK
Push alert

Rule 1 Rule 2 Rule n

Prometheu
Server

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Alerting - Alerting rules
Alerting Rules
• Rules live in Prometheus server con g

• Best practice to separate the alerts from the prometheus con g

• Add an include to /etc/prometheus/prometheus.yml

rule_files:
- "/etc/prometheus/alert.rules"

ALERT <alert name>

IF <expression>
• Alert format: [ FOR <duration> ]
[ LABELS <label set> ]
[ ANNOTATIONS <label set> ]

groups:
- name: example
rules:
• Alert example: - alert: cpuUsge
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{job='node_exporter',mode="idle"}[5m])) * 100) >
95
for: 1m
labels:
severity: critical
annotations:
summary: Machine under healvy load

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Alerting Rules
• Alerting rules allow you to de ne the alert conditions

• Alerting rules sent the alerts being red to an external service

• The format of these alerts is in the Prometheus expression language

• Example:
groups:
- name: Important instance
rules:

# Alert for any instance that is unreachable for >5 minutes.

- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
Prometheus
Alerting - Alertmanager
Alertmanager
• Alertmanager handles the alerts red by the prometheus server

• Handles deduplication, grouping and routing of alerts

• Routes alerts to receivers (Pagerduty, Opsgenie, email, Slack,…)

Prometheus course: Edward Viaene & Jorn Jambers

fi
Alertmanager
• Alertmanager Con guration (/etc/alertmanager/alertmanager.yml):
global:
smtp_smarthost: 'localhost:25'
smtp_from: '[email protected]'
smtp_auth_username: ''
smtp_auth_password: ''

templates:
- '/etc/alertmanager/template/*.tmpl'

route:
repeat_interval: 1h
receiver: operations-team

receivers:
- name: 'operations-team'
email_configs:
- to: '[email protected]'
slack_configs:
- api_url: https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX
channel: '#prometheus-course'
send_resolved: true

Prometheus course: Edward Viaene & Jorn Jambers

fi
Alertmanager
• Prometheus Con guration (/etc/prometheus/prometheus.yml):
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:
- targets: ['localhost:9090']

Prometheus course: Edward Viaene & Jorn Jambers

fi
Alertmanager
• Concepts:

• Grouping: Groups similar alerts into 1 noti cation

• Inhibition: Silence other alerts if one speci ed alert is already red

• Silences: A simple way to mute certain noti cations

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
fi
Alertmanager
• High availability

• You can create a high available Alertmanager cluster using mesh con g

• Do not load balance this service!

• Use a list of Alertmanager nodes in Prometheus con g

• All alerts are sent to all known Alertmanager nodes

• No need to monitor the monitoring

• Guarantees the noti cation is at least send once

Prometheus course: Edward Viaene & Jorn Jambers
fi
fi
fi
Alertmanager
• Alert states:
Inactive - No rule is met
Pending - Rule is met but can be suppressed due to validations
Firing - Alert is sent to the con gured channel(mail,Slack,…)

• Runs on port :9093

Prometheus course: Edward Viaene & Jorn Jambers

fi
Alertmanager
• Notifying multiple destinations
route:
repeat_interval: 1h
receiver: operations-team

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Setting up alerts
Prometheus
Setting up alerts - Demo
Setting up alerts
• Install Alertmanager

• Create con g for the Alertmanager

• Mail

• Slack

• Alter prometheus con g

• Setup an alert

• See the noti cation coming in when an alert is red

Prometheus course: Edward Viaene & Jorn Jambers
fi
fi
fi
fi
Prometheus internals
Prometheus
Architecture
Architecture

From: https://github.com/prometheus/prometheus

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Storage
Storage
• You can use the default local on-disk storage, or optionally the remote
storage syste

• Local storage: a local time series database in a custom Prometheus

format

• Remote storage: you can read/write samples to a remote system in a

standardized format

• Currently it uses a snappy-compressed protocol buffer encoding

over HTTP, but might change in the future (to use gRPC or HTTP/2)

Prometheus course: Edward Viaene & Jorn Jambers

Remote Storage
• Remote storage is primarily focussed at long term storage

• Currently there are adapters available for the following solutions:

AppOptics: write Graphite: write

Chronix: write In uxDB: read and write
Cortex: read and write OpenTSDB: write
CrateDB: read and write PostgreSQL/TimescaleDB: read and write
Gnocchi: write SignalFx: write

Source: https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage

Prometheus course: Edward Viaene & Jorn Jambers

fl
Local Storage
• Prometheus >=2.0 uses a new storage engine which dramatically increases scalability

• Ingested samples are grouped in blocks of two hours

• Those 2h samples are stored in separate directories (in the data directory of
prometheus)

• Writes are batched and written to disk in chunks, containing multiple data points

directory 1 directory 2 directory 3

2h of data: 2h of data: 2h of data:

chunks/00000 chunks/00000 chunks/00000

chunks/000002 chunks/000002

Prometheus course: Edward Viaene & Jorn Jambers

Local Storage
• Every directory also has an index le (index) and a metadata le (meta.json)

• It stores the metric names and the labels, and provides an index from the
metric names and labels to the series in the chunk les

directory 1 directory 2 directory 3

chunks/000001 chunks/000001
chunks/000001
chunks/000002 chunks/000002
meta.jso
meta.jso meta.jso
index
index index

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
Local Storage
• The most recent data is kept in memory

• You don’t want to loose the in-memory data during a crash, so the data also
needs to be persisted to disk. This is done using a write-ahead-log (WAL)

directory 1 directory 2 directory 3 wal:

chunks/000001 chunks/000001
chunks/000001
chunks/000002 chunks/000002 000001
meta.json
meta.json meta.json 000002
index
index index

Prometheus course: Edward Viaene & Jorn Jambers

Local Storage
• Write Ahead Log (WAL)

• It’s quicker to append to a le (like a log) than making (multiple)

random read/writes

• If there’s a server crash and the data from memory is lost, then the WAL
will be replayed

• This way, no data will be lost or corrupted during a crash

Prometheus course: Edward Viaene & Jorn Jambers

fi
Local Storage
• When series gets deleted, a tombstone le gets created

• This is more ef cient than immediately deleting the data from the chunk les, as
the actual delete can happen at a later time (e.g. when there’s not a lot of load)

directory 1 directory 2 directory 3

2h of data: 2h of data: 2h of data:

chunks/000001 chunks/000001 chunks/000001

chunks/000002 chunks/000002 chunks/000002
meta.json meta.json meta.json
index index index
tombstone tombstone tombstone

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
Local Storage
• The initial 2-hour blocks are merged in the background to form longer
blocks

• This is called compaction

directory 1 directory 2+3

2h of data: 4h of data

chunks/000001 chunks/000001
chunks/000002 chunks/000002

Prometheus course: Edward Viaene & Jorn Jambers

Local Storage
• Block characteristics:

• A block on the lesystem is a directory with chunks

• You can see each block as a fully independent database containing

all time series for the window

• Every block of data, except the current block, is immutable (no

changes can be made)

• These non-overlapping blocks are actually a horizontal partitioning of

the ingested time series data

Prometheus course: Edward Viaene & Jorn Jambers

fi
Local Storage
• This horizontal partitioning gives a lot of bene ts:

• When querying, the blocks not in the time range can be skipped

• When completing a block, data only needs to be added, and not

modi ed (avoids write-ampli cation)

• Recent data is kept in memory, so can be queried quicker

• Deleting old data is only a matter of deleting directories on the

lesystem

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
fi
Local Storage
• Compaction:

• When querying, blocks have to be merged together to be able to

calculate the results

• Too many blocks could cause too much merging overhead, so blocks
are compacted

• 2 blocks are merged and form a newly created (often larger) block

• Compaction can also modify data: dropping deleted data or

restructuring the chunks to increase the query performance

Prometheus course: Edward Viaene & Jorn Jambers

Local Storage
• The index:

• Having horizontal partitioning already makes most queries quicker,

but not those that need to go through all the data to get the result

• The index is an inverted index to provide better query performance,

also in cases where all data needs to be queried

• Each series is assigned a unique ID (e.g. ID 1, 2, and 3)

• The index will contain an inverted index for the labels, for example
for label env=production, it’ll have 1 and 3 as IDs if those series
contain the label env=production
Prometheus course: Edward Viaene & Jorn Jambers
Local Storage
• What about Disk size?

• On average, Prometheus needs 1-2 bytes per sample

• You can use the following formula to calculate the disk space needed:

needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample

Prometheus course: Edward Viaene & Jorn Jambers

Local Storage
• How to reduce disk size?

• You can increase the scrape interval, which will get you less data

• You can decrease the targets or series you scrape

• Or you can can reduce the retention (how long you keep the data)

--storage.tsdb.retention: This determines when to remove old data. Defaults to 15d.

Prometheus course: Edward Viaene & Jorn Jambers

References
• To read the full story of Prometheus time series database, read the blog
post from Fabian Reinartz at https://fabxc.org/tsdb/

Prometheus course: Edward Viaene & Jorn Jambers

Prometheus
Security
Security
• At the moment Prometheus doesn’t offer any support for authentication or
encryption (TLS) on the server components

• They argue that they’re focussing on building a monitoring solution, and

want to avoid having to implement complex security features

• You can still enable authentication and TLS, using a reverse proxy

• This is only valid for server components, prometheus can scrape TLS and
authentication enabled targets

• See tls_con g in the prometheus con guration to con gure a CA certi cate,
user certi cate and user key

• You’d still need to setup a reverse proxy for the targets itself
Prometheus course: Edward Viaene & Jorn Jambers
fi
fi
fi
fi
fi
Demo
Prometheus TLS and authentication
Demo
Prometheus mutual TLS for targets
Prometheus Use Cases
Monitoring a web app
Prometheus with python- ask and MySQL
fl
Monitoring a web app
• I’m going to integrate prometheus monitoring with a web application
based on python

• I’ll use the of cial prometheus_client library for Python

• Flask is the web framework I’m going to use

• It will create an http server and I’ll able to con gure routes (e.g. /query)

• I’ll use mysqlclient for python to query a MySQL database

• I’ll include one normal query and one “bad behaving” query that will
take between 0 and 10 seconds to execute

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
The web app
Python web app Client (curl, wget, browser)

ask http server (port 5000)

MySQL Database (port 3306)

prometheus monitoring (port 8000)

Prometheus (port 9090)

Prometheus course: Edward Viaene & Jorn Jambers

fl
Monitoring a web app
• I’m going to use the Counter and the Histogram metric types to capture the data:

• A Counter to capture the amount of times an http endpoint is hit + to capture the
amount of times a MySQL query is executed

• The value of the Counter must always increase, that’s why you should take
the Counter type for these types of data

• A Histogram to capture the latency of the HTTP requests and the MySQL Queries

• A Histogram samples observations (like latencies) and counts them in

con gurable buckets. It also provides a sum of all observed values.

• The default buckets are intended to cover a typical web/rpc request from
milliseconds to seconds

Prometheus course: Edward Viaene & Jorn Jambers

fi
Monitoring a web app
• This is how I’m going to de ne the data types in Python for Prometheus:

from prometheus_client import Counter, Histogram

FLASK_REQUEST_LATENCY = Histogram(' ask_request_latency_seconds', 'Flask Request Latency',

['method', 'endpoint'])
FLASK_REQUEST_COUNT = Counter(' ask_request_count', 'Flask Request Count',
['method', 'endpoint', 'http_status'])

MYSQL_REQUEST_LATENCY = Histogram('mysql_query_latency_seconds', 'MYSQL Query Latency',

['query'])
MYSQL_REQUEST_COUNT = Counter('mysql_query_count', 'Flask Request Count',
['query'])

Prometheus course: Edward Viaene & Jorn Jambers

fl
fl
fi
Monitoring a web app
• This is how we can calculate the latency of a query:

start_time = time.time()

sql = “select * from table”

# do the query

query_latency = time.time() - start_time

MYSQL_REQUEST_LATENCY.labels(sql[:50]).observe(query_latency)
MYSQL_REQUEST_COUNT.labels(sql[:50]).inc()

Prometheus course: Edward Viaene & Jorn Jambers

Demo
Monitor a web application with Prometheus
Demo
Monitor a web application with Prometheus - apdex score
Monitoring a web app
Prometheus with Java Spring boot
Monitoring a web app
• Introduction

• Application's health + Metrics

• Notice unwanted behaviour

• Monoliths as well as microservices

• Crucial in microservices architecture

• “To measure is to know”

Prometheus course: Edward Viaene & Jorn Jambers

Monitoring a web app
• We are going to integrate prometheus monitoring with a web
application based on Java Spring Boot

• We will use following :

• Spring Boot

• Spring Boot Actuato

• Micromete

• We are also going to do a demo with an example.

Prometheus course: Edward Viaene & Jorn Jambers

The web app

Spring Boot web app Client (curl, wget, browser)

Spring Boot http server (port 8080

/api/demo & /api/delayed/demo

prometheus monitoring (port 8080

/actuator/prometheus

Prometheus

Prometheus course: Edward Viaene & Jorn Jambers

)

Monitoring a web app

• Spring Boot Actuato

• Sub-Project of Spring boot

• Production ready endpoints

• /actuator is the common pre x of these endpoints

• Protected by default

• Adjustable in application.properties

• Expose all: management.endpoints.web.exposure.include=*

Prometheus course: Edward Viaene & Jorn Jambers

fi
Monitoring a web app
• Micromete

• Vendor-Neutral application metrics facade

• Support for Prometheus and many others:

• AWS Cloudwatch, Datadog, In uxDB/Telegraf, New Relic, …

• Transforms /actuator/metrics data into data your monitoring system

understands

• Only a vendor-speci c micrometer dependency in your application is

required
Prometheus course: Edward Viaene & Jorn Jambers
r

fi
fl
Monitoring a web app
• Micrometer

• pom.xml example
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>

<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Prometheus course: Edward Viaene & Jorn Jambers

Monitoring a web app
• Spring Boot

• Code example
…
import io.micrometer.core.instrument.Metrics;
…
private Counter runCounter = Metrics.counter("runCounter");
…
@GetMapping("/api/demo")
@Timed
public String apiUse() throws InterruptedException {
runCounter.increment();
log.info("Hello world app accessed on /api/demo");
return "Hello world";
}

Prometheus course: Edward Viaene & Jorn Jambers

Demo
Monitor and instrument a Spring boot application
Grafana
Grafana Provisioning
Grafana Provisioning
• In one of the rst lectures I showed you how to setup Grafana using the UI

• Rather than using the UI, you can also use yaml and json les to provision
Grafana with datasources and dashboards

• This is a much more powerful way of using Grafana, as you can test new
dashboards rst on a dev / test server, then import the newly created
dashboards to production

• You can do the import manually through the UI, or using yaml and json
les

• When using les, you can keep les within version control to keep
changes, revisions and backups
Prometheus course: Edward Viaene & Jorn Jambers
fi
fi
fi
fi
fi
fi
Grafana Provisioning
• The con guration of Grafana is all kept in /etc/grafana:

/etc/grafana/:
-rw-r----- 1 root grafana 14K Jul 17 12:30 grafana.ini
-rw-r----- 1 root grafana 3.4K Jul 17 12:30 ldap.toml
drwxr-xr-x 4 root grafana 4.0K Jul 17 13:15 provisioning/
/etc/grafana/provisioning/
drwxr-xr-x 2 root grafana 4.0K Jul 17 14:56 dashboards/
drwxr-xr-x 2 root grafana 4.0K Jul 17 15:34 datasources/

• The data is kept in /var/lib/grafana:

/var/lib/grafana
drwxr-xr-x 2 root root 4.0K Jul 17 15:47 dashboards/
-rw-r----- 1 grafana grafana 500K Jul 17 15:48 grafana.db
drwxr-x--- 2 grafana grafana 4.0K Jul 17 12:31 plugins/
drwx------ 5 grafana grafana 4.0K Jul 17 12:40 sessions/

Prometheus course: Edward Viaene & Jorn Jambers

Grafana Provisioning
• You can change the database & paths in /etc/grafana/grafana.ini
[paths]
# Path to where grafana can store temp les, sessions, and the sqlite3 db (if that is used)
;data = /var/lib/grafana

# Directory where grafana can store logs

;logs = /var/log/grafana

# Directory where grafana will automatically scan and look for plugins
;plugins = /var/lib/grafana/plugins

# folder that contains provisioning con g les that grafana will apply on startup and while running.
;provisioning = conf/provisioning
…
[database]
# Either "mysql", "postgres" or "sqlite3", it's your choice
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
;password =

Prometheus course: Edward Viaene & Jorn Jambers

fi
fi
fi
Demo
Grafana Provisioning
Prometheus Use Cases
Cloudwatch exporter
Use Cases - Cloudwatch
• Cloudwatch exporter

• Installation

• Con guration (exporter + AWS)

• Charges + measuring them

• Querying metrics

Prometheus course: Edward Viaene & Jorn Jambers

fi
Prometheus Use Cases
Consul integration
Consul integration
• Consul is a distributed, highly available solution providing:

• A Service Mesh

• Service Discovery

• Health checks for your services

• A Key-Value store

• Secure Service Communications

• Multi-datacenter support

• Consul is often deployed in conjunction with Docker

Prometheus course: Edward Viaene & Jorn Jambers
Consul integration
• There are 2 integrations that are interesting to use:

• 1) Prometheus can scrape Consul’s metrics and provide you with all
sorts of information about your running services

• Consul provides Service Discovery, so it knows where services are

running and what the current state of it is

• 2) Consul can be integrated within Prometheus to automatically add the

services as target

• Consul will discover your services, and these can then be

automatically added to Prometheus as a target
Prometheus course: Edward Viaene & Jorn Jambers
s

Consul integration
• In the next demo I’ll focus on the Prometheus integration with Consul, not really on
implementing consul itself

• I’ll show you the installation of consul, but not how to integrate consul with your
infrastructure (it’s out of scope for this Prometheus course)

• I’ll manually register a service to consul, rather than setting up service

discovery

• In production environments, where you have a lot of services, service

discovery using consul will allow you to register your services automatically

• If you are interested in Consul, have a look at the documentation at https://

www.consul.io/ and you can nd the service registrator for docker at https://
github.com/gliderlabs/registrator
Prometheus course: Edward Viaene & Jorn Jambers
fi
Demo
Consul integration
Prometheus Use Cases
EC2 auto discovery
EC2 auto discovery
• Service discovery is the automatic detection of devices and services
offered by these devices on a computer network.

• In this Use Case we will:

• Create prerequisites in AWS (IAM role, Security Groups, EC2 instances)

• Alter Prometheus con g (/etc/prometheus/prometheus.yml)

• Query the data in Grafana

• https://github.com/in4it/prometheus-course/blob/master/use-cases/ec2-
auto-discovery/lab.txt
Prometheus course: Edward Viaene & Jorn Jambers
fi
Prometheus on Kubernetes
Getting Kubernetes metrics

Capgemini Resume Template
No ratings yet
Capgemini Resume Template
3 pages
Prometheus Ebook v2
75% (4)
Prometheus Ebook v2
231 pages
Turnbull James Monitoring With Prometheus PDF
100% (1)
Turnbull James Monitoring With Prometheus PDF
394 pages
Prometheus Concepts
No ratings yet
Prometheus Concepts
4 pages
DevOps Shack _ Comprehensive Monitoring Guide
No ratings yet
DevOps Shack _ Comprehensive Monitoring Guide
41 pages
House Dzone Refcard 293 Getting Started Prometheus
No ratings yet
House Dzone Refcard 293 Getting Started Prometheus
6 pages
16 - Prometheus Handout
No ratings yet
16 - Prometheus Handout
31 pages
Prometheus and Grafana Monitoring Tools 1703260158
No ratings yet
Prometheus and Grafana Monitoring Tools 1703260158
59 pages
2019 05 15 Prometheus 101 Continuous Lifecycle Alexander Schwartz
No ratings yet
2019 05 15 Prometheus 101 Continuous Lifecycle Alexander Schwartz
39 pages
How To Install and Configure Prometheus - Grafana - and Node Exporter - Linkedin
No ratings yet
How To Install and Configure Prometheus - Grafana - and Node Exporter - Linkedin
7 pages
Monitor Health Graf Prom
No ratings yet
Monitor Health Graf Prom
34 pages
Prom Notes
No ratings yet
Prom Notes
47 pages
An Introduction To Prometheus: Brian Brazil Founder
No ratings yet
An Introduction To Prometheus: Brian Brazil Founder
42 pages
Setup of Prometheus, Node Exporter, and Grafana
No ratings yet
Setup of Prometheus, Node Exporter, and Grafana
18 pages
Prometheus
No ratings yet
Prometheus
17 pages
Booking Confirmation
No ratings yet
Booking Confirmation
56 pages
Prometheus
No ratings yet
Prometheus
37 pages
Monitoring Ec2 Instance
No ratings yet
Monitoring Ec2 Instance
15 pages
3. SRE-Practical work 3 Monitoring and Alerting Setup
No ratings yet
3. SRE-Practical work 3 Monitoring and Alerting Setup
6 pages
Promotheus 01
No ratings yet
Promotheus 01
4 pages
Monitoring and Trending With Prometheus
No ratings yet
Monitoring and Trending With Prometheus
8 pages
Essential Prometheus Interview Questions Detailed Answers
No ratings yet
Essential Prometheus Interview Questions Detailed Answers
7 pages
Intro To Prometheus Workshop - Grafana
No ratings yet
Intro To Prometheus Workshop - Grafana
67 pages
Monitoring With Prometheus
100% (1)
Monitoring With Prometheus
37 pages
Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
No ratings yet
Application Monitoring With Prometheus: Intro, Practical Tips, and Adform's Experience
41 pages
Monitoring With Prometheus (James Turnbull)
No ratings yet
Monitoring With Prometheus (James Turnbull)
38 pages
(Prometheus & Grafana) Use and Create Own Performance Dashboard
No ratings yet
(Prometheus & Grafana) Use and Create Own Performance Dashboard
10 pages
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
No ratings yet
29 Using Prometheus Alertmanager Node Exporter To Monitor A Companys Geo Distributed Infrastructure
12 pages
Network Monitoring
No ratings yet
Network Monitoring
8 pages
Kubernetes Monitoring With Prometheus Grafana
No ratings yet
Kubernetes Monitoring With Prometheus Grafana
6 pages
prometheus_monitor
No ratings yet
prometheus_monitor
10 pages
Prometheus Part 13 Use Cases
No ratings yet
Prometheus Part 13 Use Cases
24 pages
Introduction To Prometheus PromQL. Local Setup Included - by Ivan Polovyi - Level Up Coding
No ratings yet
Introduction To Prometheus PromQL. Local Setup Included - by Ivan Polovyi - Level Up Coding
35 pages
prom_qna
No ratings yet
prom_qna
43 pages
Monitoring
No ratings yet
Monitoring
63 pages
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
No ratings yet
SRECon EMEA 2017 - Monitoring Cloudflare's Planet-Scale Edge Network With Prometheus
76 pages
All_MonitoringTools_configurations
No ratings yet
All_MonitoringTools_configurations
5 pages
Grafana How To
No ratings yet
Grafana How To
4 pages
Prometheus
No ratings yet
Prometheus
34 pages
Mastering Prometheus & Grafana
No ratings yet
Mastering Prometheus & Grafana
18 pages
Assignment 3
No ratings yet
Assignment 3
13 pages
Metrics Docker Collection Using Prometheus
No ratings yet
Metrics Docker Collection Using Prometheus
9 pages
LFS241 - Monitoring Systems and Services With Prometheus: About This Course
No ratings yet
LFS241 - Monitoring Systems and Services With Prometheus: About This Course
2 pages
Prometheus Grafana Setup
100% (1)
Prometheus Grafana Setup
5 pages
Prometheus_and_Grafana_1712312993
No ratings yet
Prometheus_and_Grafana_1712312993
6 pages
Prometheus Loves Grafana
No ratings yet
Prometheus Loves Grafana
14 pages
Prometheus and Grafana
No ratings yet
Prometheus and Grafana
7 pages
(Ebook) Prometheus: Up & Running - Infrastructure and Application Performance Monitoring by Julien Pivotto, Brian Brazil ISBN 9781098131135, 1098131134 download pdf
100% (2)
(Ebook) Prometheus: Up & Running - Infrastructure and Application Performance Monitoring by Julien Pivotto, Brian Brazil ISBN 9781098131135, 1098131134 download pdf
81 pages
Setup Prometheus Monitoring On Kubernetes
No ratings yet
Setup Prometheus Monitoring On Kubernetes
6 pages
Devo p Monitoring
No ratings yet
Devo p Monitoring
15 pages
Grafana 02
No ratings yet
Grafana 02
6 pages
Devops Ultimate Monitoring Project
No ratings yet
Devops Ultimate Monitoring Project
17 pages
Prometheus Certified Associate (PCA)
No ratings yet
Prometheus Certified Associate (PCA)
1 page
Mastering Prometheus: Monitor your infrastructure, applications, and services with expert tips and tricks
From Everand
Mastering Prometheus: Monitor your infrastructure, applications, and services with expert tips and tricks
William Hegedus
No ratings yet
Observing Enterprise Kubernetes Clusters at Scale
No ratings yet
Observing Enterprise Kubernetes Clusters at Scale
59 pages
16 Monitoring Part4 02
No ratings yet
16 Monitoring Part4 02
5 pages
Monotoring Tool
No ratings yet
Monotoring Tool
3 pages
Prometheus and Grafana
No ratings yet
Prometheus and Grafana
6 pages
SESSION6 - Real Time Monitoring - 1
No ratings yet
SESSION6 - Real Time Monitoring - 1
16 pages
1735258490619
No ratings yet
1735258490619
18 pages
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
From Everand
Advanced Penetration Testing for Highly-Secured Environments: The Ultimate Security Guide
Allen Lee
4.5/5 (6)
Polymorphism in Java
No ratings yet
Polymorphism in Java
4 pages
Com - Pan.parallelspace - Plug64 Logcat
No ratings yet
Com - Pan.parallelspace - Plug64 Logcat
6 pages
Unit 2 - Lecture 10 - RDBMS
No ratings yet
Unit 2 - Lecture 10 - RDBMS
15 pages
Professional Summary: Web Developer
No ratings yet
Professional Summary: Web Developer
3 pages
Sample Final Exam Software Engineer
100% (1)
Sample Final Exam Software Engineer
5 pages
Working With Files: Loop Xpath Query (XML Files) Loop Json Query (Json Files)
No ratings yet
Working With Files: Loop Xpath Query (XML Files) Loop Json Query (Json Files)
3 pages
Class - 8 - Computer - Chapter-2 (Introduction To Java)
No ratings yet
Class - 8 - Computer - Chapter-2 (Introduction To Java)
12 pages
XCPT PLUB - DESKTOP 23-01-21 14.52.12
No ratings yet
XCPT PLUB - DESKTOP 23-01-21 14.52.12
5 pages
1 Node JS Part 1
No ratings yet
1 Node JS Part 1
62 pages
Disabling Dr. Watson Debugger
No ratings yet
Disabling Dr. Watson Debugger
3 pages
Keyword PDF
No ratings yet
Keyword PDF
9 pages
Introduction To Mirantis (France)
No ratings yet
Introduction To Mirantis (France)
15 pages
Kubernetes Interview Questions and Answers
No ratings yet
Kubernetes Interview Questions and Answers
41 pages
S Functions in Matlab R2013
100% (1)
S Functions in Matlab R2013
594 pages
Redocly Tutorial - Publishing
No ratings yet
Redocly Tutorial - Publishing
26 pages
SE CH04 Software Requirement Analysis
No ratings yet
SE CH04 Software Requirement Analysis
77 pages
Harsh Resume (1)
No ratings yet
Harsh Resume (1)
1 page
Esxi vs. Esx: A Comparison of Features: Prepared By: The Esx Team and Vmware Communities
No ratings yet
Esxi vs. Esx: A Comparison of Features: Prepared By: The Esx Team and Vmware Communities
5 pages
C167 Programming Tools
No ratings yet
C167 Programming Tools
11 pages
Krishna Mohan - Short
No ratings yet
Krishna Mohan - Short
3 pages
Csc418 Devops Cdf Ver3.1
No ratings yet
Csc418 Devops Cdf Ver3.1
2 pages
write a program to generate a (1)
No ratings yet
write a program to generate a (1)
3 pages
Python Question Solution
No ratings yet
Python Question Solution
11 pages
Lecture 1 DBMS Concepts and Architecture Introduction of DBMS
No ratings yet
Lecture 1 DBMS Concepts and Architecture Introduction of DBMS
16 pages
Comando Ecs
No ratings yet
Comando Ecs
3 pages
ZSXMB Moni Viewer
No ratings yet
ZSXMB Moni Viewer
38 pages
6CS4-23 Python Lab
No ratings yet
6CS4-23 Python Lab
26 pages
(IJIT-V10I2P1) :sugashini K, Madhumathi N, Nivethetha A S, Rajarajeshwari P V
No ratings yet
(IJIT-V10I2P1) :sugashini K, Madhumathi N, Nivethetha A S, Rajarajeshwari P V
14 pages
Lenovo Reference Architecture For OpenShift 4.13
No ratings yet
Lenovo Reference Architecture For OpenShift 4.13
77 pages