SlideShare a Scribd company logo
M O N I T O R I N G A N D L O G G I N G 

I N W O N D E R L A N D
H E L P, W H AT I S H A P P E N I N G ?
PA U L S E I F F E RT
Team Leader at Jimdo,

Traveller, Foodie, Runner
@seiffertp

paul.seiffert@gmail.com
Monitoring and Logging in Wonderland
W O N D E R L A N D
• Jimdo’s internal PaaS that runs 250 services
• 2500 Docker containers at a time
• 600 deployments per Day
W O N D E R L A N D
W O N D E R L A N D
AW S
O T H E R S E R V I C E
P R O V I D E R S
I N F R A S T R U C T U R E A U T O M AT I O N
A P I S
M O N I T O R I N G , 

L O G G I N G
C L I T O O L S
WONDERLAND
O T H E R T O O L I N G
W O N D E R L A N D
W O N D E R L A N D
A P I
AW S E C S
E C S
A G E N T
L O G G I N G 

D A E M O N
M E T R I C 

D A E M O N
EC2

Instance
• Your team is responsible for the software component
that delivers websites of 20m customers
• You are on-call this night
I M A G I N E …
4 : 0 0 A M
4 : 0 1 A M
4 : 0 1 A M
Partial outage of

web delivery component
• either because a health check failed
• or because a metric exceeded a configured threshold
PA G E R D U T Y C A L L S
H E A LT H
C H E C K S
A L E RT 

M A N A G E R
P R O M E T H E U S
• All services on Wonderland: Route53 health checks
• Infrastructure components: Pingdom checks
A P I H E A LT H C H E C K S
GET /health

HTTP/1.1 200 OK
• Workers notify a health check service after each execution
• Prometheus pushgateway
• cronitor.io
• healthchecks.io
• If not notified for a certain time an alert is created
W O R K E R H E A LT H C H E C K S
Run tests against production periodically,

monitor results, and alert on issues
S E M A N T I C M O N I T O R I N G
S Y N T H E T I C M O N I T O R I N G
4 : 1 0 A M
Service still running
S E R V I C E D A S H B O A R D
G R A FA N A
• Each service running on Wonderland automatically has a
dashboard showing key metrics for debugging
• Developers can create custom dashboards for more detailed
analysis
• Grafana pulls data from Prometheus instances
P R O M E T H E U S
• Semi-centralized metric system
• Pull-based metric retrieval
• On-the-fly calculation of derived metrics
M E T R I C S
I N F R A S T R U C T U R E M E T R I C S
S Y S T E M M E T R I C S
A P P L I C AT I O N M E T R I C S
I N F R A S T R U C T U R E M E T R I C S
P R O M E T H E U S
C L O U D WAT C H
E X P O RT E R
AW S
C U S T O M
E X P O RT E R S
W O N D E R L A N D
A P I S
E X A M P L E S
aws_autoscaling_group_desired_capacity_average{
auto_scaling_group_name="crims",

job="cloudwatch_exporter"

}
aws_elb_request_count_sum{

cluster=“crims",

job="wonderland_elb_exporter",

service_name="web-prod"

}
S Y S T E M M E T R I C S
P R O M E T H E U S
C O L L E C T D
C A D V I S O R
E X A M P L E S
container_memory_rss{

container_label_cluster="crims",

container_label_container_name="web-prod--web",

image="web-prod:abc123",

instance="10.8.4.91:9104",

job=“crims_cadvisor_metrics"

}
collectd_memory{

instance="10.8.4.42:9103",

job="crims_collectd_metrics",

memory="free"

}
A P P L I C AT I O N M E T R I C S
P R O M E T H E U S
C O N TA I N E R A
C O N TA I N E R B
…
GET /metrics
P R O M E T H E U S
C O N TA I N E R A
C O N TA I N E R B
…
W O N D E R L A N D
S E R V I C E
D I S C O V E RY
W O N D E R L A N D
A P I
update

config
locate



containers
scrape

metrics
and

reload
S E R V I C E D I S C O V E RY
D O W N L O A D E R
get scrape

targets
M E T R I C R E T E N T I O N
http_requests_total{instance=“10.8.3.101:80”} = 53

http_requests_total{instance=“10.8.3.102:80”} = 81

http_requests_total{instance=“10.8.3.103:80”} = 2
...
job:http_requests_total:sum = sum(http_requests_total) without (instance)
Automatically generated recording rules:
L O N G - T E R M -
P R O M E T H E U S
S H O RT- T E R M 

P R O M E T H E U S
scrape



filtered metrics
'match[]':
- '{job="application_metrics", instance=""}'
32
DAYS
30
MIN
F E D E R AT I O N
L O N G - T E R M -
P R O M E T H E U S
S H O RT- T E R M 

P R O M E T H E U S
scrape



filtered metrics
http_requests_total{instance=“10.8.3.101:80”}

http_requests_total{instance=“10.8.3.102:80”}

http_requests_total{instance=“10.8.3.103:80”}

...

job:http_requests_total:sum{}
job:http_requests_total:sum{}
S E R V I C E D A S H B O A R D
4 : 1 2 A M
Auto-Scaling broken
L E T ’ S TA K E A L O O K AT
T H E L O G S
• Centralised logging is a must-have in a distributed
system
• It should be very easy to gather all information that
concerns a service
C E N T R A L I S E D L O G G I N G
• Output of all services running on Wonderland is stored
centrally
• Optionally logs are parsed with configurable formats
C E N T R A L I S E D L O G G I N G
$ cat wonderland.yaml

---
components:
- name
image: my-nginx-image
logging:
types:
- access_log
- error_log_nginx
C E N T R A L I S E D L O G G I N G
D O C K E R L O G B E AT L O G Z . I O
fluentd



protocol
lumberjack



protocol
Wonderland Logbeat
• receives logs via fluent protocol,
• parses them,
• adds metadata,
• and streams them to our logging provider logz.io
Monitoring and Logging in Wonderland
T H E T R U T H
D O C K E R L O G B E AT L O G Z . I O
fluentd



protocol
lumberjack



protocol
T H E T R U T H
D O C K E R
L O G B E AT L O G Z . I O
fluentd
lumberjack
D O C K E R L O G -
S T R E A M
PA P E RT R A I L .
C O M
syslog
We are in a migration right now.
4 : 1 7 A M
You find this log message of the service
autoscaler:
Unable to scale-out service “web-
delivery”. Configured maximum number
of instances reached.
4 : 1 7 A M
You increase the maximum number of
instances:
$ cat wonderland.yaml 

[…]

auto-scaling:

min-instances: 60

max-instances: 150
4 : 2 0 A M
Back to bed
2 : 0 0 P M
In the PMA for this night’s incident, you create the
action item to
Monitor the number of instances of web-delivery
to detect potential breaches of auto-scaling limits
before affecting the system’s health
Q U E S T I O N S ?
T H A N K Y O U
Open positions:
• Senior Infrastructure Engineer
• Senior Backend Engineer
• Senior Frontend Engineer
jobs@jimdo.com
F U RT H E R R E A D I N G / S O U R C E S
• Beyer, Jones, Petoff & Murphy

Site Reliability Engineering
• Susan Fowler

Production-Ready Microservices
• Sam Newman

Building Microservices
• Stripe / Increment

On-Call (https://increment.com/on-call/)
• Mathias Lafeldt & Paul Seiffert

A Journey Through Wonderland

(https://speakerdeck.com/mlafeldt/a-journey-through-wonderland)
F O T O S
• Marcel Stockmann

https://www.flickr.com/photos/marcelstockmann/33068471286
• Michael Theis

https://www.flickr.com/photos/huskyte/6931056896
Ad

More Related Content

What's hot (8)

Elapsed time
Elapsed timeElapsed time
Elapsed time
Chester Hartin
 
Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!
Michael Arnold
 
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
Christian Hallqvist
 
Writing nagios plugins in perl
Writing nagios plugins in perlWriting nagios plugins in perl
Writing nagios plugins in perl
Jose Luis Martínez
 
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database SystemsHuhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Jeff Smith
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
StampedeCon
 
Вячеслав Крюков, Ivinco
Вячеслав Крюков, IvincoВячеслав Крюков, Ivinco
Вячеслав Крюков, Ivinco
Ontico
 
Log
LogLog
Log
tapputanker
 
Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!Admins: Smoke Test Your Hadoop Cluster!
Admins: Smoke Test Your Hadoop Cluster!
Michael Arnold
 
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
ETH_Anomaly-Detection-in-Netflows-PPT_v2_of_year_2013
Christian Hallqvist
 
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database SystemsHuhdoop?: Uncertain Data Management on Non-Relational Database Systems
Huhdoop?: Uncertain Data Management on Non-Relational Database Systems
Jeff Smith
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
StampedeCon
 
Вячеслав Крюков, Ivinco
Вячеслав Крюков, IvincoВячеслав Крюков, Ivinco
Вячеслав Крюков, Ivinco
Ontico
 

Similar to Monitoring and Logging in Wonderland (20)

Meteor WWNRW Intro
Meteor WWNRW IntroMeteor WWNRW Intro
Meteor WWNRW Intro
Stephan Hochhaus
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
Stephan Hochhaus
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
New Relic
 
Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019
Michał Kurzeja
 
Deployments in one click!
Deployments in one click!Deployments in one click!
Deployments in one click!
Manuel de la Peña Peña
 
Synthetic and rum webinar
Synthetic and rum webinarSynthetic and rum webinar
Synthetic and rum webinar
SOASTA
 
Synthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance SuccessSynthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance Success
SOASTA
 
Angular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of SpeedAngular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of Speed
Ilia Idakiev
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
New Relic
 
Everybody Lies
Everybody LiesEverybody Lies
Everybody Lies
Tomasz Kowalczewski
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Tom Lous
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0
Dinis Cruz
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
Adam Englander
 
PAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan YadavPAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan Yadav
Neotys
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
Bryan Yang
 
PAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van GaalenPAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van Gaalen
Neotys
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
PROIDEA
 
Measure to fail
Measure to failMeasure to fail
Measure to fail
Tomasz Kowalczewski
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
johnpainter_id_au
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHP
Adam Englander
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
Stephan Hochhaus
 
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
Creating Modern Metadata Systems with New Relic, Dow Jones [FutureStack16]
New Relic
 
Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019Strangler Pattern in practice @PHPers Day 2019
Strangler Pattern in practice @PHPers Day 2019
Michał Kurzeja
 
Synthetic and rum webinar
Synthetic and rum webinarSynthetic and rum webinar
Synthetic and rum webinar
SOASTA
 
Synthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance SuccessSynthetic and RUM: A Recipe for Web Performance Success
Synthetic and RUM: A Recipe for Web Performance Success
SOASTA
 
Angular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of SpeedAngular server side rendering with NodeJS - In Pursuit Of Speed
Angular server side rendering with NodeJS - In Pursuit Of Speed
Ilia Idakiev
 
Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]Creating Modern Metadata Systems [FutureStack16 NYC]
Creating Modern Metadata Systems [FutureStack16 NYC]
New Relic
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Tom Lous
 
New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0New Era of Software with modern Application Security v1.0
New Era of Software with modern Application Security v1.0
Dinis Cruz
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
Adam Englander
 
PAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan YadavPAC 2020 Santorin - Gopalkrishnan Yadav
PAC 2020 Santorin - Gopalkrishnan Yadav
Neotys
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
Bryan Yang
 
PAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van GaalenPAC 2020 Santorin - Joerek Van Gaalen
PAC 2020 Santorin - Joerek Van Gaalen
Neotys
 
4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski4Developers 2015: Measure to fail - Tomasz Kowalczewski
4Developers 2015: Measure to fail - Tomasz Kowalczewski
PROIDEA
 
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWSPuppet Camp Sydney 2014 - Evolving Design Patterns in AWS
Puppet Camp Sydney 2014 - Evolving Design Patterns in AWS
johnpainter_id_au
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHP
Adam Englander
 
Ad

Recently uploaded (20)

Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)WinRAR Crack for Windows (100% Working 2025)
WinRAR Crack for Windows (100% Working 2025)
sh607827
 
Douwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License codeDouwan Crack 2025 new verson+ License code
Douwan Crack 2025 new verson+ License code
aneelaramzan63
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Download YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full ActivatedDownload YouTube By Click 2025 Free Full Activated
Download YouTube By Click 2025 Free Full Activated
saniamalik72555
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
How to Batch Export Lotus Notes NSF Emails to Outlook PST Easily?
steaveroggers
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Ad

Monitoring and Logging in Wonderland

  • 1. M O N I T O R I N G A N D L O G G I N G 
 I N W O N D E R L A N D H E L P, W H AT I S H A P P E N I N G ?
  • 2. PA U L S E I F F E RT Team Leader at Jimdo,
 Traveller, Foodie, Runner @seiffertp
 [email protected]
  • 4. W O N D E R L A N D
  • 5. • Jimdo’s internal PaaS that runs 250 services • 2500 Docker containers at a time • 600 deployments per Day W O N D E R L A N D
  • 6. W O N D E R L A N D AW S O T H E R S E R V I C E P R O V I D E R S I N F R A S T R U C T U R E A U T O M AT I O N A P I S M O N I T O R I N G , 
 L O G G I N G C L I T O O L S WONDERLAND O T H E R T O O L I N G
  • 7. W O N D E R L A N D W O N D E R L A N D A P I AW S E C S E C S A G E N T L O G G I N G 
 D A E M O N M E T R I C 
 D A E M O N EC2
 Instance
  • 8. • Your team is responsible for the software component that delivers websites of 20m customers • You are on-call this night I M A G I N E …
  • 9. 4 : 0 0 A M
  • 10. 4 : 0 1 A M
  • 11. 4 : 0 1 A M Partial outage of
 web delivery component
  • 12. • either because a health check failed • or because a metric exceeded a configured threshold PA G E R D U T Y C A L L S
  • 13. H E A LT H C H E C K S A L E RT 
 M A N A G E R P R O M E T H E U S
  • 14. • All services on Wonderland: Route53 health checks • Infrastructure components: Pingdom checks A P I H E A LT H C H E C K S GET /health
 HTTP/1.1 200 OK
  • 15. • Workers notify a health check service after each execution • Prometheus pushgateway • cronitor.io • healthchecks.io • If not notified for a certain time an alert is created W O R K E R H E A LT H C H E C K S
  • 16. Run tests against production periodically,
 monitor results, and alert on issues S E M A N T I C M O N I T O R I N G S Y N T H E T I C M O N I T O R I N G
  • 17. 4 : 1 0 A M Service still running
  • 18. S E R V I C E D A S H B O A R D
  • 19. G R A FA N A • Each service running on Wonderland automatically has a dashboard showing key metrics for debugging • Developers can create custom dashboards for more detailed analysis • Grafana pulls data from Prometheus instances
  • 20. P R O M E T H E U S • Semi-centralized metric system • Pull-based metric retrieval • On-the-fly calculation of derived metrics
  • 21. M E T R I C S I N F R A S T R U C T U R E M E T R I C S S Y S T E M M E T R I C S A P P L I C AT I O N M E T R I C S
  • 22. I N F R A S T R U C T U R E M E T R I C S P R O M E T H E U S C L O U D WAT C H E X P O RT E R AW S C U S T O M E X P O RT E R S W O N D E R L A N D A P I S
  • 23. E X A M P L E S aws_autoscaling_group_desired_capacity_average{ auto_scaling_group_name="crims",
 job="cloudwatch_exporter"
 } aws_elb_request_count_sum{
 cluster=“crims",
 job="wonderland_elb_exporter",
 service_name="web-prod"
 }
  • 24. S Y S T E M M E T R I C S P R O M E T H E U S C O L L E C T D C A D V I S O R
  • 25. E X A M P L E S container_memory_rss{
 container_label_cluster="crims",
 container_label_container_name="web-prod--web",
 image="web-prod:abc123",
 instance="10.8.4.91:9104",
 job=“crims_cadvisor_metrics"
 } collectd_memory{
 instance="10.8.4.42:9103",
 job="crims_collectd_metrics",
 memory="free"
 }
  • 26. A P P L I C AT I O N M E T R I C S P R O M E T H E U S C O N TA I N E R A C O N TA I N E R B … GET /metrics
  • 27. P R O M E T H E U S C O N TA I N E R A C O N TA I N E R B … W O N D E R L A N D S E R V I C E D I S C O V E RY W O N D E R L A N D A P I update
 config locate
 
 containers scrape
 metrics and
 reload S E R V I C E D I S C O V E RY D O W N L O A D E R get scrape
 targets
  • 28. M E T R I C R E T E N T I O N
  • 29. http_requests_total{instance=“10.8.3.101:80”} = 53
 http_requests_total{instance=“10.8.3.102:80”} = 81
 http_requests_total{instance=“10.8.3.103:80”} = 2 ... job:http_requests_total:sum = sum(http_requests_total) without (instance) Automatically generated recording rules:
  • 30. L O N G - T E R M - P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics 'match[]': - '{job="application_metrics", instance=""}' 32 DAYS 30 MIN F E D E R AT I O N
  • 31. L O N G - T E R M - P R O M E T H E U S S H O RT- T E R M 
 P R O M E T H E U S scrape
 
 filtered metrics http_requests_total{instance=“10.8.3.101:80”}
 http_requests_total{instance=“10.8.3.102:80”}
 http_requests_total{instance=“10.8.3.103:80”}
 ...
 job:http_requests_total:sum{} job:http_requests_total:sum{}
  • 32. S E R V I C E D A S H B O A R D
  • 33. 4 : 1 2 A M Auto-Scaling broken
  • 34. L E T ’ S TA K E A L O O K AT T H E L O G S
  • 35. • Centralised logging is a must-have in a distributed system • It should be very easy to gather all information that concerns a service C E N T R A L I S E D L O G G I N G
  • 36. • Output of all services running on Wonderland is stored centrally • Optionally logs are parsed with configurable formats C E N T R A L I S E D L O G G I N G $ cat wonderland.yaml
 --- components: - name image: my-nginx-image logging: types: - access_log - error_log_nginx
  • 37. C E N T R A L I S E D L O G G I N G D O C K E R L O G B E AT L O G Z . I O fluentd
 
 protocol lumberjack
 
 protocol Wonderland Logbeat • receives logs via fluent protocol, • parses them, • adds metadata, • and streams them to our logging provider logz.io
  • 39. T H E T R U T H D O C K E R L O G B E AT L O G Z . I O fluentd
 
 protocol lumberjack
 
 protocol
  • 40. T H E T R U T H D O C K E R L O G B E AT L O G Z . I O fluentd lumberjack D O C K E R L O G - S T R E A M PA P E RT R A I L . C O M syslog We are in a migration right now.
  • 41. 4 : 1 7 A M You find this log message of the service autoscaler: Unable to scale-out service “web- delivery”. Configured maximum number of instances reached.
  • 42. 4 : 1 7 A M You increase the maximum number of instances: $ cat wonderland.yaml 
 […]
 auto-scaling:
 min-instances: 60
 max-instances: 150
  • 43. 4 : 2 0 A M Back to bed
  • 44. 2 : 0 0 P M In the PMA for this night’s incident, you create the action item to Monitor the number of instances of web-delivery to detect potential breaches of auto-scaling limits before affecting the system’s health
  • 45. Q U E S T I O N S ?
  • 46. T H A N K Y O U
  • 47. Open positions: • Senior Infrastructure Engineer • Senior Backend Engineer • Senior Frontend Engineer [email protected]
  • 48. F U RT H E R R E A D I N G / S O U R C E S • Beyer, Jones, Petoff & Murphy
 Site Reliability Engineering • Susan Fowler
 Production-Ready Microservices • Sam Newman
 Building Microservices • Stripe / Increment
 On-Call (https://increment.com/on-call/) • Mathias Lafeldt & Paul Seiffert
 A Journey Through Wonderland
 (https://speakerdeck.com/mlafeldt/a-journey-through-wonderland)
  • 49. F O T O S • Marcel Stockmann
 https://www.flickr.com/photos/marcelstockmann/33068471286 • Michael Theis
 https://www.flickr.com/photos/huskyte/6931056896