SlideShare a Scribd company logo
Distributed tracing
- Jaeger 101 -
About me
● Worked at eBay
● Worked at Forter as a backend engineer.
● Joined Rookout as a first developer and production engineer
● @itielshwartz on both Github and Twitter
● Also have a personal blog at: https://etlsh.com
Agenda
Intro:
1. State of mind for this Meetup (Super important!)
2. What is Distributed tracing, do i need it?
3. What is open tracing?
4. What is jaeger?
Zero to hero using Jaeger:
1. hello-world example
2. Jaeger terminology
3. Full blown distributed app
Wrap up
1. Demo wrap up
2. Jaeger architecture
3. Opentracing Secret ability
Before we begin (State of mind)
● The system will fail
● Your code is not perfect
● Other people code is even less perfect
● Practice new tools at daytime, don’t start using them in crisis mode
● The system will fail
● Each minute you spend adding logs and metrics can reducde your Mean Time to Resolve (MTTR)
● Keep in mind the developer that’s going to get a pager isn’t the one that wrote the code
● Try to be nice to him - he is going to need it
● The system will fail
As you can probably see i (tried) to emphasize the fact that your system is going to fail, this DOESN'T mean i think you
write bad code - only that we usually have much more trust in our code/infra then we should :)
What is distributed tracing?
With distributed tracing, we can track requests as they pass through multiple services, emitting timing and other metadata
throughout, and this information can then be reassembled to provide a complete picture of the application’s behavior at
runtime - buoyant
Mental model of distributed tracing - Opentracing
Do i need distributed tracing?
As companies move from monolithic to multi-service architectures, existing techniques for debugging and profiling begin to
break down.
Previously, troubleshooting could be accomplished by isolating a single instance of the monolith and reproducing the
problem.
With microservices, this approach is no longer feasible, because no single service provides a complete picture of the
performance or correctness of the application as a whole.
We need new tools to help us manage the real complexity of operating distributed systems at scale. - buoyant
What is opentracing?
The problem is that distributed tracing has long harbored a dirty secret: the necessary source code instrumentation has
been complex, fragile, and difficult to maintain.
This is the problem that OpenTracing solves.
Through standard, consistent APIs in many languages (Java, Javascript, Go, Python, C#, others), the OpenTracing project
gives developers clean, declarative, testable, and vendor-neutral instrumentation.
OpenTracing has focused on standards for explicit software instrumentation.
Distributed tracing 101
What is Jaeger?
Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by
Uber Technologies.
It can be used for monitoring microservices-based distributed systems:
● Distributed context propagation
● Distributed transaction monitoring
● Root cause analysis
● Service dependency analysis
● Performance / latency optimization
Getting started - The Monolith
https://github.com/itielshwartz/jaeger-hello-world/tree/step-1-the-monolith
https://github.com/itielshwartz/jaeger-hello-world/tree/step-2-the-monolith-going-wild
Getting started - Monolith going wild
Jaeger terminology - Span/ Trace
Span
A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the
duration. Spans may be nested and ordered to model causal relationships.
Trace
A trace is a data/execution path through the system, and can be thought of as a directed acyclic graph of spans.
Jaeger terminology - Span/ Trace
https://github.com/itielshwartz/jaeger-hello-world/tree/step-3-adding-jaeger
Getting started - Adding Jaeger
https://github.com/itielshwartz/jaeger-hello-world/tree/step-4-multiple-spans
Config Jaeger part II - Multiple spans
Jaeger architecture - Tag/Log
The recommended solution is to annotate spans with tags or logs.
Tag:
A tag is a key-value pair that provides certain metadata about the span.
Log:
A log is similar to a regular log statement, it contains a timestamp and some data, but it is associated with span from which
it was logged.
When and why?
When should we use tags vs. logs? The tags are meant to describe attributes of the span that apply to the whole duration of
the span. For example, if a span represents an HTTP request, then the URL of the request should be recorded as a tag
because it does not make sense to think of the URL as something that's only relevant at different points in time on the span.
On the other hand, if the server responded with a redirect URL, logging it would make more sense since there is a clear
timestamp associated with such event. The OpenTracing Specification provides guidelines called Semantic Conventions for
recommended tags and log fields.
https://github.com/yurishkuro/opentracing-tutorial/tree/master/python/lesson01#annotate-the-trace-with-tags-and-logs
https://github.com/itielshwartz/jaeger-hello-world/tree/step-5-tags-and-logs
Config Jaeger part III - Tags and Log
Until now we had single server (what kind of defy the purpose of distributed tracing).
Now let’s split our monolith into small parts - we will still have a main server (customer facing) but not we will split
get_repo_contributors And clean_github_data Into two different service.
Get_repo_contributors - Will be a flask server (same as our main)
Clean_github_data - Will Consume data from redis (pushed to it by the master)
So basically it’s going to look like this
Going distributed
Main
Get_repo_
contributors
Main Main
Clean_github_
data
Redis
Polling Write
https://github.com/itielshwartz/jaeger-hello-world/tree/step-6-distribute-single-span
Going distributed - Single span
https://github.com/itielshwartz/jaeger-hello-world/tree/step-7-distribute-multiple-spans
Going distributed - Multiple span
We now have successfully transformed a monolith beast into a set of small microservices - without losing visibility.
The nice thing about opentracing is that it allow us to move from jaeger to datadog to other solution without (almost)
needing to rewrite our code.
The other cool thing about it is that you don’t need to do everything i just did in this demo!
There are official wrappers for most of the common framework those tools allow you you to integrate with opentracing and
jager without needing to think about “how do i pass the headers inside the request?” or “ how do i read the headers to start
a new span?”
Examples”
● urllib2
● requests
● SQLAlchemy
Demo wrap up
● MySQLdb
● Tornado
HTTP client
● redis
● Flask
● Django
● More
Jaeger Architecture
Agent
The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the collector. It
is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the
collectors away from the client.
Collector
The Jaeger collector receives traces from Jaeger agents and runs them through a processing pipeline. Currently our
pipeline validates traces, indexes them, performs any transformations, and finally stores them.
Jaeger’s storage is a pluggable component which currently supports Cassandra and ElasticSearch.
Query
Query is a service that retrieves traces from storage and hosts a UI to display them.
Jaeger Architecture
Opentracing Secret ability
Context propagation
With OpenTracing instrumentation in place, we can support general purpose distributed context propagation where we
associate some metadata with the transaction and make that metadata available anywhere in the distributed call graph. In
OpenTracing this metadata is called baggage, to highlight the fact that it is carried over in-band with all RPC requests, just
like baggage. opentracing-tutorial
The client may use the Baggage to pass additional data to the server and any other downstream server it might call.
# client side
span.context.set_baggage_item('auth-token', '.....')
# server side (one or more levels down from the client)
token = span.context.get_baggage_item('auth-token')
Questions?

More Related Content

What's hot (20)

PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci
 
PPTX
Observability
Enes Altınok
 
PPTX
OpenTelemetry For Operators
Kevin Brockhoff
 
PDF
Messaging queue - Kafka
Mayank Bansal
 
PDF
Tracing Micro Services with OpenTracing
Hemant Kumar
 
PDF
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
PDF
Elasticsearch
Shagun Rathore
 
PPTX
Kafka 101
Clement Demonchy
 
ODP
Deep Dive Into Elasticsearch
Knoldus Inc.
 
PDF
Linking Metrics to Logs using Loki
Knoldus Inc.
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
PDF
Elk devops
Ideato
 
PDF
Deploying Confluent Platform for Production
confluent
 
PPTX
Centralized Logging System Using ELK Stack
Rohit Sharma
 
PDF
Running distributed tests with k6.pdf
LibbySchulze
 
PPTX
Elastic - ELK, Logstash & Kibana
SpringPeople
 
PDF
Implementing Domain Events with Kafka
Andrei Rugina
 
Introduction to Apache Kafka
AIMDek Technologies
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci
 
Observability
Enes Altınok
 
OpenTelemetry For Operators
Kevin Brockhoff
 
Messaging queue - Kafka
Mayank Bansal
 
Tracing Micro Services with OpenTracing
Hemant Kumar
 
Pinot: Near Realtime Analytics @ Uber
Xiang Fu
 
Elasticsearch
Shagun Rathore
 
Kafka 101
Clement Demonchy
 
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Linking Metrics to Logs using Loki
Knoldus Inc.
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
Elk devops
Ideato
 
Deploying Confluent Platform for Production
confluent
 
Centralized Logging System Using ELK Stack
Rohit Sharma
 
Running distributed tests with k6.pdf
LibbySchulze
 
Elastic - ELK, Logstash & Kibana
SpringPeople
 
Implementing Domain Events with Kafka
Andrei Rugina
 

Similar to Distributed tracing 101 (20)

PDF
Distributed Tracing
distributedtracing
 
PDF
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Yuri Shkuro
 
PPTX
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps_Fest
 
PDF
Opentracing 101
HungWei Chiu
 
PDF
Microservices observability
Maxim Shelest
 
PPTX
Tech talk microservices debugging
Andrey Kolodnitsky
 
PPTX
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Lohika_Odessa_TechTalks
 
PDF
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Max Klyga
 
PDF
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
⛑ Pavol Loffay
 
PDF
Distributed tracing
Puneeth Nanjundaswamy
 
PDF
Distributed tracing for big systems
Nikolay Stoitsev
 
PDF
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
PDF
stackconf 2024 | Ignite: Distributed Tracing using OpenTelemetry and Jaeger b...
NETWAYS
 
PDF
Jaeger Integration with Spring Cloud
Inexture Solutions
 
PPTX
Keep Calm and Distributed Tracing
Angelo Simone Scotto
 
PDF
Microservices in Node.js: Patterns and techniques
The Software House
 
PDF
OSDC 2018 - Distributed monitoring
Gianluca Arbezzano
 
PDF
OSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
NETWAYS
 
PDF
Open Tracing, to order and understand your mess. - ApiConf 2017
Gianluca Arbezzano
 
PDF
Diagnose Your Microservices
Marcus Hirt
 
Distributed Tracing
distributedtracing
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Yuri Shkuro
 
DevOps Fest 2020. Андрій Шабалін. Distributed Tracing for microservices with ...
DevOps_Fest
 
Opentracing 101
HungWei Chiu
 
Microservices observability
Maxim Shelest
 
Tech talk microservices debugging
Andrey Kolodnitsky
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Lohika_Odessa_TechTalks
 
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Max Klyga
 
Jaeger and OpenTracing Cloud Native Computing (CNCF) meetup Zurich
⛑ Pavol Loffay
 
Distributed tracing
Puneeth Nanjundaswamy
 
Distributed tracing for big systems
Nikolay Stoitsev
 
Monitoring&Logging - Stanislav Kolenkin
Kuberton
 
stackconf 2024 | Ignite: Distributed Tracing using OpenTelemetry and Jaeger b...
NETWAYS
 
Jaeger Integration with Spring Cloud
Inexture Solutions
 
Keep Calm and Distributed Tracing
Angelo Simone Scotto
 
Microservices in Node.js: Patterns and techniques
The Software House
 
OSDC 2018 - Distributed monitoring
Gianluca Arbezzano
 
OSDC 2018 | Distributed Monitoring by Gianluca Arbezzano
NETWAYS
 
Open Tracing, to order and understand your mess. - ApiConf 2017
Gianluca Arbezzano
 
Diagnose Your Microservices
Marcus Hirt
 
Ad

Recently uploaded (20)

PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Digital Circuits, important subject in CS
contactparinay1
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Ad

Distributed tracing 101

  • 2. About me ● Worked at eBay ● Worked at Forter as a backend engineer. ● Joined Rookout as a first developer and production engineer ● @itielshwartz on both Github and Twitter ● Also have a personal blog at: https://etlsh.com
  • 3. Agenda Intro: 1. State of mind for this Meetup (Super important!) 2. What is Distributed tracing, do i need it? 3. What is open tracing? 4. What is jaeger? Zero to hero using Jaeger: 1. hello-world example 2. Jaeger terminology 3. Full blown distributed app Wrap up 1. Demo wrap up 2. Jaeger architecture 3. Opentracing Secret ability
  • 4. Before we begin (State of mind) ● The system will fail ● Your code is not perfect ● Other people code is even less perfect ● Practice new tools at daytime, don’t start using them in crisis mode ● The system will fail ● Each minute you spend adding logs and metrics can reducde your Mean Time to Resolve (MTTR) ● Keep in mind the developer that’s going to get a pager isn’t the one that wrote the code ● Try to be nice to him - he is going to need it ● The system will fail As you can probably see i (tried) to emphasize the fact that your system is going to fail, this DOESN'T mean i think you write bad code - only that we usually have much more trust in our code/infra then we should :)
  • 5. What is distributed tracing? With distributed tracing, we can track requests as they pass through multiple services, emitting timing and other metadata throughout, and this information can then be reassembled to provide a complete picture of the application’s behavior at runtime - buoyant Mental model of distributed tracing - Opentracing
  • 6. Do i need distributed tracing? As companies move from monolithic to multi-service architectures, existing techniques for debugging and profiling begin to break down. Previously, troubleshooting could be accomplished by isolating a single instance of the monolith and reproducing the problem. With microservices, this approach is no longer feasible, because no single service provides a complete picture of the performance or correctness of the application as a whole. We need new tools to help us manage the real complexity of operating distributed systems at scale. - buoyant
  • 7. What is opentracing? The problem is that distributed tracing has long harbored a dirty secret: the necessary source code instrumentation has been complex, fragile, and difficult to maintain. This is the problem that OpenTracing solves. Through standard, consistent APIs in many languages (Java, Javascript, Go, Python, C#, others), the OpenTracing project gives developers clean, declarative, testable, and vendor-neutral instrumentation. OpenTracing has focused on standards for explicit software instrumentation.
  • 9. What is Jaeger? Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system released as open source by Uber Technologies. It can be used for monitoring microservices-based distributed systems: ● Distributed context propagation ● Distributed transaction monitoring ● Root cause analysis ● Service dependency analysis ● Performance / latency optimization
  • 10. Getting started - The Monolith https://github.com/itielshwartz/jaeger-hello-world/tree/step-1-the-monolith
  • 12. Jaeger terminology - Span/ Trace Span A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships. Trace A trace is a data/execution path through the system, and can be thought of as a directed acyclic graph of spans.
  • 13. Jaeger terminology - Span/ Trace
  • 16. Jaeger architecture - Tag/Log The recommended solution is to annotate spans with tags or logs. Tag: A tag is a key-value pair that provides certain metadata about the span. Log: A log is similar to a regular log statement, it contains a timestamp and some data, but it is associated with span from which it was logged. When and why? When should we use tags vs. logs? The tags are meant to describe attributes of the span that apply to the whole duration of the span. For example, if a span represents an HTTP request, then the URL of the request should be recorded as a tag because it does not make sense to think of the URL as something that's only relevant at different points in time on the span. On the other hand, if the server responded with a redirect URL, logging it would make more sense since there is a clear timestamp associated with such event. The OpenTracing Specification provides guidelines called Semantic Conventions for recommended tags and log fields. https://github.com/yurishkuro/opentracing-tutorial/tree/master/python/lesson01#annotate-the-trace-with-tags-and-logs
  • 18. Until now we had single server (what kind of defy the purpose of distributed tracing). Now let’s split our monolith into small parts - we will still have a main server (customer facing) but not we will split get_repo_contributors And clean_github_data Into two different service. Get_repo_contributors - Will be a flask server (same as our main) Clean_github_data - Will Consume data from redis (pushed to it by the master) So basically it’s going to look like this Going distributed Main Get_repo_ contributors Main Main Clean_github_ data Redis Polling Write
  • 21. We now have successfully transformed a monolith beast into a set of small microservices - without losing visibility. The nice thing about opentracing is that it allow us to move from jaeger to datadog to other solution without (almost) needing to rewrite our code. The other cool thing about it is that you don’t need to do everything i just did in this demo! There are official wrappers for most of the common framework those tools allow you you to integrate with opentracing and jager without needing to think about “how do i pass the headers inside the request?” or “ how do i read the headers to start a new span?” Examples” ● urllib2 ● requests ● SQLAlchemy Demo wrap up ● MySQLdb ● Tornado HTTP client ● redis ● Flask ● Django ● More
  • 23. Agent The Jaeger agent is a network daemon that listens for spans sent over UDP, which it batches and sends to the collector. It is designed to be deployed to all hosts as an infrastructure component. The agent abstracts the routing and discovery of the collectors away from the client. Collector The Jaeger collector receives traces from Jaeger agents and runs them through a processing pipeline. Currently our pipeline validates traces, indexes them, performs any transformations, and finally stores them. Jaeger’s storage is a pluggable component which currently supports Cassandra and ElasticSearch. Query Query is a service that retrieves traces from storage and hosts a UI to display them. Jaeger Architecture
  • 24. Opentracing Secret ability Context propagation With OpenTracing instrumentation in place, we can support general purpose distributed context propagation where we associate some metadata with the transaction and make that metadata available anywhere in the distributed call graph. In OpenTracing this metadata is called baggage, to highlight the fact that it is carried over in-band with all RPC requests, just like baggage. opentracing-tutorial The client may use the Baggage to pass additional data to the server and any other downstream server it might call. # client side span.context.set_baggage_item('auth-token', '.....') # server side (one or more levels down from the client) token = span.context.get_baggage_item('auth-token')