SlideShare a Scribd company logo
PREDICTIVE
DATACENTER ANALYTICS
WITH STRYMON
Vasia Kalavri
kalavriv@inf.ethz.ch
Support:
QCon San Francisco
14 November 2017
ABOUT ME
▸ Postdoc at ETH Zürich
▸ Systems Group: https://www.systems.ethz.ch/
▸ PMC member of Apache Flink
▸ Research interests
▸ Large-scale graph processing
▸ Streaming dataflow engines
▸ Current project
▸ Predictive datacenter analytics and
management
2
@vkalavri
DATACENTER
MANAGEMENT
https://code.facebook.com/posts/1499322996995183/solving-the-mystery-of-link-imbalance-a-metastable-failure-state-at-scale/
4
5
https://www.theregister.co.uk/2017/09/28/amadeus_booking_software_outages_lead_to_global_delayed_flights/
DATACENTER MANAGEMENT
66
Workloadfluctuations
Network failures
Configuration
updates
Software updatesResource scaling
Service
deployment
DATACENTER MANAGEMENT
6
Can we predict the effect of changes?
6
Workloadfluctuations
Network failures
Configuration
updates
Software updatesResource scaling
Service
deployment
DATACENTER MANAGEMENT
6
Can we predict the effect of changes?
Can we prevent catastrophic faults?
6
Workloadfluctuations
Network failures
Configuration
updates
Software updatesResource scaling
Service
deployment
Predicting outcomes under hypothetical conditions
What-if Analysis:
7
Predicting outcomes under hypothetical conditions
‣ How will response time change if we migrate a large service?
‣ What will happen to link utilization if we change the routing protocol costs?
‣ Will SLOs will be violated if a certain switch fails?
‣ Which services will be affected if we change load balancing strategy?
What-if Analysis:
7
Test deployment?
Data Center Test Cluster
▸ expensive to operate and maintain
▸ some errors only occur in a large scale!
a physical small-scale cluster
to try out configuration changes and what-ifs
8
Analytical model?
Data Center
infer workload distribution from samples and analytically
model system components (e.g. disk failure rate)
▸ hard to develop, large design space to explore
▸ often inaccurate
9
MODERN ENTERPRISE
DATACENTERS ARE ALREADY
HEAVILY INSTRUMENTED
10
Trace-driven online simulation
Use existing instrumentation to build a datacenter model
▸ construct DC state from real events
▸ simulate the state we cannot directly observe
11
Data Center
current DC
state
forked what-if
DC state
forked what-if
DC state
forked what-if
DC state
1212
traces, configuration,
topology updates, …
Datacenter
Strymon
queries, complex analytics,
simulations, …
policy enforcement,
what-if scenarios, …
STRYMON: ONLINE DATACENTER ANALYTICS AND MANAGEMENT
Datacenter
state
event streams
strymon.systems.ethz.ch
TIMELY DATAFLOW
ingress egress
feedback
STRYMON’S OPERATIONAL REQUIREMENTS
‣ Low latency: react quickly to network failures
‣ High throughput: keep up with high stream rates
‣ Iterative computation: complex graph analytics on the
network topology
‣ Incremental computation: reuse already computed results
when possible
‣ e.g. do not recompute forwarding rules after a link update
14
TIMELY DATAFLOW: STREAM PROCESSING IN RUST
15
▸ Data-parallel computations
▸ Arbitrary cyclic dataflows
▸ Logical timestamps (epochs)
▸ Asynchronous execution
▸ Low latency
D. Murray, F. McSherry, M. Isard, R. Isaacs, P. Barham, M. Abadi.
Naiad: A Timely Dataflow System. In SOSP, 2013.
15
https://github.com/frankmcsherry/timely-dataflow
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
initialize and run
a timely job
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
create input and
progress handles
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
define the dataflow
and its operators
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
watch for
progress
WORDCOUNT IN TIMELY
fn main() {

timely::execute_from_args(std::env::args(), |worker| {



let mut input = InputHandle::new();

let mut probe = ProbeHandle::new();

let index = worker.index();



worker.dataflow(|scope| {

input.to_stream(scope)

.flat_map(|text: String|

text.split_whitespace()

.map(move |word| (word.to_owned(), 1))

.collect::<Vec<_>>()

)

.aggregate(

|_key, val, agg| { *agg += val; },

|key, agg: i64| (key, agg),

|key| hash_str(key)

)

.inspect(|data| println!("seen {:?}", data))

.probe_with(&mut probe);

});

//feed data
…

}).unwrap();

}
16
a few Rust
peculiarities to
get used to :-)
PROGRESS TRACKING
▸ All tuples bear a logical timestamp (think event time)
▸ To send a timestamped tuple, an operator must hold a
capability for it
▸ Workers broadcast progress changes to other workers
▸ Each worker independently determines progress
17
A distributed protocol that allows operators reason
about the possibility of receiving data
MAKING PROGRESS
fn main() {

timely::execute_from_args(std::env::args(), |worker| {

…

…
…
for round in 0..10 {

input.send(("round".to_owned(), 1));

input.advance_to(round + 1);

while probe.less_than(input.time()) {

worker.step();

}
}

}).unwrap();

}
18
MAKING PROGRESS
fn main() {

timely::execute_from_args(std::env::args(), |worker| {

…

…
…
for round in 0..10 {

input.send(("round".to_owned(), 1));

input.advance_to(round + 1);

while probe.less_than(input.time()) {

worker.step();

}
}

}).unwrap();

}
18
push data to the
input stream
MAKING PROGRESS
fn main() {

timely::execute_from_args(std::env::args(), |worker| {

…

…
…
for round in 0..10 {

input.send(("round".to_owned(), 1));

input.advance_to(round + 1);

while probe.less_than(input.time()) {

worker.step();

}
}

}).unwrap();

}
18
advance the
input epoch
MAKING PROGRESS
fn main() {

timely::execute_from_args(std::env::args(), |worker| {

…

…
…
for round in 0..10 {

input.send(("round".to_owned(), 1));

input.advance_to(round + 1);

while probe.less_than(input.time()) {

worker.step();

}
}

}).unwrap();

}
18
do work while there’s
still data for this epoch
TIMELY ITERATIONS
timely::example(|scope| {



let (handle, stream) = scope.loop_variable(100, 1);

(0..10).to_stream(scope)

.concat(&stream)

.inspect(|x| println!("seen: {:?}", x))

.connect_loop(handle);

});
19
TIMELY ITERATIONS
timely::example(|scope| {



let (handle, stream) = scope.loop_variable(100, 1);

(0..10).to_stream(scope)

.concat(&stream)

.inspect(|x| println!("seen: {:?}", x))

.connect_loop(handle);

});
19
loop 100 times
at most
advance
timestamps by 1
in each iteration
create the
feedback loop
TIMELY ITERATIONS
timely::example(|scope| {



let (handle, stream) = scope.loop_variable(100, 1);

(0..10).to_stream(scope)

.concat(&stream)

.inspect(|x| println!("seen: {:?}", x))

.connect_loop(handle);

});
19
loop 100 times
at most
advance
timestamps by 1
in each iteration
create the
feedback loop
t
(t, l1)
(t, (l1, l2))
TIMELY & I
20
TIMELY & I
20
Relationship status:
it’s complicated
TIMELY & I
20
Relationship status:
it’s complicated
APIs &
libraries
performance
ecosystem
fault-tolerance
deployment
debugging
expressiveness
incremental
computation
performance
STRYMON
USE-CASES
What-if analysis
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Strymon
22
‣ Query and analyze state online
‣ Control and enforce configuration
‣ Simulate what-if scenarios
‣ Understand performance
What-if analysis
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Strymon
23
‣ Query and analyze state online
‣ Control and enforce configuration
‣ Simulate what-if scenarios
‣ Understand performance
RECONSTRUCTING USER SESSIONS
24
Application A
Application B
A.1
A.2
A.3
B.1
B.2
B.3
B.4
Time: 2015/09/01 10:03:38.599859
Session ID: XKSHSKCBA53U088FXGE7LD8
Transaction ID: 26-3-11-5-1
PERFORMANCE RESULTS
25
Log event
A
B
1 2
1-1 1-2
1
1-3
1-1
Client Time
1
1-1 1-2
2
Inactivity
Transaction IDn-2-8Transaction
‣ Logs from 1263
streams and 42 servers
‣ 1.3 million events/s at
424.3 MB/s
‣ 26ms per epoch vs. 2.1s
per epoch with Flink
Zaheer Chothia, John Liagouris, Desislava Dimitrova, and Timothy Roscoe.
Online Reconstruction of Structural Information from Datacenter Logs. (EuroSys '17).
More Results:
What-if analysis
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Strymon
26
‣ Query and analyze state online
‣ Control and enforce configuration
‣ Simulate what-if scenarios
‣ Understand performance
ROUTING AS A STREAMING COMPUTATION
Network
changes
G R
delta-join
iteration
Updated rules
Forwarding 

rules
27
‣ Compute APSP and keep forwarding rules as operator state
‣ Flow requests translate to lookups on that state
‣ Network updates cause re-computation of affected rules only
REACTION TO LINK FAILURES
▸ Fail random link
▸ 500 individual runs
▸ 32 threads
28
Desislava C. Dimitrova et. al.
Quick Incremental Routing Logic for Dynamic Network Graphs. SIGCOMM Posters and Demos '17
More Results:
What-if analysis
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Strymon
29
‣ Query and analyze state online
‣ Control and enforce configuration
‣ Simulate what-if scenarios
‣ Understand performance
ONLINE CRITICAL PATH ANALYSIS
30
input stream output stream
periodic
snapshot
trace snapshot
stream
analyzer
performance
summaries
Apache Flink,
Apache Spark,
TensorFlow,
Heron,
Timely Dataflow
TASK SCHEDULING IN APACHE SPARK
31
DRIVER
W1
W2
W3
Venkataraman, Shivaram, et al. "Drizzle: Fast and adaptable stream processing at scale." Spark Summit (2016).
SCHEDULING BOTTLENECK IN APACHE SPARK
32
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional ProfilingStrymon Profiling
Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots
SCHEDULING BOTTLENECK IN APACHE SPARK
32
0 5 10 15
Snapshot
0.0
0.2
0.4
0.6
0.8
CP
0 5 10 15
Snapshot
%weight
Processing Scheduling
Conventional ProfilingStrymon Profiling
Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots
What-if analysis
Real-time
datacenter analytics
Incremental
network routing
Online critical
path analysis
Strymon
33
‣ Query and analyze state online
‣ Control and enforce configuration
‣ Simulate what-if scenarios
‣ Understand performance
EVALUATING LOAD BALANCING STRATEGIES
▸ 13k services
▸ ~100K user requests/s
▸ OSPF routing
▸ Weighted Round-Robin
load balancing
34
Traffic matrix simulation
under topology and load
balancing changes
WHAT-IF DATAFLOW
35
Logging Servers
Topology Discovery
Configuration DBs
RPC logs
(text)
Topology changes
(graph)
Device specs,

application placement
(relational)
ETL Load Balancing Routing
LB View
Construction
Routing View
Construction
Traffic Matrix
Construction
DC Model
construction
STRYMON DEMO
Strymon 0.1.0 has been released!
https://strymon-system.github.io
https://github.com/strymon-system
strymon-users@lists.inf.ethz.ch
Try it out:
Send us feedback:
37
THE STRYMON TEAM & FRIENDS
38
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz HoffmannDesislava Dimitrova
John Liagouris
Frank McSherry
THE STRYMON TEAM & FRIENDS
38
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz HoffmannDesislava Dimitrova
John Liagouris
? ?
Frank McSherry
THE STRYMON TEAM & FRIENDS
38
Vasiliki Kalavri
Zaheer Chothia
Andrea Lattuada
Prof. Timothy Roscoe
Sebastian Wicki
Moritz HoffmannDesislava Dimitrova
John Liagouris
? ?
Frank McSherry
IT COULD BE YOU!
strymon.systems.ethz.ch
PREDICTIVE
DATACENTER ANALYTICS
WITH STRYMON
Vasia Kalavri
kalavriv@inf.ethz.ch
Support:
QCon San Francisco
14 November 2017

More Related Content

What's hot (20)

PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PDF
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward
 
PDF
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
PPTX
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
PDF
Data Stream Analytics - Why they are important
Paris Carbone
 
PDF
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
PDF
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
PDF
Self-managed and automatically reconfigurable stream processing
Vasia Kalavri
 
PDF
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
PPTX
Michael Häusler – Everyday flink
Flink Forward
 
PDF
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PPTX
Apache Flink Training: System Overview
Flink Forward
 
PPTX
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
PDF
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Vasia Kalavri
 
PPTX
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PDF
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward
 
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Deep Stream Dynamic Graph Analytics with Grapharis - Massimo Perini
Flink Forward
 
Apache Flink Internals: Stream & Batch Processing in One System – Apache Flin...
ucelebi
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
Data Stream Analytics - Why they are important
Paris Carbone
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Till Rohrmann
 
Flink Gelly - Karlsruhe - June 2015
Andra Lungu
 
Self-managed and automatically reconfigurable stream processing
Vasia Kalavri
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Till Rohrmann
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Michael Häusler – Everyday flink
Flink Forward
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Apache Flink Training: System Overview
Flink Forward
 
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
Block Sampling: Efficient Accurate Online Aggregation in MapReduce
Vasia Kalavri
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 

Similar to Predictive Datacenter Analytics with Strymon (20)

PPTX
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
PPTX
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Vincenzo Gulisano
 
PPTX
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
PDF
IPT Reactive Java IoT Demo - BGOUG 2018
Trayan Iliev
 
PDF
Spring 5 Webflux - Advances in Java 2018
Trayan Iliev
 
PDF
Flink meetup
Frank McSherry
 
PDF
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
PDF
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
PDF
Microservices with Spring 5 Webflux - jProfessionals
Trayan Iliev
 
PDF
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 
PDF
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
PDF
Scalability truths and serverless architectures
Regunath B
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PPTX
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
PDF
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
PPTX
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
PDF
Reactive Microservices with Spring 5: WebFlux
Trayan Iliev
 
PPTX
Software architecture for data applications
Ding Li
 
PPTX
Concurrency Constructs Overview
stasimus
 
PDF
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Vincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
IPT Reactive Java IoT Demo - BGOUG 2018
Trayan Iliev
 
Spring 5 Webflux - Advances in Java 2018
Trayan Iliev
 
Flink meetup
Frank McSherry
 
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Microservices with Spring 5 Webflux - jProfessionals
Trayan Iliev
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Jonas Bonér
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Scalability truths and serverless architectures
Regunath B
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
Reactive Microservices with Spring 5: WebFlux
Trayan Iliev
 
Software architecture for data applications
Ding Li
 
Concurrency Constructs Overview
stasimus
 
Marton Balassi – Stateful Stream Processing
Flink Forward
 
Ad

More from Vasia Kalavri (10)

PDF
From data stream management to distributed dataflows and beyond
Vasia Kalavri
 
PDF
Online performance analysis of distributed dataflow systems (O'Reilly Velocit...
Vasia Kalavri
 
PDF
The shortest path is not always a straight line
Vasia Kalavri
 
PDF
Like a Pack of Wolves: Community Structure of Web Trackers
Vasia Kalavri
 
PDF
Big data processing systems research
Vasia Kalavri
 
PDF
m2r2: A Framework for Results Materialization and Reuse
Vasia Kalavri
 
PDF
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri
 
PDF
A Skype case study (2011)
Vasia Kalavri
 
PDF
Apache Flink Deep Dive
Vasia Kalavri
 
PDF
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
From data stream management to distributed dataflows and beyond
Vasia Kalavri
 
Online performance analysis of distributed dataflow systems (O'Reilly Velocit...
Vasia Kalavri
 
The shortest path is not always a straight line
Vasia Kalavri
 
Like a Pack of Wolves: Community Structure of Web Trackers
Vasia Kalavri
 
Big data processing systems research
Vasia Kalavri
 
m2r2: A Framework for Results Materialization and Reuse
Vasia Kalavri
 
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri
 
A Skype case study (2011)
Vasia Kalavri
 
Apache Flink Deep Dive
Vasia Kalavri
 
Large-scale graph processing with Apache Flink @GraphDevroom FOSDEM'15
Vasia Kalavri
 
Ad

Recently uploaded (20)

PPTX
fashion industry boom.pptx an economics project
TGMPandeyji
 
PDF
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
PDF
Dr. Robert Krug - Chief Data Scientist At DataInnovate Solutions
Dr. Robert Krug
 
PPTX
Pre-Interrogation_Assessment_Presentation.pptx
anjukumari94314
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
Data Analysis for Business - make informed decisions, optimize performance, a...
Slidescope
 
PDF
NRRM 200 Statistics on Bycatch's Effects on Marine Mammals Slideshow.pdf
Rowan Sales
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PDF
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
PPTX
原版定制AIM毕业证(澳大利亚音乐学院毕业证书)成绩单底纹防伪如何办理
Taqyea
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
DOC
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 
fashion industry boom.pptx an economics project
TGMPandeyji
 
T2_01 Apuntes La Materia.pdfxxxxxxxxxxxxxxxxxxxxxxxxxxxxxskksk
mathiasdasilvabarcia
 
Dr. Robert Krug - Chief Data Scientist At DataInnovate Solutions
Dr. Robert Krug
 
Pre-Interrogation_Assessment_Presentation.pptx
anjukumari94314
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
Data Analysis for Business - make informed decisions, optimize performance, a...
Slidescope
 
NRRM 200 Statistics on Bycatch's Effects on Marine Mammals Slideshow.pdf
Rowan Sales
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
Data base management system Transactions.ppt
gandhamcharan2006
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
AI/ML Applications in Financial domain projects
Rituparna De
 
The X-Press God-WPS Office.pdf hdhdhdhdhd
ramifatoh4
 
原版定制AIM毕业证(澳大利亚音乐学院毕业证书)成绩单底纹防伪如何办理
Taqyea
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
Slide studies GC- CRC - PC - HNC baru.pptx
LLen8
 
MATRIX_AMAN IRAWAN_20227479046.docbbbnnb
vanitafiani1
 

Predictive Datacenter Analytics with Strymon

  • 1. PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri [email protected] Support: QCon San Francisco 14 November 2017
  • 2. ABOUT ME ▸ Postdoc at ETH Zürich ▸ Systems Group: https://www.systems.ethz.ch/ ▸ PMC member of Apache Flink ▸ Research interests ▸ Large-scale graph processing ▸ Streaming dataflow engines ▸ Current project ▸ Predictive datacenter analytics and management 2 @vkalavri
  • 7. DATACENTER MANAGEMENT 6 Can we predict the effect of changes? 6 Workloadfluctuations Network failures Configuration updates Software updatesResource scaling Service deployment
  • 8. DATACENTER MANAGEMENT 6 Can we predict the effect of changes? Can we prevent catastrophic faults? 6 Workloadfluctuations Network failures Configuration updates Software updatesResource scaling Service deployment
  • 9. Predicting outcomes under hypothetical conditions What-if Analysis: 7
  • 10. Predicting outcomes under hypothetical conditions ‣ How will response time change if we migrate a large service? ‣ What will happen to link utilization if we change the routing protocol costs? ‣ Will SLOs will be violated if a certain switch fails? ‣ Which services will be affected if we change load balancing strategy? What-if Analysis: 7
  • 11. Test deployment? Data Center Test Cluster ▸ expensive to operate and maintain ▸ some errors only occur in a large scale! a physical small-scale cluster to try out configuration changes and what-ifs 8
  • 12. Analytical model? Data Center infer workload distribution from samples and analytically model system components (e.g. disk failure rate) ▸ hard to develop, large design space to explore ▸ often inaccurate 9
  • 13. MODERN ENTERPRISE DATACENTERS ARE ALREADY HEAVILY INSTRUMENTED 10
  • 14. Trace-driven online simulation Use existing instrumentation to build a datacenter model ▸ construct DC state from real events ▸ simulate the state we cannot directly observe 11 Data Center current DC state forked what-if DC state forked what-if DC state forked what-if DC state
  • 15. 1212 traces, configuration, topology updates, … Datacenter Strymon queries, complex analytics, simulations, … policy enforcement, what-if scenarios, … STRYMON: ONLINE DATACENTER ANALYTICS AND MANAGEMENT Datacenter state event streams strymon.systems.ethz.ch
  • 17. STRYMON’S OPERATIONAL REQUIREMENTS ‣ Low latency: react quickly to network failures ‣ High throughput: keep up with high stream rates ‣ Iterative computation: complex graph analytics on the network topology ‣ Incremental computation: reuse already computed results when possible ‣ e.g. do not recompute forwarding rules after a link update 14
  • 18. TIMELY DATAFLOW: STREAM PROCESSING IN RUST 15 ▸ Data-parallel computations ▸ Arbitrary cyclic dataflows ▸ Logical timestamps (epochs) ▸ Asynchronous execution ▸ Low latency D. Murray, F. McSherry, M. Isard, R. Isaacs, P. Barham, M. Abadi. Naiad: A Timely Dataflow System. In SOSP, 2013. 15 https://github.com/frankmcsherry/timely-dataflow
  • 19. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16
  • 20. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16 initialize and run a timely job
  • 21. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16 create input and progress handles
  • 22. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16 define the dataflow and its operators
  • 23. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16 watch for progress
  • 24. WORDCOUNT IN TIMELY fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 
 let mut input = InputHandle::new();
 let mut probe = ProbeHandle::new();
 let index = worker.index();
 
 worker.dataflow(|scope| {
 input.to_stream(scope)
 .flat_map(|text: String|
 text.split_whitespace()
 .map(move |word| (word.to_owned(), 1))
 .collect::<Vec<_>>()
 )
 .aggregate(
 |_key, val, agg| { *agg += val; },
 |key, agg: i64| (key, agg),
 |key| hash_str(key)
 )
 .inspect(|data| println!("seen {:?}", data))
 .probe_with(&mut probe);
 });
 //feed data …
 }).unwrap();
 } 16 a few Rust peculiarities to get used to :-)
  • 25. PROGRESS TRACKING ▸ All tuples bear a logical timestamp (think event time) ▸ To send a timestamped tuple, an operator must hold a capability for it ▸ Workers broadcast progress changes to other workers ▸ Each worker independently determines progress 17 A distributed protocol that allows operators reason about the possibility of receiving data
  • 26. MAKING PROGRESS fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 …
 … … for round in 0..10 {
 input.send(("round".to_owned(), 1));
 input.advance_to(round + 1);
 while probe.less_than(input.time()) {
 worker.step();
 } }
 }).unwrap();
 } 18
  • 27. MAKING PROGRESS fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 …
 … … for round in 0..10 {
 input.send(("round".to_owned(), 1));
 input.advance_to(round + 1);
 while probe.less_than(input.time()) {
 worker.step();
 } }
 }).unwrap();
 } 18 push data to the input stream
  • 28. MAKING PROGRESS fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 …
 … … for round in 0..10 {
 input.send(("round".to_owned(), 1));
 input.advance_to(round + 1);
 while probe.less_than(input.time()) {
 worker.step();
 } }
 }).unwrap();
 } 18 advance the input epoch
  • 29. MAKING PROGRESS fn main() {
 timely::execute_from_args(std::env::args(), |worker| {
 …
 … … for round in 0..10 {
 input.send(("round".to_owned(), 1));
 input.advance_to(round + 1);
 while probe.less_than(input.time()) {
 worker.step();
 } }
 }).unwrap();
 } 18 do work while there’s still data for this epoch
  • 30. TIMELY ITERATIONS timely::example(|scope| {
 
 let (handle, stream) = scope.loop_variable(100, 1);
 (0..10).to_stream(scope)
 .concat(&stream)
 .inspect(|x| println!("seen: {:?}", x))
 .connect_loop(handle);
 }); 19
  • 31. TIMELY ITERATIONS timely::example(|scope| {
 
 let (handle, stream) = scope.loop_variable(100, 1);
 (0..10).to_stream(scope)
 .concat(&stream)
 .inspect(|x| println!("seen: {:?}", x))
 .connect_loop(handle);
 }); 19 loop 100 times at most advance timestamps by 1 in each iteration create the feedback loop
  • 32. TIMELY ITERATIONS timely::example(|scope| {
 
 let (handle, stream) = scope.loop_variable(100, 1);
 (0..10).to_stream(scope)
 .concat(&stream)
 .inspect(|x| println!("seen: {:?}", x))
 .connect_loop(handle);
 }); 19 loop 100 times at most advance timestamps by 1 in each iteration create the feedback loop t (t, l1) (t, (l1, l2))
  • 34. TIMELY & I 20 Relationship status: it’s complicated
  • 35. TIMELY & I 20 Relationship status: it’s complicated APIs & libraries performance ecosystem fault-tolerance deployment debugging expressiveness incremental computation performance
  • 37. What-if analysis Real-time datacenter analytics Incremental network routing Online critical path analysis Strymon 22 ‣ Query and analyze state online ‣ Control and enforce configuration ‣ Simulate what-if scenarios ‣ Understand performance
  • 38. What-if analysis Real-time datacenter analytics Incremental network routing Online critical path analysis Strymon 23 ‣ Query and analyze state online ‣ Control and enforce configuration ‣ Simulate what-if scenarios ‣ Understand performance
  • 39. RECONSTRUCTING USER SESSIONS 24 Application A Application B A.1 A.2 A.3 B.1 B.2 B.3 B.4 Time: 2015/09/01 10:03:38.599859 Session ID: XKSHSKCBA53U088FXGE7LD8 Transaction ID: 26-3-11-5-1
  • 40. PERFORMANCE RESULTS 25 Log event A B 1 2 1-1 1-2 1 1-3 1-1 Client Time 1 1-1 1-2 2 Inactivity Transaction IDn-2-8Transaction ‣ Logs from 1263 streams and 42 servers ‣ 1.3 million events/s at 424.3 MB/s ‣ 26ms per epoch vs. 2.1s per epoch with Flink Zaheer Chothia, John Liagouris, Desislava Dimitrova, and Timothy Roscoe. Online Reconstruction of Structural Information from Datacenter Logs. (EuroSys '17). More Results:
  • 41. What-if analysis Real-time datacenter analytics Incremental network routing Online critical path analysis Strymon 26 ‣ Query and analyze state online ‣ Control and enforce configuration ‣ Simulate what-if scenarios ‣ Understand performance
  • 42. ROUTING AS A STREAMING COMPUTATION Network changes G R delta-join iteration Updated rules Forwarding 
 rules 27 ‣ Compute APSP and keep forwarding rules as operator state ‣ Flow requests translate to lookups on that state ‣ Network updates cause re-computation of affected rules only
  • 43. REACTION TO LINK FAILURES ▸ Fail random link ▸ 500 individual runs ▸ 32 threads 28 Desislava C. Dimitrova et. al. Quick Incremental Routing Logic for Dynamic Network Graphs. SIGCOMM Posters and Demos '17 More Results:
  • 44. What-if analysis Real-time datacenter analytics Incremental network routing Online critical path analysis Strymon 29 ‣ Query and analyze state online ‣ Control and enforce configuration ‣ Simulate what-if scenarios ‣ Understand performance
  • 45. ONLINE CRITICAL PATH ANALYSIS 30 input stream output stream periodic snapshot trace snapshot stream analyzer performance summaries Apache Flink, Apache Spark, TensorFlow, Heron, Timely Dataflow
  • 46. TASK SCHEDULING IN APACHE SPARK 31 DRIVER W1 W2 W3 Venkataraman, Shivaram, et al. "Drizzle: Fast and adaptable stream processing at scale." Spark Summit (2016).
  • 47. SCHEDULING BOTTLENECK IN APACHE SPARK 32 0 5 10 15 Snapshot 0.0 0.2 0.4 0.6 0.8 CP 0 5 10 15 Snapshot %weight Processing Scheduling Conventional ProfilingStrymon Profiling Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots
  • 48. SCHEDULING BOTTLENECK IN APACHE SPARK 32 0 5 10 15 Snapshot 0.0 0.2 0.4 0.6 0.8 CP 0 5 10 15 Snapshot %weight Processing Scheduling Conventional ProfilingStrymon Profiling Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots
  • 49. What-if analysis Real-time datacenter analytics Incremental network routing Online critical path analysis Strymon 33 ‣ Query and analyze state online ‣ Control and enforce configuration ‣ Simulate what-if scenarios ‣ Understand performance
  • 50. EVALUATING LOAD BALANCING STRATEGIES ▸ 13k services ▸ ~100K user requests/s ▸ OSPF routing ▸ Weighted Round-Robin load balancing 34 Traffic matrix simulation under topology and load balancing changes
  • 51. WHAT-IF DATAFLOW 35 Logging Servers Topology Discovery Configuration DBs RPC logs (text) Topology changes (graph) Device specs,
 application placement (relational) ETL Load Balancing Routing LB View Construction Routing View Construction Traffic Matrix Construction DC Model construction
  • 53. Strymon 0.1.0 has been released! https://strymon-system.github.io https://github.com/strymon-system [email protected] Try it out: Send us feedback: 37
  • 54. THE STRYMON TEAM & FRIENDS 38 Vasiliki Kalavri Zaheer Chothia Andrea Lattuada Prof. Timothy Roscoe Sebastian Wicki Moritz HoffmannDesislava Dimitrova John Liagouris Frank McSherry
  • 55. THE STRYMON TEAM & FRIENDS 38 Vasiliki Kalavri Zaheer Chothia Andrea Lattuada Prof. Timothy Roscoe Sebastian Wicki Moritz HoffmannDesislava Dimitrova John Liagouris ? ? Frank McSherry
  • 56. THE STRYMON TEAM & FRIENDS 38 Vasiliki Kalavri Zaheer Chothia Andrea Lattuada Prof. Timothy Roscoe Sebastian Wicki Moritz HoffmannDesislava Dimitrova John Liagouris ? ? Frank McSherry IT COULD BE YOU! strymon.systems.ethz.ch
  • 57. PREDICTIVE DATACENTER ANALYTICS WITH STRYMON Vasia Kalavri [email protected] Support: QCon San Francisco 14 November 2017