SlideShare a Scribd company logo
1© Copyright 2016 EMC Corporation. All rights reserved.
Improved Reliable Streaming Processing:
Apache Storm as example
Frank Zhao, EMC CTO Office,
Fenghao Zhang*, Microsoft Bing,
Yusong Lv*, Peking University
Special thanks to EMC Ken Taylor, John Cardente and Lincourt Robert
*Zhang and Lv contributed to the research when they worked at EMC China COE
2© Copyright 2016 EMC Corporation. All rights reserved.
The technology concepts being discussed and demonstrated are
the result of research conducted by the Advanced Research &
Development (ARD) team from the EMC Office of the CTO. Any
demonstrated capability is only for research purpose and at a
prototype phase, therefore : THERE ARE NO IMMEDIATE PLANS
NOR INDICATION OF SUCH PLANS FOR PRODUCTIZATION OF
THESE CAPABILITIES AT THE TIME OF PRESENTATION. THINGS
MAY OR MAY NOT CHANGE IN THE FUTURE.
DISCLAIMER
3© Copyright 2016 EMC Corporation. All rights reserved.
• Distributed Streaming System
• Reliable Processing
• Apache Storm’s Solution, the Challenge
• New Proposed Approaches
– Fingerprint, and share-split
• Prototyping with Apache Storm and Benchmark
• Summary and Outlook
Agenda
4© Copyright 2016 EMC Corporation. All rights reserved.
• As service, continuously process data (a.k.a message or tuple)
in scalable, reliable and high-performance way (msec)
– Open-source: Storm, Flink, Spark-Streaming, Samza
Streaming processing
5© Copyright 2016 EMC Corporation. All rights reserved.
Streaming Processing
(Storm, Spark Streaming)
Batch processing
(Hadoop MR)
Type Continuous(never-stop),
real-time (ms level)
Batch/Period
Model DAG/graph MapReduce like Jobs
Workload CPU/Memory intensive CPU/mem and IO internsive
State Stateless, may period ckpt Stateful
Cluster Master-Slave w/ Zookeeper (Storm) Master-Slave or Job-task
Fault-
tolerance
Fault-tolerance/HA Fault-tolerance/HA
Streaming vs. batch processing
6© Copyright 2016 EMC Corporation. All rights reserved.
Storm Flink Spark
Streaming
Built since 2011 (Apache, Trident)
2016 (Twitter Heron)
2014 (Apache) ~2013
Streaming Native
(micro-batch, Trident)
Native Micro-batch
Guarantee At least once
(exactly-once w/ Trident)
Exactly-once Exactly-once
Fault-Tolerance Ack per message Checkpoint Checkpoint
Latency 5 4 3
Throughput 4 5 5
Ecosystem 5 3 3
Storm, Flink, Spark streaming*
*Personal observations for reference only
7© Copyright 2016 EMC Corporation. All rights reserved.
• Every message shall be guaranteed processed
– At-most once
– At-least once
– Exactly once
Reliable processing
May save
result
Topology (DAG)
0
1
2
3
4
5
6
7
8
9Data source
B
C
D
E
F
G
H
I
J
K
L
M
Spout
R
Bolt (worker, task, op)
8© Copyright 2016 EMC Corporation. All rights reserved.
• Scalable
• Fault-tolerant
• Guaranteed message processing
– At least once (default)
• Fast: ms level
– Pure memory computing, no checkkpoint
• Simple programming model
– Topology - Spouts – Bolts
– Clojure, Java, Ruby, Python …
Apache Storm
9© Copyright 2016 EMC Corporation. All rights reserved.
Storm: designs for fault-tolerance
Nimbus
 Deploy topology
 Dispatch tasks
 Monitor cluster
Zookeeper
cluster
 Coordination
 States of Nimbus
 State of supervisor
 …
Supervisor
Executor
Task Task
WorkersMaster
Those FT are about thread/task/
job or node, NOT message
10© Copyright 2016 EMC Corporation. All rights reserved.
• Critical message granularity (NOT thread/task/job/node)
• Need an efficient method, considering
– Every component may fault
– Large topology, continuous flooding messages
– Network temp unavailable, traffic out-of-order, …
– Minimized resource usage (network, cpu, mem)
Track processing status in DAG
0
1
2
3
4
5
6
7
8
9Data source
B
C
D
E
F
G
H
I
J
K
L
M
Spout
R
Bolt
11© Copyright 2016 EMC Corporation. All rights reserved.
History of Apache Storm and lessons learned
– Nathan Marz, creator of Storm
Tough problem and Storm’s answer!
12© Copyright 2016 EMC Corporation. All rights reserved.
Storm reliability track algorithm
0
1
2
3
4
Status Acker
srcNodeID: R, R
A
B
C
D
E
F
R ⊕ A ⊕ B ⊕ C
A ⊕ D
B ⊕ E
C ⊕ F
D⊕ E ⊕ F
R
Status = R ⊕
R ⊕ A ⊕ B ⊕ C ⊕
A ⊕ D ⊕
B ⊕ E ⊕
C ⊕ F ⊕
D ⊕ E ⊕ F =
1. Each msg has ID (8B random number)
2. Each bolt runs XOR (inMsgID, outMsgID[]) per inMsg
3. Each bolt sends XOR (per inMsg) result to Acker
4. Acker runs XOR: always 8B (regardless topology size)
5. Finally, given timeout, Acker.status shall be 0 means OK
otherwise something failed (may false-alarm, but never miss) 0
13© Copyright 2016 EMC Corporation. All rights reserved.
• RandomNum + XOR based, the key foundation of Storm that
runs for 5+Y
– Smart, simple and pretty good!
– Least memory footprint at Acker, regardless of topology
– Reliable*, regardless of Ack traffic order
– XOR op: commutative law, associative law
• Easy to handle any Out-of-order
Ingenious!
*: in theory, random ID may collision
14© Copyright 2016 EMC Corporation. All rights reserved.
• Network traffic, CPU overhead  latency & throughput impact
– Possibility of random number collision
Limitations
25000 msg/sec
9300 msg/sec
Non-reliable processing
reliable processing
*3rd party benchmark in 2012, things may change now
15© Copyright 2016 EMC Corporation. All rights reserved.
IS IT POSSIBLE ?
Ack only at leaf?
0
1
2
3
4
5
6
7
8
9
Data source
B
C
D
E
F
G
H
I
J
K
L
M
R
Current algorithm is fantastic, however
16© Copyright 2016 EMC Corporation. All rights reserved.
• Same-level guaranteed reliable processing
• More scalable, efficient and fast
– Much less Ack traffic; usually only at leaf nodes
– Same memory footprint, less CPU usage
– Eventually better latency/throughput
2 new proposed approaches
Currently in research & quick validation phase
17© Copyright 2016 EMC Corporation. All rights reserved.
• An evolution based on Random Num + XOR
Approach-1: fingerprint based
Currently, XOR in-pair (send, recv), then it’s 0
Further, XOR in multiple pairs (2, 4, 6, …), still 0
18© Copyright 2016 EMC Corporation. All rights reserved.
• Fingerprint(FP): A digest (i.e., 8B) of {in msgs, out msgs and
parent.fp}, to encode & represent the context then recursively pass-
down. That each downstream inherits genes from all ancestors
– Still use XOR of IDs, redundant in scalable way
– 3-rule: Embedded, Recursively inherited and Append-only update
Approach-1: fingerprint idea
iMsg <Mj, FPj >
Msg < Mj+1, FPj:i >
Msg < Mj+2, FPj:i >
Msg < Mj+2, FPj:i >
Msg <…>
Ni
i+1
i+2
i+3
Ni+1
Ni+2
Ni+3
Pass-down FP
InMsgID XOR [outMsgIDs]
• Embedded: as part of metadata
• Recursive-inherit: pass-down
• Append-update: via XOR
Append update
19© Copyright 2016 EMC Corporation. All rights reserved.
Fingerprint example
0
1
2
3
4
FP0= R ⊕ A ⊕ B ⊕ C
FP1= FP0 ⊕ A ⊕ D
FP2= FP0 ⊕ B ⊕ E
FP3= FP0 ⊕ C ⊕ F
Leaf has 3 Ack traffic:
FP4-D= FP1 ⊕ D
FP4-E= FP2 ⊕ E
FP4-F = FP3 ⊕ F
 Acker.status = R ⊕
(FP0 ⊕ A ⊕ D) ⊕ D ⊕
(FP0 ⊕ B ⊕ E) ⊕ E ⊕
(FP0 ⊕ C ⊕ F) ⊕ F =
Acker
srcNodeID: RootMsgID, R
A, FP0
C, FP0
B, FP0
D, FP1
E, FP2
F, FP3
FP4-D
FP4-E
FP4-F
Init: R
Calculate FP
0
R
May batch
20© Copyright 2016 EMC Corporation. All rights reserved.
Approach-1: failure example
0
1
2
3
4
Acker
srcNodeID : RootMsgID, R
A, FP0
C, FP0
B, FP0
D, FP1
E, FP2
F, FP3
FP4-D
FP4-E
FP4-F
Init = R
if msg D failed, then node4 only Ack FP4-E and FP4-F, finally Acker.status =
= R ⊕ FP4-E ⊕ FP4-F
= R ⊕ FP2 ⊕ E ⊕ FP3 ⊕ F
= R ⊕ (FP ⊕ B ⊕ E ⊕ E) ⊕ (FP ⊕ C ⊕ F ⊕ F)
= R ⊕ B ⊕ C != 0
Another example, if all message failed, Ack is R !=0
R
 Missing info about A/D path, due to failure!!
21© Copyright 2016 EMC Corporation. All rights reserved.
Approach-1: a complex example
1
2
3
4
5
6
7
8R
A
B
C
D
E
F
G
H
I
X
Initial : R
FP1= R ⊕ A ⊕ B ⊕ C
FP2= FP1 ⊕ A ⊕ D
FP3= FP1 ⊕ B ⊕ X
FP4= FP1 ⊕ C ⊕ E
//update FP5 to Acker since even
number of downstreams (2)
FP5= FP2 ⊕ D ⊕
FP3 ⊕ X ⊕
FP4 ⊕ E ⊕ (F ⊕ G)
FP6= FP5 ⊕ F ⊕ H
FP7= FP5 ⊕ G ⊕ I
// blot8 sends FP8 to Acker
FP8= FP6 ⊕ H ⊕ FP7 ⊕ I
Final Status = R ⊕ FP5 ⊕ FP8
= R ⊕ FP5 ⊕ (FP5 ⊕ F) ⊕ (FP5 ⊕ G)
= R ⊕ FP5 ⊕ (F ⊕ G)
= R ⊕ FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E
= R ⊕ (FP1 ⊕ A ⊕ B ⊕ C )
= 0
Acker
FP5
FP8
Limit and note: 1) downstream msg shall be odd number (1,3, 5, …); otherwise, bolt must send the new FP
to Acker, where Acker would run XOR with the new FP; 2) To implement such approach, ideally bolt needs
to know the total downstream number to generate FP before emit.
22© Copyright 2016 EMC Corporation. All rights reserved.
• For input rootMsg, INIT a BIG SHARE (8B), EMBED as metadata, pass-down
• SPLIT attached share by Storm at each bolt, EMBED, repeat this until leaf ...
• Only leaf ACK to Acker about received share at hand
• Acker REDO: decrease the reported share, finally 0 means ok; or-else failure
– No random(no collision), no XOR; inline embedded; split is transparent to App
– +/- (mod): follow commutative & associative law, resolve out-of-order issue
Approach-2: share split
0
1
2
3
4
5
6
7
8
9
Acker
srcNodeID: rootMsgID,BIG-Share
A
B, 50
C, 50
D, 25
E, 25
F, 17
G, 17
H, 16
I, 25
J, 25
K, 17
L, 17
A,1, 100
A, 0, 16
A, 0, 84M, 16
Like: IPO/stock share, split, increase share
23© Copyright 2016 EMC Corporation. All rights reserved.
• Rare case: INCREASE share if insufficient to split (also syncup the Acker)
• Acker then ADD the newly increased share (NOT decrease)
Approach-2: share split (con’t)
0
1
2
3
4
5
6
7
8
9
Acker
srcNodeID, RootMsgID,Share
A
B, 99
C, 1
F, 33
G, 33
H, 34
A, 100
A, +99
increase share;
Sync-up Acker
If S - S1 - S2 - … = Sn, then S - S1 - S2 - … - Sn =
AckerDAG
0
(Ack may batch)
24© Copyright 2016 EMC Corporation. All rights reserved.
• Implemented Approach-2 (share-split)
• Integrate with Storm 1.0.1 (Released in May 2016)
– Storm core (~200 LOC in Clojure: LISP-like) and Java APIs (~200 LOC
including some traces/tests)
• Implementation notes:
– Support BasicBolt, remove randomNum, re-use some existing
structures/APIs i.e., Anchors-to-ids (RootID:shareAttached), Ack sending
– Global pre-defined split share at all bolts (equally split)
• Next, configurable split approach per bolt
– To exactly split share, build 1-step delay emit
• Pre-split the input share
• Once new tuple generated, emit internally queue it until next tuple come out
• Finally explicitly call emitDone(), thus last tuple takes over all left share and emit
Prototyping
25© Copyright 2016 EMC Corporation. All rights reserved.
• Function & performance
– network traffic, CPU, latency/throughput
• Reference IBM whitepaper (Storm vs. IBM InfoSphere): 7 layers
– We use Wikipedia as data source; words processing
Benchmark
1000 Mbps
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
E5-2643 @ 3.40GHz,
24 cores;
256GB DRAM
E5-2643 @ 3.40GHz,
24 cores;
256GB DRAM
Ubuntu 15.10 (4.2.0)
Storm 1.0.1
E5-2643 @ 3.40GHz,
24 cores;
256GB DRAM
26© Copyright 2016 EMC Corporation. All rights reserved.
• Function: Inject error and validate reliability detection: Pass
– Same-level reliability as existing approach
•
• Performance: same HW/SW config and processing logic
– 16KB tuple, 100 pending, 48 parallelism per bolt
– 4 workers & 12 Ackers per host
Result: function & performance
27© Copyright 2016 EMC Corporation. All rights reserved.
• 1/3 Ack traffic, 18% faster, 9% less CPU
Test1: 3 layers
3903
1301
Current New
Ack traffic(Mil)
241
197
Current New
End-end Latency(ms)
350%
320%
Current New
CPU (per Java worker)
28© Copyright 2016 EMC Corporation. All rights reserved.
• 1/5 Ack traffic, 23% faster, 14% less CPU
Test2: 7 layers
2685
537
Current New
Ack traffic(Mil)
197
151
Current New
End-end latency(ms)
250%
215%
Current New
CPU (per Java worker)
29© Copyright 2016 EMC Corporation. All rights reserved.
• Larger topology? Quick test of 11 layers:
– 1/9 traffic
• Suppose the larger of topology, the more gains to achieve
• Next
– Refine multi-Acker
– Implement “Increase Share” operation
– Configurable split method per bolt
• So Dev can specify desired split way rather than fixed/global
• May integrate with Twitter Heron? Or apply to other areas?
– i.e., function call graph? performance trace? (more…)
MORE
30© Copyright 2016 EMC Corporation. All rights reserved.
End-end IoT landscape
Continuous, scalable,
Real-time processing
31© Copyright 2016 EMC Corporation. All rights reserved.
• Lambda architecture, fusion “historical ”+“new” data
– Proposed by Nathan Marz (5y ago), batch + streaming
– widely adopted in many Internet company
Unified data processing
32© Copyright 2016 EMC Corporation. All rights reserved.
• 2 innovative & inspiring streaming reliability algorithms
– Guaranteed with minimized mem footprint
– More scalable, efficient & fast, and even beautiful
• Demonstrate in Storm
– 1/N Ack traffic, only needed at leaf nodes
• N is topology depth. Usually a few leaf for aggregation, DB saving etc
• meanwhile, 23% faster, 14% less CPU
– Transparent to App except the last explicit emitDone() call
• Applying to other interesting areas...
– Distributed replication, tx, exact-state tracking, …
SUMMARY
33© Copyright 2016 EMC Corporation. All rights reserved.
• Feedback or comments? talk with us!
– Any flaw, constraints, or room to improve?
– then discuss with Storm community; Codes can be shared if needed
Junping.Zhao@emc.com ZhaoJP@gmail.com
THANK YOU!
Improved Reliable Streaming Processing: Apache Storm as example
Ad

More Related Content

What's hot (20)

Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Uwe Printz
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
T Jake Luciani
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
Mariusz Gil
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Dan Lynn
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Petr Zapletal
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Storm
StormStorm
Storm
nathanmarz
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
Apache Storm
Apache StormApache Storm
Apache Storm
masifqadri
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
P. Taylor Goetz
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Uwe Printz
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
Mariusz Gil
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Dan Lynn
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
Michael Noll
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
Petr Zapletal
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 

Similar to Improved Reliable Streaming Processing: Apache Storm as example (20)

BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
Alexandre Moneger
 
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
PROFIBUS and PROFINET InternationaI - PI UK
 
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Takeda Pharmaceuticals
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
Tomer Zait
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
JohanAspro
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Anil Madhavapeddy
 
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the Stack
ironSource
 
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Positive Hack Days
 
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
siouxhotornot
 
Erlang
ErlangErlang
Erlang
ESUG
 
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
ManhHoangVan
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
Maxim Kharchenko
 
presentation
presentationpresentation
presentation
Luca Terrazzan
 
Erlang OTP
Erlang OTPErlang OTP
Erlang OTP
Zvi Avraham
 
Erlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The UglyErlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The Ugly
enriquepazperez
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]
Aleksei Voitylov
 
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphes
AhmedMahjoub15
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv
vgy_a
 
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
Alexandre Moneger
 
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
FPGA-based error generator for PROFIBUS DP - Jean-Marc Capron (Yncréa Hauts-d...
PROFIBUS and PROFINET InternationaI - PI UK
 
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Hacker's and painters Hardware Hacking 101 - 10th Oct 2014
Takeda Pharmaceuticals
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Hsien-Hsin Sean Lee, Ph.D.
 
Buffer overflow – Smashing The Stack
Buffer overflow – Smashing The StackBuffer overflow – Smashing The Stack
Buffer overflow – Smashing The Stack
Tomer Zait
 
Embedded system Design introduction _ Karakola
Embedded system Design introduction _ KarakolaEmbedded system Design introduction _ Karakola
Embedded system Design introduction _ Karakola
JohanAspro
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Anil Madhavapeddy
 
Buffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the StackBuffer Overflow - Smashing the Stack
Buffer Overflow - Smashing the Stack
ironSource
 
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
Positive Hack Days
 
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
Sioux Hot-or-Not: Functional programming: unlocking the real power of multi-c...
siouxhotornot
 
Erlang
ErlangErlang
Erlang
ESUG
 
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06Lect-06
ManhHoangVan
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
Maxim Kharchenko
 
Erlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The UglyErlang Developments: The Good, The Bad and The Ugly
Erlang Developments: The Good, The Bad and The Ugly
enriquepazperez
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
Java on arm theory, applications, and workloads [dev5048]
Java on arm  theory, applications, and workloads [dev5048]Java on arm  theory, applications, and workloads [dev5048]
Java on arm theory, applications, and workloads [dev5048]
Aleksei Voitylov
 
LECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphesLECTURE2 td 2 sue les theories de graphes
LECTURE2 td 2 sue les theories de graphes
AhmedMahjoub15
 
How does ping_work_style_1_gv
How does ping_work_style_1_gvHow does ping_work_style_1_gv
How does ping_work_style_1_gv
vgy_a
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 

Improved Reliable Streaming Processing: Apache Storm as example

  • 1. 1© Copyright 2016 EMC Corporation. All rights reserved. Improved Reliable Streaming Processing: Apache Storm as example Frank Zhao, EMC CTO Office, Fenghao Zhang*, Microsoft Bing, Yusong Lv*, Peking University Special thanks to EMC Ken Taylor, John Cardente and Lincourt Robert *Zhang and Lv contributed to the research when they worked at EMC China COE
  • 2. 2© Copyright 2016 EMC Corporation. All rights reserved. The technology concepts being discussed and demonstrated are the result of research conducted by the Advanced Research & Development (ARD) team from the EMC Office of the CTO. Any demonstrated capability is only for research purpose and at a prototype phase, therefore : THERE ARE NO IMMEDIATE PLANS NOR INDICATION OF SUCH PLANS FOR PRODUCTIZATION OF THESE CAPABILITIES AT THE TIME OF PRESENTATION. THINGS MAY OR MAY NOT CHANGE IN THE FUTURE. DISCLAIMER
  • 3. 3© Copyright 2016 EMC Corporation. All rights reserved. • Distributed Streaming System • Reliable Processing • Apache Storm’s Solution, the Challenge • New Proposed Approaches – Fingerprint, and share-split • Prototyping with Apache Storm and Benchmark • Summary and Outlook Agenda
  • 4. 4© Copyright 2016 EMC Corporation. All rights reserved. • As service, continuously process data (a.k.a message or tuple) in scalable, reliable and high-performance way (msec) – Open-source: Storm, Flink, Spark-Streaming, Samza Streaming processing
  • 5. 5© Copyright 2016 EMC Corporation. All rights reserved. Streaming Processing (Storm, Spark Streaming) Batch processing (Hadoop MR) Type Continuous(never-stop), real-time (ms level) Batch/Period Model DAG/graph MapReduce like Jobs Workload CPU/Memory intensive CPU/mem and IO internsive State Stateless, may period ckpt Stateful Cluster Master-Slave w/ Zookeeper (Storm) Master-Slave or Job-task Fault- tolerance Fault-tolerance/HA Fault-tolerance/HA Streaming vs. batch processing
  • 6. 6© Copyright 2016 EMC Corporation. All rights reserved. Storm Flink Spark Streaming Built since 2011 (Apache, Trident) 2016 (Twitter Heron) 2014 (Apache) ~2013 Streaming Native (micro-batch, Trident) Native Micro-batch Guarantee At least once (exactly-once w/ Trident) Exactly-once Exactly-once Fault-Tolerance Ack per message Checkpoint Checkpoint Latency 5 4 3 Throughput 4 5 5 Ecosystem 5 3 3 Storm, Flink, Spark streaming* *Personal observations for reference only
  • 7. 7© Copyright 2016 EMC Corporation. All rights reserved. • Every message shall be guaranteed processed – At-most once – At-least once – Exactly once Reliable processing May save result Topology (DAG) 0 1 2 3 4 5 6 7 8 9Data source B C D E F G H I J K L M Spout R Bolt (worker, task, op)
  • 8. 8© Copyright 2016 EMC Corporation. All rights reserved. • Scalable • Fault-tolerant • Guaranteed message processing – At least once (default) • Fast: ms level – Pure memory computing, no checkkpoint • Simple programming model – Topology - Spouts – Bolts – Clojure, Java, Ruby, Python … Apache Storm
  • 9. 9© Copyright 2016 EMC Corporation. All rights reserved. Storm: designs for fault-tolerance Nimbus  Deploy topology  Dispatch tasks  Monitor cluster Zookeeper cluster  Coordination  States of Nimbus  State of supervisor  … Supervisor Executor Task Task WorkersMaster Those FT are about thread/task/ job or node, NOT message
  • 10. 10© Copyright 2016 EMC Corporation. All rights reserved. • Critical message granularity (NOT thread/task/job/node) • Need an efficient method, considering – Every component may fault – Large topology, continuous flooding messages – Network temp unavailable, traffic out-of-order, … – Minimized resource usage (network, cpu, mem) Track processing status in DAG 0 1 2 3 4 5 6 7 8 9Data source B C D E F G H I J K L M Spout R Bolt
  • 11. 11© Copyright 2016 EMC Corporation. All rights reserved. History of Apache Storm and lessons learned – Nathan Marz, creator of Storm Tough problem and Storm’s answer!
  • 12. 12© Copyright 2016 EMC Corporation. All rights reserved. Storm reliability track algorithm 0 1 2 3 4 Status Acker srcNodeID: R, R A B C D E F R ⊕ A ⊕ B ⊕ C A ⊕ D B ⊕ E C ⊕ F D⊕ E ⊕ F R Status = R ⊕ R ⊕ A ⊕ B ⊕ C ⊕ A ⊕ D ⊕ B ⊕ E ⊕ C ⊕ F ⊕ D ⊕ E ⊕ F = 1. Each msg has ID (8B random number) 2. Each bolt runs XOR (inMsgID, outMsgID[]) per inMsg 3. Each bolt sends XOR (per inMsg) result to Acker 4. Acker runs XOR: always 8B (regardless topology size) 5. Finally, given timeout, Acker.status shall be 0 means OK otherwise something failed (may false-alarm, but never miss) 0
  • 13. 13© Copyright 2016 EMC Corporation. All rights reserved. • RandomNum + XOR based, the key foundation of Storm that runs for 5+Y – Smart, simple and pretty good! – Least memory footprint at Acker, regardless of topology – Reliable*, regardless of Ack traffic order – XOR op: commutative law, associative law • Easy to handle any Out-of-order Ingenious! *: in theory, random ID may collision
  • 14. 14© Copyright 2016 EMC Corporation. All rights reserved. • Network traffic, CPU overhead  latency & throughput impact – Possibility of random number collision Limitations 25000 msg/sec 9300 msg/sec Non-reliable processing reliable processing *3rd party benchmark in 2012, things may change now
  • 15. 15© Copyright 2016 EMC Corporation. All rights reserved. IS IT POSSIBLE ? Ack only at leaf? 0 1 2 3 4 5 6 7 8 9 Data source B C D E F G H I J K L M R Current algorithm is fantastic, however
  • 16. 16© Copyright 2016 EMC Corporation. All rights reserved. • Same-level guaranteed reliable processing • More scalable, efficient and fast – Much less Ack traffic; usually only at leaf nodes – Same memory footprint, less CPU usage – Eventually better latency/throughput 2 new proposed approaches Currently in research & quick validation phase
  • 17. 17© Copyright 2016 EMC Corporation. All rights reserved. • An evolution based on Random Num + XOR Approach-1: fingerprint based Currently, XOR in-pair (send, recv), then it’s 0 Further, XOR in multiple pairs (2, 4, 6, …), still 0
  • 18. 18© Copyright 2016 EMC Corporation. All rights reserved. • Fingerprint(FP): A digest (i.e., 8B) of {in msgs, out msgs and parent.fp}, to encode & represent the context then recursively pass- down. That each downstream inherits genes from all ancestors – Still use XOR of IDs, redundant in scalable way – 3-rule: Embedded, Recursively inherited and Append-only update Approach-1: fingerprint idea iMsg <Mj, FPj > Msg < Mj+1, FPj:i > Msg < Mj+2, FPj:i > Msg < Mj+2, FPj:i > Msg <…> Ni i+1 i+2 i+3 Ni+1 Ni+2 Ni+3 Pass-down FP InMsgID XOR [outMsgIDs] • Embedded: as part of metadata • Recursive-inherit: pass-down • Append-update: via XOR Append update
  • 19. 19© Copyright 2016 EMC Corporation. All rights reserved. Fingerprint example 0 1 2 3 4 FP0= R ⊕ A ⊕ B ⊕ C FP1= FP0 ⊕ A ⊕ D FP2= FP0 ⊕ B ⊕ E FP3= FP0 ⊕ C ⊕ F Leaf has 3 Ack traffic: FP4-D= FP1 ⊕ D FP4-E= FP2 ⊕ E FP4-F = FP3 ⊕ F  Acker.status = R ⊕ (FP0 ⊕ A ⊕ D) ⊕ D ⊕ (FP0 ⊕ B ⊕ E) ⊕ E ⊕ (FP0 ⊕ C ⊕ F) ⊕ F = Acker srcNodeID: RootMsgID, R A, FP0 C, FP0 B, FP0 D, FP1 E, FP2 F, FP3 FP4-D FP4-E FP4-F Init: R Calculate FP 0 R May batch
  • 20. 20© Copyright 2016 EMC Corporation. All rights reserved. Approach-1: failure example 0 1 2 3 4 Acker srcNodeID : RootMsgID, R A, FP0 C, FP0 B, FP0 D, FP1 E, FP2 F, FP3 FP4-D FP4-E FP4-F Init = R if msg D failed, then node4 only Ack FP4-E and FP4-F, finally Acker.status = = R ⊕ FP4-E ⊕ FP4-F = R ⊕ FP2 ⊕ E ⊕ FP3 ⊕ F = R ⊕ (FP ⊕ B ⊕ E ⊕ E) ⊕ (FP ⊕ C ⊕ F ⊕ F) = R ⊕ B ⊕ C != 0 Another example, if all message failed, Ack is R !=0 R  Missing info about A/D path, due to failure!!
  • 21. 21© Copyright 2016 EMC Corporation. All rights reserved. Approach-1: a complex example 1 2 3 4 5 6 7 8R A B C D E F G H I X Initial : R FP1= R ⊕ A ⊕ B ⊕ C FP2= FP1 ⊕ A ⊕ D FP3= FP1 ⊕ B ⊕ X FP4= FP1 ⊕ C ⊕ E //update FP5 to Acker since even number of downstreams (2) FP5= FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E ⊕ (F ⊕ G) FP6= FP5 ⊕ F ⊕ H FP7= FP5 ⊕ G ⊕ I // blot8 sends FP8 to Acker FP8= FP6 ⊕ H ⊕ FP7 ⊕ I Final Status = R ⊕ FP5 ⊕ FP8 = R ⊕ FP5 ⊕ (FP5 ⊕ F) ⊕ (FP5 ⊕ G) = R ⊕ FP5 ⊕ (F ⊕ G) = R ⊕ FP2 ⊕ D ⊕ FP3 ⊕ X ⊕ FP4 ⊕ E = R ⊕ (FP1 ⊕ A ⊕ B ⊕ C ) = 0 Acker FP5 FP8 Limit and note: 1) downstream msg shall be odd number (1,3, 5, …); otherwise, bolt must send the new FP to Acker, where Acker would run XOR with the new FP; 2) To implement such approach, ideally bolt needs to know the total downstream number to generate FP before emit.
  • 22. 22© Copyright 2016 EMC Corporation. All rights reserved. • For input rootMsg, INIT a BIG SHARE (8B), EMBED as metadata, pass-down • SPLIT attached share by Storm at each bolt, EMBED, repeat this until leaf ... • Only leaf ACK to Acker about received share at hand • Acker REDO: decrease the reported share, finally 0 means ok; or-else failure – No random(no collision), no XOR; inline embedded; split is transparent to App – +/- (mod): follow commutative & associative law, resolve out-of-order issue Approach-2: share split 0 1 2 3 4 5 6 7 8 9 Acker srcNodeID: rootMsgID,BIG-Share A B, 50 C, 50 D, 25 E, 25 F, 17 G, 17 H, 16 I, 25 J, 25 K, 17 L, 17 A,1, 100 A, 0, 16 A, 0, 84M, 16 Like: IPO/stock share, split, increase share
  • 23. 23© Copyright 2016 EMC Corporation. All rights reserved. • Rare case: INCREASE share if insufficient to split (also syncup the Acker) • Acker then ADD the newly increased share (NOT decrease) Approach-2: share split (con’t) 0 1 2 3 4 5 6 7 8 9 Acker srcNodeID, RootMsgID,Share A B, 99 C, 1 F, 33 G, 33 H, 34 A, 100 A, +99 increase share; Sync-up Acker If S - S1 - S2 - … = Sn, then S - S1 - S2 - … - Sn = AckerDAG 0 (Ack may batch)
  • 24. 24© Copyright 2016 EMC Corporation. All rights reserved. • Implemented Approach-2 (share-split) • Integrate with Storm 1.0.1 (Released in May 2016) – Storm core (~200 LOC in Clojure: LISP-like) and Java APIs (~200 LOC including some traces/tests) • Implementation notes: – Support BasicBolt, remove randomNum, re-use some existing structures/APIs i.e., Anchors-to-ids (RootID:shareAttached), Ack sending – Global pre-defined split share at all bolts (equally split) • Next, configurable split approach per bolt – To exactly split share, build 1-step delay emit • Pre-split the input share • Once new tuple generated, emit internally queue it until next tuple come out • Finally explicitly call emitDone(), thus last tuple takes over all left share and emit Prototyping
  • 25. 25© Copyright 2016 EMC Corporation. All rights reserved. • Function & performance – network traffic, CPU, latency/throughput • Reference IBM whitepaper (Storm vs. IBM InfoSphere): 7 layers – We use Wikipedia as data source; words processing Benchmark 1000 Mbps Ubuntu 15.10 (4.2.0) Storm 1.0.1 Ubuntu 15.10 (4.2.0) Storm 1.0.1 E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM Ubuntu 15.10 (4.2.0) Storm 1.0.1 E5-2643 @ 3.40GHz, 24 cores; 256GB DRAM
  • 26. 26© Copyright 2016 EMC Corporation. All rights reserved. • Function: Inject error and validate reliability detection: Pass – Same-level reliability as existing approach • • Performance: same HW/SW config and processing logic – 16KB tuple, 100 pending, 48 parallelism per bolt – 4 workers & 12 Ackers per host Result: function & performance
  • 27. 27© Copyright 2016 EMC Corporation. All rights reserved. • 1/3 Ack traffic, 18% faster, 9% less CPU Test1: 3 layers 3903 1301 Current New Ack traffic(Mil) 241 197 Current New End-end Latency(ms) 350% 320% Current New CPU (per Java worker)
  • 28. 28© Copyright 2016 EMC Corporation. All rights reserved. • 1/5 Ack traffic, 23% faster, 14% less CPU Test2: 7 layers 2685 537 Current New Ack traffic(Mil) 197 151 Current New End-end latency(ms) 250% 215% Current New CPU (per Java worker)
  • 29. 29© Copyright 2016 EMC Corporation. All rights reserved. • Larger topology? Quick test of 11 layers: – 1/9 traffic • Suppose the larger of topology, the more gains to achieve • Next – Refine multi-Acker – Implement “Increase Share” operation – Configurable split method per bolt • So Dev can specify desired split way rather than fixed/global • May integrate with Twitter Heron? Or apply to other areas? – i.e., function call graph? performance trace? (more…) MORE
  • 30. 30© Copyright 2016 EMC Corporation. All rights reserved. End-end IoT landscape Continuous, scalable, Real-time processing
  • 31. 31© Copyright 2016 EMC Corporation. All rights reserved. • Lambda architecture, fusion “historical ”+“new” data – Proposed by Nathan Marz (5y ago), batch + streaming – widely adopted in many Internet company Unified data processing
  • 32. 32© Copyright 2016 EMC Corporation. All rights reserved. • 2 innovative & inspiring streaming reliability algorithms – Guaranteed with minimized mem footprint – More scalable, efficient & fast, and even beautiful • Demonstrate in Storm – 1/N Ack traffic, only needed at leaf nodes • N is topology depth. Usually a few leaf for aggregation, DB saving etc • meanwhile, 23% faster, 14% less CPU – Transparent to App except the last explicit emitDone() call • Applying to other interesting areas... – Distributed replication, tx, exact-state tracking, … SUMMARY
  • 33. 33© Copyright 2016 EMC Corporation. All rights reserved. • Feedback or comments? talk with us! – Any flaw, constraints, or room to improve? – then discuss with Storm community; Codes can be shared if needed [email protected] [email protected] THANK YOU!

Editor's Notes

  • #3: Any official adeclaimer?
  • #5: May also known as Complex Event Processing (CEP)
  • #7: Trident: abstraction on top of Storm. Besides providing higher-level constructs “a-la-Cascading”, it batches groups of Tuples to 1) Make reasoning about processing easier and 2) Encourage efficient data persistence, even with the help of an API that can provide exactly-once semantics for some cases Heron: built since 2014, paper in 2015, open-source in May 2016. http://twitter.github.io/heron/. API compatible with Apache Storm and hence no code change “One of our primary requirements for Heron was ease of debugging and profiling”, also scheduling, optimal resource utilization (IPC layer, simplification) Flink: based on distributed ckpt, Lightweight Asynchronous Snapshots for Distributed Dataflows, (ABS: Asynchronous Barrier Snapshotting ) http://arxiv.org/abs/1506.08603 variation of the Chandy Lamport algorithm (1985). periodically draws state snapshots of a running stream topology, and stores these snapshots to durable storage Similar to the micro-batching approach, in which all computations between two checkpoints either succeed or fail atomically as a whole. However, the similarities stop there. One great feature of Chandy Lamport is that we never have to press the “pause” button in stream processing to schedule the next micro batch. Instead, regular data processing always keeps going, processing events as they come, while checkpoints happen in the background
  • #8: If failed detected, Storm can re-do from beginning (Storm doesn’t do ckpt) - usually fast in ms level. Spark can re-do from the most recent ckpt (perf impact).
  • #10: Task failed: by supervisor daemon restart Supervisor/workNode failed: by ZK Restart/re-scheduler Master failed: by ZK. Cant’ submit new task Existing task should be ok Redo(Re-compute): no log/replica, for high performance or real-time processing
  • #12: http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html
  • #13: It doesn’t care which component is failed. Once failed is detected given a time-out (30sec) App should not commit the message to data source like Kafka, then Kafka never remove that data App could re-send the message and re-run the topology
  • #14: Random ID Every bolt must send a Ack message
  • #15: Another benchmark is IBM IBM InfoSphere vs. Storm: https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
  • #20: In practice, there’s a challenge to implement such approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  • #21: In practice, there’s a challenge to implement the approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  • #22: In practice, there’s a challenge to implement such approach: ideally, Need to know how many downstream msg are generated, then alloc enough random IDs and calculate the FP For each downstream msg, embed the FP and emit to downstream However, for many (may be not all) processing logic, probably don’t know the total downstream msg count beforehand (in step1) until execute the logic.
  • #23: For example, init share is 100 @ Acker. Embed the share into msg and pass-down to the downstream, A source msg (root msg) is ingested at root node (spout), then init the BIG SHARE as initial status. And embed the SHARE as part of metadata Run topology, and each node execute pre-defined logic, meanwhile, also abstract the share and split it to downstream outputMsg Finally at leaf nodes, would abstract and report the received share to Acker Acker would decrease the share, 100 - 16 -84 = 0. 0 means ok.
  • #24: May pre-define some rule about inc, i.e., always increase 7B, then Acker could use one bit to indicate one increase A similar but different Huang’s algorithm. looks both use number as weight or share then involve split op, but sounds to me, the problem area, prerequisite, algorithm steps are very different. Huang’s target is more related to process (task/bolt) state, but my target is the continuous flowing message running at tasks. A few bullets in my mind, feel free to comment: Problem area: In Huang’s context, the distributed task consists of different processes, each in either active (may idel at anytime) or idle (idle to active is only triggered by some msg). Huang’s goal is to detect *all processes* in the system become state idle. our goal is to track each message status running at those task or usually related to partial failure (but we don’t care which task is failed/unavailable) prerequisite: importantly, state of idle (Huang’s monitoring state) clearly *is explicit aware* by the process; with that, his step is “Upon becoming idle, a process sends a message… “; but in our case, message failure/exception is hard to know by itself, typically due to network partition/timeout etc, thus it must be detected by other components or special design state, which adds extra challenges. into the algorithm: steps are different, our method always split the number during flow the DAG, then the Acker essentially redo split op based on recv share and make sure redo result is 0.   In general,  Huang’s research target is process (tens or hundreds), rather than the continuous flowing message (billions or never stop). In practices, currently distributed process states are managed by Zookeeper(or raft etc) that based on Paxos algorithm publish in 1990-but widely understood and adopted after 2001 (until Lamport’s second paper to explain Paxos, and Google validated)
  • #25: A few important points in implementation: Re-use existing Anchors-to-ids map to embed the share when emit (so no extra traffic), previously it’s [RootId -> tupleID]; now it’s [RootID -> shareAssigned] 2. To split pass-down share, need to know how many downstream outMsg generated beforehand (but usually hard to predict), To resolve that, work out the 1-step defer processing: 1) Static split the input share = sub-share 2) Assign and embed, prepare to emit 3) Internally queue current outMsg, and send previous msg 4) ToEmit the last msg with new API, thus the last outMsg takes over all the left share w/ above implem, we introduce a little bit delay but it’s acceptable. 3. How to split share is also important. Right now it’s simply pre-defined split method, i.e., all bolt uses a pre-defined split count (could be 1 ~ 4096 or larger); in the future, it shall be config per bolt by Dev (who suppose knows more on the topology). i.e., bolt1 may split upto 128; bolt2 may split as 256 etc. Improper split may cause pressure to run out of ID thus increase share is required - still depends on topology size
  • #26: IBM InfoSphere vs. Storm: https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
  • #27: Various topology, such as top-down, multiple income bolt, multiple spouts, …
  • #34: CLJ source files storm-core/src/clj/org/apache/storm/daemon/acker.clj executor.clj storm-core/src/clj/org/apache/storm/util.clj Java source files storm-core/src/jvm/org/apache/storm/topology/BasicOutputCollector.java BasicBoltExecutor.java storm-core/src/jvm/org/apache/storm/coordination/CoordinatedBolt.java storm-core/src/jvm/org/apache/storm/task/IOutputCollector.java OutputCollector.java storm-core/src/jvm/org/apache/storm/trident/topology/TridentBoltExecutor.java