SlideShare a Scribd company logo
Storm
The Real-Time Layer Your Big
    Data’s Been Missing


        Dan Lynn
      dan@fullcontact.com
          @danklynn
Keeps Contact Information Current and Complete


  Based in Denver, Colorado




                              CTO & Co-Founder
                               dan@fullcontact.com
                                   @danklynn
Turn Partial Contacts
 Into Full Contacts
Your data is old
Old data is confusing
First, there was the spreadsheet
Next, there was SQL
Then, there wasn’t SQL
Batch Processing
        =
   Stale Data
Streaming
Computation
Queues / Workers
Queues / Workers




Messages
Messages
 Messages
 Messages
  Messages
  Messages          Queue
   Messages




                                 Workers
Queues / Workers
Me ssage rou ting
can be com ple x
 Messages
 Messages
  Messages
  Messages
   Messages
   Messages          Queue
    Messages




                                  Workers
Queues / Workers
Brittle
Hard to scale
Queues / Workers
Queues / Workers
Queues / Workers
Queues / Workers
Rou ting mus t be
reco nfig ured whe n
sca ling out
Storm
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Key Concepts
Tuples
Ordered list of elements
Tuples
           Ordered list of elements


("search-01384", "e:dan@fullcontact.com")
Streams
Unbounded sequence of tuples
Streams
        Unbounded sequence of tuples

Tuple    Tuple   Tuple   Tuple   Tuple   Tuple
Spouts
 Source of streams
Spouts
 Source of streams
Spouts  Source of streams

Tuple   Tuple   Tuple   Tuple   Tuple   Tuple
Spouts can talk with




               some images from http://commons.wikimedia.org
Spouts can talk with


•Queues




                     some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs




                      some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs

•API calls




                       some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs

•API calls

•Event data


                       some images from http://commons.wikimedia.org
Bolts
Process tuples and create new streams
Bolts



                                                                                             Tuple
                                                                                    Tuple
                                                                           Tuple
                                                                  Tuple
                                                         Tuple
                                                Tuple

Tuple   Tuple   Tuple   Tuple   Tuple   Tuple
                                                 Tuple
                                                          Tuple
                                                                   Tuple
                                                                            Tuple
                                                                                     Tuple
                                                                                             Tuple




                                                                               some images from http://commons.wikimedia.org
Bolts




        some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms




                     some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter




                     some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation




                       some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation
•Streaming joins




                       some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation
•Streaming joins
•Access DBs, APIs, etc...


                       some images from http://commons.wikimedia.org
Topologies
A directed graph of Spouts and Bolts
This is a Topology




               some images from http://commons.wikimedia.org
This is also a topology




                 some images from http://commons.wikimedia.org
Tasks
Processes which execute Streams or Bolts
Running a Topology




$ storm jar my-code.jar com.example.MyTopology arg1 arg2
Storm Cluster




                Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




                                 Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




Job Tracke r
                                 Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




         Tas k Tracke rs         Nathan Marz
Storm Cluster

But it’s not Hado op




Coo rdi nates eve ry thi ng
                                       Nathan Marz
Example:
Streaming Word Count
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                   SplitSentence.java
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                        SplitSentence.java




                                                     splitsentence.py
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                   SplitSentence.java
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));



                                                               java
Streaming Word Count
public static class WordCount extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if(count==null) count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}
                                                                    WordCount.java
Streaming Word Count


 TopologyBuilder builder = new TopologyBuilder();

 builder.setSpout("sentences", new RandomSentenceSpout(), 5);
 builder.setBolt("split", new SplitSentence(), 8)
         .shuffleGrouping("sentences");
 builder.setBolt("count", new WordCount(), 12)
         .fieldsGrouping("split", new Fields("word"));



                                                                java



Gro upings con tro l how tup les are rou ted
Shuffle grouping
Tuples are randomly distributed across all of the
             tasks running the bolt
Fields grouping
Groups tuples by specific named fields and routes
             them to the same task
o p’s
                    ado r
               t o H av i o
          o us b e h
An a lo g i ng
   rt i t io n
pa
              Fields grouping
     Groups tuples by specific named fields and routes
                  them to the same task
Distributed RPC
Before Distributed RPC,
time-sensitive queries relied on a
      pre-computed index
What if you didn’t need an index?
Distributed RPC
Try it out!

Huge thanks to Nathan Marz - @nathanmarz

   http://github.com/nathanmarz/storm

https://github.com/nathanmarz/storm-starter
             @stormprocessor
Questions?
 dan@fullcontact.com

More Related Content

What's hot (20)

PDF
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
PDF
Introduction to Apache Storm
Tiziano De Matteis
 
PDF
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
PPTX
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
PPTX
Introduction to Storm
Eugene Dvorkin
 
PDF
Storm and Cassandra
T Jake Luciani
 
PDF
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Sonal Raj
 
PPTX
Storm
Pouyan Rezazadeh
 
PDF
Storm Anatomy
Eiichiro Uchiumi
 
PPTX
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
PDF
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
PPS
Storm presentation
Shyam Raj
 
PPTX
Cassandra and Storm at Health Market Sceince
P. Taylor Goetz
 
PDF
Storm
nathanmarz
 
PPTX
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
PPTX
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
PDF
Streams processing with Storm
Mariusz Gil
 
PDF
Real time and reliable processing with Apache Storm
Andrea Iacono
 
PDF
Real-time Big Data Processing with Storm
viirya
 
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Introduction to Apache Storm
Tiziano De Matteis
 
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
Yahoo compares Storm and Spark
Chicago Hadoop Users Group
 
Introduction to Storm
Eugene Dvorkin
 
Storm and Cassandra
T Jake Luciani
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Sonal Raj
 
Storm Anatomy
Eiichiro Uchiumi
 
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 
PHP Backends for Real-Time User Interaction using Apache Storm.
DECK36
 
Storm presentation
Shyam Raj
 
Cassandra and Storm at Health Market Sceince
P. Taylor Goetz
 
Storm
nathanmarz
 
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 
Streams processing with Storm
Mariusz Gil
 
Real time and reliable processing with Apache Storm
Andrea Iacono
 
Real-time Big Data Processing with Storm
viirya
 

Viewers also liked (20)

PDF
How Spotify scales Apache Storm Pipelines
Kinshuk Mishra
 
PPT
REST vs WS-*: Myths Facts and Lies
Paul Fremantle
 
PPTX
Test with Haiku Deck - Crate.io overview
Teerapong Kraiamornchai
 
PDF
Chris Ward - Understanding databases for distributed docker applications - No...
NoSQLmatters
 
PPT
Berkley Building Materials Project Gallary
sajidd
 
PPT
A Competive edge
pjb19
 
ODP
Using content-based multimedia similarity search for learning
suzreader
 
PPTX
Stacked deck presentation (1)
Joe Hines
 
PDF
Webinar-Daily Deals and Mobile-Engagement Explained
Waterfall Mobile
 
PPTX
Intro to ebd08 review
postguy365
 
PDF
הכל על מועדון 700
Kidum LTD
 
PPTX
Prezentacja tz3 promocja_blog
towarzystwoziemskie
 
KEY
When it rains: Prepare for scale with Amazon EC2
Dan Lynn
 
PDF
Mobile CRM Webinar: 6 Steps to Mobile ROI for Government Agencies
Waterfall Mobile
 
PPTX
Technology and accountability – ideas
Laina Emmanuel
 
PDF
Mobile CRM Webinar: 6 Must Haves For Effective Cross Channel CRM
Waterfall Mobile
 
PPSX
Catalyst mini
Nasira Bahay
 
PDF
【STR3 パネルトーク】
Up Hatch
 
PPTX
Special needs power point
busybee67
 
PDF
שיעורי בית בפסיכומטרי
Kidum LTD
 
How Spotify scales Apache Storm Pipelines
Kinshuk Mishra
 
REST vs WS-*: Myths Facts and Lies
Paul Fremantle
 
Test with Haiku Deck - Crate.io overview
Teerapong Kraiamornchai
 
Chris Ward - Understanding databases for distributed docker applications - No...
NoSQLmatters
 
Berkley Building Materials Project Gallary
sajidd
 
A Competive edge
pjb19
 
Using content-based multimedia similarity search for learning
suzreader
 
Stacked deck presentation (1)
Joe Hines
 
Webinar-Daily Deals and Mobile-Engagement Explained
Waterfall Mobile
 
Intro to ebd08 review
postguy365
 
הכל על מועדון 700
Kidum LTD
 
Prezentacja tz3 promocja_blog
towarzystwoziemskie
 
When it rains: Prepare for scale with Amazon EC2
Dan Lynn
 
Mobile CRM Webinar: 6 Steps to Mobile ROI for Government Agencies
Waterfall Mobile
 
Technology and accountability – ideas
Laina Emmanuel
 
Mobile CRM Webinar: 6 Must Haves For Effective Cross Channel CRM
Waterfall Mobile
 
Catalyst mini
Nasira Bahay
 
【STR3 パネルトーク】
Up Hatch
 
Special needs power point
busybee67
 
שיעורי בית בפסיכומטרי
Kidum LTD
 
Ad

Similar to Storm: The Real-Time Layer - GlueCon 2012 (20)

PDF
Twitter Stream Processing
Colin Surprenant
 
PPTX
Presentation on nesting of loops
bsdeol28
 
PDF
Apache Storm Tutorial
Farzad Nozarian
 
PDF
C++ Standard Template Library
Ilio Catallo
 
PPTX
Domain-Specific Languages
Javier Canovas
 
PDF
Twitter Big Data
Colin Surprenant
 
PPTX
Storm 0.8.2
Kasper Grud Skat Madsen
 
PDF
Apache PIG - User Defined Functions
Christoph Bauer
 
PPTX
Mastering Python lesson3b_for_loops
Ruth Marvin
 
PPTX
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
PDF
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
PPTX
storm-170531123446.pptx
IbrahimBenhadhria
 
PDF
Intro to Apache Storm
David Kay
 
PDF
Storm introduction
Angelo Genovese
 
PDF
understand Storm in pictures
zqhxuyuan
 
PDF
Loops_in_Rv1.2b
Carlo Fanara
 
PPTX
My lecture stack_queue_operation
Senthil Kumar
 
PPT
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
PPT
06 Loops
maznabili
 
PDF
Storm
Szymon Sobczak
 
Twitter Stream Processing
Colin Surprenant
 
Presentation on nesting of loops
bsdeol28
 
Apache Storm Tutorial
Farzad Nozarian
 
C++ Standard Template Library
Ilio Catallo
 
Domain-Specific Languages
Javier Canovas
 
Twitter Big Data
Colin Surprenant
 
Apache PIG - User Defined Functions
Christoph Bauer
 
Mastering Python lesson3b_for_loops
Ruth Marvin
 
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
Scaling Apache Storm - Strata + Hadoop World 2014
P. Taylor Goetz
 
storm-170531123446.pptx
IbrahimBenhadhria
 
Intro to Apache Storm
David Kay
 
Storm introduction
Angelo Genovese
 
understand Storm in pictures
zqhxuyuan
 
Loops_in_Rv1.2b
Carlo Fanara
 
My lecture stack_queue_operation
Senthil Kumar
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Davorin Vukelic
 
06 Loops
maznabili
 
Ad

More from Dan Lynn (8)

PDF
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
PDF
The Holy Grail of Data Analytics
Dan Lynn
 
PDF
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
PDF
Hands on with Apache Spark
Dan Lynn
 
PDF
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
Dan Lynn
 
PDF
Data Streaming Technology Overview
Dan Lynn
 
PDF
Data decay and the illusion of the present
Dan Lynn
 
PDF
Storing and manipulating graphs in HBase
Dan Lynn
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
The Holy Grail of Data Analytics
Dan Lynn
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dan Lynn
 
Hands on with Apache Spark
Dan Lynn
 
AgilData - How I Learned to Stop Worrying and Evolve with On-Demand Schemas
Dan Lynn
 
Data Streaming Technology Overview
Dan Lynn
 
Data decay and the illusion of the present
Dan Lynn
 
Storing and manipulating graphs in HBase
Dan Lynn
 

Recently uploaded (20)

PDF
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
Using Google Data Studio (Looker Studio) to Create Effective and Easy Data Re...
Orage Technologies
 
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Using Google Data Studio (Looker Studio) to Create Effective and Easy Data Re...
Orage Technologies
 

Storm: The Real-Time Layer - GlueCon 2012