SlideShare a Scribd company logo
Real-Time Predictions
H2O // Storm
H2
O.ai
Spencer Aiello
spencer@h2o.ai
Jan 15, 2015
H2
O.aiOverview:
● Introductions
● Real Time Analytics
● The Speed of Information
● The Analytics Workflow
● H2O // Storm
● Demo
H2
O.ai
Real Time Analytics: Then & Now
1930 - 1940s
Kerrison Predictor
ENIAC - Weather Modeling
(pseudo real time)
1950s
Real Time Analytics
to Fight Fraud
1990s
Traffic
Management
Dynamic
Pricing
Shopping & Movie
Recommendations
1970s
Real Time
Roulette Wheel
Prediction With A
Computer In A
Shoe
H2
O.ai
The Speed of Information
Factors to consider:
● Speed of Light
○ 3x108
m/s
● Infrastructure
○ Line-of-sight relays
○ Submarine Cables
○ Where is the information coming from?
○ Where is it going?
○ Lossless?
● Power Consumption
○ Efficiency
● Amount of Information
○ Bandwidth considerations (impacts infrastructure)
○ How quickly can you schlepp around 1TB? 1PB?
■ How quickly do you _need_ to do that?
■ I.e., are you making efficient use of resources?
H2
O.ai
The Shannon Limit:
Sup({ Bounds on bits/s })
- C = Channel Capacity (bits/s)
- B = Bandwidth (Hz)
- S = Signal in Joules/s (Watts)
- N = Noise in Joules/s (Watts)
The Speed of Information
H2
O.ai
The Speed of Information
Consider: The Warning Beacons of Gondor
7 beacons (13 in the movie)
Probably 1 cord of wood (~3.6 m3
)
1 bit of information (@ Shannon Limit)
optical transmission
Compare to the current World Record:
1 Petabit / second Fiber Transmission over 50-km
(~5,000 HDTV Videos/Second over single fiber)
About 25 orders of magnitude difference!
(source: http://www.ntt.co.jp/news2012/1209e/120920a.html)
H2
O.ai
The Speed of Information
AT&T “Long Lines”:
● 838 mile route connecting Chicago to New York
● 4GHz microwave line-of-sight radio relays
● ~25 miles separation (due to curvature of the Earth)
● 34 hops in all
High Frequency Trading (HFT):
● Light propagation delays between distant points are relevant
sources:
- Relativistic Statistical Arbitrage (http://www.alexwg.org/publications/PhysRevE_82-056104.pdf)
- Information Transmission Between Financial Markets in Chicago and New York (http://arxiv.org/pdf/1302.
5966v1.pdf)
H2
O.ai
The Speed of Information
Observations:
● Moving bits around is a big deal!
● ∃ insurmountable physical and theoretical limitations
○ Shannon Limit
○ Speed of Light
○ Landauer’s Principle
○ Relativistic Effects
○ Curvature of the Earth
● Other limitations or complications?
○ Hairpinning: Non-optimal routing to far flung nodes
■ Geographic locality ≠ Internet locality
○ Bad hardware
○ Bad software
H2
O.ai
(n.d.). Retrieved from http://www.us.ntt.net/support/looking-glass/
(n.d.). Retrieved from http://www.submarinecablemap.com/
The Speed of Information
H2
O.ai
The Analytics Workflow
The Analytics Process:
1. Define your problem
2. Gather data and explore
3. Prepare your data for modeling
4. Modeling
5. Model Validation
6. Implementation & Tracking
H2
O.ai
The Analytics Workflow
The Analytics Process:
1. Define your problem
2. Gather data and explore
3. Prepare your data for modeling
4. Modeling
5. Model Validation
6. Implementation & Tracking
} Here’s where H2O fits into the analytics process
http://learn.h2o.ai/content/
H2
O.ai
The Analytics Workflow
:::Prep:::
Data Preparation:
● A sequence of transformations applied to your data
● This step will define your Storm topology
● Take raw information and give it structure
H2
O.ai
The Analytics Workflow
:::Modeling:::
Questions to ask yourself:
● How fast must a scoring engine classify incoming tuples?
● How do I optimize between scoring latency and predictive power?
● E.g.What are the trade-offs between a GLM and a GBM?
Science!
H2
O.ai
The Analytics Workflow
:::Validation:::
Types of Validation:
● N-fold cross validation
● Train/Validate/Test -- What Features are Important?
● Model Comparison -- Does your model optimize all needs?
○ Business needs
○ Resource needs
● Repeat steps 3 - 5 until satisfied
H2
O.ai
The Analytics Workflow
:::Validation:::
Types of Validation:
● N-fold cross validation
● Train/Validate/Test -- What Features are Important?
● Model Comparison -- Does your model optimize all needs?
○ Business needs
○ Resource needs
● Repeat steps 3 - 5 until satisfied
WRONG: You should never be satisfied!
Your model will go out of date (if it hasn’t already)!
H2
O.ai
The Analytics Workflow
:::Tracking:::
An Extension of Validation:
● Do not open the fire-hose and blast your model with 100% of your data
○ Expect the unexpected
○ Your topology might will break (oops forgot about unicode… derp)
○ Start off with 10% and ramp up; course-correct along the way
● Perform batch modeling in off-peak hours (Jenkins never sleeps)
● Models should be replaced “gradually”
H2
O.ai
H2O // Storm
H2
O.ai
H2O // Storm
For a complete tutorial please visit:
http://learn.h2o.ai/content/demos/streaming_data.html
H2
O.ai
Use H2O
Awesome
H2
O.ai
Thank you!
H2
O.ai
DEMO
http://learn.h2o.ai/content/demos/streaming_data.html

More Related Content

PDF
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
PPTX
How IOT & Big Data will shape up Future Economies?
PDF
Agile data science
PPTX
Real-time Big Data Analytics: From Deployment to Production
PPTX
The Other 99% of a Data Science Project
PPTX
What Open Data and Open Source can do for Sri Lanka?
PDF
Meetup7 integration microservices_machine_learning
PPTX
Predictive Analytics: Context and Use Cases
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
How IOT & Big Data will shape up Future Economies?
Agile data science
Real-time Big Data Analytics: From Deployment to Production
The Other 99% of a Data Science Project
What Open Data and Open Source can do for Sri Lanka?
Meetup7 integration microservices_machine_learning
Predictive Analytics: Context and Use Cases

What's hot (20)

PPTX
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
PDF
Come diventare data scientist - Paolo Pellegrini
PDF
Data Science Project Lifecycle
PDF
Machine Learning for Fraud Detection
PDF
Predictive Analytics - Big Data & Artificial Intelligence
PDF
Intro to Data Science for Non-Data Scientists
PDF
Data Science Application in Business Portfolio & Risk Management
PDF
The NEEDS vs. the WANTS in IoT
PDF
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
PDF
Case of success: Visualization as an example for exercising democratic transp...
PDF
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
PDF
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
PPTX
Data sciences and marketing analytics
PDF
from_physics_to_data_science
PPTX
Machine Learning for Auditors: What you need to know - ISACA North America CA...
DOCX
Tools for Unstructured Data Analytics
PDF
Introduction to Data Science (Data Science Thailand Meetup #1)
PPT
Real time analytics of big data
PDF
An Obligatory Introduction to Data Science
PPTX
Big Data: The 4 Layers Everyone Must Know
Predictive Analytics - Display Advertising & Credit Card Acquisition Use cases
Come diventare data scientist - Paolo Pellegrini
Data Science Project Lifecycle
Machine Learning for Fraud Detection
Predictive Analytics - Big Data & Artificial Intelligence
Intro to Data Science for Non-Data Scientists
Data Science Application in Business Portfolio & Risk Management
The NEEDS vs. the WANTS in IoT
Big data technology by Data Sciences Thailand ในงาน THE FIRST NIDA BUSINESS A...
Case of success: Visualization as an example for exercising democratic transp...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
Big Data Use-Cases across industries (Georg Polzer, Teralytics)
Data sciences and marketing analytics
from_physics_to_data_science
Machine Learning for Auditors: What you need to know - ISACA North America CA...
Tools for Unstructured Data Analytics
Introduction to Data Science (Data Science Thailand Meetup #1)
Real time analytics of big data
An Obligatory Introduction to Data Science
Big Data: The 4 Layers Everyone Must Know
Ad

Similar to H2o storm (20)

PDF
Adding Velocity to BigBench
PDF
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
PPTX
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
PDF
Elasticsearch Performance Testing and Scaling @ Signal
PPTX
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
PDF
Build machine learning pipelines from research to production
PDF
Engineering data quality
PDF
India Analytics and Big Data Summit 2015
PDF
India Analytics and Big Data Summit 2015
PDF
Data Lessons Learned at Scale - Big Data DC
PDF
Data engineering in 10 years.pdf
PPTX
Keeping the Internet Fast and Resilient for You and Your Customers
PDF
Sensing the world with Data of Things
PDF
Sensing the world with data of things
PDF
Extracting Insights from Data at Twitter
PDF
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
PDF
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
PDF
Container world 2019 Canary Release
PDF
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Adding Velocity to BigBench
Adding Velocity to BigBench, Todor Ivanov, Patrick Bedué, Roberto Zicari, Ahm...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Elasticsearch Performance Testing and Scaling @ Signal
Charles sonigo - Demuxed 2018 - How to be data-driven when you aren't Netflix...
Build machine learning pipelines from research to production
Engineering data quality
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
Data Lessons Learned at Scale - Big Data DC
Data engineering in 10 years.pdf
Keeping the Internet Fast and Resilient for You and Your Customers
Sensing the world with Data of Things
Sensing the world with data of things
Extracting Insights from Data at Twitter
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Counting Unique Users in Real-Time: Here's a Challenge for You!
Container world 2019 Canary Release
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Ad

Recently uploaded (20)

PDF
REPORT: Heating appliances market in Poland 2024
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
PDF
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Event Presentation Google Cloud Next Extended 2025
PPTX
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Sensors and Actuators in IoT Systems using pdf
PDF
Top Generative AI Tools for Patent Drafting in 2025.pdf
REPORT: Heating appliances market in Poland 2024
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
ai-archetype-understanding-the-personality-of-agentic-ai.pdf
A Day in the Life of Location Data - Turning Where into How.pdf
NewMind AI Monthly Chronicles - July 2025
Event Presentation Google Cloud Next Extended 2025
Web Security: Login Bypass, SQLi, CSRF & XSS.pptx
Transforming Manufacturing operations through Intelligent Integrations
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 2 Digital Image Fundamentals.pdf
Enable Enterprise-Ready Security on IBM i Systems.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Sensors and Actuators in IoT Systems using pdf
Top Generative AI Tools for Patent Drafting in 2025.pdf

H2o storm

  • 1. Real-Time Predictions H2O // Storm H2 O.ai Spencer Aiello [email protected] Jan 15, 2015
  • 2. H2 O.aiOverview: ● Introductions ● Real Time Analytics ● The Speed of Information ● The Analytics Workflow ● H2O // Storm ● Demo
  • 3. H2 O.ai Real Time Analytics: Then & Now 1930 - 1940s Kerrison Predictor ENIAC - Weather Modeling (pseudo real time) 1950s Real Time Analytics to Fight Fraud 1990s Traffic Management Dynamic Pricing Shopping & Movie Recommendations 1970s Real Time Roulette Wheel Prediction With A Computer In A Shoe
  • 4. H2 O.ai The Speed of Information Factors to consider: ● Speed of Light ○ 3x108 m/s ● Infrastructure ○ Line-of-sight relays ○ Submarine Cables ○ Where is the information coming from? ○ Where is it going? ○ Lossless? ● Power Consumption ○ Efficiency ● Amount of Information ○ Bandwidth considerations (impacts infrastructure) ○ How quickly can you schlepp around 1TB? 1PB? ■ How quickly do you _need_ to do that? ■ I.e., are you making efficient use of resources?
  • 5. H2 O.ai The Shannon Limit: Sup({ Bounds on bits/s }) - C = Channel Capacity (bits/s) - B = Bandwidth (Hz) - S = Signal in Joules/s (Watts) - N = Noise in Joules/s (Watts) The Speed of Information
  • 6. H2 O.ai The Speed of Information Consider: The Warning Beacons of Gondor 7 beacons (13 in the movie) Probably 1 cord of wood (~3.6 m3 ) 1 bit of information (@ Shannon Limit) optical transmission Compare to the current World Record: 1 Petabit / second Fiber Transmission over 50-km (~5,000 HDTV Videos/Second over single fiber) About 25 orders of magnitude difference! (source: http://www.ntt.co.jp/news2012/1209e/120920a.html)
  • 7. H2 O.ai The Speed of Information AT&T “Long Lines”: ● 838 mile route connecting Chicago to New York ● 4GHz microwave line-of-sight radio relays ● ~25 miles separation (due to curvature of the Earth) ● 34 hops in all High Frequency Trading (HFT): ● Light propagation delays between distant points are relevant sources: - Relativistic Statistical Arbitrage (http://www.alexwg.org/publications/PhysRevE_82-056104.pdf) - Information Transmission Between Financial Markets in Chicago and New York (http://arxiv.org/pdf/1302. 5966v1.pdf)
  • 8. H2 O.ai The Speed of Information Observations: ● Moving bits around is a big deal! ● ∃ insurmountable physical and theoretical limitations ○ Shannon Limit ○ Speed of Light ○ Landauer’s Principle ○ Relativistic Effects ○ Curvature of the Earth ● Other limitations or complications? ○ Hairpinning: Non-optimal routing to far flung nodes ■ Geographic locality ≠ Internet locality ○ Bad hardware ○ Bad software
  • 9. H2 O.ai (n.d.). Retrieved from http://www.us.ntt.net/support/looking-glass/ (n.d.). Retrieved from http://www.submarinecablemap.com/ The Speed of Information
  • 10. H2 O.ai The Analytics Workflow The Analytics Process: 1. Define your problem 2. Gather data and explore 3. Prepare your data for modeling 4. Modeling 5. Model Validation 6. Implementation & Tracking
  • 11. H2 O.ai The Analytics Workflow The Analytics Process: 1. Define your problem 2. Gather data and explore 3. Prepare your data for modeling 4. Modeling 5. Model Validation 6. Implementation & Tracking } Here’s where H2O fits into the analytics process http://learn.h2o.ai/content/
  • 12. H2 O.ai The Analytics Workflow :::Prep::: Data Preparation: ● A sequence of transformations applied to your data ● This step will define your Storm topology ● Take raw information and give it structure
  • 13. H2 O.ai The Analytics Workflow :::Modeling::: Questions to ask yourself: ● How fast must a scoring engine classify incoming tuples? ● How do I optimize between scoring latency and predictive power? ● E.g.What are the trade-offs between a GLM and a GBM? Science!
  • 14. H2 O.ai The Analytics Workflow :::Validation::: Types of Validation: ● N-fold cross validation ● Train/Validate/Test -- What Features are Important? ● Model Comparison -- Does your model optimize all needs? ○ Business needs ○ Resource needs ● Repeat steps 3 - 5 until satisfied
  • 15. H2 O.ai The Analytics Workflow :::Validation::: Types of Validation: ● N-fold cross validation ● Train/Validate/Test -- What Features are Important? ● Model Comparison -- Does your model optimize all needs? ○ Business needs ○ Resource needs ● Repeat steps 3 - 5 until satisfied WRONG: You should never be satisfied! Your model will go out of date (if it hasn’t already)!
  • 16. H2 O.ai The Analytics Workflow :::Tracking::: An Extension of Validation: ● Do not open the fire-hose and blast your model with 100% of your data ○ Expect the unexpected ○ Your topology might will break (oops forgot about unicode… derp) ○ Start off with 10% and ramp up; course-correct along the way ● Perform batch modeling in off-peak hours (Jenkins never sleeps) ● Models should be replaced “gradually”
  • 18. H2 O.ai H2O // Storm For a complete tutorial please visit: http://learn.h2o.ai/content/demos/streaming_data.html