SlideShare a Scribd company logo
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
A real-time Lambda Architecture using Hadoop & Storm
NoSQL Matters Cologne 2014 by Nathan Bijnens
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Speaker
Nathan Bijnens
Big Data Engineer @ Virdata
@nathan_gs
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Computing Trends
Past
Computation (CPUs)
Expensive
Disk Storage Expensive
Coordination Easy
(Latches Don’t Often Hit)
DRAM Expensive
Computation Cheap
(Many Core Computers)
Disk Storage Cheap
(Cheap Commodity Disks)
Coordination Hard
(Latches Stall a Lot, etc)
DRAM / SSD
Getting Cheap
Current
Source: Immutability Changes Everything - Pat Helland, RICON2012
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Credits
Nathan Marz
● Ex-Backtype & Twitter
● Startup in Stealthmode
Creator of
● Storm
● Cascalog
● ElephantDB
Coined the term Lambda Architecture.
manning.com/marz
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
a Data System
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Not all information is equal.
Some information is derived from other pieces of information.
Data is more than Information
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Eventually you will reach the most ‘raw’ form of
information.
This is the information you hold true, simply because it exists.
Let’s call this ‘data’, very similar to ‘event’.
Data is more than Information
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Events used to manipulate the master data.
Events: Before
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Today, events are the master data.
Events: After
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Let’s store everything.
Data System
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Data is Immutable.
Data System
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Data is Time Based.
Data System
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Capturing change
INSERT INTO contact (name, city) VALUES (‘Nathan’, ‘Antwerp’)
UPDATE contact SET city = ‘Cologne’ WHERE name = ‘Nathan’
Traditionally
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Capturing change
INSERT INTO contact (name, city, timestamp) VALUES (‘Nathan’, ‘Antwerp’, 2008-10-11 20:00Z)
INSERT INTO contact (name, city, timestamp) VALUES (‘Nathan’, ‘Cologne’, 2014-04-29 10:00Z)
in a Data System
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
The data you query is often transformed, aggregated, ...
Rarely used in it’s original form.
Query
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Query = function ( all data )
Query
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Query: Number of people living in each city
Person City Timestamp
Nathan Antwerp 2008-10-11
John Cologne 2010-01-23
Dirk Antwerp 2012-09-12
Nathan Cologne 2014-04-29
City Count
Antwerp 1
Cologne 2
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Query
All Data QueryPrecomputed
View
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Layered Architecture
Batch Layer
Speed Layer
Serving
Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Layered Architecture
Hadoop ElephantDB
Incoming Data
Cassandra
Query
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
Hadoop ElephantDB
Incoming Data
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
The batch layer can calculate anything, given enough time...
Unrestrained computation.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
No need to De-Normalize.
The batch layer stores the data normalized, the generated views are often, if not always denormalized.
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Horizontally scalable.
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
High Latency.
Let’s for now pretend the update latency doesn’t matter.
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Functional computation, based on immutable inputs, is
idempotent.
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Stores a master copy of the data set
Batch Layer
… append only
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch: view generation
Master Dataset
View #1
View #3
View #2
MapReduce
MapReduce
MapReduce
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
MapReduce
1. Take a large data set and divide it into subsets
2. Perform the same function on all subsets
3. Combine the output from all subsets
…
…
Output
DoWork() DoWork() DoWork() …
MAPREDUCE
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
MapReduce
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Serialization & Schema
Catch errors as quickly as they happen.
Validate on write vs on read.
Catch errors as quickly as they happen.
Validate on write vs on read.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
CSV is actually a serialization language that is just
poorly defined.
Serialization & Schema
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Use a format with a schema
● Thrift
● Avro
● Protocolbuffers
Could be combined with Parquet.
Added bonus: it’s faster and uses less space.
Serialization & Schema
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch View Database
No random writes required.
Read Only database
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Every iteration produces the views from scratch.
Batch View Database
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Pure Lambda databases
● ElephantDB
● SploutSQL
Databases with a batch load & read only views
● Voldemort
Other databases that could be used
● ElasticSearch/Solr: generate the lucene indexes using MapReduce
● Cassandra: generate sstables
● ...
Batch View Databases
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
Without the associated complexities.
Eventually consistent
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Batch Layer
Data absorbed into Batch Views
Time
Now
We are not done yet…
Not yet absorbed.
Just a few hours of data.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Speed Layer
Hadoop ElephantDB
Incoming Data
Cassandra
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Stream processing.
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Continuous computation.
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storing a limited window of data.
Compensating for the last few hours of data.
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
All the complexity is isolated in the Speed Layer.
If anything goes wrong, it’s auto-corrected.
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
You have a choice between:
● Availability
○ Queries are eventual consistent
● Consistency
○ Queries are consistent
CAP
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Eventual accuracy
Some algorithms are hard to implement in real-time.
For those cases we could estimate the results.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storm
Speed Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Message passing
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Distributed processing
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Horizontally scalable.
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Incremental algorithms
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Fast.
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storm
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storm
Tuple
Stream
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storm
Spout
Bolt
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Storm
Grouping
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Data Ingestion
Queues & Pub/Sub models are a natural fit.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
● Kafka
● Flume
● Scribe
● *MQ
● …
Data Ingestion
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Speed Layer Views
The views need to be stored in a random writable
database.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
The logic behind a R/W database is much more
complex than a read-only view.
Speed Layer Views
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
The views are stored in a Read & Write database.
● Cassandra
● Hbase
● Redis
● SQL
● ElasticSearch
● ...
Speed Layer Views
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Serving Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Serving Layer
Hadoop ElephantDB
Incoming Data
Cassandra
Query
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Serving Layer
Random reads.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
This layer queries the batch & real-time views and
merges it.
Serving layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
How to query an Average?
Serving Layer
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Side note: CQRS
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
CQRS
Source: martinfowler.com/bliki/CQRS.html - Martin Fowler
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
CQRS & Event Sourcing
Event Sourcing
● Every command is a new event.
● The event store keeps all events, new events are
appended.
● Any query loops through all related events, even
to produce an aggregate.
source: CQRS Journey - Microsoft Patterns & Practices
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Lambda Architecture
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Lambda Architecture
The Lambda Architecture can discard any view, batch
and real-time, and just recreate everything from the
master data.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Mistakes are corrected via recomputation.
Write bad data? Remove the data & recompute.
Bug in view generation? Just recompute the view.
Lambda Architecture
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Data storage is highly optimized.
Lambda Architecture
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Immutability changes everything.
Lambda Architecture
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Questions?
@nathan_gs #nosql14
nathan@nathan.gs / slideshare.net/nathan_gs
lambda-architecture.net / @LambdaArch / #LambdaArch
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Virdata is the cross-industry cloud service/platform for the Internet
of Things. Designed to elastically scale to monitor and manage an
unprecedented amount of devices and applications using
concurrent persistent connections, Virdata opens the door to
numerous new business opportunities.
Virdata combines Publish-Subscribe based Distributed Messaging,
Complex Event Processing and state-of-the-art Big Data paradigms
to enable both historical & real-time monitoring and near real-time
analytics with a scale required for the Internet of Things.
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Acknowledgements
I would like to thank Nathan Marz for writing a very insightful book, where most of the ideas in this presentation come from.
Parts of this presentation has been created while working for datacrunchers.eu, I thank them for the opportunities to speak about the
Lambda Architecture both at clients and at conferences. DataCrunchers is the first Big Data agency in Belgium.
Schema’s & Pictures:
Computing Trends: Immutability Changes Everything - Pat Helland, RICON2012
MapReduce #1: PolybasePass2012.pptx - David J. DeWitt, Microsoft Gray Systems Lab
MapReduce #2: Introduction to MapReduce and Hadoop - Shivnath Babu, Duke
CQRS: martinfowler.com/bliki/CQRS.html - Martin Fowler
CQRS & Event Sourcing: CQRS Journey - Adam Dymitruk, Josh Elster & Mark Seemann, Microsoft Patterns & Practices
NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14
Thank you
@nathan_gs
nathan@nathan.gs
Ad

More Related Content

What's hot (20)

Lambda architecture
Lambda architectureLambda architecture
Lambda architecture
Szilveszter Molnár
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
Sujee Maniyam
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Timothy Spann
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
Duyhai Doan
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
prajods
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
hadooparchbook
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
DataWorks Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
Sujee Maniyam
 
Real time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.jsReal time data viz with Spark Streaming, Kafka and D3.js
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Timothy Spann
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Natalino Busa
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
Knoldus Inc.
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
Duyhai Doan
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
prajods
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
 
Architectural Patterns for Streaming Applications
Architectural Patterns for Streaming ApplicationsArchitectural Patterns for Streaming Applications
Architectural Patterns for Streaming Applications
hadooparchbook
 

Viewers also liked (20)

Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
Guido Schmutz
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
Serg Masyutin
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
 
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
Gigaom
 
Url Shortening Services
Url Shortening ServicesUrl Shortening Services
Url Shortening Services
Altan Khendup
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Altan Khendup
 
Storm
StormStorm
Storm
Antonio Calvo Morata
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
Navneet kumar
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
SATOSHI TAGOMORI
 
National Weather Service Storm Spotter Training
National Weather Service Storm Spotter TrainingNational Weather Service Storm Spotter Training
National Weather Service Storm Spotter Training
chowd
 
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
Mathieu DESPRIEE
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
C4Media
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
Nathan Bijnens
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
Guido Schmutz
 
Lambda architecture: from zero to One
Lambda architecture: from zero to OneLambda architecture: from zero to One
Lambda architecture: from zero to One
Serg Masyutin
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
Guido Schmutz
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
 
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013
Gigaom
 
Url Shortening Services
Url Shortening ServicesUrl Shortening Services
Url Shortening Services
Altan Khendup
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Altan Khendup
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
Navneet kumar
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
SATOSHI TAGOMORI
 
National Weather Service Storm Spotter Training
National Weather Service Storm Spotter TrainingNational Weather Service Storm Spotter Training
National Weather Service Storm Spotter Training
chowd
 
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
Mathieu DESPRIEE
 
Lambda Architectures in Practice
Lambda Architectures in PracticeLambda Architectures in Practice
Lambda Architectures in Practice
C4Media
 
Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016Attunity Hortonworks Webinar- Sept 22, 2016
Attunity Hortonworks Webinar- Sept 22, 2016
Hortonworks
 
Ad

Similar to A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne '14) (20)

New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
jaxLondonConference
 
SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014
Ben Vanberg
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
DataStax
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Anant Corporation
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
 
2014 04-AMPlifying-docker-at-451-hcts-eu
2014 04-AMPlifying-docker-at-451-hcts-eu2014 04-AMPlifying-docker-at-451-hcts-eu
2014 04-AMPlifying-docker-at-451-hcts-eu
Alex Heneveld
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
 
New Analytics Toolbox
New Analytics ToolboxNew Analytics Toolbox
New Analytics Toolbox
Robbie Strickland
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Unlock the value of your big data infrastructure
Unlock the value of your big data infrastructureUnlock the value of your big data infrastructure
Unlock the value of your big data infrastructure
ManageEngine, Zoho Corporation
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
Databricks
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
Álvaro Agea Herradón
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
Vince Gonzalez
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van La...
jaxLondonConference
 
SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014SSTable Reader Cassandra Day Denver 2014
SSTable Reader Cassandra Day Denver 2014
Ben Vanberg
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Announcing Spark Driver for Cassandra
Announcing Spark Driver for CassandraAnnouncing Spark Driver for Cassandra
Announcing Spark Driver for Cassandra
DataStax
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Anant Corporation
 
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Paris Spark Meetup (Feb2015) ccarbone : SPARK Streaming vs Storm / MLLib / Ne...
Cedric CARBONE
 
2014 04-AMPlifying-docker-at-451-hcts-eu
2014 04-AMPlifying-docker-at-451-hcts-eu2014 04-AMPlifying-docker-at-451-hcts-eu
2014 04-AMPlifying-docker-at-451-hcts-eu
Alex Heneveld
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
OpenSistemas
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...How R Developers Can Build and Share Data and AI Applications that Scale with...
How R Developers Can Build and Share Data and AI Applications that Scale with...
Databricks
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
Vince Gonzalez
 
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev BandungScalable data pipeline at Traveloka - Facebook Dev Bandung
Scalable data pipeline at Traveloka - Facebook Dev Bandung
Rendy Bambang Junior
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
Ned Shawa
 
Ad

More from Nathan Bijnens (13)

AI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour BrusselsAI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour Brussels
Nathan Bijnens
 
AI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide SprintAI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide Sprint
Nathan Bijnens
 
Large Language Models vs Small Language Models
Large Language Models vs Small Language ModelsLarge Language Models vs Small Language Models
Large Language Models vs Small Language Models
Nathan Bijnens
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
Nathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
Nathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
Nathan Bijnens
 
AI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour BrusselsAI Revolution unleashed with AI Foundry at AI Tour Brussels
AI Revolution unleashed with AI Foundry at AI Tour Brussels
Nathan Bijnens
 
AI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide SprintAI Agents, such as Autogen at Tide Sprint
AI Agents, such as Autogen at Tide Sprint
Nathan Bijnens
 
Large Language Models vs Small Language Models
Large Language Models vs Small Language ModelsLarge Language Models vs Small Language Models
Large Language Models vs Small Language Models
Nathan Bijnens
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
Nathan Bijnens
 
Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018Azure Databricks & Spark @ Techorama 2018
Azure Databricks & Spark @ Techorama 2018
Nathan Bijnens
 
Big Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AIBig Data Expo '18 - Microsoft AI
Big Data Expo '18 - Microsoft AI
Nathan Bijnens
 
Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)Spark on Azure, a gentle introduction (nov 2015)
Spark on Azure, a gentle introduction (nov 2015)
Nathan Bijnens
 
Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17Cloudera, Azure and Big Data at Cloudera Meetup '17
Cloudera, Azure and Big Data at Cloudera Meetup '17
Nathan Bijnens
 
Microsoft AI at SAI '17
Microsoft AI at SAI '17Microsoft AI at SAI '17
Microsoft AI at SAI '17
Nathan Bijnens
 
Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16Microsoft Advanced Analytics @ Data Science Ghent '16
Microsoft Advanced Analytics @ Data Science Ghent '16
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.beA real-time architecture using Hadoop and Storm @ BigData.be
A real-time architecture using Hadoop and Storm @ BigData.be
Nathan Bijnens
 

Recently uploaded (20)

Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
Process Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBSProcess Mining and Official Statistics - CBS
Process Mining and Official Statistics - CBS
Process mining Evangelist
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
2024-Media-Literacy-Index-Of-Ukrainians-ENG-SHORT.pdf
OlhaTatokhina1
 
CS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docxCS-404 COA COURSE FILE JAN JUN 2025.docx
CS-404 COA COURSE FILE JAN JUN 2025.docx
nidarizvitit
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 

A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne '14)

  • 1. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 A real-time Lambda Architecture using Hadoop & Storm NoSQL Matters Cologne 2014 by Nathan Bijnens
  • 2. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Speaker Nathan Bijnens Big Data Engineer @ Virdata @nathan_gs
  • 3. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Computing Trends Past Computation (CPUs) Expensive Disk Storage Expensive Coordination Easy (Latches Don’t Often Hit) DRAM Expensive Computation Cheap (Many Core Computers) Disk Storage Cheap (Cheap Commodity Disks) Coordination Hard (Latches Stall a Lot, etc) DRAM / SSD Getting Cheap Current Source: Immutability Changes Everything - Pat Helland, RICON2012
  • 4. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Credits Nathan Marz ● Ex-Backtype & Twitter ● Startup in Stealthmode Creator of ● Storm ● Cascalog ● ElephantDB Coined the term Lambda Architecture. manning.com/marz
  • 5. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 a Data System
  • 6. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Not all information is equal. Some information is derived from other pieces of information. Data is more than Information
  • 7. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Eventually you will reach the most ‘raw’ form of information. This is the information you hold true, simply because it exists. Let’s call this ‘data’, very similar to ‘event’. Data is more than Information
  • 8. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Events used to manipulate the master data. Events: Before
  • 9. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Today, events are the master data. Events: After
  • 10. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Let’s store everything. Data System
  • 11. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Data is Immutable. Data System
  • 12. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Data is Time Based. Data System
  • 13. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Capturing change INSERT INTO contact (name, city) VALUES (‘Nathan’, ‘Antwerp’) UPDATE contact SET city = ‘Cologne’ WHERE name = ‘Nathan’ Traditionally
  • 14. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Capturing change INSERT INTO contact (name, city, timestamp) VALUES (‘Nathan’, ‘Antwerp’, 2008-10-11 20:00Z) INSERT INTO contact (name, city, timestamp) VALUES (‘Nathan’, ‘Cologne’, 2014-04-29 10:00Z) in a Data System
  • 15. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 The data you query is often transformed, aggregated, ... Rarely used in it’s original form. Query
  • 16. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Query = function ( all data ) Query
  • 17. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Query: Number of people living in each city Person City Timestamp Nathan Antwerp 2008-10-11 John Cologne 2010-01-23 Dirk Antwerp 2012-09-12 Nathan Cologne 2014-04-29 City Count Antwerp 1 Cologne 2
  • 18. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Query All Data QueryPrecomputed View
  • 19. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Layered Architecture Batch Layer Speed Layer Serving Layer
  • 20. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Layered Architecture Hadoop ElephantDB Incoming Data Cassandra Query
  • 21. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer
  • 22. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer Hadoop ElephantDB Incoming Data
  • 23. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer The batch layer can calculate anything, given enough time... Unrestrained computation.
  • 24. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 No need to De-Normalize. The batch layer stores the data normalized, the generated views are often, if not always denormalized. Batch Layer
  • 25. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Horizontally scalable. Batch Layer
  • 26. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 High Latency. Let’s for now pretend the update latency doesn’t matter. Batch Layer
  • 27. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Functional computation, based on immutable inputs, is idempotent. Batch Layer
  • 28. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Stores a master copy of the data set Batch Layer … append only
  • 29. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer
  • 30. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch: view generation Master Dataset View #1 View #3 View #2 MapReduce MapReduce MapReduce
  • 31. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 MapReduce 1. Take a large data set and divide it into subsets 2. Perform the same function on all subsets 3. Combine the output from all subsets … … Output DoWork() DoWork() DoWork() … MAPREDUCE
  • 32. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 MapReduce
  • 33. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Serialization & Schema Catch errors as quickly as they happen. Validate on write vs on read. Catch errors as quickly as they happen. Validate on write vs on read.
  • 34. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 CSV is actually a serialization language that is just poorly defined. Serialization & Schema
  • 35. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Use a format with a schema ● Thrift ● Avro ● Protocolbuffers Could be combined with Parquet. Added bonus: it’s faster and uses less space. Serialization & Schema
  • 36. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch View Database No random writes required. Read Only database
  • 37. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Every iteration produces the views from scratch. Batch View Database
  • 38. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Pure Lambda databases ● ElephantDB ● SploutSQL Databases with a batch load & read only views ● Voldemort Other databases that could be used ● ElasticSearch/Solr: generate the lucene indexes using MapReduce ● Cassandra: generate sstables ● ... Batch View Databases
  • 39. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer Without the associated complexities. Eventually consistent
  • 40. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Batch Layer Data absorbed into Batch Views Time Now We are not done yet… Not yet absorbed. Just a few hours of data.
  • 41. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Speed Layer
  • 42. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Speed Layer Hadoop ElephantDB Incoming Data Cassandra
  • 43. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Stream processing. Speed Layer
  • 44. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Continuous computation. Speed Layer
  • 45. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storing a limited window of data. Compensating for the last few hours of data. Speed Layer
  • 46. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 All the complexity is isolated in the Speed Layer. If anything goes wrong, it’s auto-corrected. Speed Layer
  • 47. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 You have a choice between: ● Availability ○ Queries are eventual consistent ● Consistency ○ Queries are consistent CAP
  • 48. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Eventual accuracy Some algorithms are hard to implement in real-time. For those cases we could estimate the results.
  • 49. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storm Speed Layer
  • 50. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Message passing Storm
  • 51. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Distributed processing Storm
  • 52. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Horizontally scalable. Storm
  • 53. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Incremental algorithms Storm
  • 54. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Fast. Storm
  • 55. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storm
  • 56. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storm Tuple Stream
  • 57. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storm Spout Bolt
  • 58. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Storm Grouping
  • 59. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Data Ingestion Queues & Pub/Sub models are a natural fit.
  • 60. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 ● Kafka ● Flume ● Scribe ● *MQ ● … Data Ingestion
  • 61. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Speed Layer Views The views need to be stored in a random writable database.
  • 62. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 The logic behind a R/W database is much more complex than a read-only view. Speed Layer Views
  • 63. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 The views are stored in a Read & Write database. ● Cassandra ● Hbase ● Redis ● SQL ● ElasticSearch ● ... Speed Layer Views
  • 64. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Serving Layer
  • 65. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Serving Layer Hadoop ElephantDB Incoming Data Cassandra Query
  • 66. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Serving Layer Random reads.
  • 67. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 This layer queries the batch & real-time views and merges it. Serving layer
  • 68. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 How to query an Average? Serving Layer
  • 69. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Side note: CQRS
  • 70. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 CQRS Source: martinfowler.com/bliki/CQRS.html - Martin Fowler
  • 71. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 CQRS & Event Sourcing Event Sourcing ● Every command is a new event. ● The event store keeps all events, new events are appended. ● Any query loops through all related events, even to produce an aggregate. source: CQRS Journey - Microsoft Patterns & Practices
  • 72. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Lambda Architecture
  • 73. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Lambda Architecture The Lambda Architecture can discard any view, batch and real-time, and just recreate everything from the master data.
  • 74. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Mistakes are corrected via recomputation. Write bad data? Remove the data & recompute. Bug in view generation? Just recompute the view. Lambda Architecture
  • 75. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Data storage is highly optimized. Lambda Architecture
  • 76. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Immutability changes everything. Lambda Architecture
  • 77. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Questions? @nathan_gs #nosql14 [email protected] / slideshare.net/nathan_gs lambda-architecture.net / @LambdaArch / #LambdaArch
  • 78. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Virdata is the cross-industry cloud service/platform for the Internet of Things. Designed to elastically scale to monitor and manage an unprecedented amount of devices and applications using concurrent persistent connections, Virdata opens the door to numerous new business opportunities. Virdata combines Publish-Subscribe based Distributed Messaging, Complex Event Processing and state-of-the-art Big Data paradigms to enable both historical & real-time monitoring and near real-time analytics with a scale required for the Internet of Things.
  • 79. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Acknowledgements I would like to thank Nathan Marz for writing a very insightful book, where most of the ideas in this presentation come from. Parts of this presentation has been created while working for datacrunchers.eu, I thank them for the opportunities to speak about the Lambda Architecture both at clients and at conferences. DataCrunchers is the first Big Data agency in Belgium. Schema’s & Pictures: Computing Trends: Immutability Changes Everything - Pat Helland, RICON2012 MapReduce #1: PolybasePass2012.pptx - David J. DeWitt, Microsoft Gray Systems Lab MapReduce #2: Introduction to MapReduce and Hadoop - Shivnath Babu, Duke CQRS: martinfowler.com/bliki/CQRS.html - Martin Fowler CQRS & Event Sourcing: CQRS Journey - Adam Dymitruk, Josh Elster & Mark Seemann, Microsoft Patterns & Practices
  • 80. NoSQL Matter 2014 - A real-time (Lambda) Architecture using Hadoop & Storm - #nosql14 Thank you @nathan_gs [email protected]