SlideShare a Scribd company logo
Architectural evolution starting from
Hadoop
Monica Franceschini
Solution Architecture Manager
Big Data Competency Center
Engineering Group
Experiences
ENERGY
Predictive analysis using
geo-spatial sensors data
FINANCE
Big Data architecture for
advanced CRM
Measure of energy
consumption for 15M
users
P.A.
Energy
HDFS
Kafka
Hbase
Spark
Flume
Phoenix
Hadoop Technologies
Externalsystems
JMS
FS
flume
HDFS
kafkaHBase KAFKA
Spark Spark
streaming
Phoenix
Web apps
RDBMS
sqoop
Finance
NFS
Hbase
Spark
Phoenix
Hadoop Technologies
Externalsystems
NFS
HBase
Spark
Phoenix
Web apps
HDFS
P.A.
HDFS
Hbase
Spark
Spark MLlib
Flume
Phoenix
Hadoop Technologies
Externalsystems
JMS
flume
HDFS
HBase
Spark
Phoenix
Web apps
Spark
MLlib
Data:
Lots of small files (data
coming from sensors
or stuctured/semi-
structured data)
Ingestion:
Fast data
Event driven
Near real-time
Storage:
Update single records
Considerations
Similar scenarios:
Flume, HBase & Spark
Online
performances
HBase instead of
HDFS
Similar data
High
throughput
Moreover…
• Adoption of a well-established solution
• Availability of support services
• Community, open source or … free version!
Hadoop storage
HBaseHDFS
Large data sets
Unstructured data
Write-once-read-many access
Append-only file system
Hive HQL access
High-speed writes
and scans
Fault-tolerant
Replication
Many rows/columns
Compaction
Random read-writes
Updates
Rowkey access
Data modeling
NoSQL
Untyped data
Sparse schema
High throughput
Variable columns
The solution:
HBase
Random read-writes
Updates
Compaction
Granular data
STORAGE
Problem:
Some Hbase features:
• Just one index or primary key
• Rowkey composed by other fields
• Big denormalized tables
• Horizontal partitioning rowkey-based
• Focus on the rowkey design and table schema (data modeling)
• The ACCESS PATTERN must be known in advance!
Warning!!!
Using HBase as RDBMS
doesn’t work at all!!!
What’s missed?
• SQL language
• Analytic queries
• Secondary index
Performances
for online
applications
Solutions:
• Phoenix is fast: Full table scan of 100M rows usually executed in 20 seconds (narrow
table on a medium sized cluster). This time comes down to few milliseconds if query
contains filter on key columns.
• Phoenix follows the philosophy of bringing the computation to the data by using:
• coprocessors to perform operations on the server-side thus minimizing client/server
data transfer
• custom filters to prune data as close to the source as possible. In addition, Phoenix
uses native Hbase to minimize any startup costs.
Query chunks: Phoenix chunks up your query using the region boundaries and runs them in
parallel on the client using a configurable number of threads.
The aggregation will be done in a coprocessor on the server-side
• OLTP
• Analytic queries
• Hbase specific
• A lightweight solution
• Who else is going to use it?
• Query engine + metadata store + JDBC driver
• Database over HDFS (for bulk loads and full-table scans
queries)
• HBase APIs (not accessing Hfiles directly)
• …what about performances?…
Query: select count(1) from table over 1M and 5M
rows. Data is 3 narrow columns. Number of Region
Server: 1 (Virtual Machine, HBase heap: 2GB,
Processor: 2 cores @ 3.3GHz Xeon)
• Query engine + metadata store + JDBC driver
• DWH over HDFS
• Runs MapReduce jobs to query HBase
• StorageHanlder to read HBase
• …what about performances?…
Query: select count(1) from table over 10M and
100M rows. Data is 5 narrow columns. Number
of Region Servers: 4 (HBase heap: 10GB,
Processor: 6 cores @ 3.3GHz Xeon)
• Cassandra + Spark as lightweight solution (replacing Hbase+
Spark)
• SQL-like language (CQL) + secondary indexes
• …what about the other Hadoop tools?...
• Converged data platform: batch+NoSQL+streaming
• MapR-FS: great for throughput and files of every size +
singolar updates
• Apache Drill as SQL-layer on Mapr-FS
• …proprietary solution…
• Developed by Cloudera is Open Source (->integrated with
Hadoop Ecosystem)
• Low-latency random access
• Super-fast Columnar Storage
• Designed for Next-Generation Hardware (storage based on IO
of solid state drives + experimental cache implementation)
• …beta version…
With Kudu, Cloudera promises to solve Hadoop's infamous
storage problem
InfoWorld | Sep 28, 2015
HBaseHDFS
Hadoop storage
highly scalable in-memory
database per MPP workloads
Fast writes, fast updates,
fast reads, fast everything
Structured data
SQL+scan use cases
Unstructured data
Deep storage
Fixed column schema
SQL+scan use cases
Any type column
schema
Gets/puts/micro
scans
Conclusions
• One size doesn’t fit all the different
requirements
• The choice between different Open
Source solutions is driven by the
context
• Technology evolves
• So what?
• REQUIREMENTS
• NO LOCK-IN
• PEER-REVIEWS
Thank you!
Monica Franceschini
Twitter  @twittmonique
Linkedin  mfranceschini
Skype  monica_franceschini
Email  monica.franceschini@eng.it

More Related Content

What's hot (20)

PPTX
Apache drill
Jakub Pieprzyk
 
PPTX
Real Time and Big Data – It’s About Time
DataWorks Summit
 
PDF
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
PPTX
Apache drill
MapR Technologies
 
PPTX
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
PPTX
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
 
PDF
Facebook - Jonthan Gray - Hadoop World 2010
Cloudera, Inc.
 
PDF
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
PPTX
Apache Spark on Apache HBase: Current and Future
HBaseCon
 
PPTX
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
PPTX
SQL-on-Hadoop with Apache Drill
MapR Technologies
 
PDF
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
PPTX
Apache Drill
Ted Dunning
 
PPTX
Hadoop and HBase @eBay
DataWorks Summit
 
PDF
Philly DB MapR Overview
MapR Technologies
 
PPTX
Rethinking SQL for Big Data with Apache Drill
MapR Technologies
 
PPTX
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
PPTX
Drilling into Data with Apache Drill
MapR Technologies
 
PPTX
MapR 5.2 Product Update
MapR Technologies
 
PPTX
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 
Apache drill
Jakub Pieprzyk
 
Real Time and Big Data – It’s About Time
DataWorks Summit
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Apache drill
MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR Technologies
 
NoSQL Application Development with JSON and MapR-DB
MapR Technologies
 
Facebook - Jonthan Gray - Hadoop World 2010
Cloudera, Inc.
 
Apache Drill @ PJUG, Jan 15, 2013
Gera Shegalov
 
Apache Spark on Apache HBase: Current and Future
HBaseCon
 
MapR-DB – The First In-Hadoop Document Database
MapR Technologies
 
SQL-on-Hadoop with Apache Drill
MapR Technologies
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Apache Drill
Ted Dunning
 
Hadoop and HBase @eBay
DataWorks Summit
 
Philly DB MapR Overview
MapR Technologies
 
Rethinking SQL for Big Data with Apache Drill
MapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Drilling into Data with Apache Drill
MapR Technologies
 
MapR 5.2 Product Update
MapR Technologies
 
A Non-Standard use Case of Hadoop: High Scale Image Processing and Analytics
DataWorks Summit
 

Viewers also liked (20)

ODP
Starting with SpagoBI Slide Support
SpagoWorld
 
ODP
My First Report slide support
SpagoWorld
 
ODP
Parametric report slide support
SpagoWorld
 
ODP
My First Worksheet slide support
SpagoWorld
 
ODP
SpagoBI Suite Slide Support
SpagoWorld
 
PDF
Data Mining with SpagoBI suite
SpagoWorld
 
PDF
Openness as the Engine for Digital Innovation
SpagoWorld
 
PDF
How Data Quality Impacts The Entire Lead LifeCycle
G3 Communications
 
PDF
Webinar - What's new with SpagoBI 5: presentation and demo
SpagoWorld
 
PDF
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
SpagoWorld
 
PDF
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
SpagoWorld
 
PDF
Webinar: SpagoBI Suite
SpagoWorld
 
PPTX
The LightConnectTM Fabric V-POD Data Center Architecture
CALIENT Technologies
 
PPTX
HTRC Architecture Overview
Robert H. McDonald
 
PPTX
Data-center SDN
Николай Кожанов
 
PDF
Presentation data center and cloud architecture
xKinAnx
 
PPTX
3D IT Architecture - Data Center
Paul Brink
 
PPTX
Cloud Architecture in the Data Center
InterVision Systems
 
PDF
Data-Ed Webinar: Data Architecture Requirements
DATAVERSITY
 
PDF
EUDAT data architecture and interoperability aspects – Daan Broeder
OpenAIRE
 
Starting with SpagoBI Slide Support
SpagoWorld
 
My First Report slide support
SpagoWorld
 
Parametric report slide support
SpagoWorld
 
My First Worksheet slide support
SpagoWorld
 
SpagoBI Suite Slide Support
SpagoWorld
 
Data Mining with SpagoBI suite
SpagoWorld
 
Openness as the Engine for Digital Innovation
SpagoWorld
 
How Data Quality Impacts The Entire Lead LifeCycle
G3 Communications
 
Webinar - What's new with SpagoBI 5: presentation and demo
SpagoWorld
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
SpagoWorld
 
Open Opportunity Meeting 2012: SpagoBI use cases - The open source Business I...
SpagoWorld
 
Webinar: SpagoBI Suite
SpagoWorld
 
The LightConnectTM Fabric V-POD Data Center Architecture
CALIENT Technologies
 
HTRC Architecture Overview
Robert H. McDonald
 
Presentation data center and cloud architecture
xKinAnx
 
3D IT Architecture - Data Center
Paul Brink
 
Cloud Architecture in the Data Center
InterVision Systems
 
Data-Ed Webinar: Data Architecture Requirements
DATAVERSITY
 
EUDAT data architecture and interoperability aspects – Daan Broeder
OpenAIRE
 
Ad

Similar to Architectural Evolution Starting from Hadoop (20)

PPTX
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
PPTX
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
PDF
StreamHorizon and bigdata overview
StreamHorizon
 
PPTX
Introduction to HDFS and MapReduce
Derek Chen
 
PDF
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
PDF
How can Hadoop & SAP be integrated
Douglas Bernardini
 
PPTX
Introduction to Hadoop and Big Data
Joe Alex
 
PPTX
Stream processing on mobile networks
pbelko82
 
PPTX
Impala presentation
trihug
 
PPTX
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
PPT
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
PPTX
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
PDF
Hadoop distributed computing framework for big data
Cyanny LIANG
 
PDF
Hbase mhug 2015
Joseph Niemiec
 
PPTX
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
 
PPTX
Hadoop_arunam_ppt
jerrin joseph
 
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
PPTX
Cloudera Hadoop Distribution
Thisara Pramuditha
 
PPTX
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Impala for PhillyDB Meetup
Shravan (Sean) Pabba
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
Michael Stack
 
StreamHorizon and bigdata overview
StreamHorizon
 
Introduction to HDFS and MapReduce
Derek Chen
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
Michael Stack
 
How can Hadoop & SAP be integrated
Douglas Bernardini
 
Introduction to Hadoop and Big Data
Joe Alex
 
Stream processing on mobile networks
pbelko82
 
Impala presentation
trihug
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
Claudiu Barbura
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Hortonworks
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Hadoop distributed computing framework for big data
Cyanny LIANG
 
Hbase mhug 2015
Joseph Niemiec
 
Cloudera Impala: A Modern SQL Engine for Hadoop
Cloudera, Inc.
 
Hadoop_arunam_ppt
jerrin joseph
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Cloudera Hadoop Distribution
Thisara Pramuditha
 
Big data processing engines, Atlanta Meetup 4/30
Ashish Narasimham
 
Ad

More from SpagoWorld (19)

PDF
[SFScon'17] More than a decade with free open source software
SpagoWorld
 
PDF
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
SpagoWorld
 
PDF
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
SpagoWorld
 
PDF
Webinar - SpagoBI 5: here comes the Social Network analysis
SpagoWorld
 
PDF
SpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoWorld
 
PDF
SpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoWorld
 
PDF
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
SpagoWorld
 
PDF
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
SpagoWorld
 
PDF
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
SpagoWorld
 
PDF
Simpda 2014 - A living story: measuring quality of developments in a large in...
SpagoWorld
 
PDF
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
SpagoWorld
 
PDF
SpagoBI 5 official presentation in Paris
SpagoWorld
 
PDF
Balanced Measurement Sets - Criteria for Improving Project Management Practices
SpagoWorld
 
PDF
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
SpagoWorld
 
PDF
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
SpagoWorld
 
PDF
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5
SpagoWorld
 
PDF
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
SpagoWorld
 
PDF
The Business Intelligence SpagoBI suite and Big Data
SpagoWorld
 
PDF
Open Source, a business model based on collaboration
SpagoWorld
 
[SFScon'17] More than a decade with free open source software
SpagoWorld
 
EclipseDay Milano 2017 - How to make Data Science appealing with open source ...
SpagoWorld
 
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
SpagoWorld
 
Webinar - SpagoBI 5: here comes the Social Network analysis
SpagoWorld
 
SpagoBI 5 Demo Day and Workshop : Business Applications and Uses
SpagoWorld
 
SpagoBI 5 Demo Day and Workshop : Technology Applications and Uses
SpagoWorld
 
Engineering and OW2 Big Data Initiative: an open approach to the data-driven ...
SpagoWorld
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
SpagoWorld
 
OW2Con’14 – OW2 Big Data initiative: leveraging the data-driven economy with ...
SpagoWorld
 
Simpda 2014 - A living story: measuring quality of developments in a large in...
SpagoWorld
 
DrupalDay 2014 - Ecology of value and DRUPAL@Engineering: the experience of a...
SpagoWorld
 
SpagoBI 5 official presentation in Paris
SpagoWorld
 
Balanced Measurement Sets - Criteria for Improving Project Management Practices
SpagoWorld
 
Webinar - How SpagoBI 5 faces Big Data challenges to generate new business op...
SpagoWorld
 
Webinar - SpagoBI 5 and what-if analytics: is your business strategy effective?
SpagoWorld
 
Webinar - Self-build your cockpits and gain instant insights with SpagoBI 5
SpagoWorld
 
Webinar - What's new in SpagoBI 5: advanced data analytics at your fingertips
SpagoWorld
 
The Business Intelligence SpagoBI suite and Big Data
SpagoWorld
 
Open Source, a business model based on collaboration
SpagoWorld
 

Recently uploaded (20)

PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the top
chinnusindhu1
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Blitz Campinas - Dia 24 de maio - Piettro.pdf
fabigreek
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
short term internship project on Data visualization
JMJCollegeComputerde
 

Architectural Evolution Starting from Hadoop