SlideShare a Scribd company logo
Big Data in Practice: A Pragmatic approach to Adoption and Value creation 
Raj Nair 
Data Management Practitioner and Consultant 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
1 
Mainstream Big Data 
2 
Real World Use Cases and Applications 
3 
Practical Adoption : Opportunity Identification 
4 
Big Data 2.0 – What’s on the Horizon ? 
5 
Conclusion 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Every Day Big Data 
Reaching scale-up limits on your server 
Represents tools, technologies, frameworks for storage and processing at scale 
Represents Opportunity 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Every Day Big Data 
Reaching scale-up limits on your server 
Represents tools, technologies, frameworks for storage and processing at scale 
Represents Opportunity 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Every Day Big Data 
Reaching scale-up limits on your server 
Represents tools, technologies, frameworks for storage and processing at scale 
Represents Opportunity 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Big Data 1.0 – The Hadoop Ecosystem 
Software library 
Framework for large scale distributed processing 
Ability to scale to thousands of computers 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Design Principles 
-Large Data Sets 
Classic Hadoop MapReduce – Batch Processing 
-Moving computation is cheaper than moving data 
-Hardware Failure, redundancy 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
What is this you call data? 
Unlearn current notion of “Data” 
Native Data Source 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
HDFS Storage and Archival 
MapReduce Programming Library 
Crunch 
Data Pipeline processing 
HBase 
Real time access (low latency) 
Pig 
M/R Abstraction 
Hive 
Data Warehouse 
Sqoop Data Transfer 
Flume 
Data Streaming 
(High Latency) 
Data Processing 
Workload Management 
Data Movement 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Purpose 
Use it for 
HDFS 
Distributed Storage 
Raw data storage and archival 
Flume 
Data Movement 
Continuous Streaming into HDFS 
Sqoop 
Data Movement 
Data transfer from RDBMS to HDFS/HBase 
HBase 
Workload Mgmt 
Near real-time read/write access to large data sets 
Hive 
Workload Mgmt 
Analytical queries; data warehouse 
Map Reduce 
Data Processing 
Low level custom code for data processing 
Crunch 
Data Processing (Java) 
Coding M/R pipelines, aggregations 
Pig 
Data Processing 
Scripting language; similar to Crunch 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
A Powerful Paradigm 
Storage Layer 
Query Engine 
Processing Engine 
Metadata 
Hadoop – Separate Layers 
Multiple Query Engines 
Data in Native format 
Oracle 
SQL Server 
Storage 
Query 
Storage 
Query 
Storage 
Query 
DB2 
Tightly integrated Proprietary Stacks, cannot free your data 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
1 
Mainstream Big Data 
2 
Real World Use Cases and Applications 
3 
Practical Adoption : Opportunity Identification 
4 
Big Data 2.0 – What’s on the Horizon ? 
5 
Conclusion 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Opportunity… 
Transform Data Processing 
Exploration 
Information Enrichment 
Data Archival 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Processing Pipeline 
Several sources 
Varying Frequencies 
Varying Formats 
Quality check 
Validations, Scrubbing 
Transformations/Rules 
Prune app data sources 
Discard/Archive 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
ETL Engine 
Data Warehouse 
Data Storage 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
From Source to Business Value 
Shoe-horning 
Relational fit 
Loading 
Archiving / Purging 
Biz Rules 
Validations 
Scrubbing 
Mapping 
Transforms 
Staging 
Distribution 
Prep Tuning 
Data stores 
Minutes/Hours 
Subset of Data 
Hours 
Reliability 
Sourcing 
Missed SLAs = Biz Frustration 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
From Source to Business Value 
Significantly more data sources 
Highly scalable, significantly performant data processing 
New business value, 
Faster time to value 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Exploration 
Large reservoir of data 
Descriptive Statistics 
Central Tendencies 
Dispersion 
Visualization 
Surprise Me! 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Exploration 
Courtesy: Data Science Central 
http://www.datasciencecentral.com/profiles/blogs/r-hadoop-data-analytics-heaven 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Information Enrichment 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Information Enrichment 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Archival 
Recycle Policy 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Archival 
Storage in Native Format 
Redundancy , Replication 
Easily accessible, inexpensive 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
1 
Mainstream Big Data 
2 
Real World Use Cases and Applications 
3 
Practical Adoption : Opportunity Identification 
4 
Big Data 2.0 – What’s on the Horizon ? 
5 
Conclusion 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Practical Adoption 
Big Data Technologies don’t solve all problems 
Leveraging existing investments 
Complexities of existing systems 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Proof of Concept 
Use your own data – realistic results 
Focus on very specific pain points 
Know what you are going to measure 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Processing 
Engine 
Data Warehouse 
Data Storage 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Processing 
Engine 
Data Warehouse 
Data Storage 
Keep all your raw data 
Cheaper Hardware 
Low cost per byte $$ 
High value per byte 
Offload from RDBMS 
Improve scale, performance 
Leverage existing tools 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Hardware on a budget 
Master: 
- 12 cores 
- 32 GB RAM 
- 2 TB SATA Drives, 7.2K RPM 
Workers: 
- 4 Nodes 
- 12 cores 
- 16 GB RAM 
- 4 TB SATA Drives each, 7.2 PRM 
$5000 
$5000 each 
4-Port 10 Gig Switch - $1500 
Grand Total < $30,000 
Software costs ? - 0 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Exploratory BI / Analysis 
Data Storage 
Makes Data exploration practically cheaper and faster 
Use existing visualization tools (Tableau or other) 
Check for integration with R 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Data Architecture 
•Single Important factor 
•Don’t miss technology trends 
But …. 
It’s more about the battle plan 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
1 
Mainstream Big Data 
2 
Real World Use Cases and Applications 
3 
Practical Adoption : Opportunity Identification 
4 
Big Data 2.0 – What’s on the Horizon ? 
5 
Conclusion 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
SQL on Hadoop 
Impala 
Tez 
Phoenix 
•Cloudera 
•MPP Engine 
•HortonWorks 
•SQL on Hive 
•Apache 
•SQL on HBase 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
In memory and Real Time 
Spark 
Storm 
Apache Drill 
•100x faster than M/R 
•Event processing 
•Low latency ad hoc queries 
•Interactive queries at scale 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
1 
Mainstream Big Data 
2 
Real World Use Cases and Applications 
3 
Practical Adoption : Opportunity Identification 
4 
Big Data 2.0 – What’s on the Horizon ? 
5 
Conclusion 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Where can I get Hadoop? 
Distributors 
Open Source Apache Project 
And these guys… 
Cloud 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Conclusion 
The Power & Paradigm of Distributed Computing 
“Nativity” of Data – Unlearn old notions 
Identify, understand your data processing pipeline 
POC with a measurable, specific use case 
Data Architecture – key to sustainable scalability 
Stay informed 
Content NOT FOR DISTRIBUTION: Property of Raj Nair
Ad

More Related Content

What's hot (20)

Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
Imply
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
hadooparchbook
 
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
 
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
DataWorks Summit/Hadoop Summit
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
Jonathan Seidman
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
hadooparchbook
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
DataWorks Summit
 
About CDAP
About CDAPAbout CDAP
About CDAP
Cask Data
 
Riak TS
Riak TSRiak TS
Riak TS
clive boulton
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
Imply
 
Hadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an exampleHadoop application architectures - using Customer 360 as an example
Hadoop application architectures - using Customer 360 as an example
hadooparchbook
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
Architecting a next generation data platform
Architecting a next generation data platformArchitecting a next generation data platform
Architecting a next generation data platform
hadooparchbook
 
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
DataWorks Summit
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
Jonathan Seidman
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
StampedeCon
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
Inside Analysis
 
Architecting next generation big data platform
Architecting next generation big data platformArchitecting next generation big data platform
Architecting next generation big data platform
hadooparchbook
 

Viewers also liked (13)

Детермінанти конкуренції: дефініції та взаємозв'язок
Детермінанти конкуренції: дефініції та взаємозв'язокДетермінанти конкуренції: дефініції та взаємозв'язок
Детермінанти конкуренції: дефініції та взаємозв'язок
Oleksandra Shpytiak
 
Bo p 200502
Bo p 200502Bo p 200502
Bo p 200502
Amit Kumar Tiwari
 
Moral tahun 4
Moral tahun 4Moral tahun 4
Moral tahun 4
Mohd Redzwan Mohd Zakaria
 
Основи обліку
Основи облікуОснови обліку
Основи обліку
Oleksandra Shpytiak
 
World religions
World religionsWorld religions
World religions
Jyggalag
 
Smal retail in the conditions of recession
Smal retail in the conditions of recessionSmal retail in the conditions of recession
Smal retail in the conditions of recession
Oleksandra Shpytiak
 
O sistema de crédito imobiliário brasileiro
O sistema de crédito imobiliário brasileiroO sistema de crédito imobiliário brasileiro
O sistema de crédito imobiliário brasileiro
Mário Januário Filho
 
Bid
BidBid
Bid
kuldeep_rawat2012
 
How to herd cat statues and make awesome things
How to herd cat statues and make awesome thingsHow to herd cat statues and make awesome things
How to herd cat statues and make awesome things
meldra
 
Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)
Bella_SI
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)
Bella_SI
 
Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Grow your own pesticide free food - Urban Hydroponics (soilless culture)Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Arvind Narayanan
 
Детермінанти конкуренції: дефініції та взаємозв'язок
Детермінанти конкуренції: дефініції та взаємозв'язокДетермінанти конкуренції: дефініції та взаємозв'язок
Детермінанти конкуренції: дефініції та взаємозв'язок
Oleksandra Shpytiak
 
World religions
World religionsWorld religions
World religions
Jyggalag
 
Smal retail in the conditions of recession
Smal retail in the conditions of recessionSmal retail in the conditions of recession
Smal retail in the conditions of recession
Oleksandra Shpytiak
 
O sistema de crédito imobiliário brasileiro
O sistema de crédito imobiliário brasileiroO sistema de crédito imobiliário brasileiro
O sistema de crédito imobiliário brasileiro
Mário Januário Filho
 
How to herd cat statues and make awesome things
How to herd cat statues and make awesome thingsHow to herd cat statues and make awesome things
How to herd cat statues and make awesome things
meldra
 
Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)
Bella_SI
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)Merry riana(the woman who inspire me)
Merry riana(the woman who inspire me)
Bella_SI
 
Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Grow your own pesticide free food - Urban Hydroponics (soilless culture)Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Grow your own pesticide free food - Urban Hydroponics (soilless culture)
Arvind Narayanan
 
Ad

Similar to The practice of big data - making big data approachable (20)

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
avanttic Consultoría Tecnológica
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
Dataconomy Media
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Scaling Data overview
Scaling Data overviewScaling Data overview
Scaling Data overview
Wade Malone
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
Arcadia Data
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
Kiran Kamreddy
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
Emil Andreas Siemes
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
solarisyougood
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Mark Rittman
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Matt Stubbs
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
Arcadia Data
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
Inside Analysis
 
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio..."Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovatio...
Dataconomy Media
 
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
avanttic Consultoría Tecnológica
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Inside Analysis
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
Data Con LA
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
Hitachi Vantara
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Scaling Data overview
Scaling Data overviewScaling Data overview
Scaling Data overview
Wade Malone
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
Arcadia Data
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
Kiran Kamreddy
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
solarisyougood
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Mark Rittman
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Matt Stubbs
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
Arcadia Data
 
Ad

Recently uploaded (20)

AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

The practice of big data - making big data approachable

  • 1. Big Data in Practice: A Pragmatic approach to Adoption and Value creation Raj Nair Data Management Practitioner and Consultant Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 2. 1 Mainstream Big Data 2 Real World Use Cases and Applications 3 Practical Adoption : Opportunity Identification 4 Big Data 2.0 – What’s on the Horizon ? 5 Conclusion Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 3. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 4. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 5. Every Day Big Data Reaching scale-up limits on your server Represents tools, technologies, frameworks for storage and processing at scale Represents Opportunity Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 6. Big Data 1.0 – The Hadoop Ecosystem Software library Framework for large scale distributed processing Ability to scale to thousands of computers Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 7. Design Principles -Large Data Sets Classic Hadoop MapReduce – Batch Processing -Moving computation is cheaper than moving data -Hardware Failure, redundancy Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 8. What is this you call data? Unlearn current notion of “Data” Native Data Source Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 9. HDFS Storage and Archival MapReduce Programming Library Crunch Data Pipeline processing HBase Real time access (low latency) Pig M/R Abstraction Hive Data Warehouse Sqoop Data Transfer Flume Data Streaming (High Latency) Data Processing Workload Management Data Movement Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 10. Purpose Use it for HDFS Distributed Storage Raw data storage and archival Flume Data Movement Continuous Streaming into HDFS Sqoop Data Movement Data transfer from RDBMS to HDFS/HBase HBase Workload Mgmt Near real-time read/write access to large data sets Hive Workload Mgmt Analytical queries; data warehouse Map Reduce Data Processing Low level custom code for data processing Crunch Data Processing (Java) Coding M/R pipelines, aggregations Pig Data Processing Scripting language; similar to Crunch Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 11. A Powerful Paradigm Storage Layer Query Engine Processing Engine Metadata Hadoop – Separate Layers Multiple Query Engines Data in Native format Oracle SQL Server Storage Query Storage Query Storage Query DB2 Tightly integrated Proprietary Stacks, cannot free your data Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 12. 1 Mainstream Big Data 2 Real World Use Cases and Applications 3 Practical Adoption : Opportunity Identification 4 Big Data 2.0 – What’s on the Horizon ? 5 Conclusion Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 13. Opportunity… Transform Data Processing Exploration Information Enrichment Data Archival Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 14. Data Processing Pipeline Several sources Varying Frequencies Varying Formats Quality check Validations, Scrubbing Transformations/Rules Prune app data sources Discard/Archive Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 15. ETL Engine Data Warehouse Data Storage Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 16. From Source to Business Value Shoe-horning Relational fit Loading Archiving / Purging Biz Rules Validations Scrubbing Mapping Transforms Staging Distribution Prep Tuning Data stores Minutes/Hours Subset of Data Hours Reliability Sourcing Missed SLAs = Biz Frustration Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 17. From Source to Business Value Significantly more data sources Highly scalable, significantly performant data processing New business value, Faster time to value Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 18. Data Exploration Large reservoir of data Descriptive Statistics Central Tendencies Dispersion Visualization Surprise Me! Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 19. Data Exploration Courtesy: Data Science Central http://www.datasciencecentral.com/profiles/blogs/r-hadoop-data-analytics-heaven Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 20. Information Enrichment Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 21. Information Enrichment Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 22. Data Archival Recycle Policy Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 23. Data Archival Storage in Native Format Redundancy , Replication Easily accessible, inexpensive Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 24. 1 Mainstream Big Data 2 Real World Use Cases and Applications 3 Practical Adoption : Opportunity Identification 4 Big Data 2.0 – What’s on the Horizon ? 5 Conclusion Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 25. Practical Adoption Big Data Technologies don’t solve all problems Leveraging existing investments Complexities of existing systems Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 26. Proof of Concept Use your own data – realistic results Focus on very specific pain points Know what you are going to measure Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 27. Data Processing Engine Data Warehouse Data Storage Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 28. Data Processing Engine Data Warehouse Data Storage Keep all your raw data Cheaper Hardware Low cost per byte $$ High value per byte Offload from RDBMS Improve scale, performance Leverage existing tools Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 29. Hardware on a budget Master: - 12 cores - 32 GB RAM - 2 TB SATA Drives, 7.2K RPM Workers: - 4 Nodes - 12 cores - 16 GB RAM - 4 TB SATA Drives each, 7.2 PRM $5000 $5000 each 4-Port 10 Gig Switch - $1500 Grand Total < $30,000 Software costs ? - 0 Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 30. Exploratory BI / Analysis Data Storage Makes Data exploration practically cheaper and faster Use existing visualization tools (Tableau or other) Check for integration with R Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 31. Data Architecture •Single Important factor •Don’t miss technology trends But …. It’s more about the battle plan Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 32. 1 Mainstream Big Data 2 Real World Use Cases and Applications 3 Practical Adoption : Opportunity Identification 4 Big Data 2.0 – What’s on the Horizon ? 5 Conclusion Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 33. SQL on Hadoop Impala Tez Phoenix •Cloudera •MPP Engine •HortonWorks •SQL on Hive •Apache •SQL on HBase Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 34. In memory and Real Time Spark Storm Apache Drill •100x faster than M/R •Event processing •Low latency ad hoc queries •Interactive queries at scale Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 35. 1 Mainstream Big Data 2 Real World Use Cases and Applications 3 Practical Adoption : Opportunity Identification 4 Big Data 2.0 – What’s on the Horizon ? 5 Conclusion Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 36. Where can I get Hadoop? Distributors Open Source Apache Project And these guys… Cloud Content NOT FOR DISTRIBUTION: Property of Raj Nair
  • 37. Conclusion The Power & Paradigm of Distributed Computing “Nativity” of Data – Unlearn old notions Identify, understand your data processing pipeline POC with a measurable, specific use case Data Architecture – key to sustainable scalability Stay informed Content NOT FOR DISTRIBUTION: Property of Raj Nair