SlideShare a Scribd company logo
Big Data 
Overview – Part 1 
Wm. Barrett Simms 
barrett@wbsimms.com 
@wbsimms
Opening remarks 
• Sponsors 
• Pluralsight 
• Free month gift card give away. Enter your name in the pot! 
• DevExpress 
• $250 in developer JustCode tools. 
• O’Reilly 
• Book give away. Enter your name in the pot! 
• Boston Code Camp 22 (November 22nd) 
• http://www.bostoncodecamp.com/ 
• Thanks to 3thought for the space
About Me 
Software 
Developer 
Agile Team 
Member 
Team Lead 
Agile 
Advocate 
SDLC 
Implementer
SDLC
Big Data 
“Big data is an all-encompassing term for any collection of data sets so 
large and complex that it becomes difficult to process using traditional 
data processing applications.” 
- Wikipedia
The 3 Vs 
• Volume 
• A few Gigabytes -> Petabyte 
• Velocity 
• Arrives quickly 
• Variety 
• Multiple Sources
Volume 
• Traditional SQL architectures don’t scale to very large 
• Maybe this isn’t so true 
…but the MMP systems are expensive
An example problem (Volume) 
• You own a chain of stores 
• … with 25,000 stores and 100,000 POS systems 
• Need information on inventory changes 
• By region 
• By store
Velocity 
• Traditional solutions don’t handle fast inbound data 
• Maybe this isn’t so true 
…but you lose data.
Another example (Velocity) 
• You host a website 
• … on 10,000 servers 
• Monitor logs for errors
Variety 
• Most traditional solutions don’t handle a variety of data types well 
• Maybe this isn’t so true 
…But you need to write a custom importer for every type.
A final example (Variety) 
• You own a business 
• With a sales and marketing teams 
• … in different regions around the world 
• Correlate sales numbers against marketing expenses
The First Problem : Computing Power 
First Second Third 
First Second Third 
First Second Third 
First Second Third 
First Second Third 
Limited by cores 
(Scaling up)
Solution: Scale out (not up!) 
Server 1 Server 2 
Coordinator 
Server 3 Server 4
Coordination 
Job Coordinator 
Runner 
Runner 
Runner
MapReduce 
• A programming model and an associated implementation for 
processing and generating large data sets with a parallel, distributed 
algorithm on a cluster. – Wikipedia 
WHAT?
Map and Reduce 
• Map 
• Process data returning key value pairs 
• Reduce 
• Aggregate/Filter key value pairs into result 
Map 
Map 
Data 
Data 
Reduce Result
Mapping 
• Easy example 
• Store Sales 
• Find most sales per store in 2010 
Year Month Store Id SalesTotal 
2010 1 13 1,000 
2010 3 43 12,000 
2010 3 21 21,000 
2010 4 13 3,000 
2010 2 56 4,000 
2010 6 32 12,000 
2010 7 1 4,000 
2010 2 23 2,000
Solution – Map 
1. Mapper feeds document rows to your program 
2. You return key value pairs 
StoreId Sales 
21 2,000 
23 3,000 
2 1,000 
21 23,000
Solution - Reduce 
• Data is merged 
• Merged into Key/Values: 
{21, [2,000, 23,000]} 
{23, [3,000]} 
{2, [1,000]} 
• You process each row
Data Access 
• Each process needs access to data 
Typical Desired
HDFS 
• Hadoop File System 
• Open-source implementation of the Google File System (GFS) 
Hard drives last about 1,000 days. So, 
if you have 1K hard drives, you’ll lose 
one per day.
The ecosystem 
• Hive 
• SQL-like query language 
• Define and enforce schema 
• Pig 
• SQL-like query language 
• Sqoop 
• SQL/Hadoop integration 
• Oozie 
• Scheduling 
• Mahout 
• Machine Learning interface 
• Storm 
• Stream-based MapReduce 
… and Many Others
Vendors 
• Hortonworks 
• Single click install of Sandbox 
• Cloudera 
• Downloadable VM 
• Syncfusion 
• Single click install of Syncfusion Big Data 
• Amazon AWS 
• Elastic MapReduce 
• Microsoft Azure 
• HDInsight
Contact Me 
Barrett Simms 
barrett@wbsimms.com 
http://wbsimms.com 
Twitter: @wbsimms 
Phone: 781.405.4686
Ad

More Related Content

What's hot (20)

Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
Roopendra Vishwakarma
 
Academy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet ElasticsearchAcademy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet Elasticsearch
Binary Studio
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
Steve Behrendt
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
Elasticsearch 5.0
Elasticsearch 5.0Elasticsearch 5.0
Elasticsearch 5.0
Matias Cascallares
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
MongoDB
MongoDBMongoDB
MongoDB
Rony Gregory
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
473_LightningTalks.pptx
473_LightningTalks.pptx473_LightningTalks.pptx
473_LightningTalks.pptx
Aakash Takale
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
Roland Cornelissen
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Bo Andersen
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
Carlo Vaccari
 
Building an API layer for C* at Coursera
Building an API layer for C* at CourseraBuilding an API layer for C* at Coursera
Building an API layer for C* at Coursera
Daniel Jin Hao Chia
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Windows Developer
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Continuent
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 
Academy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet ElasticsearchAcademy PRO: Introduction to search engines. Meet Elasticsearch
Academy PRO: Introduction to search engines. Meet Elasticsearch
Binary Studio
 
ELK - Stack - Munich .net UG
ELK - Stack - Munich .net UGELK - Stack - Munich .net UG
ELK - Stack - Munich .net UG
Steve Behrendt
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
Robert Lujo
 
Visualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and KibanaVisualizing Austin's data with Elasticsearch and Kibana
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
Elastic Stack Introduction
Elastic Stack IntroductionElastic Stack Introduction
Elastic Stack Introduction
Vikram Shinde
 
Roaring with elastic search sangam2018
Roaring with elastic search sangam2018Roaring with elastic search sangam2018
Roaring with elastic search sangam2018
Vinay Kumar
 
473_LightningTalks.pptx
473_LightningTalks.pptx473_LightningTalks.pptx
473_LightningTalks.pptx
Aakash Takale
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
Bo Andersen
 
Klevis Mino: MongoDB
Klevis Mino: MongoDBKlevis Mino: MongoDB
Klevis Mino: MongoDB
Carlo Vaccari
 
Building an API layer for C* at Coursera
Building an API layer for C* at CourseraBuilding an API layer for C* at Coursera
Building an API layer for C* at Coursera
Daniel Jin Hao Chia
 
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Build 2017 - P4010 - A lap around Azure HDInsight and Cosmos DB Open Source A...
Windows Developer
 
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Webinar Slides: Tungsten Replicator for Elasticsearch - Real-time data loadin...
Continuent
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Scaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays SingaporeScaling ELK Stack - DevOpsDays Singapore
Scaling ELK Stack - DevOpsDays Singapore
Angad Singh
 

Viewers also liked (17)

Insights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCEInsights of Brazilian Luxury Market Palestra UBIFRANCE
Insights of Brazilian Luxury Market Palestra UBIFRANCE
Haroldo Monteiro da Silva Filho
 
Unit Testing and Tools
Unit Testing and ToolsUnit Testing and Tools
Unit Testing and Tools
William Simms
 
shinwari saltish
shinwari saltishshinwari saltish
shinwari saltish
Jia Usaf
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrum
William Simms
 
Software Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That MatterSoftware Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That Matter
William Simms
 
Negotiation skills ppt.odp
Negotiation skills ppt.odpNegotiation skills ppt.odp
Negotiation skills ppt.odp
Hari Kudchadkar
 
Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.
murat-yaman.com
 
Oriental theatre Presentation - Dramatics
Oriental theatre Presentation - DramaticsOriental theatre Presentation - Dramatics
Oriental theatre Presentation - Dramatics
Hari Kudchadkar
 
E commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges inE commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges in
Haroldo Monteiro da Silva Filho
 
The art of selling fashion franchises through storytelling
The art of selling fashion franchises through  storytellingThe art of selling fashion franchises through  storytelling
The art of selling fashion franchises through storytelling
Haroldo Monteiro da Silva Filho
 
French Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics ClassFrench Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics Class
Hari Kudchadkar
 
스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323
YoonTaeSup
 
The fashion franchising market in brazil
The fashion franchising market in brazilThe fashion franchising market in brazil
The fashion franchising market in brazil
Haroldo Monteiro da Silva Filho
 
Toyota's Team Culture Case Presentation
Toyota's Team Culture Case PresentationToyota's Team Culture Case Presentation
Toyota's Team Culture Case Presentation
Hari Kudchadkar
 
Intergroup conflict
Intergroup conflictIntergroup conflict
Intergroup conflict
Jia Usaf
 
News beats journalism ppt final
News beats   journalism ppt finalNews beats   journalism ppt final
News beats journalism ppt final
Hari Kudchadkar
 
Medical Tourism Presentation
Medical Tourism PresentationMedical Tourism Presentation
Medical Tourism Presentation
Hari Kudchadkar
 
Unit Testing and Tools
Unit Testing and ToolsUnit Testing and Tools
Unit Testing and Tools
William Simms
 
shinwari saltish
shinwari saltishshinwari saltish
shinwari saltish
Jia Usaf
 
Introduction to scrum
Introduction to scrumIntroduction to scrum
Introduction to scrum
William Simms
 
Software Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That MatterSoftware Development And Delivery Metrics That Matter
Software Development And Delivery Metrics That Matter
William Simms
 
Negotiation skills ppt.odp
Negotiation skills ppt.odpNegotiation skills ppt.odp
Negotiation skills ppt.odp
Hari Kudchadkar
 
Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.Bir elektronikçinin mutluluk formülü.
Bir elektronikçinin mutluluk formülü.
murat-yaman.com
 
Oriental theatre Presentation - Dramatics
Oriental theatre Presentation - DramaticsOriental theatre Presentation - Dramatics
Oriental theatre Presentation - Dramatics
Hari Kudchadkar
 
E commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges inE commerce business model strategy opportunities and challenges in
E commerce business model strategy opportunities and challenges in
Haroldo Monteiro da Silva Filho
 
The art of selling fashion franchises through storytelling
The art of selling fashion franchises through  storytellingThe art of selling fashion franchises through  storytelling
The art of selling fashion franchises through storytelling
Haroldo Monteiro da Silva Filho
 
French Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics ClassFrench Theatre Presentation - Dramatics Class
French Theatre Presentation - Dramatics Class
Hari Kudchadkar
 
스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323스타트업 홍보의 이론과 실제 20130323
스타트업 홍보의 이론과 실제 20130323
YoonTaeSup
 
Toyota's Team Culture Case Presentation
Toyota's Team Culture Case PresentationToyota's Team Culture Case Presentation
Toyota's Team Culture Case Presentation
Hari Kudchadkar
 
Intergroup conflict
Intergroup conflictIntergroup conflict
Intergroup conflict
Jia Usaf
 
News beats journalism ppt final
News beats   journalism ppt finalNews beats   journalism ppt final
News beats journalism ppt final
Hari Kudchadkar
 
Medical Tourism Presentation
Medical Tourism PresentationMedical Tourism Presentation
Medical Tourism Presentation
Hari Kudchadkar
 
Ad

Similar to Big Data Overview Part 1 (20)

Big Data
Big DataBig Data
Big Data
Mahesh Bmn
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
bigdata.pdf
bigdata.pdfbigdata.pdf
bigdata.pdf
AnjaliKumari301316
 
bigdata.pptx
bigdata.pptxbigdata.pptx
bigdata.pptx
VIJAYAPRABAP
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
CQD
 
Big data technology
Big data technology Big data technology
Big data technology
omer mohamed abd alrhman
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Idan Tohami
 
Scalable web architecture
Scalable web architectureScalable web architecture
Scalable web architecture
Kaushik Paranjape
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
Zohar Elkayam
 
try
trytry
try
Lamha Agarwal
 
Redshift deep dive
Redshift deep diveRedshift deep dive
Redshift deep dive
Amazon Web Services LATAM
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
datastack
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
Christos Charmatzis
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
CQD
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Lucidworks
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Idan Tohami
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Caserta
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
سکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابرسکوهای ابری و مدل های برنامه نویسی در ابر
سکوهای ابری و مدل های برنامه نویسی در ابر
datastack
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Ad

Recently uploaded (20)

How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 

Big Data Overview Part 1

  • 1. Big Data Overview – Part 1 Wm. Barrett Simms [email protected] @wbsimms
  • 2. Opening remarks • Sponsors • Pluralsight • Free month gift card give away. Enter your name in the pot! • DevExpress • $250 in developer JustCode tools. • O’Reilly • Book give away. Enter your name in the pot! • Boston Code Camp 22 (November 22nd) • http://www.bostoncodecamp.com/ • Thanks to 3thought for the space
  • 3. About Me Software Developer Agile Team Member Team Lead Agile Advocate SDLC Implementer
  • 5. Big Data “Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications.” - Wikipedia
  • 6. The 3 Vs • Volume • A few Gigabytes -> Petabyte • Velocity • Arrives quickly • Variety • Multiple Sources
  • 7. Volume • Traditional SQL architectures don’t scale to very large • Maybe this isn’t so true …but the MMP systems are expensive
  • 8. An example problem (Volume) • You own a chain of stores • … with 25,000 stores and 100,000 POS systems • Need information on inventory changes • By region • By store
  • 9. Velocity • Traditional solutions don’t handle fast inbound data • Maybe this isn’t so true …but you lose data.
  • 10. Another example (Velocity) • You host a website • … on 10,000 servers • Monitor logs for errors
  • 11. Variety • Most traditional solutions don’t handle a variety of data types well • Maybe this isn’t so true …But you need to write a custom importer for every type.
  • 12. A final example (Variety) • You own a business • With a sales and marketing teams • … in different regions around the world • Correlate sales numbers against marketing expenses
  • 13. The First Problem : Computing Power First Second Third First Second Third First Second Third First Second Third First Second Third Limited by cores (Scaling up)
  • 14. Solution: Scale out (not up!) Server 1 Server 2 Coordinator Server 3 Server 4
  • 15. Coordination Job Coordinator Runner Runner Runner
  • 16. MapReduce • A programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. – Wikipedia WHAT?
  • 17. Map and Reduce • Map • Process data returning key value pairs • Reduce • Aggregate/Filter key value pairs into result Map Map Data Data Reduce Result
  • 18. Mapping • Easy example • Store Sales • Find most sales per store in 2010 Year Month Store Id SalesTotal 2010 1 13 1,000 2010 3 43 12,000 2010 3 21 21,000 2010 4 13 3,000 2010 2 56 4,000 2010 6 32 12,000 2010 7 1 4,000 2010 2 23 2,000
  • 19. Solution – Map 1. Mapper feeds document rows to your program 2. You return key value pairs StoreId Sales 21 2,000 23 3,000 2 1,000 21 23,000
  • 20. Solution - Reduce • Data is merged • Merged into Key/Values: {21, [2,000, 23,000]} {23, [3,000]} {2, [1,000]} • You process each row
  • 21. Data Access • Each process needs access to data Typical Desired
  • 22. HDFS • Hadoop File System • Open-source implementation of the Google File System (GFS) Hard drives last about 1,000 days. So, if you have 1K hard drives, you’ll lose one per day.
  • 23. The ecosystem • Hive • SQL-like query language • Define and enforce schema • Pig • SQL-like query language • Sqoop • SQL/Hadoop integration • Oozie • Scheduling • Mahout • Machine Learning interface • Storm • Stream-based MapReduce … and Many Others
  • 24. Vendors • Hortonworks • Single click install of Sandbox • Cloudera • Downloadable VM • Syncfusion • Single click install of Syncfusion Big Data • Amazon AWS • Elastic MapReduce • Microsoft Azure • HDInsight
  • 25. Contact Me Barrett Simms [email protected] http://wbsimms.com Twitter: @wbsimms Phone: 781.405.4686

Editor's Notes

  • #2: Welcome!
  • #4: Focus on technical product delivery
  • #14: Each inbound request spawns three processes. Spawning multiple processes isn’t scalable