SlideShare a Scribd company logo
Making Sense out of Big Data
Peter Morgan - July 2013
Table of Contents
1. Definition and Overview
2. Data Sources
3. Databases
4. Data Analytics
Glossary
References
2
1. Definition and Overview
3
What is big data?
More and more data is being collected and stored each day
4
Four main components
• Data
– Structured and unstructured
• Databases
– Proprietary and open source
• Query language
– Querying the database
• Analytics
– Analysing the data
5
How big is big?
• Large data sets
– Greater than 1,000 Terabytes? (1 Petabyte)
– 1,000,000 Terabytes? (1 Exabyte)
• Excel 2013 can have 1,048,576 rows by 16,384 columns
– About 10 Gigabyte of data
• Only going to get bigger
– 90% of all data produced in the past two years !
– Rate is increasing
• Recall
– Giga = 10⁹
– Tera = 10¹²
– Peta = 10¹⁵
– Exa = 10¹⁸
6
Big Data Evolution
7
2. Data Sources
8
Where does the data come from?
• Science – particle, astrophysics
• Industry – oil, finance, telecom
– Actually all verticals
• Social – Facebook, LinkedIn, Twitter
• Medicine – genome, neuroscience
• Government – census, education, police
• Sports – statistics
• Environment – weather, sensors
9
Unstructured Data
• 80% of data is unstructured
• NoSQL
• Document based
– Documents
– Texts, tweets
– Emails
– Machine logs
– Blogs
– Web pages
– Photos
– Videos (YouTube)
• Graph based
– Social media sites
– Facebook has 1.1billions users (Microstrategy, July 27, 2013)
10
Why do we need to use big data?
Use in public and private sector to:
• Make faster and more accurate business decisions
• Make accurate predictions
• Gain competitive advantage
• Implement smarter marketing – CRM
• Discover new opportunities
• Enhance Business Intelligence
• Enable fraud detection
• Reduce crime
• Improve scientific research
• Quicken analysis (up to real time)
– Weeks, days  minutes, seconds
11
Big Data Startup - Case Study
• Rocket Fuel
• No. 4 on Forbes' 2013 Most Promising Companies In
America list
• Digital advertising startup
• Screens over 26 billion ads per day
• “Advertising that learns” big data platform
• Distributed planet-scale computing engine
• Hadoop implementation
• Founders from Yahoo!, Salesforce.com, DoubleClick
• Targeting algorithms use lifestyle, purchase intent and
social data
12
Some big statistics
13
3. Databases
14
Database Timeline
15
Relational databases – SQL
Proprietary
• Oracle DB
• IBM DB2
• Microsoft SQL
• SAP
• EMC
Open Source
• MySQL
• PostgresQL
• Drizzle
• Firebird
16
Non-relational databases – NoSQL
• BigTable – Google
• Cassandra – Facebook
• Eucalyptus – Amazon
• Hbase – Hadoop
• MongoDB – 10Gen
• Neo4j - NeoTechnologies
• CouchDB - Apache
• CouchBase
• Riak - Basho
• Redis - Pivotal
17
4. Big Data Analytics
18
Big Data Analytics - Incumbents
• Oracle – Exadata, Exalytics
• Microsoft – HDInsight, xVelocity
• IBM – Netezza, Cognos, BigInsights
• SAP – HANA, Business Objects
• EMC – Pivotal (Greenplum)
• HP – Vertica, HAVEn
• All run on Hadoop
19
Big Data Analytics – Pure Plays
• Pure plays – definition:
– Been around more than 20 years
– Purely data analytic companies
• Teradata - Aster
• SAS
• Microstrategy
20
Big Data Analytics – New Entrants
• Hortonworks
• Cloudera
• MapR
• Acunu
• Pentaho
• Tableau
• Talend
• Splunk
21
(Some of) IBM’s Big Data Acquisitions
• Algorithmics
– Oct 2011, $400million
• OpenPages
– Oct 2010, ?
• Netezza
– Sept 2010, $1.7billion
• SPSS
– Jan 2010, $1.2billion
• Cognos
– Jan 2008, $4.9billion
• About $10billion in four years
http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM
22
Big Data Science Tools
• Hadoop
• NoSQL
• MapReduce
• R
• Matlab
• Python
• Statistics
23
Big Data Hadoop Stack
• Hadoop is the de facto big data operating system
• Developed from Google and Yahoo! (2005)
• It is distributed, open source and managed by Apache
24
Analytic Technologies
• A/B testing
• Genetic algorithms
• Machine learning
• Natural language
processing
• Neural networks
• Pattern recognition
• Anomaly detection
• Decision tree
• Predictive modeling
• Regression testing
• Sentiment analysis
• Signal processing
• Simulations
• Time series analysis
• Visualization
• Multivariate analysis
• Text analytics
25
Glossary
• OLTP = On Line Transactional Processing
• OLAP = On Line Analytic Processing
• ODBC = Open DataBase Connectivity
• IMDB = In Memory DataBase
• CRUD = Create, Read, Update, Delete
• ETL = Extract, Transform and Load
• CDO = Chief Data Officer
• NLP = Natural Language Processing
• GQL = Graph Query Language
• AaaS = Analytics as a Service
• EDW = Enterprise Data Warehouse
26
References
• Microstrategy website, 27 July, 2013, Michael Saylor
Presentation at Microstrategy World 2013,
http://www.microstrategy.com/
• Teradata website www.teradata.com
• Wikipedia http://en.wikipedia.org/wiki/
• Google images www.google.co.uk
• IBM website www.ibm.com
• Youtube www.youtube.com
• Hadoop www.hortonworks.com
27
Any Questions?
28
Ad

More Related Content

What's hot (20)

Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Emran Hossain
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
Jeff Kelly
 
Big data
Big dataBig data
Big data
Nausheen Hasan
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
René Kuipers
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
Matthias Funke
 
Unit 1
Unit 1Unit 1
Unit 1
karthik eriki
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
Way-Yen Lin
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
suresh sood
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
kk1718
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
chennaijp
 
Big data
Big dataBig data
Big data
ArchanaMani2
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
Putchong Uthayopas
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Connected Data World
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
Sandip Tipayle Patil
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Polash Halder
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
Sanjeev Solanki
 
Big data
Big dataBig data
Big data
nikki135
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Emran Hossain
 
Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014Wikibon Big Data Capital Markets Day 2014
Wikibon Big Data Capital Markets Day 2014
Jeff Kelly
 
Presentation Big Data
Presentation Big DataPresentation Big Data
Presentation Big Data
René Kuipers
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
Matthias Funke
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
Way-Yen Lin
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
suresh sood
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
kk1718
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
chennaijp
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Connected Data World
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
Anand572211
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
Polash Halder
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
Sanjeev Solanki
 

Similar to Big data – An Introduction, July 2013 (20)

BigData.pptx
BigData.pptxBigData.pptx
BigData.pptx
vidhi171881
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Big data
Big dataBig data
Big data
roysonli
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
Neo4j
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
data science unit 2 bigdata introduction .pptx
data science unit 2 bigdata introduction .pptxdata science unit 2 bigdata introduction .pptx
data science unit 2 bigdata introduction .pptx
NithiMini
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
ahmedibrahimghnnam01
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Roi Blanco
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Graph tour keynote 2019
Graph tour keynote 2019Graph tour keynote 2019
Graph tour keynote 2019
Neo4j
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
data science unit 2 bigdata introduction .pptx
data science unit 2 bigdata introduction .pptxdata science unit 2 bigdata introduction .pptx
data science unit 2 bigdata introduction .pptx
NithiMini
 
ppt final.pptx
ppt final.pptxppt final.pptx
ppt final.pptx
kalai75
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01Bigdatappt 140225061440-phpapp01
Bigdatappt 140225061440-phpapp01
nayanbhatia2
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Modul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptxModul_1_Introduction_to_Big_Data.pptx
Modul_1_Introduction_to_Big_Data.pptx
NouhaElhaji1
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
ahmedibrahimghnnam01
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
Ad

More from Peter Morgan (12)

Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019
Peter Morgan
 
AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024
Peter Morgan
 
Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018
Peter Morgan
 
Simulation Hypothesis 2017
Simulation Hypothesis 2017Simulation Hypothesis 2017
Simulation Hypothesis 2017
Peter Morgan
 
AI Developments Aug 2017
AI Developments Aug 2017AI Developments Aug 2017
AI Developments Aug 2017
Peter Morgan
 
London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017
Peter Morgan
 
Robotics Overview 2016
Robotics Overview 2016Robotics Overview 2016
Robotics Overview 2016
Peter Morgan
 
AI and Blockchain 2017
AI and Blockchain 2017AI and Blockchain 2017
AI and Blockchain 2017
Peter Morgan
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017
Peter Morgan
 
AI Predictions 2017
AI Predictions 2017AI Predictions 2017
AI Predictions 2017
Peter Morgan
 
AI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCAI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYC
Peter Morgan
 
Machine Learning - Where to Next?, May 2015
Machine Learning  - Where to Next?, May 2015Machine Learning  - Where to Next?, May 2015
Machine Learning - Where to Next?, May 2015
Peter Morgan
 
Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019Towards AGI Berlin - Building AGI, May 2019
Towards AGI Berlin - Building AGI, May 2019
Peter Morgan
 
AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024AI in Physics - University of Washington, Jan 2024
AI in Physics - University of Washington, Jan 2024
Peter Morgan
 
Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018Towards a General Theory of Intelligence - April 2018
Towards a General Theory of Intelligence - April 2018
Peter Morgan
 
Simulation Hypothesis 2017
Simulation Hypothesis 2017Simulation Hypothesis 2017
Simulation Hypothesis 2017
Peter Morgan
 
AI Developments Aug 2017
AI Developments Aug 2017AI Developments Aug 2017
AI Developments Aug 2017
Peter Morgan
 
London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017London Exponential Technologies Meetup, July 2017
London Exponential Technologies Meetup, July 2017
Peter Morgan
 
Robotics Overview 2016
Robotics Overview 2016Robotics Overview 2016
Robotics Overview 2016
Peter Morgan
 
AI and Blockchain 2017
AI and Blockchain 2017AI and Blockchain 2017
AI and Blockchain 2017
Peter Morgan
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017
Peter Morgan
 
AI Predictions 2017
AI Predictions 2017AI Predictions 2017
AI Predictions 2017
Peter Morgan
 
AI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYCAI State of Play Dec 2016 NYC
AI State of Play Dec 2016 NYC
Peter Morgan
 
Machine Learning - Where to Next?, May 2015
Machine Learning  - Where to Next?, May 2015Machine Learning  - Where to Next?, May 2015
Machine Learning - Where to Next?, May 2015
Peter Morgan
 
Ad

Recently uploaded (20)

Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 

Big data – An Introduction, July 2013

  • 1. Making Sense out of Big Data Peter Morgan - July 2013
  • 2. Table of Contents 1. Definition and Overview 2. Data Sources 3. Databases 4. Data Analytics Glossary References 2
  • 3. 1. Definition and Overview 3
  • 4. What is big data? More and more data is being collected and stored each day 4
  • 5. Four main components • Data – Structured and unstructured • Databases – Proprietary and open source • Query language – Querying the database • Analytics – Analysing the data 5
  • 6. How big is big? • Large data sets – Greater than 1,000 Terabytes? (1 Petabyte) – 1,000,000 Terabytes? (1 Exabyte) • Excel 2013 can have 1,048,576 rows by 16,384 columns – About 10 Gigabyte of data • Only going to get bigger – 90% of all data produced in the past two years ! – Rate is increasing • Recall – Giga = 10⁹ – Tera = 10¹² – Peta = 10¹⁵ – Exa = 10¹⁸ 6
  • 9. Where does the data come from? • Science – particle, astrophysics • Industry – oil, finance, telecom – Actually all verticals • Social – Facebook, LinkedIn, Twitter • Medicine – genome, neuroscience • Government – census, education, police • Sports – statistics • Environment – weather, sensors 9
  • 10. Unstructured Data • 80% of data is unstructured • NoSQL • Document based – Documents – Texts, tweets – Emails – Machine logs – Blogs – Web pages – Photos – Videos (YouTube) • Graph based – Social media sites – Facebook has 1.1billions users (Microstrategy, July 27, 2013) 10
  • 11. Why do we need to use big data? Use in public and private sector to: • Make faster and more accurate business decisions • Make accurate predictions • Gain competitive advantage • Implement smarter marketing – CRM • Discover new opportunities • Enhance Business Intelligence • Enable fraud detection • Reduce crime • Improve scientific research • Quicken analysis (up to real time) – Weeks, days  minutes, seconds 11
  • 12. Big Data Startup - Case Study • Rocket Fuel • No. 4 on Forbes' 2013 Most Promising Companies In America list • Digital advertising startup • Screens over 26 billion ads per day • “Advertising that learns” big data platform • Distributed planet-scale computing engine • Hadoop implementation • Founders from Yahoo!, Salesforce.com, DoubleClick • Targeting algorithms use lifestyle, purchase intent and social data 12
  • 16. Relational databases – SQL Proprietary • Oracle DB • IBM DB2 • Microsoft SQL • SAP • EMC Open Source • MySQL • PostgresQL • Drizzle • Firebird 16
  • 17. Non-relational databases – NoSQL • BigTable – Google • Cassandra – Facebook • Eucalyptus – Amazon • Hbase – Hadoop • MongoDB – 10Gen • Neo4j - NeoTechnologies • CouchDB - Apache • CouchBase • Riak - Basho • Redis - Pivotal 17
  • 18. 4. Big Data Analytics 18
  • 19. Big Data Analytics - Incumbents • Oracle – Exadata, Exalytics • Microsoft – HDInsight, xVelocity • IBM – Netezza, Cognos, BigInsights • SAP – HANA, Business Objects • EMC – Pivotal (Greenplum) • HP – Vertica, HAVEn • All run on Hadoop 19
  • 20. Big Data Analytics – Pure Plays • Pure plays – definition: – Been around more than 20 years – Purely data analytic companies • Teradata - Aster • SAS • Microstrategy 20
  • 21. Big Data Analytics – New Entrants • Hortonworks • Cloudera • MapR • Acunu • Pentaho • Tableau • Talend • Splunk 21
  • 22. (Some of) IBM’s Big Data Acquisitions • Algorithmics – Oct 2011, $400million • OpenPages – Oct 2010, ? • Netezza – Sept 2010, $1.7billion • SPSS – Jan 2010, $1.2billion • Cognos – Jan 2008, $4.9billion • About $10billion in four years http://en.wikipedia.org/wiki/List_of_mergers_and_acquisitions_by_IBM 22
  • 23. Big Data Science Tools • Hadoop • NoSQL • MapReduce • R • Matlab • Python • Statistics 23
  • 24. Big Data Hadoop Stack • Hadoop is the de facto big data operating system • Developed from Google and Yahoo! (2005) • It is distributed, open source and managed by Apache 24
  • 25. Analytic Technologies • A/B testing • Genetic algorithms • Machine learning • Natural language processing • Neural networks • Pattern recognition • Anomaly detection • Decision tree • Predictive modeling • Regression testing • Sentiment analysis • Signal processing • Simulations • Time series analysis • Visualization • Multivariate analysis • Text analytics 25
  • 26. Glossary • OLTP = On Line Transactional Processing • OLAP = On Line Analytic Processing • ODBC = Open DataBase Connectivity • IMDB = In Memory DataBase • CRUD = Create, Read, Update, Delete • ETL = Extract, Transform and Load • CDO = Chief Data Officer • NLP = Natural Language Processing • GQL = Graph Query Language • AaaS = Analytics as a Service • EDW = Enterprise Data Warehouse 26
  • 27. References • Microstrategy website, 27 July, 2013, Michael Saylor Presentation at Microstrategy World 2013, http://www.microstrategy.com/ • Teradata website www.teradata.com • Wikipedia http://en.wikipedia.org/wiki/ • Google images www.google.co.uk • IBM website www.ibm.com • Youtube www.youtube.com • Hadoop www.hortonworks.com 27