Big Data Defined: Andrew J. Brust

This document defines big data and describes how it is analyzed. It explains that big data involves hundreds of terabytes to petabytes of information that is too large for traditional databases. Hadoop is used to perform distributed and parallel processing on big data using MapReduce. MapReduce splits and processes the data in the map step, aggregates results in the reduce step, and is commonly employed by Hadoop but also used by other systems. The document provides an example of how MapReduce could be used to count data by floor and platform in a building. It also discusses data scientists and their skills in statistics, subject matter expertise, and asking the right questions to analyze big data.

Uploaded by

Favio90

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Big Data Defined: Andrew J. Brust

Uploaded by

Favio90

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Big Data Defined

Andrew J. Brust
http://www.bluebadgeinsights.com
[email protected]
Big Data Defined

100s of TB – x PB

Uses Hadoop

Three Vs

Too big for OLTP

Uses distributed/parallel processing

MapReduce

 Map step: split the data and pre-process it

 Reduce step: aggregate the results
 Most typical of Hadoop but employed by others, to various extents
A MapReduce Example

• Count by suite, on each floor

• Send per-suite, per platform totals to lobby

• Sort totals by platform

• Send two platform packets to 10th, 20th, 30th floor

• Tally up each platform

• Collect the tallies

• Merge tallies into one spreadsheet

Data Scientists

Near Abuse of
But with:
synonyms: term:
• Statisticians • Subject • Hadoop
• “Quants” matter experts
expertise • “R”
• Good at developers
knowing (although
the right that does
questions to overlap)
ask

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (78)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
88% (8)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Michele Maggiore-Gravitational Waves - Volume 1 - Theory and Experiments-Oxford University Press, USA (2007) PDF
0% (1)
Michele Maggiore-Gravitational Waves - Volume 1 - Theory and Experiments-Oxford University Press, USA (2007) PDF
569 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
GemFire Architecture
No ratings yet
GemFire Architecture
72 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Hadoop and Big Data
No ratings yet
Hadoop and Big Data
41 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Hdfs MR Wordcount
No ratings yet
Hdfs MR Wordcount
16 pages
Lecture3 Hadoop-NLP
No ratings yet
Lecture3 Hadoop-NLP
44 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
Big Data
No ratings yet
Big Data
43 pages
Data Science
No ratings yet
Data Science
16 pages
BigData Materials
No ratings yet
BigData Materials
68 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
50 pages
Big Data Testing
No ratings yet
Big Data Testing
10 pages
Hadoop
No ratings yet
Hadoop
13 pages
Yum Yum D Giga
No ratings yet
Yum Yum D Giga
368 pages
DM - Topic Five
No ratings yet
DM - Topic Five
30 pages
How To Program Mapreduce Jobs in Hadoop With R: Group 8 João Rosa, Mario Almeida, Alex Pérez
No ratings yet
How To Program Mapreduce Jobs in Hadoop With R: Group 8 João Rosa, Mario Almeida, Alex Pérez
27 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Lec1 Special
No ratings yet
Lec1 Special
21 pages
Unit 1
No ratings yet
Unit 1
118 pages
Lecture4 IntroMapReduce PDF
No ratings yet
Lecture4 IntroMapReduce PDF
75 pages
Lecture 5 - Hadoop and Mapreduce
No ratings yet
Lecture 5 - Hadoop and Mapreduce
30 pages
Big Data With Hadoop & Spark - Introduction
No ratings yet
Big Data With Hadoop & Spark - Introduction
42 pages
Unit-5 -Hadoop.pptx
No ratings yet
Unit-5 -Hadoop.pptx
29 pages
Data Science
No ratings yet
Data Science
31 pages
Data_Analytics_Tools
No ratings yet
Data_Analytics_Tools
15 pages
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
No ratings yet
School of Computer Engineering: Kalinga Institute of Industrial Technology Deemed To Be University Bhubaneswar-751024
260 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
52 pages
Lecture8 -Big Data (Hadoop)
No ratings yet
Lecture8 -Big Data (Hadoop)
29 pages
RV College of Engineering: Big Data Analytics 16CS7F1 Prof - Mamatha T
No ratings yet
RV College of Engineering: Big Data Analytics 16CS7F1 Prof - Mamatha T
64 pages
Analysis Complex Samples 131108
No ratings yet
Analysis Complex Samples 131108
31 pages
Bda Unit-2
No ratings yet
Bda Unit-2
52 pages
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
No ratings yet
CAIM: Cerca I Anàlisi D'informació Massiva: FIB, Grau en Enginyeria Informàtica
65 pages
Hadoop Big Data 1
No ratings yet
Hadoop Big Data 1
19 pages
2011 Webber-A Programmatic Introduction To Neo4j
No ratings yet
2011 Webber-A Programmatic Introduction To Neo4j
66 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Presentation: Hadoop Technology
No ratings yet
Presentation: Hadoop Technology
15 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
DAY 3 - ITEM 10 - Overview of Big Data Tools
No ratings yet
DAY 3 - ITEM 10 - Overview of Big Data Tools
25 pages
Unit_IV_Hadoop
No ratings yet
Unit_IV_Hadoop
90 pages
Hive Slides-2
No ratings yet
Hive Slides-2
25 pages
Cloud Compute
No ratings yet
Cloud Compute
46 pages
Big Data
No ratings yet
Big Data
67 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Big Data Analytics With Lab
No ratings yet
Big Data Analytics With Lab
3 pages
Lab Manual BDA
No ratings yet
Lab Manual BDA
36 pages
Using R in Azure ML
No ratings yet
Using R in Azure ML
63 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Large-Scale Data Analytics: Traditional Database Systems
No ratings yet
Large-Scale Data Analytics: Traditional Database Systems
11 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
BDA - Lecture 3
100% (1)
BDA - Lecture 3
17 pages
02-Hadoop Ecosystem
No ratings yet
02-Hadoop Ecosystem
21 pages
Unit V Cloud Technologies and Advancements
No ratings yet
Unit V Cloud Technologies and Advancements
33 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Sophos Increases Security With Big Data Analytics
No ratings yet
Sophos Increases Security With Big Data Analytics
4 pages
Runge Lenz Vector in Quantum Mechanics
No ratings yet
Runge Lenz Vector in Quantum Mechanics
10 pages
Jaynes-Cummings Model
No ratings yet
Jaynes-Cummings Model
6 pages
Borromean Circles Are Impossible
No ratings yet
Borromean Circles Are Impossible
3 pages
The Time Inversion For Modified Oscillators
No ratings yet
The Time Inversion For Modified Oscillators
33 pages
Theory of ODE - Hu
No ratings yet
Theory of ODE - Hu
67 pages
Levinas, E - Time and The Other & Additional Essays (Duquesne, 1987)
100% (9)
Levinas, E - Time and The Other & Additional Essays (Duquesne, 1987)
104 pages
Module 3
No ratings yet
Module 3
33 pages
Blockchain: Research and Applications: Lodovica Marchesi, Michele Marchesi, Roberto Tonelli, Maria Ilaria Lunesu
No ratings yet
Blockchain: Research and Applications: Lodovica Marchesi, Michele Marchesi, Roberto Tonelli, Maria Ilaria Lunesu
13 pages
Massa Whitepaper
No ratings yet
Massa Whitepaper
44 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
CH 4 Concurrency Control
No ratings yet
CH 4 Concurrency Control
41 pages
W 14 Clusters and Message-passing Multiprocessor
No ratings yet
W 14 Clusters and Message-passing Multiprocessor
22 pages
Cockroach DB
No ratings yet
Cockroach DB
37 pages
Cloud Computing QB
No ratings yet
Cloud Computing QB
104 pages
BDA - Assignment and Submission Guidelines PDF
No ratings yet
BDA - Assignment and Submission Guidelines PDF
3 pages
Computing Environment (2101678586)
No ratings yet
Computing Environment (2101678586)
2 pages
Intersystems Communication: Unit 1
No ratings yet
Intersystems Communication: Unit 1
80 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
130 pages
Failure Model
No ratings yet
Failure Model
14 pages
Clock Synchronization in Centralized Systems
No ratings yet
Clock Synchronization in Centralized Systems
30 pages
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
No ratings yet
A Critical Analysis of Apache Hadoop and Spark For Big Data Processing
6 pages
DX Blockchain
No ratings yet
DX Blockchain
25 pages
Kafka My Kafka Note v67
No ratings yet
Kafka My Kafka Note v67
55 pages
The Raise of Digital Gold
No ratings yet
The Raise of Digital Gold
11 pages
Slide 03
No ratings yet
Slide 03
155 pages
Agni College of Technology: Office of Examcell Internal Test-1
No ratings yet
Agni College of Technology: Office of Examcell Internal Test-1
2 pages
Slides For Chapter 2: Architectural Models: Software Layers
No ratings yet
Slides For Chapter 2: Architectural Models: Software Layers
4 pages

Big Data Defined: Andrew J. Brust

Uploaded by

Big Data Defined: Andrew J. Brust

Uploaded by

Big Data Defined

Too big for OLTP

Uses distributed/parallel processing

 Map step: split the data and pre-process it

• Count by suite, on each floor

• Send per-suite, per platform totals to lobby

• Sort totals by platform

• Send two platform packets to 10th, 20th, 30th floor

• Tally up each platform

• Collect the tallies

• Merge tallies into one spreadsheet

You might also like