What's Big Data? How Does It Relate To Data Science?
What's Big Data? How Does It Relate To Data Science?
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 1
From data to analysis and execu<on
Intensity
Analytical Capability
Data Availability
Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 2
The appearance of the “Big Data”
Intensity
Analytical Capability
Data Availability
Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 3
What's Big Data?
Data at scale
Volume
Terabytes to
hexabyte of data
cumulated on
cheaper and cheaper
storages
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 4
What's Big Data?
Variety
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 5
What's Big Data?
Velocity
Data in motion
Analysis of streaming data
to enable decision within
fractions of a second
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 6
What's Big Data?
Veracity
Data Uncertainty
Managing the reliability and
predictability of inherently
imprecise data type
a b o u t u n c e r tainty
ty
the one certain to g o away
o t li ke ly
is that it is n
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 7
What's Big Data? (cont.)
Volume
Velocity
Variety
Data Data
Data in many
at scale in motion
forms More Vs
• Value
• Volatility
• Validity
• …
Data Uncertainty
Veracity
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 8
Big Data techs are like "crude oil"
• … that we have to
• Extract
• Transport in mega-tankers
• Ship through pipelines
• Store in massive silos
• …
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 9
Data Science is "refining crude oil"
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 10
Big Data and Data Science are
Closing the Gab
Intensity
Analytical Capability
Data Availability
Execution Capability
40’s 50’s 60’s 70’s 80’s 90's 2000's 2010's 2020’s 2030’s
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 11
What's Data Science?
• The Science [and Art] of…
• Discovering what we don’t know from data
• Obtaining predic<ve, ac<onable insight from data
• Crea<ng Data Products that have business impact now
• Communica<ng relevant business stories from data
• Building confidence in decisions that drive business value
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 12
Who's a Data Scientist?
• Drew Conway, 2010 e
c
ic en
M Kn
at ow
S
h
er Machine
& led
ut Learning
St ge
p
aF
m
Co
sF
Data
cs
Science Tr
a
ge
r Re ditio
n se n a
Da ne! arc l
Zo h
Substantive
Expertise
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 13
Who's a Data Scientist?
• Drew Conway, 2010 e
c
M Kn
ic en
at ow
S
h
r Machine
& led
te
St ge
pu Learning
at
m
ist
Co Data
ics
Science Tr
a
ge
r Re diFo
n se n a
Da ne! arc l
Zo h
SubstanFve
ExperFse
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 14
Who's a Data Scientist?
• Drew Conway, 2010 e
c
M Kn
ic en
at ow
S
h
r Machine
& led
te
St ge
pu Learning
at
m
ist
Co Data
ics
Science Tr
a
ge
r Re ditio
n se n a
Da ne! arc l
Zo h
SubstanFve
ExperFse
Marco Brambilla, Emanuele Della Valle - Data Science for Business InnovaFon 15
Statistical Modeling: The Two Cultures
Data modeling Algorithmic modeling
(a.k.a, staFsFcal analysis) (a.k.a, Machine Learning)
y = F(x, random noise, parameters) y = algorithm(x)
Linear regression
y Logistic regression x y unknown x
…
Decision Tree
Neural Nets
…
Valida<on Valida<on
yes/no using goodness-of-fit predicFve accuracy
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 16
Statistical Modeling: The Two Cultures
• Starting with data
response independent
variable variable
y nature x
Decision
Action
Data
Predic<ve
What will happen?
Decision Support
Prescriptive
What should I do?
Decision Automation
Marco Brambilla, Emanuele Della Valle - Data Science for Business Innovation 18
Credits
• Big Data [sorry] & Data Science: What Does a Data ScienFst Do? Carlos Somohano, 2013
• hjps://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-
scienFst-do-world
• 2017 Planning Guide for Data and AnalyFcs. John Hagerty (Gartner), 2016
• hjps://www.gartner.com/binaries/content/assets/events/keywords/catalyst/catus8/2017_plannin
g_guide_for_data_analyFcs.pdf
• Big Data: the next fronFer for innovaFon, compeFFon, and producFvity. McKinsey Global
InsFtute. May, 2011.
• hjp://www.mckinsey.com/insights/business_technology/big_data_the_next_fronFer_for_innovat
ion
• AnalyFcs: The real-world use of big data. IBM InsFtute for Business Value In collaboraFon
with Saïd Business School at the University of Oxford. 2012
• hjp://www-03.ibm.com/systems/hu/resources/the_real_word_use_of_big_data.pd
• Big Data & AnalyFcs: Next GeneraFon Architecture and CapabiliFes. Marc Andrews,
2014
• hjps://www.ibm.com/partnerworld/wps/servlet/RedirectServlet?cmsId=isv_ast_smp_ecosystem-
webcasts&ajachmentName=Data_Warehouse_deck.pdf