01 Introduction
01 Introduction
Davison
Lehigh University
DSCI 398
Introduction to Data Science
Outline for Today
• Introductions
Proponents of big data solutions would also describe big data as a large
volume of unstructured data which cannot be handled by standard
databases.
Source: http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Why is Big Data
a big deal now?
https://www.domo.com/learn/infographic/data-never-sleeps-9
• Over 90% of people
access the Internet via
mobile devices
• Amount of data
consumed last year
was 79ZB!
•
Answer #1:
Source:
http://www.networkworld.com/article/2358531/dat
a-center/internet-guru-mary-meeker-says-
enterprise-technology-is-getting-much-much-
cheaper.html
Why now?
Answer #2:
Source:
http://www.networkworld.com/article/2358531/da
ta-center/internet-guru-mary-meeker-says-
enterprise-technology-is-getting-much-much-
cheaper.html
Why now?
Answer #3:
http://www.mkomo.com/cost-per-gigabyte-update
• As a result of the factors (computing, storage,
and communication costs) just mentioned for
big data, we are now drowning in data.
• A bigger shift in business itself: “information is
power” and organizations need to think about
Why what data to collect and what information to
extract and how to use it optimally.
Data Science • Sensors, e.g., transaction processing systems are
Now? critical (ATMs, point-of-sale scanners, web servers,
IoT), as the eyes and ears of an org
• Data warehouses provide access to historical data
• Data mining provides the analytical/modeling toolkits
• Together they provide organizations an effective
“sense and respond” mechanism
Big data is not
about the data.
–Gary King, Harvard University, making the point that while data may
be plentiful, the real value is in the analytics
Hiding within those mounds
of data is knowledge that
could change the life of a
patient, or change the world.
–Atul Butte, Stanford School of Medicine
Data Science
Is a set of principles, concepts, and techniques that structure
thinking and analysis of data
Changes the way you think about data and its role in business
See http://www.datasciencecentral.com/profiles/blogs/17-analytic-disciplines-compared
Some other (useful) definitions of Data Science
will cover • The data mining process (which we will see on another
day)
Domain Expertise
Data Science
Social
Sciences
ar al
So orm re t
se on
inf ftw men
cia ed
ch
Domain-
Re diti
De
So lop
lly
unaware
ve
Tra
Ma
-
r Data Science
a
te th
u
p ce a
m Data Sta nd
Co cien Science tis
tic
S Socially- Tra s
e
alu ng unaware Re ditio
V se na
i gh eeri Data Science arc l
H in h
n g
E
Domain Expertise
“Data Scientist” (Geek?)
• Can do the actual modeling, building or extending tools when needed
• Applied statistician + computer scientist
Data
• Uses data science techniques to extract knowledge of value
Science • Can understand the potential, evaluate a proposal and execution, and
interface with a broad variety of people
Strategist, Investor, …
• Envision opportunities, come up with novel ideas, evaluate the promise
of new ideas, design data science projects / companies conceptually
Jobs in the Data Science / Big Data Ecosystem
http://101.datascience.community/wp-content/uploads/2015/11/datasciencejobs.png
Jobs in the Data Science / Big Data Ecosystem
http://101.datascience.community/wp-content/uploads/2015/11/datasciencejobs.png
Jobs in the Data Science / Big Data Ecosystem
http://101.datascience.community/wp-content/uploads/2015/11/datasciencejobs.png