Unit I Introduction To Data Science
Unit I Introduction To Data Science
DATA SCIENCE
Dr Purnima Gandhi
LEARNING OBJECTIVES
Source: https://bit.ly/31HBHuQ
WHAT IS DATA SCIENCE?
Hector Garcia-Molina
Professor, Departments of
Computer Science and
Electrical Engineering,
Stanford University
WHAT IS DATA SCIENCE?
A multi-disciplinary and
emerging field that uses
scientific methods, processes,
algorithms, and systems to
extract knowledge and insights
from structured and
unstructured data.
Source: https://bit.ly/30dekJB
What is data science?
Data Science Service Change
Applying advanced Converting new data
statistical tools to insights into (often
existing data to small) changes to
generate new insights business processes
Smarter Work
More efficient and effective use of staff and resources
WHAT IS DATA SCIENCE?
• “ Concept to unify statistics, data analysis, machine
learning, and their related methods" in order to
"understand and analyze actual phenomena" with data.
Data Science is the science which uses computer science, statistics and machine
learning, visualization and human-computer interactions to collect, clean,
integrate, analyze, visualize, interact with data to create data products.
9
Goal of Data Science
Major overhauls /
Small changes
service disruptions
Collecting new
Use existing data
data (mostly ;)
WHAT IS DATA SCIENCE?
Data Science Roles
WHAT IS DATA SCIENCE?
Roles Required in Data Science Project
Source: https://bit.ly/2z5sYqf
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Data Scientists need to know how to “CODE”
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Other languages, tools, platforms and visualization
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Libraries
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Libraries
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Tools
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Learn to code
What’s in the Data Science SF Toolkit?
Statistical Methods Tools User Experience Research
Multilevel
Missing data
modeling imputations Classification and
clustering
Survival analysis
Pattern recognition
Principal component
and factor analysis
AB testing Machine learning
Forecasting
Propensity score Logistic, multinomial
matching and multiple linear
regression techniques Network analysis
What’s in the Data Science SF Toolkit?
Statistical Methods Tools User Experience Research
Iterative
Prototyping Photo journaling
and documenting
Service
blueprinting
Journey mapping
Ride-alongs
Process mapping
Ethnographic field
research and user
observation Usability testing
WHAT IS DATA SCIENCE?
Data Scientist need to comfortable with:
WHAT IS DATA SCIENCE?
Data Scientist need to learn machine learning and software
engineering
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
APPLICATION OF DATA SCIENCE
APPLICATIONS OF DATA SCIENCE
• Security
APPLICATIONS OF DATA SCIENCE
• Sports
APPLICATIONS OF DATA SCIENCE
• Internet Search
APPLICATIONS OF DATA SCIENCE
• Digital Advertisements
APPLICATIONS OF DATA SCIENCE
• Recommender System
APPLICATIONS OF DATA SCIENCE
• Image Processing
APPLICATIONS OF DATA SCIENCE
• Speech Recognition
APPLICATIONS OF DATA SCIENCE
• Gaming
APPLICATIONS OF DATA SCIENCE
• Delivery Logistics
APPLICATIONS OF DATA SCIENCE
• Health Care
APPLICATIONS OF DATA SCIENCE
• Augmented Reality
APPLICATIONS OF DATA SCIENCE
• Self-Driving Cars
APPLICATIONS OF DATA SCIENCE
• Robots
IMPACT OF DATA SCIENCE ON SOCIETY
IMPACT OF DATA SCIENCE ON
SOCIETY
• Saving Energy
IMPACT OF DATA SCIENCE ON
SOCIETY
• Data-Driven Hospitals
IMPACT OF DATA SCIENCE ON
SOCIETY
• A Cleaner Environment
IMPACT OF DATA SCIENCE ON
SOCIETY
• Volunteer with a socially-oriented data science
program/organization
IMPACT OF DATA SCIENCE ON
SOCIETY
• Contribute via competitions
IMPACT OF DATA SCIENCE ON
SOCIETY
• Consider solutions to real-world problems that you
encounter
IMPACT OF DATA SCIENCE ON
SOCIETY
• Be thoughtful in professional work
Real Life Examples
e.g.,
Google Flu Trends:
Detecting outbreaks
two weeks ahead
of CDC data
67
Data and Election 2012 (cont.)
…that was just one of several ways that Mr. Obama’s campaign
operations, some unnoticed by Mr. Romney’s aides in Boston, helped
save the president’s candidacy. In Chicago, the campaign recruited a
team of behavioral scientists to build an extraordinarily sophisticated
database
…that allowed the Obama campaign not only to alter the very nature of
the electorate, making it younger and less white, but also to create a
portrait of shifting voter allegiances. The power of this operation
stunned Mr. Romney’s aides on election night, as they saw voters they
never even knew existed turn out in places like Osceola County, Fla.
-- New York Times, Wed Nov 7, 2012
The White House Names Dr. DJ Patil as the First U.S. Chief Data
Scientist, Feb. 18th 2015
68
A history of the (Business)
Internet: 1997
PageRank: The web as a behavioral dataset
Sponsored search
Sponsored search
Google revenue around $50 bn/year from marketing,
97% of the companies revenue.
Sponsored search uses an auction – a pure competition
for marketers trying to win access to consumers.
In other words, a competition for models of consumers –
their likelihood of responding to the ad – and of
determining the right bid for the item.
There are around 30 billion search requests a month.
Perhaps a trillion events of history between search
providers.
Google Adwords and Adsense
What can you do with the data?
Traffic Prediction and Earthquake Warning
to produce: