0% found this document useful (0 votes)
138 views

Unit I Introduction To Data Science

Here are a few examples of how data science impacts society in positive ways: - Data-driven hospitals can use patient data and advanced analytics to improve care, develop more effective treatments, and save lives. Precise data allows doctors to better diagnose illnesses and predict high-risk patients. - Data science solutions help make cities more sustainable by optimizing public transit routing, reducing traffic, decreasing energy consumption, and lowering emissions through smart infrastructure and electric vehicles. - Analyzing social media and other online data allows nonprofit organizations and governments to more efficiently target relief efforts, identify needs, and aid vulnerable groups during crises or disasters. Data helps ensure resources reach those who need it most.

Uploaded by

Jaydeep Dodiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views

Unit I Introduction To Data Science

Here are a few examples of how data science impacts society in positive ways: - Data-driven hospitals can use patient data and advanced analytics to improve care, develop more effective treatments, and save lives. Precise data allows doctors to better diagnose illnesses and predict high-risk patients. - Data science solutions help make cities more sustainable by optimizing public transit routing, reducing traffic, decreasing energy consumption, and lowering emissions through smart infrastructure and electric vehicles. - Analyzing social media and other online data allows nonprofit organizations and governments to more efficiently target relief efforts, identify needs, and aid vulnerable groups during crises or disasters. Data helps ensure resources reach those who need it most.

Uploaded by

Jaydeep Dodiya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

INTRODUCTION TO

DATA SCIENCE
Dr Purnima Gandhi
LEARNING OBJECTIVES

• Data Science – Goal and Importance


• Applications of Data Science
• Data Science and Related Fields
• Different Computing Environments
CONTENTS

• Why should study Data Science?


• How Does Data Science Impact Organizations?
• Application and Competitive Advantage of Data
Science in Organization
• Importance of Data Science to Society
• Road to Become a Data Scientist
WHY WE ARE TALKING ABOUT
DATA SCIENCE?

Source: https://bit.ly/31HBHuQ
WHAT IS DATA SCIENCE?

• “Data Science is a new term. But in the same sense as


Columbus was discovered NEW Continent 1000 years
ago.”

Hector Garcia-Molina

Professor, Departments of
Computer Science and
Electrical Engineering,
Stanford University
WHAT IS DATA SCIENCE?

A multi-disciplinary and
emerging field that uses
scientific methods, processes,
algorithms, and systems to
extract knowledge and insights
from structured and
unstructured data.

Source: https://bit.ly/30dekJB
What is data science?
Data Science Service Change
Applying advanced Converting new data
statistical tools to insights into (often
existing data to small) changes to
generate new insights business processes

Smarter Work
More efficient and effective use of staff and resources
WHAT IS DATA SCIENCE?
• “ Concept to unify statistics, data analysis, machine
learning, and their related methods" in order to
"understand and analyze actual phenomena" with data.

• employs techniques and


theories drawn from many
fields within the context of
mathematics, statistics,
computer science, and
information science.
Source: https://bit.ly/2YTRQ3w
Data Science – A Definition

Data Science is the science which uses computer science, statistics and machine
learning, visualization and human-computer interactions to collect, clean,
integrate, analyze, visualize, interact with data to create data products.

9
Goal of Data Science

Turn data (Data science principles apply to all data


– big and small) into data products and create
business value.
WHAT IS DATA SCIENCE?
Data Science
Data Science
WHAT IS DATA SCIENCE?
Fourth Paradigm of Science
• Thousand of years
- Empirical
• Few hundred of years
- Theoretical
• Last fifty years
- Computational
- “Query the world”
• Last twenty years
- eScience (Data Science)
- “Download the world”
WHAT IS DATA SCIENCE?
Data Science and others
• Statistics
• Big Data Analytics
• Business Analytics
• Business Intelligence
• Data(base) Management
• Visualization
• Machine Learning
• Data Mining
• Artificial Intelligence
• Predictive Modelling
WHAT IS DATA SCIENCE?
Big Data Science Tasks
• Facebooks
• Amazon
• Google
• Linkedln
• Netflix
• Rozetka
• Microsoft
WHAT IS DATA SCIENCE?
Regular Data Science
• Data Analysis
• Modelling Statistics
• Engineering / Prototyping
WHAT IS DATA SCIENCE?
What do people look for in a datascientist?
WHAT IS DATA SCIENCE?
What do people look for in a datascientist?
What is NOT data science?
 This  Not that
Service change Academic research

Major overhauls /
Small changes
service disruptions

Collecting new
Use existing data
data (mostly ;)
WHAT IS DATA SCIENCE?
Data Science Roles
WHAT IS DATA SCIENCE?
Roles Required in Data Science Project

Source: https://bit.ly/2z5sYqf
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Data Scientists need to know how to “CODE”
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Other languages, tools, platforms and visualization
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Libraries
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Libraries
WHAT IS DATA SCIENCE?
Learning Data Science with Python - Tools
WHAT IS DATA SCIENCE?
How to become a data scientist?
• Learn to code
What’s in the Data Science SF Toolkit?
Statistical Methods Tools User Experience Research

Sentiment Time series analysis


analysis Data mining

Multilevel
Missing data
modeling imputations Classification and
clustering
Survival analysis
Pattern recognition
Principal component
and factor analysis
AB testing Machine learning
Forecasting
Propensity score Logistic, multinomial
matching and multiple linear
regression techniques Network analysis
What’s in the Data Science SF Toolkit?
Statistical Methods Tools User Experience Research

Languages Libraries Data Engineering Visualization


Python SciPy Profiling D3.js
R Pandas ETL Gephi
SQL Scikit-learn Job notices R
Javascript GPText APIs Leaflet
NodeJS OpenNLP Optimized data PowerBI
Mahout pipelines ggplot2
+many others Optimized data shiny
storage/access
What’s in the Data Science SF Toolkit?
Statistical Methods Tools User Experience Research

Iterative
Prototyping Photo journaling
and documenting
Service
blueprinting
Journey mapping
Ride-alongs
Process mapping
Ethnographic field
research and user
observation Usability testing
WHAT IS DATA SCIENCE?
Data Scientist need to comfortable with:
WHAT IS DATA SCIENCE?
Data Scientist need to learn machine learning and software
engineering
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
WHAT IS DATA SCIENCE?
Who are the Data Scientist?
APPLICATION OF DATA SCIENCE
APPLICATIONS OF DATA SCIENCE

• Security
APPLICATIONS OF DATA SCIENCE

• Sports
APPLICATIONS OF DATA SCIENCE

• Banking and Finance


APPLICATIONS OF DATA SCIENCE

• Internet Search
APPLICATIONS OF DATA SCIENCE

• Digital Advertisements
APPLICATIONS OF DATA SCIENCE

• Recommender System
APPLICATIONS OF DATA SCIENCE

• Image Processing
APPLICATIONS OF DATA SCIENCE

• Speech Recognition
APPLICATIONS OF DATA SCIENCE

• Gaming
APPLICATIONS OF DATA SCIENCE

• Price Comparison Websites


APPLICATIONS OF DATA SCIENCE

• Airline Routing Planning


APPLICATIONS OF DATA SCIENCE

• Fraud and Risk Detection


APPLICATIONS OF DATA SCIENCE

• Delivery Logistics
APPLICATIONS OF DATA SCIENCE

• Internet of Things (IoT)


APPLICATIONS OF DATA SCIENCE

• Health Care
APPLICATIONS OF DATA SCIENCE

• Augmented Reality
APPLICATIONS OF DATA SCIENCE

• Self-Driving Cars
APPLICATIONS OF DATA SCIENCE

• Robots
IMPACT OF DATA SCIENCE ON SOCIETY
IMPACT OF DATA SCIENCE ON
SOCIETY
• Saving Energy
IMPACT OF DATA SCIENCE ON
SOCIETY
• Data-Driven Hospitals
IMPACT OF DATA SCIENCE ON
SOCIETY
• A Cleaner Environment
IMPACT OF DATA SCIENCE ON
SOCIETY
• Volunteer with a socially-oriented data science
program/organization
IMPACT OF DATA SCIENCE ON
SOCIETY
• Contribute via competitions
IMPACT OF DATA SCIENCE ON
SOCIETY
• Consider solutions to real-world problems that you
encounter
IMPACT OF DATA SCIENCE ON
SOCIETY
• Be thoughtful in professional work
Real Life Examples

Companies learn your secrets, shopping patterns,


and preferences
For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case
study

Data Science and election (2008, 2012)


1 million people installed the Obama Facebook app that gave access to info on “friends”
Some recent ML
Competitions at
https://www.kaggle.c
om/

NIST Pre-Pilot Data


Science Evaluation –
likely to be
incorporated to be
part of Labs/Final
project
Data Science: Why all the Excitement?
Exciting new effective
applications of data analytics

e.g.,
Google Flu Trends:

Detecting outbreaks
two weeks ahead
of CDC data

New models are estimating


which cities are most at risk
for spread of the Ebola virus.

Prediction model is built on


Various data sources,
types and analysis.
66
Why the all the Excitement?
Predicting political
champagne and election
Outcome:

67
Data and Election 2012 (cont.)
…that was just one of several ways that Mr. Obama’s campaign
operations, some unnoticed by Mr. Romney’s aides in Boston, helped
save the president’s candidacy. In Chicago, the campaign recruited a
team of behavioral scientists to build an extraordinarily sophisticated
database

…that allowed the Obama campaign not only to alter the very nature of
the electorate, making it younger and less white, but also to create a
portrait of shifting voter allegiances. The power of this operation
stunned Mr. Romney’s aides on election night, as they saw voters they
never even knew existed turn out in places like Osceola County, Fla.
-- New York Times, Wed Nov 7, 2012
The White House Names Dr. DJ Patil as the First U.S. Chief Data
Scientist, Feb. 18th 2015

68
A history of the (Business)
Internet: 1997
PageRank: The web as a behavioral dataset
Sponsored search
Sponsored search
Google revenue around $50 bn/year from marketing,
97% of the companies revenue.
Sponsored search uses an auction – a pure competition
for marketers trying to win access to consumers.
In other words, a competition for models of consumers –
their likelihood of responding to the ad – and of
determining the right bid for the item.
There are around 30 billion search requests a month.
Perhaps a trillion events of history between search
providers.
Google Adwords and Adsense
What can you do with the data?
Traffic Prediction and Earthquake Warning

Crowdsourcing + physical modeling + sensing + data assimilation

to produce:

From Alex Bayen, UCB, Director, Institute for Transportation Studies


73
Other Data Science Applications
Transaction Databases → Recommender systems (NetFlix), Fraud
Detection (Security and Privacy)

Wireless Sensor Data → Smart Home, Real-time Monitoring, Internet


of Things

Text Data, Social Media Data → Product Review and Consumer


Satisfaction (Facebook, Twitter, LinkedIn), E-discovery

Software Log Data → Automatic Trouble Shooting (Splunk)

Genotype and Phenotype Data → Epic, 23andme, Patient-Centered


Care, Personalized Medicine
IMPORTANCE OF DATA SCIENCE
IMPORTANCE OF DATA SCIENCE

1. Data science helps brands to understand their


customers in a much enhanced and empowered
manner.
2. It allows brands to communicate their story in such
a engaging and powerful manner.

3. Big Data is a new field that is constantly growing


and evolving.
IMPORTANCE OF DATA SCIENCE

4. Its findings and results can be applied to almost any


sector like travel, healthcare and education among
others.
5. Data science is accessible to almost all sectors.
Road to become a Data Scientist
REFERENCES
• https://slideplayer.com/slide/10398517/
• https://www.slideshare.net/ryanorban/how-to-become-a-data-scientist
• Dhar, V.(2013). "Data science and prediction". Communications of the ACM. 56 (12):64–73. doi:10.1145/2500499.
• Hayashi, Chikio (1 January 1998). "What is Data Science? Fundamental Concepts and a Heuristic Example". In
Hayashi, Chikio; Yajima, Keiji; Bock, Hans-Hermann; Ohsumi, Noboru; Tanaka, Yutaka; Baba, Yasumasa (eds.). Data
Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge
Organization. Springer Japan. pp. 40–51. doi:10.1007/978-4-431-65950-1_3. ISBN 9784431702085.
• Davenport, Thomas H.; Patil, DJ (October 2012), Data Scientist: The Sexiest Job of the 21st Century, Harvard
Business Review
• Jeff Leek (12December 2013)."The key word in "Data Science" is not Data, it is Science". Simply Statistics.
• https://www.analyticsvidhya.com/blog/2015/09/applications-data-science/
• https://www.edureka.co/blog/data-science-applications/
• https://dutchdatascienceweek.nl/2018/04/05/the-impact-of-data-science-on-society/
• https://www.educba.com/data-science-and-its-growing-importance/

You might also like