A Crash Course in Data Science Review

This document provides an overview of key concepts in data science, including definitions of data science, areas of statistics, machine learning techniques, software engineering practices, the structure of data science projects, outputs, and hallmarks of successful experiments. It discusses descriptive, inferential, and predictive statistics, as well as supervised and unsupervised machine learning. The document also outlines tools used in data science.

Uploaded by

huka

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views

A Crash Course in Data Science Review

Uploaded by

huka

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

A Crash Course in Data

Science- Review
Kartikeya Bolar

1
What is Data Science?
• Applied Statistics ?
• Applied Machine Learning ?
• Database Management ?
• Answering Specific Questions with Data ?
• Deep Learning ?

2
Broad Areas of Statistics
• Descriptive - Involves Basic Summary Tables and Exploratory Data Analysis
Focus is on Profiling and exploring potential relationships)
• Inferential - Process of Drawing Conclusions about a Population from a Sample.
Focus on Parameter Estimates and Relationship between Parameters)
• Predictive - Process of getting predictions irrespective of the statistical significance of
the relationship
Focus is on Sampling variance
• Experimental Design - Balancing observed and unobserved covariates that may
contaminate our results
Focus is on Cause and Effect

3
Machine Learning
• Obtain generalizability by testing on novel datasets
Types
• Supervised ( Focus on prediction through prediction performance)
• Unsupervised ( Clustering , Association, Principal Component)
• Traditional statistical approaches often differ from ML approaches
• By often placing a higher priority on parameter interpretability and simplicity
(model specification) over prediction performance

4
Software engineering for Data Science
• Software engineering is used to generalize data analyses into software so that
they can be applied in different situations
• Software packages provide a well-defined interface that can abstract low-level
technical details of data analysis routines
• Developing a function or a package depends upon the level of repetition of the
procedure or steps

5
Structure of Data Science Project
• A Data Science Project might start
with Exploratory Data Analysis or
Defining /Stating the Question

• Decision making is not the part of

data analysis process

6
Output of Data Science Experiment - 1
Output Types Characteristics
• Reports • Clearly written
• Presentations • Narrative
• Concise Conclusions
• Omit the Unnecessary
• Reproducible
• Tools : Rmarkdown Knittr, Presenter

7
Output of Data Science Experiment -2
Output Types Characteristics
• Interactive web pages (Dashboards) • Easy to use
• Apps • Documentation
• Code commented
• Version Control
• Tools : Rmarkdown Shiny, Shiny
WebApp, Tableau, PowerBI

8
Hallmarks of Successful Data Science
Experiment
• New knowledge is created.
• Decisions or policies are made based on the outcome of the experiment.
• A report, presentation or app with impact is created.
• It is learned that the data can't answer the question being asked of it.

9
Data scientist’s toolbox
• Data programming languages ( Eg. R ,Python)
• Scaling computing frameworks ( Eg. Apache Spark, Hadoop Map Reduce)
• Web servers (Eg. Amazon Web Service , RStudio Cloud, Azure)
• Help websites(Eg. Stack overflow)
• Databases (Eg. Sqlserver, Excel)
• Chat tools (Eg. Slack)
• Reproducibility tools (Eg. Rmarkdown)
• Data products development tools(Eg. Shiny, Tableau ,PowerBI)

10
Separating Hype from Value
• What is the question you are trying to
answer with the data?
• Do you have the data to answer that
question?
• If you could answer the question,
could you use the answer?

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (78)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
Sample Mental Health Progress Note
96% (47)
Sample Mental Health Progress Note
3 pages
2025 MandateForLeadership FULL
70% (10)
2025 MandateForLeadership FULL
920 pages
How To Kiss A Woman's Breast
60% (114)
How To Kiss A Woman's Breast
14 pages
Get Out of My Head Meredith Arthur Leah Rosenberg Booktree
86% (7)
Get Out of My Head Meredith Arthur Leah Rosenberg Booktree
98 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Download full Jobs to Be Done Playbook The Jim Kalbach ebook all chapters
100% (3)
Download full Jobs to Be Done Playbook The Jim Kalbach ebook all chapters
34 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Professional Data Science Course by MAGES Institute
No ratings yet
Professional Data Science Course by MAGES Institute
24 pages
Memory Based Reasoning - BIA
100% (1)
Memory Based Reasoning - BIA
19 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages
Pert 7 - Ethics and Privacy
No ratings yet
Pert 7 - Ethics and Privacy
18 pages
DCCN Notes
No ratings yet
DCCN Notes
27 pages
Data Science New
No ratings yet
Data Science New
9 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
Download full Text Analytics with Python A Practical Real World Approach to Gaining Actionable Insights from Your Data 1st Edition Dipanjan Sarkar ebook all chapters
100% (1)
Download full Text Analytics with Python A Practical Real World Approach to Gaining Actionable Insights from Your Data 1st Edition Dipanjan Sarkar ebook all chapters
55 pages
1 1 Intro To Data and Data Science Course Notes
No ratings yet
1 1 Intro To Data and Data Science Course Notes
8 pages
Cns Lessonplan
No ratings yet
Cns Lessonplan
2 pages
Data Science Answers
No ratings yet
Data Science Answers
2 pages
Knowledge Representation Issue
No ratings yet
Knowledge Representation Issue
18 pages
Sentiment Analysis Using Natural Language Processing
No ratings yet
Sentiment Analysis Using Natural Language Processing
7 pages
Multivariate Regression Model-LE
No ratings yet
Multivariate Regression Model-LE
5 pages
DBMS Notes
No ratings yet
DBMS Notes
141 pages
RM4151 Class Notes3
No ratings yet
RM4151 Class Notes3
14 pages
2.2 ML Session Bias Variance Tradeoffs
No ratings yet
2.2 ML Session Bias Variance Tradeoffs
38 pages
Data Mining Handout
No ratings yet
Data Mining Handout
4 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
20 pages
Data Science Course Agenda
No ratings yet
Data Science Course Agenda
29 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
ARTIFICIAl iNTELLIGENCE Unit III &iv
No ratings yet
ARTIFICIAl iNTELLIGENCE Unit III &iv
39 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
20 pages
Optimization Technique Course Objective
No ratings yet
Optimization Technique Course Objective
1 page
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Data Science Project
No ratings yet
Data Science Project
3 pages
NLP Lab Tasks
No ratings yet
NLP Lab Tasks
16 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Types of Analytics
No ratings yet
Types of Analytics
10 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
5 pages
Assignment-2 Data Visualization and Data Preprocessing
No ratings yet
Assignment-2 Data Visualization and Data Preprocessing
1 page
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
UNIT 3 DV (1)
No ratings yet
UNIT 3 DV (1)
44 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Lecture Notes: Introduction To Data Science and Big Data
No ratings yet
Lecture Notes: Introduction To Data Science and Big Data
5 pages
AI Course Outline
0% (1)
AI Course Outline
2 pages
Data Science Course Content
No ratings yet
Data Science Course Content
4 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
ARM7 Processor Architecture
No ratings yet
ARM7 Processor Architecture
33 pages
K-Means Clustering Using Python
No ratings yet
K-Means Clustering Using Python
30 pages
Download ebooks file Data analytics Anil Maheshwari all chapters
100% (1)
Download ebooks file Data analytics Anil Maheshwari all chapters
47 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Game Playing: Adversarial Search
No ratings yet
Game Playing: Adversarial Search
66 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Unit-8: Natural Language: Processing
No ratings yet
Unit-8: Natural Language: Processing
16 pages
Chi Merge
No ratings yet
Chi Merge
5 pages
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
No ratings yet
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
14 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
49 pages
Excel 2013/2016: Get Your Hands Dirty
From Everand
Excel 2013/2016: Get Your Hands Dirty
Sam Akrasi
No ratings yet
Science Portfolio
No ratings yet
Science Portfolio
44 pages
Interview 12
No ratings yet
Interview 12
1 page
BQ - PayTm Everywhere - But Profitability Not in Sight1
No ratings yet
BQ - PayTm Everywhere - But Profitability Not in Sight1
5 pages
Aviation: Sector Reviving, But Valuation Factored in
No ratings yet
Aviation: Sector Reviving, But Valuation Factored in
11 pages
Cost of Debt Calculations PDF
No ratings yet
Cost of Debt Calculations PDF
2 pages
Vba
No ratings yet
Vba
3 pages
Supply Chain: Keep Them Solvent For 27 Days Pay Suppliers Earlier
No ratings yet
Supply Chain: Keep Them Solvent For 27 Days Pay Suppliers Earlier
6 pages
Crowdfunding in Fintech
No ratings yet
Crowdfunding in Fintech
6 pages
BQ - PayTm Everywhere - But Profitability Not in Sight1
No ratings yet
BQ - PayTm Everywhere - But Profitability Not in Sight1
5 pages
Digitisation Related GD
No ratings yet
Digitisation Related GD
13 pages
Project Management in The Cloud: Business Transformation Through IT Transformation
No ratings yet
Project Management in The Cloud: Business Transformation Through IT Transformation
3 pages
IT PRJ Management Learn Full Subjects Handled The Lus Application Daily Scrum Meetings It Consulting Slides Analytics Finish Stocks
No ratings yet
IT PRJ Management Learn Full Subjects Handled The Lus Application Daily Scrum Meetings It Consulting Slides Analytics Finish Stocks
1 page
Sony VGN-FZ MBX-165 - MS90 - Rev - 1.0 PDF
No ratings yet
Sony VGN-FZ MBX-165 - MS90 - Rev - 1.0 PDF
72 pages
LPDDR4 Ug583 Ultrascale PCB Design
No ratings yet
LPDDR4 Ug583 Ultrascale PCB Design
225 pages
A Algorithm
No ratings yet
A Algorithm
22 pages
Service Validation and Testing Process Template
No ratings yet
Service Validation and Testing Process Template
3 pages
File Management Organisation of Data Exercise APC 6
No ratings yet
File Management Organisation of Data Exercise APC 6
4 pages
Toni Expert Operating Instructions Structural Shape 6215: Building Materials Testing
No ratings yet
Toni Expert Operating Instructions Structural Shape 6215: Building Materials Testing
42 pages
VN-34_Radiologic Technologist (G-5)
No ratings yet
VN-34_Radiologic Technologist (G-5)
3 pages
Integers 2 Jeopardy
No ratings yet
Integers 2 Jeopardy
51 pages
Hayden Cooper - Managing Your Digital Footprint
No ratings yet
Hayden Cooper - Managing Your Digital Footprint
1 page
ZTE UMTS Intelligent Carrier Power Off On Feature Guide
No ratings yet
ZTE UMTS Intelligent Carrier Power Off On Feature Guide
37 pages
CCNP ENARSI v4.0 (5 Nov 2021)
No ratings yet
CCNP ENARSI v4.0 (5 Nov 2021)
198 pages
AG198 Datasheet
No ratings yet
AG198 Datasheet
1 page
VPN1 06
No ratings yet
VPN1 06
10 pages
Snake Game Made by Python
No ratings yet
Snake Game Made by Python
4 pages
80066cajournal May2024 17
No ratings yet
80066cajournal May2024 17
8 pages
ANDROID (Operating System) : 1) S.Kannan 2) R.Senthil Kumar
67% (3)
ANDROID (Operating System) : 1) S.Kannan 2) R.Senthil Kumar
5 pages

A Crash Course in Data Science Review

Uploaded by

A Crash Course in Data Science Review

Uploaded by

A Crash Course in Data

• Decision making is not the part of

You might also like