1.Statistics and Probability (1)

This document outlines a 1.5-hour course focused on statistics and probability, specifically tailored for Data Science interviews. It covers key topics such as experimental design, theoretical distributions, and statistical tests, along with practical examples and questions. The course aims to provide an overview of common themes and practice problems rather than an exhaustive list of statistics concepts.

Uploaded by

lakshmisai1190

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

1.Statistics and Probability (1)

Uploaded by

lakshmisai1190

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Statistics and Probability

In this 1.5 hour course, we'll cover everything from experimental design to the various
distribution questions you'll need to know, all focused on interview questions that are
extremely common in DS interviews.
What this won't be:

• An exhaustive list of every stats concept out there

• A passive stats lecture
• A document containing everything you could ever see in a stats related interview
What this will be:

• An overview of the most common topics and themes across full time Data Science
interviews

• A set of practice problems curated to test those common themes and let you know what
you know well, and what you don't
How to best sit for this session:
1. Take notes on topics you don't know for self study later
2. Ask copious questions, please interrupt if you'd like

Experimental Design:
• A/B testing design
◦ Sample Size
◦ Test Length
◦ Applications for recommendation algorithms
◦ Ability to implement an A/B test, python/R skills if handed the result of an A
◦ Communication around A/B testing is vital

Experiment interpretation:
• p values (how to interpret, not to interpret)
• Confidence intervals

◦ CI Generation, interpretation

Example A: Let's say your company has run an experiment and has found that there is a p-
value of .04 between the two. How do you figure out if the test is valid?

Example B: Find all the errors in this A/B testing design

Airbnb wants to figure out how changing their logo affects how often people return to their
website. In order to do this, they run a test where 1% of users see logo A (new) and 99% see
logo B and measure the return rates for their users. Every week, they run a t-test on the
results for the experiment. This week, they got a p-value of .02 and stopped the experiment,
and are going to switch their logo.

Example C: How would you explain to a non-technical coworker what Confidence Intervals
are?
Example D: When does Bootstrapping fail?

Theoretical Distributions:
• Should know:
◦ Mean/variance formulae
◦ How to detect when described
◦ What are necessary conditions e.g. poisson requires events to be independent and
number of events must be unbounded)

• Normal/Gaussian
◦ symmetric, unimodal, and asymptotic
◦ Useful in Law of Large Numbers/Central Limit Theorem
• Binomial:
◦ multiple trials where each trial can succeed or fail
• Bernoulli:
◦ binomial when n = 1
◦ only possibilities are success and failure (coin flip)
• Geometric:
◦ trials before Bernoulli success
• Poisson:
◦ observed independent events over fixed time period
◦ The key parameter is the mean events over the fixed time period, often denotes as
lambda

◦ e.g. catching fish

Example A:
Say you have two options for how to deliver an ad in a newsletter. For the first option, you
will put an ad at the top and bottom of every newsletter. For the second, at every paragraph,
you have a ten percent chance of placing an ad there. What type of distribution are each of
these?

Example B: continuing from above, what is the variance and expected value for each
option?

Example C: continuing from above, how often would you expect ads to be shown right
next to each other? if we have 10 paragraphs, what is the expected number of adjacent
ads?

Example D: continuing from above, what about your answers for A-C would change if in
scenario two you were not allowed to run two ads in a row?

Example E:

Say you were a data scientist for McDonalds, and you had data about cars coming through
the drive-through line at a specific location over a time period. What theoretical distribution
could you fit this data to? How would you use this data to figure out how many people to
keep running the drive-through?
Law Of Large Numbers:
• In the long term, running an experiment (or values from a distribution) give us a
representative average

Central Limit Theorem:

• Sampling from any distribution over time produces a normal distribution IF all samples
are equal in size

• standard deviation = st_dev / N^.5

If we viewed the number of ads shown in example B from above over time, what distribution
would the resulting values be most similar to? What would the mean be? How would you find
out what the expected 5th percentile of samples?

Power Analysis:
• Sample size calculations for setting up experiments
• How to Interpret statistical power

Statistical Tests:
• T-test: testing means
• Z-test: normality assumption
• Chi-square test: categorical data

Bayes Rule:
• How to apply to common situations
• P(A|B) = (P(A) * P(B|A)) / P(B)

Questions if we have time:

Let's say we pay people to watch soccer games and rate how good the soccer game is.
For 80% of raters: they have a 60% of rating the quality as high and a 40% chance as low
For 20% of raters, they have a 100% chance of rating a game as good

What is the probability that a random rater rates a game as good?

What if we have 100 raters for the PSG vs Bayern game the other day, and they all rate it
independently. What is the expected number of good ratings?

What if we have 100 games that are all rated by the same person independently. What is the
expected number of good ratings?

If we have 3 games that were all rated by the same person, and they are all rated as good.
What is the probability the rater was in the 80% vs the 20%?

Best Places to Learn about Stats:

Khan Academy
Interview Query
Naked Statistics

For help with statistical intuition

Chris Albon's Machine Learning Flashcards
TidyTuesday Data
for data cleaning/analysis practice, great for take home practice
ISLR

DLL - Math 10 - Q3
92% (12)
DLL - Math 10 - Q3
3 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
AP Statistics 핵심정리
No ratings yet
AP Statistics 핵심정리
20 pages
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
No ratings yet
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
10 pages
Crush Hypothesis Testing
From Everand
Crush Hypothesis Testing
Allison Dillard
No ratings yet
Parametric and non parametric test
No ratings yet
Parametric and non parametric test
76 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
Real Statistics Examples Distributions
No ratings yet
Real Statistics Examples Distributions
472 pages
Real Statistics Examples Distributions
No ratings yet
Real Statistics Examples Distributions
491 pages
Introduction To Data Science Exploratory Data Analysis
No ratings yet
Introduction To Data Science Exploratory Data Analysis
55 pages
1 - Basic Probability Theory
No ratings yet
1 - Basic Probability Theory
58 pages
اسايمنت
No ratings yet
اسايمنت
28 pages
Probability
No ratings yet
Probability
22 pages
Statistics
No ratings yet
Statistics
36 pages
Probability and Statistics Ver.6 - May2013 PDF
100% (1)
Probability and Statistics Ver.6 - May2013 PDF
129 pages
Theoretical Questions in Basic Business Statistics
No ratings yet
Theoretical Questions in Basic Business Statistics
12 pages
probability&stats
No ratings yet
probability&stats
12 pages
Course Outline - Probability & Statistics (05-03-2021)
No ratings yet
Course Outline - Probability & Statistics (05-03-2021)
4 pages
Bab III Integral Ganda
No ratings yet
Bab III Integral Ganda
396 pages
ML UNIT-3
No ratings yet
ML UNIT-3
18 pages
Intro To Prob Theory
No ratings yet
Intro To Prob Theory
302 pages
Humansci 1
No ratings yet
Humansci 1
302 pages
ML2_Math_Algo
No ratings yet
ML2_Math_Algo
72 pages
Statistics Interview Questions & Answers For Data Scientists
No ratings yet
Statistics Interview Questions & Answers For Data Scientists
43 pages
Words of Wisdom
No ratings yet
Words of Wisdom
17 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
E-Note 20895 Content Document 20240607120458PM
No ratings yet
E-Note 20895 Content Document 20240607120458PM
202 pages
Static Tics
No ratings yet
Static Tics
47 pages
CENG3300 Lecture 2-2
No ratings yet
CENG3300 Lecture 2-2
23 pages
Advance Statistics
No ratings yet
Advance Statistics
292 pages
Course Outline - Probability & Statistics (14-02-2022)
No ratings yet
Course Outline - Probability & Statistics (14-02-2022)
4 pages
MMW Notes
No ratings yet
MMW Notes
10 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
156 pages
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
100% (9)
(Probability and Statistics For Programmers) Allen Downey - Think Stats. Probability and Statistics For programmers-O'Reilly Media (2012) PDF
142 pages
UNIT 1 SSMDA NOTES
No ratings yet
UNIT 1 SSMDA NOTES
35 pages
Statistics Cheatsheet
No ratings yet
Statistics Cheatsheet
3 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Business Statistics - Sessions 4 To 7
No ratings yet
Business Statistics - Sessions 4 To 7
43 pages
Download (Ebook) Statistics - Compendium by Brink D. ISBN 9788776814083, 8776814084 ebook All Chapters PDF
No ratings yet
Download (Ebook) Statistics - Compendium by Brink D. ISBN 9788776814083, 8776814084 ebook All Chapters PDF
81 pages
Lecture Note Sse2193
33% (3)
Lecture Note Sse2193
251 pages
Stats Must Knows
No ratings yet
Stats Must Knows
5 pages
DOC-20240509-WA0008.
No ratings yet
DOC-20240509-WA0008.
157 pages
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
100% (7)
Solutions Manual to accompany Miller & Freund’s Probability and Statistics for Engineers 8th edition 0321640772 - Download Instantly To Experience The Full Content
51 pages
Chapter-2
No ratings yet
Chapter-2
23 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Statistical Formula Sheet 1: X X N X N X F X N
No ratings yet
Statistical Formula Sheet 1: X X N X N X F X N
11 pages
Problem Set 2
No ratings yet
Problem Set 2
18 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
F.Y. Maths PPT On Probability and Statistics
No ratings yet
F.Y. Maths PPT On Probability and Statistics
10 pages
Statistics For Data Analytics
No ratings yet
Statistics For Data Analytics
15 pages
E-Note_24354_Content_Document_20240917024357PM
No ratings yet
E-Note_24354_Content_Document_20240917024357PM
4 pages
ReviewExam1
No ratings yet
ReviewExam1
3 pages
Statistics Compendium 1st edition Edition Brink D. pdf download
100% (5)
Statistics Compendium 1st edition Edition Brink D. pdf download
84 pages
Book IntroStatistics PDF
No ratings yet
Book IntroStatistics PDF
263 pages
2024 F STA-1005ab Review Problems for the Final Exam
No ratings yet
2024 F STA-1005ab Review Problems for the Final Exam
65 pages
Probability and Statistic
100% (1)
Probability and Statistic
132 pages
Probability and Statistics
No ratings yet
Probability and Statistics
5 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Diving for Hidden Treasures: Uncovering the Cost of Delay in Your Project Portfoilo
From Everand
Diving for Hidden Treasures: Uncovering the Cost of Delay in Your Project Portfoilo
Johanna Rothman
No ratings yet
Mat490 Week 1
No ratings yet
Mat490 Week 1
3 pages
Axdif
No ratings yet
Axdif
38 pages
Lesson Plan
No ratings yet
Lesson Plan
5 pages
Chance Year 8
No ratings yet
Chance Year 8
16 pages
Monte Carlo Method
0% (1)
Monte Carlo Method
23 pages
SEMIdetailed Lesson Plan CO2
No ratings yet
SEMIdetailed Lesson Plan CO2
7 pages
Chapter 05 - Quiz
33% (3)
Chapter 05 - Quiz
13 pages
Chapter One:: Prepared by Md. Sagar Rana Lecturer Banking and Insurance University of Rajshahi
No ratings yet
Chapter One:: Prepared by Md. Sagar Rana Lecturer Banking and Insurance University of Rajshahi
15 pages
ACC246 Lec Notes 2
No ratings yet
ACC246 Lec Notes 2
7 pages
Prediction Statistics for Psychological Assessment 1st Edition Dr. R. Karl Hanson Phd - The latest updated ebook is now available for download
100% (1)
Prediction Statistics for Psychological Assessment 1st Edition Dr. R. Karl Hanson Phd - The latest updated ebook is now available for download
69 pages
Session 13 &14 Continuous Probability Distribution
No ratings yet
Session 13 &14 Continuous Probability Distribution
50 pages
Problem 1: University of San Jose-Recoletos School of Business and Management Accountancy and Finance Department
No ratings yet
Problem 1: University of San Jose-Recoletos School of Business and Management Accountancy and Finance Department
2 pages
MDM4U 4-5 Summative
No ratings yet
MDM4U 4-5 Summative
5 pages
Bayes' Theorem
No ratings yet
Bayes' Theorem
15 pages
Math 2 Unit 9 - Probability: Lesson 1: "Sample Spaces, Subsets, and Basic Probability"
100% (1)
Math 2 Unit 9 - Probability: Lesson 1: "Sample Spaces, Subsets, and Basic Probability"
87 pages
Q3 Random Variables and Probability Distribution
No ratings yet
Q3 Random Variables and Probability Distribution
12 pages
Ceng222 hw1
No ratings yet
Ceng222 hw1
4 pages
Chapter 3-2857
No ratings yet
Chapter 3-2857
8 pages
Discrete Time Markov Chain
No ratings yet
Discrete Time Markov Chain
1 page
Martinmath30 2outline
No ratings yet
Martinmath30 2outline
3 pages
Prob Standard Math
No ratings yet
Prob Standard Math
30 pages
Reliability Engineering
No ratings yet
Reliability Engineering
477 pages
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
100% (3)
Instructor s Solutions Manual for Probability and Statistics with R for Engineers and Scientists 1st edition Michael Akritas download pdf
79 pages
JSA Form
No ratings yet
JSA Form
5 pages
Group 3 - Chapter 8 - Emperical Philosophers
No ratings yet
Group 3 - Chapter 8 - Emperical Philosophers
12 pages
Appendix Table 7 - Upper Critical Values of Chi-Square Distribution
No ratings yet
Appendix Table 7 - Upper Critical Values of Chi-Square Distribution
2 pages
SEMI-DETAILED LESSON PLAN DAY 9 Week 4
No ratings yet
SEMI-DETAILED LESSON PLAN DAY 9 Week 4
3 pages
Year 11 MA Statistical Analysis HSC 2019 To 2015
No ratings yet
Year 11 MA Statistical Analysis HSC 2019 To 2015
6 pages
STAT 230 Course Notes Fall 2019
No ratings yet
STAT 230 Course Notes Fall 2019
425 pages