Refresher Probabilities Statistics PDF

CS 229 is a machine learning course at Stanford University taught by Shervine Amidi. The document provides an introduction to key concepts in probability and statistics including: 1) Definitions of fundamental probability terms like sample space, events, axioms of probability, and conditional probability. 2) Descriptions of common probability distributions like the cumulative distribution function and probability density function. 3) Explanations of important statistical measures like variance, standard deviation, and expectations that are used to analyze random variables and their distributions. 4) Overviews of additional probability topics like permutations, combinations, independence, partitions, and Bayes' rule.

Uploaded by

prabu2125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views

Refresher Probabilities Statistics PDF

Uploaded by

prabu2125

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CS 229 – Machine Learning https://stanford.

edu/~shervine

n
VIP Refresher: Probabilities and Statistics Remark: for any event B in the sample space, we have P (B) =
X
P (B|Ai )P (Ai ).
i=1

Afshine Amidi and Shervine Amidi r Extended form of Bayes’ rule – Let {Ai , i ∈ [[1,n]]} be a partition of the sample space.
We have:
P (B|Ak )P (Ak )
August 6, 2018 P (Ak |B) = n
X
P (B|Ai )P (Ai )
i=1
Introduction to Probability and Combinatorics
r Sample space – The set of all possible outcomes of an experiment is known as the sample r Independence – Two events A and B are independent if and only if we have:
space of the experiment and is denoted by S. P (A ∩ B) = P (A)P (B)
r Event – Any subset E of the sample space is known as an event. That is, an event is a set
consisting of possible outcomes of the experiment. If the outcome of the experiment is contained
in E, then we say that E has occurred. Random Variables
r Axioms of probability – For each event E, we denote P (E) as the probability of event E r Random variable – A random variable, often noted X, is a function that maps every element
occuring. By noting E1 ,...,En mutually exclusive events, we have the 3 following axioms: in a sample space to a real line.
n
! n
[ X r Cumulative distribution function (CDF) – The cumulative distribution function F ,
(1) 0 6 P (E) 6 1 (2) P (S) = 1 (3) P Ei = P (Ei ) which is monotonically non-decreasing and is such that lim F (x) = 0 and lim F (x) = 1, is
x→−∞ x→+∞
i=1 i=1
defined as:
F (x) = P (X 6 x)
r Permutation – A permutation is an arrangement of r objects from a pool of n objects, in a
given order. The number of such arrangements is given by P (n, r), defined as: Remark: we have P (a < X 6 B) = F (b) − F (a).
n!
P (n, r) = r Probability density function (PDF) – The probability density function f is the probability
(n − r)! that X takes on values between two adjacent realizations of the random variable.
r Relationships involving the PDF and CDF – Here are the important properties to know
r Combination – A combination is an arrangement of r objects from a pool of n objects, where in the discrete (D) and the continuous (C) cases.
the order does not matter. The number of such arrangements is given by C(n, r), defined as:
P (n, r) n!
C(n, r) = = Case CDF F PDF f Properties of PDF
r! r!(n − r)! X X
(D) F (x) = P (X = xi ) f (xj ) = P (X = xj ) 0 6 f (xj ) 6 1 and f (xj ) = 1
Remark: we note that for 0 6 r 6 n, we have P (n,r) > C(n,r).
xi 6x j
ˆ x ˆ +∞
dF
Conditional Probability (C) F (x) = f (y)dy f (x) = f (x) > 0 and f (x)dx = 1
−∞ dx −∞
r Bayes’ rule – For events A and B such that P (B) > 0, we have:
P (B|A)P (A)
P (A|B) = r Variance – The variance of a random variable, often noted Var(X) or σ 2 , is a measure of the
P (B) spread of its distribution function. It is determined as follows:
Remark: we have P (A ∩ B) = P (A)P (B|A) = P (A|B)P (B). Var(X) = E[(X − E[X])2 ] = E[X 2 ] − E[X]2

r Partition – Let {Ai , i ∈ [[1,n]]} be such that for all i, Ai 6= ∅. We say that {Ai } is a partition
if we have: r Standard deviation – The standard deviation of a random variable, often noted σ, is a
n
measure of the spread of its distribution function which is compatible with the units of the
[ actual random variable. It is determined as follows:
∀i 6= j, Ai ∩ Aj = ∅ and Ai = S p
i=1 σ= Var(X)

Stanford University 1 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

r Expectation and Moments of the Distribution – Here are the expressions of the expected r Marginal density and cumulative distribution – From the joint density probability
value E[X], generalized expected value E[g(X)], kth moment E[X k ] and characteristic function function fXY , we have:
ψ(ω) for the discrete and continuous cases:
Case Marginal density Cumulative function
E[X k ]
X XX
Case E[X] E[g(X)] ψ(ω) (D) fX (xi ) = fXY (xi ,yj ) FXY (x,y) = fXY (xi ,yj )
n
X n
X n
X n
X j xi 6x yj 6y
(D) xi f (xi ) g(xi )f (xi ) xki f (xi ) f (xi )eiωxi ˆ ˆ ˆ
+∞ x y
i=1 i=1 i=1 i=1 (C) fX (x) = fXY (x,y)dy FXY (x,y) = fXY (x0 ,y 0 )dx0 dy 0
ˆ +∞ ˆ +∞ ˆ +∞ ˆ +∞
−∞ −∞ −∞

(C) xf (x)dx g(x)f (x)dx xk f (x)dx f (x)eiωx dx

−∞ −∞ −∞ −∞
r Distribution of a sum of independent random variables – Let Y = X1 + ... + Xn with
X1 , ..., Xn independent. We have:
Remark: we have eiωx = cos(ωx) + i sin(ωx). n
Y
ψY (ω) = ψXk (ω)
r Revisiting the kth moment – The kth moment can also be computed with the characteristic
function as follows: k=1

1 ∂k ψ r Covariance – We define the covariance of two random variables X and Y , that we note σXY
2
E[X k ] =
ik ∂ω k or more commonly Cov(X,Y ), as follows:
ω=0

Cov(X,Y ) , σXY
2
= E[(X − µX )(Y − µY )] = E[XY ] − µX µY
r Transformation of random variables – Let the variables X and Y be linked by some
function. By noting fX and fY the distribution function of X and Y respectively, we have:
r Correlation – By noting σX , σY the standard deviations of X and Y , we define the correlation
between the random variables X and Y , noted ρXY , as follows:

fY (y) = fX (x)
dx
dy
2
σXY
ρXY =
σX σY
r Leibniz integral rule – Let g be a function of x and potentially c, and a, b boundaries that
may depend on c. We have: Remarks: For any X, Y , we have ρXY ∈ [−1,1]. If X and Y are independent, then ρXY = 0.
ˆ ˆ b r Main distributions – Here are the main distributions to have in mind:

∂ b ∂b ∂a ∂g
g(x)dx = · g(b) − · g(a) + (x)dx
∂c a ∂c ∂c a ∂c
Type Distribution PDF ψ(ω) E[X] Var(X)
n
r Chebyshev’s inequality – Let X be a random variable with expected value µ and standard X ∼ B(n, p) P (X = x) = px q n−x (peiω + q)n np npq
deviation σ. For k, σ > 0, we have the following inequality: x
Binomial x ∈ [[0,n]]
1 (D)
P (|X − µ| > kσ) 6
k2 µx −µ iω
X ∼ Po(µ) P (X = x) = e eµ(e −1) µ µ
x!
Poisson x∈N
Jointly Distributed Random Variables
1 eiωb − eiωa a+b (b − a)2
X ∼ U (a, b) f (x) =
b−a (b − a)iω 2 12
r Conditional density – The conditional density of X with respect to Y , often noted fX|Y , Uniform x ∈ [a,b]
is defined as follows:
2
fXY (x,y) 1 −1
x−µ
1 2
σ2
fX|Y (x) = (C) X ∼ N (µ, σ) f (x) = √ e2 σ
eiωµ− 2 ω µ σ2
fY (y) 2πσ
Gaussian x∈R
1 1 1
r Independence – Two random variables X and Y are said to be independent if we have: X ∼ Exp(λ) f (x) = λe−λx
1− iω
λ
λ λ2
fXY (x,y) = fX (x)fY (y) Exponential x ∈ R+

Stanford University 2 Fall 2018

CS 229 – Machine Learning https://stanford.edu/~shervine

Parameter estimation
r Random sample – A random sample is a collection of n random variables X1 , ..., Xn that
are independent and identically distributed with X.
r Estimator – An estimator θ̂ is a function of the data that is used to infer the value of an
unknown parameter θ in a statistical model.
r Bias – The bias of an estimator θ̂ is defined as being the difference between the expected
value of the distribution of θ̂ and the true value, i.e.:
Bias(θ̂) = E[θ̂] − θ

Remark: an estimator is said to be unbiased when we have E[θ̂] = θ.

r Sample mean and variance – The sample mean and the sample variance of a random
sample are used to estimate the true mean µ and the true variance σ 2 of a distribution, are
noted X and s2 respectively, and are such that:
n n
1 X 1 X
X= Xi and s2 = σ̂ 2 = (Xi − X)2
n n−1
i=1 i=1

r Central Limit Theorem – Let us have a random sample X1 , ..., Xn following a given
distribution with mean µ and variance σ 2 , then we have:
σ

X ∼ N µ, √
n→+∞ n

Stanford University 3 Fall 2018

ALL ST218 Lecture Notes
No ratings yet
ALL ST218 Lecture Notes
87 pages
Glossary of Statistical Terms and Symbols
No ratings yet
Glossary of Statistical Terms and Symbols
4 pages
Week 6 Assignment Ch4 PDF
No ratings yet
Week 6 Assignment Ch4 PDF
6 pages
Cheat Sheet On Probability
No ratings yet
Cheat Sheet On Probability
2 pages
Summary of Probability 2 1
No ratings yet
Summary of Probability 2 1
3 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Gamma Distribution
No ratings yet
Gamma Distribution
30 pages
Assignment 2 - Engineering Statistics - Spring 2019
No ratings yet
Assignment 2 - Engineering Statistics - Spring 2019
7 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
Beta Distribution
No ratings yet
Beta Distribution
9 pages
Beta Distribution
No ratings yet
Beta Distribution
8 pages
Stat Cookbook
No ratings yet
Stat Cookbook
31 pages
Beta Distribution: First Kind, Whereasbeta Distribution of The Second Kind Is An Alternative Name For The
No ratings yet
Beta Distribution: First Kind, Whereasbeta Distribution of The Second Kind Is An Alternative Name For The
4 pages
Joint and Conditional Probability Distributions
No ratings yet
Joint and Conditional Probability Distributions
52 pages
R Packages For Machine Learning
No ratings yet
R Packages For Machine Learning
3 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
9 pages
Notes Estimation Theory
100% (3)
Notes Estimation Theory
39 pages
Stats Formula
No ratings yet
Stats Formula
2 pages
Likelihood Ratio Tests: Instructor: Songfeng Zheng
No ratings yet
Likelihood Ratio Tests: Instructor: Songfeng Zheng
9 pages
Properties of The Trinomial Distribution
No ratings yet
Properties of The Trinomial Distribution
2 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Statistics Tutorial: Basic Probability: Probability of A Sample Point
No ratings yet
Statistics Tutorial: Basic Probability: Probability of A Sample Point
48 pages
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
No ratings yet
Chapter 1 Basic Definitions of Stochastic Process, Kolmogorov Consistency Theorem (Lecture On 01-05-2021) - STAT 243 - Stochastic Process
5 pages
Joint Probability Functions
No ratings yet
Joint Probability Functions
7 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
Chapter 4: Probability Distributions: 4.1 Random Variables
100% (1)
Chapter 4: Probability Distributions: 4.1 Random Variables
53 pages
Data Analysis Formula Sheet Tables (DADM)
No ratings yet
Data Analysis Formula Sheet Tables (DADM)
8 pages
Formula Sheet STAT1301
No ratings yet
Formula Sheet STAT1301
3 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Introduction To Probability 1
No ratings yet
Introduction To Probability 1
71 pages
Statistics Probability Midterm Cheat Sheet
0% (1)
Statistics Probability Midterm Cheat Sheet
5 pages
Parametric Families of Discrete Distributions
No ratings yet
Parametric Families of Discrete Distributions
2 pages
6.5 Order Statistik
No ratings yet
6.5 Order Statistik
13 pages
Parameters: Unless Otherwise Noted, These Formulas Assume
No ratings yet
Parameters: Unless Otherwise Noted, These Formulas Assume
6 pages
Formulae Sheet
No ratings yet
Formulae Sheet
11 pages
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
No ratings yet
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
3 pages
Order Statistics PDF
No ratings yet
Order Statistics PDF
3 pages
Session 14 - Joint Probability Distributions (GbA) PDF
No ratings yet
Session 14 - Joint Probability Distributions (GbA) PDF
69 pages
Unit-9 IGNOU STATISTICS
No ratings yet
Unit-9 IGNOU STATISTICS
16 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
R Programming
No ratings yet
R Programming
63 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
Comm-05-Random Variables and Processes
No ratings yet
Comm-05-Random Variables and Processes
90 pages
Multivariate Normal Distribution: 1 Random Vector
No ratings yet
Multivariate Normal Distribution: 1 Random Vector
3 pages
244 Cheat Sheet
No ratings yet
244 Cheat Sheet
4 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Stat Term Paper
No ratings yet
Stat Term Paper
17 pages
Sampling Distributions Coursera
No ratings yet
Sampling Distributions Coursera
8 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Markov Chains Cheat Sheet PDF
No ratings yet
Markov Chains Cheat Sheet PDF
2 pages
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
No ratings yet
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
76 pages
Multivariate Normal Distribution
100% (1)
Multivariate Normal Distribution
8 pages
Ch1 Random Variables and Probability Distributions 0
No ratings yet
Ch1 Random Variables and Probability Distributions 0
25 pages
Statistics Formulas Cheatsheet
100% (1)
Statistics Formulas Cheatsheet
2 pages
6 - Super-Cheatsheet-Mathematics
No ratings yet
6 - Super-Cheatsheet-Mathematics
5 pages
Math Statistics
No ratings yet
Math Statistics
4 pages
R Variables
No ratings yet
R Variables
9 pages
Random Variables and Process
No ratings yet
Random Variables and Process
31 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Machine Learning Solution
100% (1)
Machine Learning Solution
12 pages
Selenium PDF
No ratings yet
Selenium PDF
44 pages
Effective Model Validation Using Machine Learning
No ratings yet
Effective Model Validation Using Machine Learning
4 pages
QM120 - Exercises For Chapter 4
No ratings yet
QM120 - Exercises For Chapter 4
11 pages
Group Assignment - AS GRP 5 PDF
No ratings yet
Group Assignment - AS GRP 5 PDF
37 pages
4.mgf and Distributions PDF
No ratings yet
4.mgf and Distributions PDF
104 pages
Obtainingquarterlydata
No ratings yet
Obtainingquarterlydata
2 pages
Time Series Assignment
No ratings yet
Time Series Assignment
10 pages
Macaw Power BI Cheat Sheet EN
No ratings yet
Macaw Power BI Cheat Sheet EN
2 pages
SR MATHS-2A - LAQ'S & SAQ'S-converted Ts Board
No ratings yet
SR MATHS-2A - LAQ'S & SAQ'S-converted Ts Board
13 pages
Question 8 PDF
No ratings yet
Question 8 PDF
9 pages
Integer Optimisation For Dream 11 Cricket Team Selection
No ratings yet
Integer Optimisation For Dream 11 Cricket Team Selection
7 pages
Study On The Factors of Delay in Construction Works: Estudo Sobre Os Fatores de Atraso Nas Obras
No ratings yet
Study On The Factors of Delay in Construction Works: Estudo Sobre Os Fatores de Atraso Nas Obras
20 pages
Positive and Negative Symptoms Questionnaire-Revised
No ratings yet
Positive and Negative Symptoms Questionnaire-Revised
46 pages
Hatzopoulos Hsss Education en
No ratings yet
Hatzopoulos Hsss Education en
11 pages
Quiz 8614 - Merged Final by Educational Solutions
No ratings yet
Quiz 8614 - Merged Final by Educational Solutions
68 pages
Statistics and Probability Daily Lesson Log
100% (2)
Statistics and Probability Daily Lesson Log
3 pages
An Introduction to Value at Risk 4th Edition Moorad Choudhry - Download the full ebook now to never miss any detail
100% (1)
An Introduction to Value at Risk 4th Edition Moorad Choudhry - Download the full ebook now to never miss any detail
84 pages
GEORGE - Lectures in Turbulence For The 21st Century
No ratings yet
GEORGE - Lectures in Turbulence For The 21st Century
255 pages
07 Chapter 4
No ratings yet
07 Chapter 4
47 pages
Hasbullah & Sajiman (2020)
No ratings yet
Hasbullah & Sajiman (2020)
8 pages
Edu 2018 09 P Syllabus
No ratings yet
Edu 2018 09 P Syllabus
5 pages
Multiple Range MCQs On Introductory Statistics
100% (1)
Multiple Range MCQs On Introductory Statistics
26 pages
SYLLABUS
No ratings yet
SYLLABUS
15 pages
VI. Operators & Expectation Values
No ratings yet
VI. Operators & Expectation Values
14 pages
Ijf Manuscript
No ratings yet
Ijf Manuscript
33 pages
How To Calculate Standard Deviation
No ratings yet
How To Calculate Standard Deviation
3 pages
Solving The Blotto Game - A Computational Approach
No ratings yet
Solving The Blotto Game - A Computational Approach
19 pages
8ed8 PDF
No ratings yet
8ed8 PDF
126 pages
Advanced FFT Topics and Applications
No ratings yet
Advanced FFT Topics and Applications
26 pages
On A Bivariate Poisson
No ratings yet
On A Bivariate Poisson
11 pages
Application of Bio Statistics in Pharmacy
33% (3)
Application of Bio Statistics in Pharmacy
13 pages
Complete Download Intermittent Demand Forecasting: Context, Methods and Applications Syntetos PDF All Chapters
100% (4)
Complete Download Intermittent Demand Forecasting: Context, Methods and Applications Syntetos PDF All Chapters
76 pages
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
No ratings yet
McCulloch and Neuhaus 2005 Generalized Linear Mixed Models
5 pages
Chapter Two 2. Probability Distribution 2.1. Meaning and Concept of Probability
100% (1)
Chapter Two 2. Probability Distribution 2.1. Meaning and Concept of Probability
11 pages
Ranova Help Tcm18-242064
No ratings yet
Ranova Help Tcm18-242064
20 pages
Chapter 7 - Sampling Distributions
No ratings yet
Chapter 7 - Sampling Distributions
82 pages
Structural Safety and Reliability Index
No ratings yet
Structural Safety and Reliability Index
29 pages
Question Bank Ust 6501 - Design and Analysis of Experiments
No ratings yet
Question Bank Ust 6501 - Design and Analysis of Experiments
3 pages

Refresher Probabilities Statistics PDF

Uploaded by

Refresher Probabilities Statistics PDF

Uploaded by

CS 229 – Machine Learning https://stanford.

Stanford University 1 Fall 2018

(C) xf (x)dx g(x)f (x)dx xk f (x)dx f (x)eiωx dx

Stanford University 2 Fall 2018

Remark: an estimator is said to be unbiased when we have E[θ̂] = θ.

Stanford University 3 Fall 2018

You might also like