0% found this document useful (0 votes)

18 views

Unit 2 (2) - 1

Uploaded by

git21ec063-t

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Unit 2 (2) - 1

Uploaded by

git21ec063-t

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Unit 2:

Probability Distributions
Sample Space
• The sample space Ω is the set of possible outcomes of an experiment.
Points ω in Ω are called sample outcomes, realizations, or elements.
Subsets of Ω are called Events.

• Example. If we toss a coin twice then Ω = {HH,HT, TH, TT}. The event
that the first toss is heads is A = {HH,HT}

• We say that events A1 and A2 are disjoint (mutually exclusive) if Ai ∩

Aj = {}
• Example: first flip being heads and first flip being tails
Probability
• We will assign a real number P(A) to every event A, called the
probability of A.
• To qualify as a probability, P must satisfy three axioms:
• Axiom 1: P(A) ≥ 0 for every A
• Axiom 2: P(Ω) = 1
• Axiom 3: If A1,A2, . . . are disjoint then
Joint and Conditional Probabilities
• Joint Probability
• P(X,Y)
• Probability of X and Y

• Conditional Probability
• P(X|Y)
• Probability of X given Y
Independent and Conditional Probabilities
• Assuming that P(B) > 0, the conditional probability of A given B:
• P(A|B)=P(AB)/P(B)
• P(AB) = P(A|B)P(B) = P(B|A)P(A)
• Product Rule

• Two events A and B are independent if

• P(AB) = P(A)P(B)
• Joint = Product of Marginals

• Two events A and B are conditionally independent given C if they are

independent after conditioning on C
• P(AB|C) = P(B|AC)P(A|C) = P(B|C)P(A|C)
Example
• 60% of ML students pass the final and 45% of ML students pass both the
final and the midterm *
• What percent of students who passed the final also passed the
midterm?

* These are made up values.

Example
• 60% of ML students pass the final and 45% of ML students pass both the
final and the midterm *
• What percent of students who passed the final also passed the
midterm?

• Reworded: What percent of students passed the midterm given they

passed the final?
• P(M|F) = P(M,F) / P(F)
• = .45 / .60
• = .75
* These are made up values.
Marginalization and Law of Total Probability
• Marginalization (Sum Rule)

• Law of Total Probability

Bayes’ Rule
P(A|B) = P(AB) /P(B) (Conditional Probability)
P(A|B) = P(B|A)P(A) /P(B) (Product Rule)
P(A|B) = P(B|A)P(A) / Σ P(B|A)P(A) (Law of Total Probability)
Bayes’ Rule
Example
• Suppose you have tested positive for a disease; what is the
probability that you actually have the disease?
• It depends on the accuracy and sensitivity of the test, and on the
background (prior) probability of the disease.
• P(T=1|D=1) = .95 (true positive)
• P(T=1|D=0) = .10 (false positive)
• P(D=1) = .01 (prior)

• P(D=1|T=1) = ?
Example
• P(T=1|D=1) = .95 (true positive)
• P(T=1|D=0) = .10 (false positive)
• P(D=1) = .01 (prior)

Bayes’ Rule Law of Total Probability

• P(D|T) = P(T|D)P(D) / P(T) • P(T) = Σ P(T|D)P(D)
= .95 * .01 / .1085 = P(T|D=1)P(D=1) + P(T|D=0)P(D=0)
= .087 = .95*.01 + .1*.99
= .1085
The probability that you have the disease given you tested positive is 8.7%
Random Variable
• How do we link sample spaces and events to data?
• A random variable is a mapping that assigns a real number X(ω) to
each outcome ω

• Example: Flip a coin ten times. Let X(ω) be the number of heads in the
sequence ω. If ω = HHTHHTHHTT, then X(ω) = 6.
Discrete vs Continuous Random Variables
• Discrete: can only take a countable number of values
• Example: number of heads
• Distribution defined by probability mass function (pmf)
• Marginalization:

• Continuous: can take infinitely many values (real numbers)

• Example: time taken to accomplish task
• Distribution defined by probability density function (pdf)
• Marginalization:
Probability Distribution Statistics
• Mean: E[x] = μ = first moment = Univariate continuous random variable

= Univariate discrete random variable

• Variance: Var(X) =

• Nth moment =
Discrete Distribution

Bernoulli Distribution
• Input: x ∈ {0, 1}
• Parameter: μ

• Example: Probability of flipping heads (x=1)

• Mean = E[x] = μ
• Variance = μ(1 − μ)
Discrete Distribution

Binomial Distribution
• Input: m = number of successes
• Parameters: N = number of trials
μ = probability of success
• Example: Probability of flipping heads m times out of N independent
flips with success probability μ

• Mean = E[x] = Nμ
• Variance = Nμ(1 − μ)
Discrete Distribution

Multinomial Distribution
• The multinomial distribution is a generalization of the binomial
distribution to k categories instead of just binary (success/fail)
• For n independent trials each of which leads to a success for exactly
one of k categories, the multinomial distribution gives the probability
of any particular combination of numbers of successes for the various
categories
• Example: Rolling a die N times
Discrete Distribution

Multinomial Distribution
• Input: m1 … mK (counts)
• Parameters: N = number of trials
μ = μ1 … μK probability of success for each category, Σμ=1

• Mean of mk: Nµk

• Variance of mk: Nµk(1-µk)
Continuous Distribution

Gaussian Distribution
• Aka the normal distribution
• Widely used model for the distribution of continuous variables
• In the case of a single variable x, the Gaussian distribution can be
written in the form

• where μ is the mean and σ2 is the variance

Gaussian Distribution
• Gaussians with different means and variances
Multivariate Gaussian Distribution
• For a D-dimensional vector x, the multivariate Gaussian distribution
takes the form

• where μ is a D-dimensional mean vector

• Σ is a D × D covariance matrix
• |Σ| denotes the determinant of Σ
Inferring Parameters
• We have data X and we assume it comes from some distribution
• How do we figure out the parameters that ‘best’ fit that distribution?
• Maximum Likelihood Estimation (MLE)

• Maximum a Posteriori (MAP)

See ‘Gibbs Sampling for the Uninitiated’ for a straightforward introduction to parameter
estimation: http://www.umiacs.umd.edu/~resnik/pubs/LAMP-TR-153.pdf
I.I.D.
• Random variables are independent and identically distributed (i.i.d.) if
they have the same probability distribution as the others and are all
mutually independent.

• Example: Coin flips are assumed to be IID

MLE for parameter estimation
• The parameters of a Gaussian distribution are the mean (µ) and
variance (σ2)

• We’ll estimate the parameters using MLE

• Given observations x1, . . . , xN , the likelihood of those observations
for a certain µ and σ2 (assuming IID) is

Likelihood =
MLE for parameter estimation

Likelihood =

What’s the distribution’s mean

and variance?
MLE for Gaussian Parameters

Likelihood =

• Now we want to maximize this function wrt µ

• Instead of maximizing the product, we take the log of the likelihood so
the product becomes a sum

Log Likelihood = log Log

• We can do this because log is monotonically increasing

• Meaning
MLE for Gaussian Parameters
• Log Likelihood simplifies to:

• Now we want to maximize this function wrt μ

• How?

To see proofs for these derivations: http://www.statlect.com/normal_distribution_maximum_likelihood.htm

MLE for Gaussian Parameters
• Log Likelihood simplifies to:

• Now we want to maximize this function wrt μ

• Take the derivative, set to 0, solve for μ

To see proofs for these derivations: http://www.statlect.com/normal_distribution_maximum_likelihood.htm

Maximum Likelihood and Least Squares
• Suppose that you are presented with a
sequence of data points (X1, T1), ..., (Xn, Tn),
and you are asked to find the “best fit” line
passing through those points.
• In order to answer this you need to know
precisely how to tell whether one line is
“fitter” than another
• A common measure of fitness is the squared-
error

For a good discussion of Maximum likelihood estimators and least squares see
http://people.math.gatech.edu/~ecroot/3225/maximum_likelihood.pdf
Maximum Likelihood and Least Squares
y(x,w) is estimating the target t

Red line

• Error/Loss/Cost/Objective function measures the squared error

Green lines

• Least Square Regression

• Minimize L(w) wrt w
Maximum Likelihood and Least Squares
• Now we approach curve fitting from a probabilistic perspective
• We can express our uncertainty over the value of the target variable
using a probability distribution
• We assume, given the value of x, the corresponding value of t has a
Gaussian distribution with a mean equal to the value y(x,w)

β is the precision parameter (inverse variance)

Maximum Likelihood and Least Squares
Maximum Likelihood and Least Squares
• We now use the training data {x, t} to
determine the values of the unknown
parameters w and β by maximum likelihood

• Log Likelihood
Maximum Likelihood and Least Squares
• Log Likelihood

• Maximize Log Likelihood wrt to w

• Since last two terms, don’t depend on w,
they can be omitted.
• Also, scaling the log likelihood by a positive
constant β/2 does not alter the location of
the maximum with respect to w, so it can be
ignored
• Result: Maximize
Maximum Likelihood and Least Squares
• MLE
• Maximize

• Least Squares
• Minimize

• Therefore, maximizing likelihood is equivalent, so far as determining w is

concerned, to minimizing the sum-of-squares error function
• Significance: sum-of-squares error function arises as a consequence of
maximizing likelihood under the assumption of a Gaussian noise
distribution
Questions?

Estimation and Costing Workbook
No ratings yet
Estimation and Costing Workbook
35 pages
Case Study - Patent On Neem
100% (5)
Case Study - Patent On Neem
12 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
CHP 5
No ratings yet
CHP 5
63 pages
2005 GEOG090 Probability
No ratings yet
2005 GEOG090 Probability
27 pages
Descriptive Probability
No ratings yet
Descriptive Probability
12 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
Lecture 9
No ratings yet
Lecture 9
28 pages
Stats Review
No ratings yet
Stats Review
65 pages
CH01. Random Variables 2023
No ratings yet
CH01. Random Variables 2023
43 pages
lecture 5. Independence, random variable
No ratings yet
lecture 5. Independence, random variable
38 pages
Cone Pre Calculus
No ratings yet
Cone Pre Calculus
30 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Probability and Random Processes 2023
No ratings yet
Probability and Random Processes 2023
43 pages
STA80006 Weeks7-12 PDF
No ratings yet
STA80006 Weeks7-12 PDF
29 pages
Probility Distribution
No ratings yet
Probility Distribution
41 pages
Lecture 3 Probability Distribution
No ratings yet
Lecture 3 Probability Distribution
25 pages
Module I Complete
No ratings yet
Module I Complete
40 pages
Continuous Random Variable and Their Properties
No ratings yet
Continuous Random Variable and Their Properties
26 pages
Discrete Random Variables and Probability Distribution
No ratings yet
Discrete Random Variables and Probability Distribution
27 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
Review of Statistics Basic Concepts: Moments
No ratings yet
Review of Statistics Basic Concepts: Moments
4 pages
Lecture Note 3
No ratings yet
Lecture Note 3
8 pages
Normal Distribution
No ratings yet
Normal Distribution
48 pages
Module 1
No ratings yet
Module 1
39 pages
DSAI514-lec2-point-estimation-part-3
No ratings yet
DSAI514-lec2-point-estimation-part-3
21 pages
Ch2 Slides
No ratings yet
Ch2 Slides
80 pages
Unit 4
No ratings yet
Unit 4
45 pages
Random Variables and Probability Distributions
No ratings yet
Random Variables and Probability Distributions
71 pages
Session_9
No ratings yet
Session_9
35 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Cmpe107 Notes
No ratings yet
Cmpe107 Notes
14 pages
L2_ Mathematical Preliminaries
No ratings yet
L2_ Mathematical Preliminaries
41 pages
Deep Learning
No ratings yet
Deep Learning
28 pages
Probability Presentation
No ratings yet
Probability Presentation
26 pages
Slide 2
No ratings yet
Slide 2
30 pages
Likelihood Frequentist
No ratings yet
Likelihood Frequentist
27 pages
S-11 - Random Variables and Discrete Probability Distributions
No ratings yet
S-11 - Random Variables and Discrete Probability Distributions
24 pages
5-Random-Variables
No ratings yet
5-Random-Variables
128 pages
Continuous Distributions
No ratings yet
Continuous Distributions
73 pages
Quantitative Analysis
No ratings yet
Quantitative Analysis
47 pages
Probability Distribution
80% (5)
Probability Distribution
69 pages
Presentation 6
No ratings yet
Presentation 6
43 pages
ECN-511 Random Variables 11
No ratings yet
ECN-511 Random Variables 11
106 pages
Itc Introduction
No ratings yet
Itc Introduction
40 pages
Probability Theory and Stochastic Processes: Unit-2
No ratings yet
Probability Theory and Stochastic Processes: Unit-2
31 pages
W4 Lecture4
No ratings yet
W4 Lecture4
31 pages
umema msf ppt
No ratings yet
umema msf ppt
10 pages
Fundamentals of Probability
No ratings yet
Fundamentals of Probability
24 pages
ps presentation
No ratings yet
ps presentation
31 pages
5juni2021 RandomVariabel
No ratings yet
5juni2021 RandomVariabel
17 pages
Introduction To Probability Distributions
No ratings yet
Introduction To Probability Distributions
73 pages
Lecture 6-7 Random Variable
No ratings yet
Lecture 6-7 Random Variable
29 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
Questions_for_Unit_4 (2)
No ratings yet
Questions_for_Unit_4 (2)
6 pages
Applied Probability Lecture 3: Rajeev Surati
No ratings yet
Applied Probability Lecture 3: Rajeev Surati
14 pages
Distribution
No ratings yet
Distribution
6 pages
MSD_Discrete_count_models_2
No ratings yet
MSD_Discrete_count_models_2
42 pages
Theoretical Questions in Basic Business Statistics
No ratings yet
Theoretical Questions in Basic Business Statistics
12 pages
Lecture4 Mech SU
No ratings yet
Lecture4 Mech SU
17 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
2022 Parking Solution Brochure Preview
No ratings yet
2022 Parking Solution Brochure Preview
20 pages
Master Circular Savings Bank Account
No ratings yet
Master Circular Savings Bank Account
35 pages
Astec Chirobiotic: Macrocyclic Glycopeptide-Based Chiral HPLC Phases
No ratings yet
Astec Chirobiotic: Macrocyclic Glycopeptide-Based Chiral HPLC Phases
12 pages
HMT Week 5
No ratings yet
HMT Week 5
21 pages
The Indian Community School Kuwait First Mid Term Examination - 2022 - 2023 Class - Xi - Mathematics - Answer Key
No ratings yet
The Indian Community School Kuwait First Mid Term Examination - 2022 - 2023 Class - Xi - Mathematics - Answer Key
1 page
ANAT1101 - Anatomy and Physiology - (Winter 2023 - Current)
No ratings yet
ANAT1101 - Anatomy and Physiology - (Winter 2023 - Current)
7 pages
M30 Grade Concrete Design-3
No ratings yet
M30 Grade Concrete Design-3
6 pages
Mathematics 10 Quarter 3 - Module 4: Problem Solving in Permutations and Combinations
No ratings yet
Mathematics 10 Quarter 3 - Module 4: Problem Solving in Permutations and Combinations
5 pages
SDV1042-600 1 Ets
No ratings yet
SDV1042-600 1 Ets
5 pages
Project Management - Cpm/Pert: Professor SIBM - Pune, Lavale Campus
No ratings yet
Project Management - Cpm/Pert: Professor SIBM - Pune, Lavale Campus
25 pages
Dahisar Bus Station: 207 Bus Time Schedule & Line Map
No ratings yet
Dahisar Bus Station: 207 Bus Time Schedule & Line Map
4 pages
Activity
No ratings yet
Activity
9 pages
MAt Design Complete
No ratings yet
MAt Design Complete
13 pages
AP6680BGM HF 3 AdvancedPowerElectronics
No ratings yet
AP6680BGM HF 3 AdvancedPowerElectronics
5 pages
Aphorisms Concerning The Universal Salt of Nature
No ratings yet
Aphorisms Concerning The Universal Salt of Nature
14 pages
Literature Review PPT Format
100% (2)
Literature Review PPT Format
5 pages
Operation & Maintenance of Two Number 200 KLD Sewage Treatment Plant (MBBR Technology) at NIT Andhra Pradesh
No ratings yet
Operation & Maintenance of Two Number 200 KLD Sewage Treatment Plant (MBBR Technology) at NIT Andhra Pradesh
29 pages
May 2809
100% (1)
May 2809
52 pages
QCP FOR Concrete Sampling - Testing
No ratings yet
QCP FOR Concrete Sampling - Testing
6 pages
01.0 PP I II The Anger Gap
No ratings yet
01.0 PP I II The Anger Gap
4 pages
B2 First For Schools Sample Paper 1 Listening 2022
No ratings yet
B2 First For Schools Sample Paper 1 Listening 2022
8 pages
Sap Media List
No ratings yet
Sap Media List
2 pages
Eng Final
No ratings yet
Eng Final
15 pages
Tencord KB (E 42 4 Z B 42 h5)
No ratings yet
Tencord KB (E 42 4 Z B 42 h5)
1 page
Ethics Lecture 1
100% (1)
Ethics Lecture 1
68 pages
Mini Implants For Definitive Asystmatic Review
No ratings yet
Mini Implants For Definitive Asystmatic Review
9 pages
Suffixes - Prípony: Abstraktní Výrazy
No ratings yet
Suffixes - Prípony: Abstraktní Výrazy
16 pages

Unit 2 (2) - 1

Uploaded by

Unit 2 (2) - 1

Uploaded by

Unit 2:

• We say that events A1 and A2 are disjoint (mutually exclusive) if Ai ∩

• Two events A and B are independent if

• Two events A and B are conditionally independent given C if they are

* These are made up values.

• Reworded: What percent of students passed the midterm given they

• Law of Total Probability

Bayes’ Rule Law of Total Probability

• Continuous: can take infinitely many values (real numbers)

= Univariate discrete random variable

• Example: Probability of flipping heads (x=1)

• Mean of mk: Nµk

• where μ is the mean and σ2 is the variance

• where μ is a D-dimensional mean vector

• Maximum a Posteriori (MAP)

• Example: Coin flips are assumed to be IID

• We’ll estimate the parameters using MLE

What’s the distribution’s mean

• Now we want to maximize this function wrt µ

Log Likelihood = log Log

• We can do this because log is monotonically increasing

• Now we want to maximize this function wrt μ

To see proofs for these derivations: http://www.statlect.com/normal_distribution_maximum_likelihood.htm

• Now we want to maximize this function wrt μ

To see proofs for these derivations: http://www.statlect.com/normal_distribution_maximum_likelihood.htm

• Error/Loss/Cost/Objective function measures the squared error

• Least Square Regression

β is the precision parameter (inverse variance)

• Maximize Log Likelihood wrt to w

• Therefore, maximizing likelihood is equivalent, so far as determining w is

You might also like