Unit 2 (2) - 1
Unit 2 (2) - 1
Probability Distributions
Sample Space
• The sample space Ω is the set of possible outcomes of an experiment.
Points ω in Ω are called sample outcomes, realizations, or elements.
Subsets of Ω are called Events.
• Example. If we toss a coin twice then Ω = {HH,HT, TH, TT}. The event
that the first toss is heads is A = {HH,HT}
• Conditional Probability
• P(X|Y)
• Probability of X given Y
Independent and Conditional Probabilities
• Assuming that P(B) > 0, the conditional probability of A given B:
• P(A|B)=P(AB)/P(B)
• P(AB) = P(A|B)P(B) = P(B|A)P(A)
• Product Rule
• P(D=1|T=1) = ?
Example
• P(T=1|D=1) = .95 (true positive)
• P(T=1|D=0) = .10 (false positive)
• P(D=1) = .01 (prior)
• Example: Flip a coin ten times. Let X(ω) be the number of heads in the
sequence ω. If ω = HHTHHTHHTT, then X(ω) = 6.
Discrete vs Continuous Random Variables
• Discrete: can only take a countable number of values
• Example: number of heads
• Distribution defined by probability mass function (pmf)
• Marginalization:
• Variance: Var(X) =
• Nth moment =
Discrete Distribution
Bernoulli Distribution
• Input: x ∈ {0, 1}
• Parameter: μ
• Mean = E[x] = μ
• Variance = μ(1 − μ)
Discrete Distribution
Binomial Distribution
• Input: m = number of successes
• Parameters: N = number of trials
μ = probability of success
• Example: Probability of flipping heads m times out of N independent
flips with success probability μ
• Mean = E[x] = Nμ
• Variance = Nμ(1 − μ)
Discrete Distribution
Multinomial Distribution
• The multinomial distribution is a generalization of the binomial
distribution to k categories instead of just binary (success/fail)
• For n independent trials each of which leads to a success for exactly
one of k categories, the multinomial distribution gives the probability
of any particular combination of numbers of successes for the various
categories
• Example: Rolling a die N times
Discrete Distribution
Multinomial Distribution
• Input: m1 … mK (counts)
• Parameters: N = number of trials
μ = μ1 … μK probability of success for each category, Σμ=1
Gaussian Distribution
• Aka the normal distribution
• Widely used model for the distribution of continuous variables
• In the case of a single variable x, the Gaussian distribution can be
written in the form
See ‘Gibbs Sampling for the Uninitiated’ for a straightforward introduction to parameter
estimation: http://www.umiacs.umd.edu/~resnik/pubs/LAMP-TR-153.pdf
I.I.D.
• Random variables are independent and identically distributed (i.i.d.) if
they have the same probability distribution as the others and are all
mutually independent.
Likelihood =
MLE for parameter estimation
Likelihood =
Likelihood =
For a good discussion of Maximum likelihood estimators and least squares see
http://people.math.gatech.edu/~ecroot/3225/maximum_likelihood.pdf
Maximum Likelihood and Least Squares
y(x,w) is estimating the target t
Red line
Green lines
• Log Likelihood
Maximum Likelihood and Least Squares
• Log Likelihood
• Least Squares
• Minimize