0% found this document useful (0 votes)
9 views

lec-1 probabilistic models

The document provides an overview of probabilistic models, including definitions of key concepts such as probability, probability distribution, conditional and joint probability, and various types of probability distributions like binomial and Gaussian. It discusses the importance of sampling, inference, and Bayesian statistics in estimating model parameters and making predictions based on data. Additionally, it covers the concepts of entropy and mutual information as measures of uncertainty and independence in probability theory.

Uploaded by

yarno.prc
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

lec-1 probabilistic models

The document provides an overview of probabilistic models, including definitions of key concepts such as probability, probability distribution, conditional and joint probability, and various types of probability distributions like binomial and Gaussian. It discusses the importance of sampling, inference, and Bayesian statistics in estimating model parameters and making predictions based on data. Additionally, it covers the concepts of entropy and mutual information as measures of uncertainty and independence in probability theory.

Uploaded by

yarno.prc
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 29

Probabilistic models

Haixu Tang
School of Informatics
Probability
• Experiment: a procedure involving
chance that leads to different results
• Outcome: the result of a single trial of an
experiment;
• Event: one or more outcomes of an
experiment;
• Probability: the measure of how likely an
event is;
Example: a fair 6-sided dice
• Outcome: The possible outcomes of
this experiment are 1, 2, 3, 4, 5 and
6;
• Events: 1; 6; even
• Probability: outcomes are equally
likely to occur.
– P(A) = The Number Of Ways Event A Can Occur / The Total
Number Of Possible Outcomes
– P(1)=P(6)=1/6; P(even)=3/6=1/2;
Probability distribution
• Probability distribution: the assignment of a
probability P(x) to each outcome x.
• A fair dice: outcomes are equally likely to occur
 the probability distribution over the all six
outcomes P(x)=1/6, x=1,2,3,4,5 or 6.
• A loaded dice: outcomes are unequally likely to
occur  the probability distribution over the all
six outcomes P(x)=f(x), x=1,2,3,4,5 or 6, but
f(x)=1.
Example: DNA sequences
• Event: Observing a DNA sequence S=s1s2…sn:
si  {A,C,G,T};
• Random sequence model (or Independent and
identically-distributed, i.i.d. model): si occurs at
random with the probability P(si), independent
of all other
n
residues in the sequence;
• P(S)=  Psi 
i 1
• This model will be used as a background
model (or called null hypothesis).
Conditional probability
• P(i|): the measure of how likely an event i
happens under the condition ;
– Example: two dices D1, D2
• P(i|D1) probability for picking i using dicer D1
• P(i|D2) probability for picking i using dicer D2
Joint probability
• Two experiments X and Y
– P(X,Y)  joint probability (distribution) of experiments
X and Y
– P(X,Y)=P(X|Y)P(Y)=P(Y|X)P(X)
– P(X|Y)=P(X), X and Y are independent
• Example: experiment 1 (selecting a dice),
experiment 2 (rolling the selected dice)
– P(y): y=D1 or D2
– P(i, D1)=P(i| D1)P(D1)
– P(i| D1)=P(i| D2), independent events
Marginal probability
• P(X)=YP(X|Y)P(Y)
• Example: experiment 1 (selecting a dice),
experiment 2 (rolling the selected dice)
– P(y): y=D1 or D2
– P(i) =P(i| D1)P(D1)+P(i| D2)P(D2)
– P(i| D1)=P(i| D2), independent events
• P(i)= P(i| D1)(P(D1)+P(D2))= P(i| D1)
Probability models
• A system that produces different outcomes
with different probabilities.

• It can simulate a class of objects (events),


assigning each an associated probability.

• Simple objects (processes)  probability


distributions
Example: continuous variable
• The whole set of outcomes X (xX) can
be infinite.
• Continuous variable x[x0,x1]
– P(x0≤x≤x1) ->0
– P(x-x/2 ≤ x ≤ x+x/2) = f(x)x; f(x)x=1
– f(x) – probability density function (density, pdf)
– P(xy)= yx f(x)x – cumulated density function (cdf)
0

x1

x

x0 x1
Mean and variance
• Mean
– m=xP(x)

• Variance
 2= (k-m)2P(k)
 : standard deviation
Typical probability distributions
• Binomial distribution
• Gaussian distribution
• Multinomial distribution
• Dirichlet distribution
• Extreme value distribution (EVD)
Binomial distribution
• An experiment with binary outcomes: 0 or 1;

• Probability distribution of a single experiment:


P(‘1’)=p and P(‘0’) = 1-p;

• Probability distribution of N tries of the same


experiment
N k
  p (1  p ) N  k
• Bi(k ‘1’s out of N tries) ~ k 
Gaussian distribution
• N -> , Bi -> Gaussian distribution

• Define the new variable u = (k-m)/ 


– f(u)~ 1 exp u 2 / 2 
2
Multinomial distribution
• An experiment with K independent outcomes
with probabilities i, i =1,…,K, i =1.

• Probability distribution of N tries of the same


experiment, getting ni occurrences of outcome i,
ni =N.
N ! K ni
• M(N|) ~  i
 ni ! i 1
i
Example: a fair dice
• Probability: outcomes (1,2,…,6) are equally
likely to occur

• Probability of rolling 1 dozen times (12) and


getting each outcome twice:
– 
12! 1 12
26 6
~3.410-3
Example: a loaded dice
• Probability: outcomes (1,2,…,6) are
unequally likely to occur: P(6)=0.5,
P(1)=P(2)=…=P(5)=0.1

• Probability of rolling 1 dozen times (12) and


getting each outcome twice:
– 12!
26
0.52 0.110 ~1.8710-4
Dirichlet distribution
• Outcomes: =(1, 2,…, K)
K
 K 
• Density: D(|)~  i     i  1
i  1

i 1  i 1 

• (1, 2,…, K) are constants  different


gives different probability distribution
over .

• K=2  Beta distribution


Example: dice factories
• Dice factories produces all kinds of dices:
(1), (2),…, (6)
• A dice factory distinguish itself from the
others by parameters

• The probability of producing a dice  in the
factory  is determined by D(|)
Extreme value distribution
• Outcome: the largest number among N
samples from a density g(x) is larger than
x;

• For a variety of densities g(x),


– pdf:

– cdf:
Probabilistic model
• Selecting a model
– Probabilistic distribution
– Machine learning methods
• Neural nets
• Support Vector Machines (SVMs)
– Probabilistic graphical models
• Markov models
• Hidden Markov models
• Bayesian models
• Stochastic grammars
• Model  data (sampling)
• Data  model (inference)
Sampling
• Probabilistic model with parameter   P(x| )
for event x;
• Sampling: generate a large set of events xi with
probability P(xi| );
• Random number generator ( function rand()
picks a number randomly from the interval [0,1)
with the uniform density;
• Sampling from a probabilistic model 
transforming P(xi| ) to a uniform distribution
– For a finite set X (xiX), find i s.t. P(x1)+…+P(xi-1) <
rand(0,1) < P(x1)+…+P(xi-1) + P(xi)
Inference (ML)
• Estimating the model parameters
(inference): from large sets of trusted
examples

• Given a set of data D (training set), find a


model with parameters  with the maximal
likelihood P( |D);
Example: a loaded dice
• loaded dice: to estimate parameters 1, 2,
…, 6, based on N observations D=d1,d2,…
dN

 i=ni / N, where ni is of i, is the maximum


likelihood solution (11.5)

• Inference from counts


Bayesian statistics
• P(X|Y)=P(Y|X)P(X)/P(Y)

• P( |D) = P()[P(D | )/P(D)]


=P()[P(D | )/ (P(D | )P ()]

P()  prior probability; P(|D)  posterior


probability;
Example: two dices
• Fair dice 0.99; loaded dice: 0.01, P(6)=0.5,
P(1)=…P(5)=0.1
• 3 consecutive ‘6’es:
– P(loaded|3’6’s)=P(loaded)*[P(3’6’s|loaded)/
P(3’6’s)] = 0.01*(0.53 / C)
– P(fair|3’6’s)=P(fair)*[P(3’6’s|fair)/P(3’6’s)] =
0.99 * ((1/6)3 / C)
– Likelihood ratio: P(loaded|3’6’s) / P(fair|3’6’s)
<1
Inference from counts: including
prior knowledge
• Prior knowledge is important when the data is
scarce

• Use Dirichlet distribution as prior:


– P( |n) = D(|)[P(n|)/P(n)]
– Equivalent to add i as pseudo-counts to the
observation I (11.5)
– We can forget about statistics and use pseudo-
counts in the parameter estimation!
Entropy
• Probabilities distributions P(xi) over K
events

• H(x)=- P(xi) log P(xi)


– Maximized for uniform distribution P(x i)=1/K
– A measure of average uncertainty
Mutual information
• Measure of independence of two random
variable X and Y
• P(X|Y)=P(X), X and Y are independent 
P(X,Y)/P(X)P(Y)=1
• M(X;Y)=x,y P(x,y)log[P(x,y)/P(x)P(y)]
– 0  independent

You might also like