Data Mining - Utrecht University - 10. Slides
Data Mining - Utrecht University - 10. Slides
Ad Feelders
Universiteit Utrecht
Do you like
noodles?
Race Gender Yes No
Black Male 10 40
Female 30 20
White Male 100 100
Female 120 80
G R
G⊥
⊥R|A
Strange: Gender and Race are prior to Answer, but this model says they
are independent given Answer!
Ad Feelders ( Universiteit Utrecht ) Data Mining 3 / 49
Do you like noodles?
Race
Gender Black White
Male 50 200
Female 50 200
From this table we conclude that Race and Gender are independent in the
data.
G R
G⊥
⊥ R, G 6⊥
⊥ R |A
Race
Gender Black White
Male 10 100
Female 30 120
Race
Gender Black White
Male 40 100
Female 20 80
From these tables we conclude that Race and Gender are dependent given
Answer.
Corresponding to this ordering we can use the product rule to factorize the
joint distribution of X1 , X2 , . . . , Xk as
j⊥
⊥ i | {1, . . . , j} \ {i, j}
More loosely
j⊥
⊥ i | prior variables
Compare this to pairwise independence
j⊥
⊥ i | rest
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
1 2
4 3
k
Y
P(X1 , . . . , Xk ) = P(Xi | Xpa(i) )
i=1
X1 X2
X4 X3
X5
X1 X2
X4 X3
X5
To verify
i⊥
⊥j |S
construct the moral graph on
G R
X1 X2
X4 X3
X5
1 2 3
1 2 3
1 2 3
Ad Feelders ( Universiteit Utrecht ) Data Mining 34 / 49
Learning Bayesian Networks
L = p(1)n(1) (1 − p(1))n−n(1)
Take derivative with respect to p(1), equate to zero, and solve for p(1).
dL n(1) n − n(1)
= − = 0,
dp(1) p(1) 1 − p(1)
d log x 1
since dx = x (where log is the natural logarithm).
n(j)
p(j) = , j = 1, 2, . . . , J
n
This is also the maximum likelihood estimate.
p(Xi | Xpa(i) ) i = 1, 2, . . . , k
k
X X
L= n(xi , xpa(i) ) log p(xi | xpa(i) )
i=1 xi ,xpa(i)
n(xi , xpa(i) )
p̂(xi | xpa(i) ) = ,
n(xpa(i) )
where
n(xi , xpa(i) ) is the number of records in the data with
Xi = xi and Xpa(i) = xpa(i) , and
n(xpa(i) ) is the number of records in the data with Xpa(i) = xpa(i) .
1 2
P(X1 , X2 , X3 , X4 ) = p1 (X1 )p2 (X2 )p3|12 (X3 |X1 , X2 )p4|3 (X4 |X3 )
P(X1 , X2 , X3 , X4 ) = p1 (X1 )p2 (X2 )p3|12 (X3 |X1 , X2 )p4|3 (X4 |X3 )
Now we have to estimate the following parameters (X4 ternary, rest binary):
p4|3 (1|1) p4|3 (2|1) p4|3 (3|1) = 1 − p4|3 (1|1) − p4|3 (2|1)
p4|3 (1|2) p4|3 (2|2) p4|3 (3|2) = 1 − p4|3 (1|2) − p4|3 (2|2)
obs X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 2 1
4 1 2 2 1
5 1 2 2 2
6 2 1 1 2
7 2 1 2 3
8 2 1 2 3
9 2 2 2 3
10 2 2 1 3
obs X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 2 1
4 1 2 2 1
5 1 2 2 2
6 2 1 1 2
7 2 1 2 3
8 2 1 2 3
9 2 2 2 3
10 2 2 1 3
n(x1 = 1) 5 1
p̂1 (1) = = =
n 10 2
Ad Feelders ( Universiteit Utrecht ) Data Mining 46 / 49
Maximum Likelihood Estimation
obs X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 2 1
4 1 2 2 1
5 1 2 2 2
6 2 1 1 2
7 2 1 2 3
8 2 1 2 3
9 2 2 2 3
10 2 2 1 3
n(x2 = 1) 6
p̂2 (1) = =
n 10
Ad Feelders ( Universiteit Utrecht ) Data Mining 47 / 49
Maximum Likelihood Estimation
obs X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 2 1
4 1 2 2 1
5 1 2 2 2
6 2 1 1 2
7 2 1 2 3
8 2 1 2 3
9 2 2 2 3
10 2 2 1 3
n(x1 = 1, x2 = 1, x3 = 1) 2
p̂3|1,2 (1|1, 1) = =
n(x1 = 1, x2 = 1) 3
Ad Feelders ( Universiteit Utrecht ) Data Mining 48 / 49
Maximum Likelihood Estimation
obs X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 2 1
4 1 2 2 1
5 1 2 2 2
6 2 1 1 2
7 2 1 2 3
8 2 1 2 3
9 2 2 2 3
10 2 2 1 3
n(x1 = 1, x2 = 1, x3 = 1) 2
p̂3|1,2 (1|1, 1) = =
n(x1 = 1, x2 = 1) 3
Ad Feelders ( Universiteit Utrecht ) Data Mining 49 / 49