Lecture 22: Introduction To Log-Linear Models: Dipankar Bandyopadhyay, PH.D
Lecture 22: Introduction To Log-Linear Models: Dipankar Bandyopadhyay, PH.D
(picture of cube)
Also recall that Bayes Law states for any two random variables
P (AB)
P (A|B) =
P (B)
P (A)P (B)
P (A|B) = = P (A)
P (B)
Definitions:
Suppose that a single multinomial applies to the entire three-way table with cell probabilities
equal to
πijk = P (X = i, Y = j, Z = k)
Let
P
π·jk = P (X = i, Y = j, Z = k)
X
= P (Y = j, Z = k)
Then,
πijk = P (X = i, Z = k)P (Y = j|X = i, Z = k)
Or,
log µij = λ + λX Y
i + λj
• Testing λXY
ij = 0 is a test of independence
Cold incidence among French Skiers (Pauling, Proceedings of the national Academy of
Sciences, 1971).
OUTCOME
NO
|COLD | COLD | Total
T ---------+--------+--------+
R VITAMIN | | |
E C | 17 | 122 | 139
A | | |
T ---------+--------+--------+
M NO | | |
E VITAMIN | 31 | 109 | 140
N C | | |
T ---------+--------+--------+
Total 48 231 279
Regardless of how these data were actually collected, we have shown that the estimate of
the odds ratio is the same for all designs, as is the likelihood ratio test and Pearson’s
chi-square for independence.
Lecture 22: Introduction to Log-linear Models – p. 13/59
Using SAS Proc Freq
data one;
input vitc cold count;
cards;
1 1 17
1 2 122
2 1 31
2 2 109
;
proc freq;
table vitc*cold / chisq measures;
weight count;
run;
Y
1 2
Poisson µ11 µ12 (Poisson mean)
1 Double Dichotomy np11 np12 (table prob sums to 1)
Prospective n1 p 1 n1 (1 − p1 ) (row prob sums to 1)
Case Control n1 π1 n2 π2 (col prob sums to 1)
X
Poisson µ21 µ22
2 Double Dichotomy np21 n(1 − p11 − p12 − p21 )
Prospective n2 p 2 n2 (1 − p2 )
Case Control n1 (1 − π1 ) n2 (1 − π2 )
• Often, when you are not really sure how you want to model the data (conditional on the
total, conditional on the rows or conditional on the columns), you can treat the data as
if they are Poisson (the most general model) and use log-linear models to explore
relationships between the row and column variables.
• The most general model for a (2 × 2) table is a Poisson model (4 non-redundant
expected cell counts).
• Since the expected cell counts are always positive, we model µjk as an exponential
function of row and column effects:
µjk = exp(µ + λX Y XY
j + λk + λjk )
where
λX
j =j
th row effect
λY
k =k
th column effect
λXY
jk = interaction effect in j
th row, k th column
log(µjk ) = µ + λX Y XY
j + λk + λjk
• Treating the 4 expected cell counts as non-redundant, we can write the model for µjk
as a function of at most 4 parameters. However, in this model, there are 9 parameters,
µ, λX X Y Y XY XY XY XY
1 , λ2 , λ1 , λ2 , λ11 , λ12 , λ21 , λ22 ,
but only four expected cell counts µ11 , µ12 , µ21 , µ22 .
λX Y XY XY XY
2 = λ2 = λ12 = λ21 = λ22 = 0,
µ, λX Y XY
1 , λ1 , λ11
µjk = exp(µ + λX Y XY
j + λk + λjk )
µ11 = exp(µ + λX Y XY
1 + λ1 + λ11 )
µ12 = exp(µ + λX
1 )
µ21 = exp(µ + λY
1 )
µ22 = exp(µ)
2 3 2 3 2 32 3
log(µ11 ) µ + λX Y XY
1 + λ1 + λ11 1 1 1 1 µ
6 log(µ12 ) 7 6 µ + λX 7 6 1 1 0 0 76 λX 7
6 7 6 1 7 6 76 1 7
6 7=6 7=6 76 7
4 log(µ21 ) 5 4 µ + λY1 5 4 1 0 1 0 54 λY
1 5
log(µ22 ) µ 1 0 0 0 XY
λ11
• i.e., you create dummy or indicator variables for the different categories.
where
(
1 if A is true
I(A) = .
0 if A is not true
= µ + 0 · λX Y XY
1 + 1 · λ1 + 0 · λ11
= µ + λY
1
log(µ22 ) = µ
log(µ12 ) − log(µ22 ) = (µ + λX
1 )−µ
= λX
1
log(µ21 ) − log(µ22 ) = (µ + λY
1 )−µ
= λY
1
µ11 µ22
log(OR) = log µ21 µ12
= log(µ11 ) + log(µ22 ) − log(µ21 ) − log(µ12 )
= (µ + λX Y XY
1 + λ1 + λ11 ) + µ
−(µ + λY X
1 ) − (µ + λ1 )
= λXY
11
Important: the main parameter of interest is the log odd ratio, which equals λXY
11 in this
model.
µ, λX Y XY
1 , λ1 , λ11
is called the ‘saturated model’ since it has as many free parameters as possible for a
(2 × 2) table which has the four expected cell counts µ11 , µ12 , µ21 , µ22 .
2
X
λX
k = 0,
k=1
and
2
X
λXY
jk = 0 for k = 1, 2
j=1
and
2
X
λXY
jk = 0 for j = 1, 2
k=1
H0 :OR = 1.
H0 :λXY
11 = log(OR) = 0.
• Depending on the design, some of the parameters of the log-linear model are actually
fixed by the design.
• However, for all designs, we can estimate the parameters (that are not fixed by the
design) with a Poisson likelihood, and get the MLE’s of the parameters for all designs.
• This is because the kernel of the log-likelihood for any of these design is the same
Random Counts
n
e−µjk µjkjk
P (Yjk = njk |Poisson) =
njk !
• Or,
XX XX
l= −µjk + njk log µjk + K
j k j k
µjk = exp[µ + λX Y XY
j + λk + λjk ]
log[L(µ, λX Y XY
1 , λ1 , λ11 )] =
P2 P2
−µ++ + j=1 k=1 yjk [µ + λX Y XY
j + λk + λjk ] =
P2 X
P2 Y
P2 P2
−µ++ + µy++ + j=1 λj yj+ + k=1 λk y+k + j=1 k=1 yjk λXY
jk =
−µ++ + µy++ + λX Y XY
1 y1+ + λ1 y+1 + λ11 y11
(µ, λX Y XY
1 , λ1 , λ11 )
are called sufficient statistics, i.e., all the information from the data in the likelihood are
contained in the sufficient statistics
• In particular, when taking derivatives of the log-likelihood to find the MLE, we will be
solving for the estimate of (µ, λX Y XY
1 , λ1 , λ11 ) as a function of the sufficient statistics
(y++ , y1+ , y+1 , y11 )
Cold incidence among French Skiers (Pauling, Proceedings of the national Academy of
Sciences, 1971).
OUTCOME
NO
|COLD | COLD | Total
T ---------+--------+--------+
R VITAMIN | | |
E C | 17 | 122 | 139
A | | |
T ---------+--------+--------+
M NO | | |
E VITAMIN | 31 | 109 | 140
N C | | |
T ---------+--------+--------+
Total 48 231 279
• For the Poisson likelihood, we write the log-linear model for the expected cell counts
as:
2 3 2 3 2 32 3
log(µ11 ) µ + λX Y XY
1 + λ1 + λ11 1 1 1 1 µ
6 log(µ12 ) 7 6 µ + λX 7 6 1 1 0 0 76 λX 7
6 7 6 1 7 6 76 1 7
6 7=6 7=6 76 7
4 log(µ21 ) 5 4 µ + λY1 5 4 1 0 1 0 54 λY
1 5
log(µ22 ) µ 1 0 0 0 XY
λ11
data one;
input vitc cold count;
cards;
1 1 17
1 2 122
2 1 31
2 2 109
;
run;
bV IT C = 0.1127
λ1
bCOLD = −1.2574
λ1
λV
11
IT C,COLD
= log(OR) = −0.7134
• The OR the “regular” way is
17 · 109
log(OR) = log( ) = log(0.499) = −0.7134
31 · 122
• For the double dichotomy in which the data follow a multinomial, we first rewrite the
log-likelihood
2 X
X 2
µ++ = µjk = n
j=1 k=1
(fixed by design), so that the first term in the log-likelihood, −µ++ = −n is not a
function of the unknown parameters for the multinomial.
µjk µjk
pjk = =
n µ++
P2 P2
µ++ = j=1 k=1 µjk =
P2 P2
j=1 k=1 exp[µ + λX Y XY
j + λk + λjk ] =
P2 P2
exp[µ] j=1 k=1 exp[λX Y XY
j + λk + λjk ]
µjk
pjk =
µ++
exp[µ + λX Y XY
j + λk + λjk ]
= P2 P2 X Y XY
j=1 k=1 exp[µ + λj + λk + λjk ]
exp[µ] exp[λX Y XY
j + λk + λjk ]
= P P
exp[µ] 2j=1 2k=1 exp[λX Y XY
j + λk + λjk ]
exp[λX Y XY
j + λk + λjk ]
= P2 P2 X + λY + λXY ]
,
j=1 k=1 exp[λj k jk
with
µjk
pjk =
µ++
exp[λX Y XY
j + λk + λjk ]
pjk = P2 P2 X + λY + λXY ]
,
j=1 k=1 exp[λj k jk
• If the data are from a double dichotomy, the multinomial likelihood is not a function of
µ. Thus, if you use a Poisson likelihood to estimate the log-linear model when the data
are multinomial, the estimate of µ really is not of interest.
• We will use this in SAS Proc Catmod to obtain the estimates using the multinomial
likelihood.
• For the Multinomial likelihood in SAS Proc Catmod, we write the log-linear model for
the three probabilities (p11 , p12 , p21 ) as:
exp(λX Y XY
1 + λ1 + λ11 )
p11 =
exp(λX
1 + λY XY X Y
1 + λ11 ) + exp(λ1 ) + exp(λ1 ) + 1
exp(λX1 )
p12 =
exp(λX Y
1 + λ1 + λXY X Y
11 ) + exp(λ1 ) + exp(λ1 ) + 1
exp(λY1 )
p21 =
exp(λX Y
1 + λ1 + λXY X Y
11 ) + exp(λ1 ) + exp(λ1 ) + 1
2 X
X 2
exp[λX Y XY
j + λk + λjk ]
j=1 k=1
0
exp[λX Y XY
2 + λ2 + λ22 ] = e = 1
• Using SAS Proc Catmod, we make the design matrix equal to the combinations of
(λX Y XY
1 , λ1 , λ11 ) found in the exponential function in the numerators:
2 3 2 32 3
λX
1 + λY
1 + λXY
11 1 1 1 λX
1
6 7 6 76 7
4 λX
1 5=4 1 0 0 54 λY
1 5
λY
1 0 1 0 XY
λ11
data one;
input vitc cold count;
cards;
1 1 17
1 2 122
2 1 31
2 2 109
;
run;
Response Profiles
Standard Chi-
Effect Parameter Estimate Error Square Pr > ChiSq
---------------------------------------------------------------------
Model 1 0.1127 0.1318 0.73 0.3926
2 -1.2574 0.2035 38.16 <.0001
3 -0.7134 0.3293 4.69 0.0303
bCOLD = −1.2574
λ1
λV
11
IT C,COLD
= log(OR) = −0.7134
e(−0.7134) = 0.49
• Now, suppose the data are from a prospective study, or, equivalently, we condition on
the row totals of the (2 × 2) table. We know that, conditional on the row totals
n1 = Y1+ and n2 = Y2+ are fixed, and the total sample size is n++ = n1 + n2 .
• Further, we are left with a likelihood that is a product of two independent row binomials.
where
µ11 µ11
p1 = P [Y = 1|X = 1] = = ;
µ1+ µ11 + µ12
and
(Y21 |Y2+ = y2+ ) ∼ Bin(y2+ , p2 )
where
µ21 µ21
p2 = P [Y = 1|X = 2] = =
µ2+ µ21 + µ22
l∗ = −(n1 +n2 )+y11 log(n1 p1 )+y12 log(n1 (1−p1 ))+y21 log(n2 p2 )+y22 log(n2 (1−p2 ))
µ11
p1 =
µ11 + µ12
exp(µ + λX Y XY
1 + λ1 + λ11 )
=
exp(µ + λX Y XY X
1 + λ1 + λ11 ) + exp(µ + λ1 )
exp(µ + λX Y XY
1 ) exp(λ1 + λ11 )
=
exp(µ + λX Y XY
1 )[exp(λ1 + λ11 ) + 1]
exp(λY XY
1 + λ11 )
=
1 + exp(λY XY
1 + λ11 )
µ21
p2 =
µ21 + µ22
exp(µ + λY1 )
=
exp(µ + λY
1 ) + exp(µ)
exp(µ) exp(λY
1 )
=
exp(µ)[exp(λY
1 ) + 1]
exp(λY
1 )
=
1 + exp(λY
1 )
• Now, conditional on the row totals (as in a prospective study), we are left with two free
probabilities (p1 , p2 ), and the conditional likelihood is a function of two free parameters
(λY XY
1 , λ11 ).
• Looking at the previous pages, the conditional probabilities of Y given X from the
log-linear model follow a logistic regression model:
px = P [Y = 1|X ∗ = x∗ ]
Y
e[λ1 +λXY ∗
11 x ]
= Y +λXY
e[λ1 11 x ]
∗
+1
∗
e[β0 +β1 x ]
=
1 + e[β0 +β1 x ]
∗
where
(
1 if x = 1
x∗ = .
0 if x = 2
and
β0 = λY
1
and
β1 = λXY
11
β0 = λY COLD
1 = λ1
V IT C,COLD
β1 = λXY
11 = λ11
data one;
input vitc cold count;
if vitc=2 then vitc=0;
if cold=2 then cold=0;
cards;
1 1 17
1 2 122
2 1 31
2 2 109
;
run;
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Estimates
βb1 = λV
11
IT C,COLD
= log(OR) = −0.7134
• Which are the same as for the Poisson and Multinomial Log-Linear Models.
Lecture 22: Introduction to Log-linear Models – p. 57/59
Recap
• Except for combinatorial terms that are not function of any unknown parameters, using
µjk from the previous table, the kernel of the log-likelihood for any of these design can
be written as
• In this likelihood, the table total µ++ is actually known for all designs,
Double Dichotomy n
Prospective n1 + n2
Case Control n1 + n2
µ++ = E(Y++ )
Key Points:
• We have introduced Log-linear models
• We have defined a parameter in the model to represent the OR
• We do not have an “outcome” per se
• If you can designate an outcome, you minimize the number of parameters estimated
• You should feel comfortable writing likelihoods, If not, you have 3 weeks to gain the
comfort
• Expect the final exam to have at least one likelihood problem