Logistic Regression
Logistic Regression
parameter.
Let xi1 , ..., xit denote values of t explanatory variables for the ith observation. The systematic
component, the linear model (GLM), relates parameters ni to the explanatory variables using a
linear predictor
i x
j
j ij ; i 1, 2, ..., N
. In matrix form
expected value of the response to the explanatory variables through the equation
g i x
j
i ij
.
The function g for which g i i is called the canonical link, So that the relationship between
y
i
1 i i
1 i
1 i exp yi log i for yi 0 and 1
1 i
where, yi ni and i mi , a i e mi ,
1
b ni and Q mi log mi
ni !
For the Poisson distribution, a GLM links a monotone function of mi to explanatory variables
through a linear model.
Since, the natural parameter in log mi , the canonical link function is the log link, i log mi .
The model using this link is
log mi x
j
j ij ; i 1, 2, ..., N 1 .
Let M1 denote the fitted model and M 2 denote the simpler model with 0 . Large sample tests
can use Wilk’s (1938) likelihood ratio approach, with test statistic based on twice the log of the
ratio of maximized likelihoods for M1 and M 2 . Let L1 denote the maximized log likelihood for
M1 and L2 denote the maximized log likelihood for M 2 under H 0 , the statistic
2 L2 L1
has a large sample chi-squared distribution with degrees of freedom q .
Thus the estimated logistic regression coefficient also provides an estimate of the odds ratio i.e.
ˆ x
ÔR e 1 . If the confidence interval for is x then the confidence interval for e is e
Let nij denote the number of times response j occurs when the factor is at level i . It is usual
to treat as fixed the total counts ni ni1 ni 2 at the I factor levels. When binary responses are
independent Bernoulli ransom variables, ni1 are independent binomial random variables with
parameters 1|i .
For any set 1|i 0 , there exist i such that model 1 holds that model has as many
parameters as binomial observations, and it is said to be saturated. When a factor has no effect
on the response variables the simpler model
1| i
log
2|i
ii
holds. This is the special case of i in which 1 2 ... I . Since it is equivalent to
1|1 1|2 ... 1| I , ii is the model of statistical independence of the response and factor.
Goodness of fit as a Likelihood Ratio Test
Models for Binary Response Variables ~ 4 of 8
For a given logit model, we can use model parameter estimates to calculate predicted logits and
hence predicted probabilities and estimated expected frequencies
mˆ ij ni ˆ j |i .
When expected frequencies are relatively large we can test goodness of fit with a Pearson or
Likelihood ratio chi-squared statistic. For a model symbolized by M , we denote this statistics by
2 M and G 2 M . For instance,
nij
G2 M 2 n
i j
ij log
mˆ ij
The degrees of freedom equal the number of logits minus the number of linearly independent
parameters in the model.
That tests whether certain model parameters are zero by comparing the fitted model M1 with a
simpler model M 2 . When explanatory variables are categorical, we denote this statistic for
testing M 2 , given that M1 holds, by G 2 M 2 | M1 . Let Ls denote the maximized log-likelihood for
the saturated model. The likelihood-ratio statistic for comparing models M1 and M 2 is
G 2 M 2 | M1 2 L2 L1
2 L2 Ls 2 L2 Ls
G 2 M 2 G 2 M1
That is, the test statistic for comparing two models is identical to the difference in G 2 goodness
of fit statistics for the two models.
Model Diagnostics
Residuals
Let yi denote the number of successes for ni trials at the ith level of I settings of the
explanatory variables. For a binary response model, residuals for the fits provided by the I
binomial distributions are
yi ni ˆ1|i
ei 1
; i 1, 2, ..., I i
niˆ1|i 1 ˆ1|i
2
If ˆ1|i were replaced by the true value 1|i in i , ei would be the difference between a binomial
random variable and its expectation, divided by its estimated standard deviation, if ni were
large, ei would have an approximate standard normal distribution.
If 2 has d.f. v , it follows that the sum of squared residuals is asymptotically comparable to the
sum of squares of v (rather than I ) standard normal random variables. Despite this, residuals
are often treated like standard normal deviates with absolute values larger than 2 indicating
possible lack of fit.
When more than one observation on Y occurs at a fixed xi value, it is sufficient to record the
number of observations ni and the number of “1” outcomes, thus we let Yi refers to this
“success” count rather than to individual binary responses.
i 1 i 1
Taking log both side of this Likelihood function and easily obtained as follows
n n n
n
L L og L ln yii yi ln i ni yi ln 1 i
i 1 i 1 i 1
n exp X n
L yi ln
1 exp X i 1
ln 1 exp X ni ln 1 exp X
i 1
Again differentiate equation iv with respect to elements of b and obtain results as follows
k
n n
exp
j 0
X ij j
L 0
a i 1
yi X ia ni X ia k
i 1
1 exp
j 0
X ij j
k k k k
1 exp
j 0
X ij j exp
j 0
X ij j X ib exp
j 0
X ij j exp
j 0X ij j X ib
L2 n
ni X ia
a b i 1 k
2
1 exp
j 0
X ij j
2
k k
2 n
exp X ij j
exp
X ij j X ib
L j 0 j 0
a b
ni X ia X ib
i 1 k k
1 exp X
j 0 ij j
1 exp X
j 0 ij j
We estimate the variance-covariance matrix by substituting ˆ into the matrix having elements
equal to the negative of vii and investing the estimated variance covariance matrix has the
form
1
ˆ Cov ˆ X Diag n ˆ 1 ˆ X
Var i i i
where Diag niˆi 1 ˆi denote the n n diagonal matrix having elements niˆi 1 ˆi on the
main diagonal. The square roots of estimated standard errors of model parameter estimators.
From v and vi let
t L n n
qj | t yi X ij ni X ij i
j i 1 i 1
yi ni i X ij
n
X j Y
t
i 1
2L
n
| t ni X ia X ib i 1 i
t t t
and hab
ab
i 1
X a Diag n i 1 i
t
X b
1
t 1
h q
t t t
1
X Diag ni i 1 i X
t 1
X y viii
t t t t