0% found this document useful (0 votes)
42 views

Logistic Regression

1) Generalized linear models (GLMs) relate a response variable to explanatory variables through a linear predictor and link function. They allow modeling of non-normal response variables. 2) Logit models are a type of GLM used for binary response variables. They use the logit link function to model the log-odds of a success. 3) Logistic regression models the relationship between a binary response and explanatory variables using a logistic curve. It addresses limitations of the linear probability model. The logistic regression coefficients can be interpreted as log-odds ratios.

Uploaded by

Md. Karimuzzaman
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Logistic Regression

1) Generalized linear models (GLMs) relate a response variable to explanatory variables through a linear predictor and link function. They allow modeling of non-normal response variables. 2) Logit models are a type of GLM used for binary response variables. They use the logit link function to model the log-odds of a success. 3) Logistic regression models the relationship between a binary response and explanatory variables using a logistic curve. It addresses limitations of the linear probability model. The logistic regression coefficients can be interpreted as log-odds ratios.

Uploaded by

Md. Karimuzzaman
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 8

Models for Binary Response Variables

Components of a Generalized Linear Model


Suppose the N observations on Y are independent and denote their values by  y1 , y2 , ..., y N  .
We assume that each Yi has probability density function or mass function of the form
f  yi , i   a  i  b  yi  exp  yi Q  i   ; i  1, 2, ..., N     1 .
The term Q    is called the natural parameter of the distribution. A more general representation
of  1 is given by
  yii  b  i   
f  yi ;i ,    exp     c y , 
 i 
 a 

The function a    often has form a     w for known weight wi and  is called the dispersion
i

parameter.

Let xi1 , ..., xit denote values of t explanatory variables for the ith observation. The systematic
component, the linear model (GLM), relates parameters  ni  to the explanatory variables using a

linear predictor
i   x
j
j ij ; i  1, 2, ..., N
. In matrix form

  X  where    1 , ...,  N   ,    1 , ..., t   , are model parameters and X is the N  t model


matrix.
The link function, the third component of a GLM, connects the expectation i of Yi to the linear
predictor by i  g  i  where g is a monotone, differentiable function. Thus, a GLM links the

expected value of the response to the explanatory variables through the equation
g  i    x
j
i ij
.

The function g for which g  i   i is called the canonical link, So that the relationship between

the natural parameter and the linear predictor is


i   x
j
j ij
.
Logit Models
Many categorical response variables have only two categories. The observation for each subject
might be classified as a “Success” or a “Failure”. Represent these outcomes by 1 and 0. The
Bernoulli distribution for binary random variables specifies probabilities
p  Y  1   and p  Y  0   1   for the two outcomes, for which   E  Y  . When Yi has Bernoulli

distribution with parameter  i , the probability mass function is


f  yi ;  i    iyi  1   i 
1 yi

y
  i
  1 i   i 
 1 i 
   
  1   i  exp  yi log  i   for yi  0 and 1
  1   i  

Models for Binary Response Variables ~ 1 of 8


  
This distribution is in the natural exponential family. The natural parameter Q     log  ,
 
1 
the log odds of response 1, is called the logit of  . GLMs that use the logit link are called logit
models.
Log-linear Models
Let ni denote the count in the ith cell and let mi  E  ni  denote its expected value, i  1, 2, ..., N .
The Poisson probability mass function of ni is
e  mi mini
f  ni ; mi  
ni !
 1  ni log mi
f  ni ; mi   e mi  e
 ni ! 
for non-negative integer values of ni . This has natural exponential form
f  yi ;i   a  i  b  yi  exp  yi Q  i  

where, yi  ni and i  mi , a  i   e  mi ,
1
b  ni   and Q  mi   log  mi 
ni !
For the Poisson distribution, a GLM links a monotone function of mi to explanatory variables
through a linear model.
Since, the natural parameter in log  mi  , the canonical link function is the log link, i  log  mi  .
The model using this link is
log  mi    x
j
j ij ; i  1, 2, ..., N     1 .

Model  1 is called a loglinear model for a contingency table.


Linear Probability Model
For a Binary response, the regression model
E  Y     x     x
is called a linear probability model. The linear probability model has a major structural defect.
Probabilities must fall between 0 and 1, where as linear functions take values over the entire real
line.
Logistic Regression Model
Because of the structural problems with the linear probability model, it is more fruitful to study
models implying a curvilinear relationship between x and   x  . When we expect a monotonic
relationship, the S-shaped curves are natural shapes for regression curves a function having this
shape
exp     x 
  x      1
1  exp     x 
is called the logistic regression function. When the model holds with   0 , the binary response
is independent of X .

The logistic regression curve  1 has


  x 
   x  1    x   .
x

Models for Binary Response Variables ~ 2 of 8


For model  1 , the odds of making response 1 are
  x
 
x
 exp     x   e e 
1   x
This formula provides a basic interpretation for  . The odds increase multiplicatively by e  for
every unit increase in x . The log odds has the linear relationship
   x 
log 
 1   x      x
 

Inference for Logistic Regression


From Wald’s (1943) general asymptotic results for ML estimators, it follows that parameter
estimators in logistic models have large sample normal distribution.

Let     1 ,  2 , ...,  q   denote a subset of model parameters. Suppose we want to test H 0 :   0 .

Let M1 denote the fitted model and M 2 denote the simpler model with   0 . Large sample tests
can use Wilk’s (1938) likelihood ratio approach, with test statistic based on twice the log of the
ratio of maximized likelihoods for M1 and M 2 . Let L1 denote the maximized log likelihood for
M1 and L2 denote the maximized log likelihood for M 2 under H 0 , the statistic
2  L2  L1 
has a large sample chi-squared distribution with degrees of freedom q .

Alternatively, by the large sample normality of parameter estimators, the statistic


1
ˆ  ˆ   ˆ
ˆ cov
has the same limiting null distribution (Wald 1943). This is called a Wald statistic.

Odds Ratio and Coefficient of the Linear Logistic Regression


When the independent variables are dichotomous or polychotomous, the logistic regression
coefficients can be linked with odds ratios. In the linear logistic model, the dependence of the
probability of success on independent variables is assumed to be
 p 

exp  i xij 
 j 0 
 
i      1
 p 

1  exp  i xij 
 j 0 
 
1
1 i 
 p 
and 1  exp 
  x i ij


 j 0 
consider the simplest case, where there is one independent variable, x1 which is either 0 or1. The
linear regression model in  1 and  ii  becomes
e 0  1x1 1
p  y  1| x1   0  1x1
and p  y  0 | x1   0  1 x1
1 e 1 e

Models for Binary Response Variables ~ 3 of 8


Values of the model, when x1  0 and 1 , are
e 0 e 0  1
p  y  1| x1  0   p  y  1| x1  1 
1  e 0 1  e 0  1
1 1
p  y  0 | x1  0   p  y  0 | x1  1 
1  e 0 1  e 0  1
Thus we get the odds Ratio as
e 0  1
1  e 0  1
p  y  1| x  1 1
p  y  0 | x  1  0  1 e 0  1
OR   1  e    e 1
p  y  1| x  0  e 0 e 0
p  y  0 | x  0 1  e 0
1
1  e 0
ˆ  e 1
 OR

and the log odds ratio is log  OR 


ˆ  e 1

Thus the estimated logistic regression coefficient also provides an estimate of the odds ratio i.e.
ˆ  x
ÔR  e 1 . If the confidence interval for  is x   then the confidence interval for e is e

Logit Model for Categorical Data

Logit Model for I  2 table


Suppose there is a single explanatory factor, having I categories. In row i of the I  2 table, the
two response probabilities are 1|i and  2|i , with 1|i   2|i  1 . In the logit model,
 1| i 
log 
  2|i
    i

    1 .
 
where  i  describes the effects of the factor on the response.

Let  nij  denote the number of times response j occurs when the factor is at level i . It is usual
to treat as fixed the total counts  ni   ni1  ni 2  at the I factor levels. When binary responses are
independent Bernoulli ransom variables,  ni1 are independent binomial random variables with
parameters  1|i  .

For any set  1|i  0 , there exist  i  such that model  1 holds that model has as many
parameters as binomial observations, and it is said to be saturated. When a factor has no effect
on the response variables the simpler model
 1| i 
log 
  2|i
 

    ii 
 
holds. This is the special case of  i  in which 1   2  ...   I . Since it is equivalent to
 1|1   1|2  ...   1| I ,  ii  is the model of statistical independence of the response and factor.
Goodness of fit as a Likelihood Ratio Test
Models for Binary Response Variables ~ 4 of 8
For a given logit model, we can use model parameter estimates to calculate predicted logits and
hence predicted probabilities and estimated expected frequencies
mˆ ij  ni ˆ j |i .

When expected frequencies are relatively large we can test goodness of fit with a Pearson or
Likelihood ratio chi-squared statistic. For a model symbolized by M , we denote this statistics by
 2  M  and G 2  M  . For instance,
 nij 
G2  M   2  n
i j
ij log 
 mˆ ij



The degrees of freedom equal the number of logits minus the number of linearly independent
parameters in the model.

We used the likelihood ratio principle to construct a statistic


2  L2  L1 

That tests whether certain model parameters are zero by comparing the fitted model M1 with a
simpler model M 2 . When explanatory variables are categorical, we denote this statistic for
testing M 2 , given that M1 holds, by G 2  M 2 | M1  . Let Ls denote the maximized log-likelihood for
the saturated model. The likelihood-ratio statistic for comparing models M1 and M 2 is
G 2  M 2 | M1   2  L2  L1 
 2  L2  Ls    2  L2  Ls  
 G 2  M 2   G 2  M1 

That is, the test statistic for comparing two models is identical to the difference in G 2 goodness
of fit statistics for the two models.

Model Diagnostics

Residuals
Let yi denote the number of successes for ni trials at the ith level of I settings of the
explanatory variables. For a binary response model, residuals for the fits provided by the I
binomial distributions are
yi  ni ˆ1|i
ei  1
; i  1, 2, ..., I     i
 
 niˆ1|i 1  ˆ1|i
 2

If ˆ1|i were replaced by the true value 1|i in  i  , ei would be the difference between a binomial
random variable and its expectation, divided by its estimated standard deviation, if ni were
large, ei would have an approximate standard normal distribution.

Models for Binary Response Variables ~ 5 of 8


The  1|i  are unknown, however, so  i  replaces them by their estimates for the model. Because
the estimates depend on  yi  ,  yi  niˆ1|i  tend to be smaller than  yi  ni1|i  . Thus,  ei  tend to
show less variation than standard normal random variables. In fact, the Pearson statistic for
testing the fit of the model is related to  ei  by    ei .
2 2

If  2 has d.f. v , it follows that the sum of squared residuals is asymptotically comparable to the
sum of squares of v (rather than I ) standard normal random variables. Despite this, residuals
are often treated like standard normal deviates with absolute values larger than 2 indicating
possible lack of fit.

Estimation of Logistic Regression Parameters


Let xi   xi 0 , xi1 , ..., xik  denote the ith setting of values of k explanatory variables,
i  1, 2, ..., n where, xi 0  1 . We can express the logistic regression model as
exp  X  
i 
1  exp  X  
exp  X   1 1
1 i  1   1  exp  X        i
1  exp  X   1  exp  X   
k
where, X    j xij ; i  1, 2, ..., n ; j  1, 2, ..., k     *
j 0

When more than one observation on Y occurs at a fixed xi value, it is sufficient to record the
number of observations ni and the number of “1” outcomes, thus we let Yi refers to this
“success” count rather than to individual binary responses.

Hence, Yi ~ b  ni ,  i  ; i  1, 2, ..., n are independent binomial random variables. So we can write


the probability mass function as follows
 ni 
f i  yi      i yi  1   i  i i
n y
... ... ...  ii 
y
 i
The joint probability mass function of  Y1 , ..., Yn  is proportional to the product of n binomial
functions, such that the Likelihood function can be written as follows
n n
n 
 fi  yi     yii   i y  1   i 
ni  yi
L  i

i 1 i 1

Taking log both side of this Likelihood function and easily obtained as follows
n n n
n 
L  L og L   ln  yii    yi ln  i    ni  yi  ln  1   i 
i 1 i 1 i 1

 yi ln  1  exp  X       ni  yi  ln  1  exp  X    


n  exp  X    n
1
 L
i 1   i 1
n  exp  X    n n
 L yi ln  
 1  exp  X    
 ni ln 1  exp  X     
yi ln 1  exp  X   
i 1   i 1 i 1

n   exp  X     n
 L yi ln  
  1  exp  X     i 1

 ln 1  exp  X      ni ln 1  exp  X   
i 1

Models for Binary Response Variables ~ 6 of 8


n  exp  X     n
 L  yi ln  1 exp  X   1  exp  X       ni ln 1  exp  X   
i 1    i 1
n n
 L  yi ln exp  X      ni ln 1  exp  X   
i 1 i 1
n n
 L  yi X    ni ln 1  exp  X   
i 1 i 1
n k n   k 
 L  yi  X ij  j   ni ln 1  exp   X ij  j   Using equation *     iii 
i 1 j 0 i 1   j 0 
Differentiating likelihood equations  iii  with respect to elements of  a and setting the results
equal to zero. Since,
 k 
n n
exp 
 j 0 
X ij  j  X ia

L  

 a i 1
 yi X ia  ni   k 
0 ; a  0, 1, 2, ..., k     iv 
i 1
1  exp  
X  
 j 0 ij j 
 
 k 
n n
exp 
 j 0 
X ij  j 

  X 0
  yi X ia  ni   k 
ia
i 1 i 1
1  exp  
X  
 j  0 ij j 
 
n n
  yi X ia   ni X ia i  0 ... ... ...  v
i 1 i 1
 y1 X 1a  y2 X 2 a  ...  yn X na  n1 X 1a1  n2 X 2 a 2  ...  nn X na n  0
 X aY  X a   0
 X a  Y     0

where, X a   X1a , ..., X na  ; Y   y1 , ..., yn   ;     n1 1 , ..., nn n  .

Again differentiate equation  iv  with respect to elements of b and obtain results as follows
 k 
n n
exp 
 j 0 
X ij  j 

L   0

 a i 1
yi X ia  ni X ia   k 
i 1
1  exp 
 j 0 
X ij  j 

 
  k    k   k   k 
1  exp 
  j 0 
X ij  j   exp 
  j 0 
X ij  j  X ib  exp 
  j 0 
X ij  j  exp 
  j 0X ij  j  X ib

L2 n
       
   ni X ia 

 a  b i 1   k 
2

1  exp 
  j 0 
X ij  j  

  
   
2
  k   k  
2 n
 exp  X ij  j 
   exp 
   
X ij  j  X ib  

L   j 0    j 0   

 a b

  ni X ia X ib 
   
i 1   k    k
 1  exp  X   
 j 0 ij j     
 1  exp  X   
 j 0 ij j     
        
 

Models for Binary Response Variables ~ 7 of 8


L2 n

 a b i 1

  ni X ia X ib  i   i2  ; a  0, 1, ..., k ; b  0, 1, ..., k     vi 
L2
    n1 X1a X 1b1  1  1   ...  nn X na X nb n  1   n  
 a b
L2
   X a Diag  n  1     X b ... ... ...  vii 
 a b

X a   X 1a ,..., X na  ; X b   X 1b ,..., X nb  and


 n11  1  1  0 ... 0 
 
where, 0 n2 2  1   2  ... 0 
Diag  n  1       
... ... ... 
0
 0 ... n 
n n  1   
n 

nn

We estimate the variance-covariance matrix by substituting ˆ into the matrix having elements
equal to the negative of  vii  and investing the estimated variance covariance matrix has the
form
   
1
ˆ Cov ˆ  X  Diag  n ˆ  1  ˆ   X
Var  i i i 

where Diag  niˆi  1  ˆi   denote the n  n diagonal matrix having elements  niˆi  1  ˆi   on the
main diagonal. The square roots of estimated standard errors of model parameter estimators.
From  v and  vi  let
 t  L   n n
qj  |  t    yi X ij   ni X ij  i
 j  i 1 i 1

  yi  ni i   X ij
n
 X j  Y   
t

i 1

 2L  
 
n
 
|  t    ni X ia X ib i  1   i 
t t t
and hab 
 ab 
i 1

 
  X a Diag n i 1   i 
t
 X b

Here   t  , t th approximation for ˆ , is obtained from   t  through


 k 
exp 
 
 j  xij 
t

 i  
j 0  
t
    vii 
 k 
1  exp 
 j 0 
 j  xij 
t

 
We use  t  t  with formula,
q and h

 
1

t 1
     h  q 
t t t

to obtain the next value   t 1 , this is

     
1
      X  Diag  ni  i  1   i  X 
t 1
X  y     viii 
t t t t
  
 

where, i t   ni i t  .


This is used to obtain   t 1 and so forth.

Models for Binary Response Variables ~ 8 of 8

You might also like