0% found this document useful (0 votes)
126 views

Binary Logistic Regression With PASW: Karl L. Wuensch Dept of Psychology East Carolina University

- The document describes using binary logistic regression to predict whether research subjects will vote to continue or stop animal research based on their gender. - The model found that gender significantly predicted decisions, with men being 3.4 times more likely than women to vote to continue the research. - When classifying cases based on a 50% probability cutoff, the logistic regression model correctly predicted the decision for 66% of subjects, outperforming a simple prediction based only on group probabilities.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views

Binary Logistic Regression With PASW: Karl L. Wuensch Dept of Psychology East Carolina University

- The document describes using binary logistic regression to predict whether research subjects will vote to continue or stop animal research based on their gender. - The model found that gender significantly predicted decisions, with men being 3.4 times more likely than women to vote to continue the research. - When classifying cases based on a 50% probability cutoff, the logistic regression model correctly predicted the decision for 66% of subjects, outperforming a simple prediction based only on group probabilities.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 81

Binary Logistic Regression

with PASW
Karl L. Wuensch
Dept of Psychology
East Carolina University
Download the Instructional
Document
• http://core.ecu.edu/psyc/wuenschk/SPSS/
SPSS-MV.htm
.
• Click on Binary Logistic Regression .
• Save to desktop.
• Open in Word.
When to Use Binary Logistic Regression

• The criterion variable is dichotomous.


• Predictor variables may be categorical or
continuous.
• If predictors are all continuous and nicely
distributed, may use discriminant function
analysis.
• If predictors are all categorical, may use
logit analysis.
Wuensch & Poteat, 1998
• Cats being used as research subjects.
• Stereotaxic surgery.
• Subjects pretend they are on university
research committee.
• Complaint filed by animal rights group.
• Vote to stop or continue the research.
Purpose of the Research
• Cosmetic
• Theory Testing
• Meat Production
• Veterinary
• Medical
Predictor Variables
• Gender
• Ethical Idealism (9-point Likert)
• Ethical Relativism (9-point Likert)
• Purpose of the Research
Model 1: Decision = Gender
• Decision 0 = stop, 1 = continue
• Gender 0 = female, 1 = male
• Model is ….. logit =

 Yˆ 
lnODDS   ln   a  bX
1  Yˆ
 
• Yˆ is the predicted probability of the event
which is coded with 1 (continue the research)
rather than with 0 (stop the research).
Iterative Maximum Likelihood
Procedure
• PASW starts with arbitrary regression
coefficents.
• Tinkers with the regression coefficients to
find those which best reduce error.
• Converges on final model.
PASW
• Bring the data into PASW
• http://core.ecu.edu/psyc/wuenschk/SPSS/
Logistic.sav

• Analyze, Regression, Binary Logistic


• Decision  Dependent
• Gender  Covariate(s), OK
Look at the Output
Case Processing Summary
a
Unweighted Cases N Percent
Selected Cases Included in Analysis 315 100.0
Missing Cases 0 .0
Total 315 100.0
Unselected Cases 0 .0
Total 315 100.0
a. If weight is in effect, see classification table for the total
number of cases.

• We have 315 cases.


Block 0 Model, Odds
• Look at Variables in the Equation.
• The model contains only the intercept
(constant, B0), a function of the marginal
distribution of the decisions.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step 0 Constant -.379 .115 10.919 1 .001 .684

 Yˆ 
ln ODDS   ln   .379
1  Yˆ
 
Exponentiate Both Sides
• Exponentiate both sides of the equation:
• e-.379 = .684 = Exp(B0) = odds of deciding to
continue the research.

Yˆ 128
 Exp( .379 )  .684 
1  Yˆ 187

• 128 voted to continue the research, 187 to stop


it.
Probabilities
• Randomly select one participant.
• P(votes continue) = 128/315 = 40.6%
• P(votes stop) = 187/315 = 59.4%
• Odds = 40.6/59.4 = .684
• Repeatedly sample one participant and
guess how e will vote.
Humans vs. Goldfish
• Humans Match Probabilities
– (suppose p = .7, q = .3)
– .7(.7) + .3(.3) = .49 + .09 = .58
• Goldfish Maximize Probabilities
– .7(1) = .70
• The goldfish win!
PASW Model 0 vs. Goldfish
• Look at the Classification Table for Block 0.
Classification Tablea,b

Predicted

decision Percentage
Observed stop continue Correct
Step 0 decision stop 187 0 100.0
continue 128 0 .0
Overall Percentage 59.4
a. Constant is included in the model.
b. The cut value is .500

• PASW Predicts “STOP” for every


participant.
• PASW is as smart as a Goldfish here.
Block 1 Model
• Gender has now been added to the model.
• Model Summary: -2 Log Likelihood = how
poorly model fits the data.

Model Summary

-2 Log Cox & Snell Nagelkerke


Step likelihood R Square R Square
1 399.913a .078 .106
a. Estimation terminated at iteration number 3 because
parameter estimates changed by less than .001.
Block 1 Model

• For intercept only, -2LL = 425.666.


• Add gender and -2LL = 399.913.
• Omnibus Tests: Drop in -2LL = 25.653 =
Model 2.
• df = 1, p < .001. Omnibus Tests of Model Coefficients

Chi-square df Sig.
Step 1 Step 25.653 1 .000
Block 25.653 1 .000
Model 25.653 1 .000
Variables in the Equation
• ln(odds) = -.847 + 1.217Gender

a  bGender
ODDS  e
Variables in the Equation

B S.E. Wald df Sig. Exp(B)


Step
a
gender 1.217 .245 24.757 1 .000 3.376
1 Constant -.847 .154 30.152 1 .000 .429
a. Variable(s) entered on step 1: gender.
Odds, Women
.847  1.217 ( 0 ) .847
ODDS  e e  0.429

• A woman is only .429 as likely to decide to


continue the research as she is to decide
to stop it.
Odds, Men

.847 1.217 (1)


ODDS  e e .37
 1.448

• A man is 1.448 times more likely to vote to


continue the research than to stop the research.
Odds Ratio
male _ odds 1.448
  3.376  e1.217
female _ odds .429

• 1.217 was the B (slope) for Gender, 3.376 is the


Exp(B), that is, the exponentiated slope, the
odds ratio.
• Men are 3.376 times more likely to vote to
continue the research than are women.
Convert Odds to Probabilities
• For our women,

ˆ ODDS 0.429
Y    0.30
1  ODDS 1.429

• For our men,

ˆ ODDS 1.448
Y    0.59
1  ODDS 2.448
Classification
• Decision Rule: If Prob (event)  Cutoff,
then predict event will take place.
• By default, PASW uses .5 as Cutoff.
• For every man, Prob(continue) = .59,
predict he will vote to continue.
• For every woman Prob(continue) = .30,
predict she will vote to stop it.
Overall Success Rate
• Look at the Classification Table
Classification Tablea

Predicted

decision Percentage
Observed stop continue Correct
Step 1 decision stop 140 47 74.9
continue 60 68 53.1
Overall Percentage 66.0
a. The cut value is .500

140  68 208
  66%
315 315
• PASW beat the Goldfish!
Sensitivity
• P (correct prediction | event did occur)
• P (predict Continue | subject voted to Continue)
• Of all those who voted to continue the research,
for how many did we correctly predict that.

68 68
  53%
68  60 128
Specificity
• P (correct prediction | event did not occur)
• P (predict Stop | subject voted to Stop)
• Of all those who voted to stop the research, for
how many did we correctly predict that.

140 140
  75%
140  47 187
False Positive Rate
• P (incorrect prediction | predicted occurrence)
• P (subject voted to Stop | we predicted Continue)
• Of all those for whom we predicted a vote to Continue
the research, how often were we wrong.

47 47
  41%
47  68 115
False Negative Rate
• P (incorrect prediction | predicted nonoccurrence)
• P (subject voted to Continue | we predicted Stop)
• Of all those for whom we predicted a vote to Stop the
research, how often were we wrong.

60 60
  30%
140  60 200
Pearson  2
• Analyze, Descriptive Statistics, Crosstabs
• Gender  Rows; Decision  Columns
Crosstabs Statistics
• Statistics, Chi-Square, Continue
Crosstabs Cells
• Cells, Observed Counts, Row
Percentages
Crosstabs Output
• Continue, OK
• 59% & 30% match logistic’s predictions.
gender * decision Crosstabulation

decision
stop continue Total
gender Female Count 140 60 200
% within gender 70.0% 30.0% 100.0%
Male Count 47 68 115
% within gender 40.9% 59.1% 100.0%
Total Count 187 128 315
% within gender 59.4% 40.6% 100.0%
Crosstabs Output
• Likelihood Ratio 2 = 25.653, as with
logistic.
Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 25.685b 1 .000
Likelihood Ratio 25.653 1 .000
N of Valid Cases 315
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 46.73.
Model 2: Decision =
Idealism, Relativism, Gender
• Analyze, Regression, Binary Logistic
• Decision  Dependent
• Gender, Idealism, Relatvsm
Covariate(s)
• Click Options and check “Hosmer-
Lemeshow goodness of fit” and “CI for
exp(B) 95%.”

• Continue, OK.
Comparing Nested Models
• With only intercept and gender,
-2LL = 399.913.
• Adding idealism and relativism dropped
-2LL to 346.503, a drop of 53.41.
 2(2) = 399.913 – 346.503 = 53.41, p = ?
Model Summary

-2 Log Cox & Snell Nagelkerke


Step likelihood R Square R Square
1 346.503a .222 .300
a. Estimation terminated at iteration number 4 because
parameter estimates changed by less than .001.
Obtain p
• Transform, Compute
• Target Variable = p
• Numeric Expression =
1 - CDF.CHISQ(53.41,2)
p=?
• OK
• Data Editor, Variable View
• Set Decimal Points to 5 for p
p < .0001
• Data Editor, Data View
• p = .00000
• Adding the ethical ideology variables
significantly improved the model.
Hosmer-Lemeshow
• Hø: weighted combination of predictors is
related to outcome log odds in linear
fashion.
• Cases are arranged in order by their
predicted probability on the criterion.
• Then divided into ten groups (lowest decile
to highest decile)
• This gives ten rows in the table.
• The two columns are, for each row, how
many cases were the event, how many the
nonevent.
Contingency Table for Hosmer and Lemeshow Test

decision = stop decision = continue


Observed Expected Observed Expected Total
Step 1 29 29.331 3 2.669 32
1 2 30 27.673 2 4.327 32
3 28 25.669 4 6.331 32
4 20 23.265 12 8.735 32
5 22 20.693 10 11.307 32
6 15 18.058 17 13.942 32
7 15 15.830 17 16.170 32
8 10 12.920 22 19.080 32
9 12 9.319 20 22.681 32
10 6 4.241 21 22.759 27
• Note expected freqs decline in first
column, rise in second.
• The nonsignificant chi-square indicative of
fit of data with linear model.

Hosmer and Lemeshow Test

Step Chi-square df Sig.


1 8.810 8 .359
Model 3: Decision =
Idealism, Relativism, Gender, Purpose
• Need 4 dummy variables to code the five
purposes.
• Consider the Medical group a reference
group.
• Dummy variables are: Cosmetic, Theory,
Meat, Veterin.
• 0 = not in this group, 1 = in this group.
Add the Dummy Variables
• Analyze, Regression, Binary Logistic
• Add to the Covariates: Cosmetic, Theory,
Meat, Veterin.
• OK
Block 0
• Look at “Variables not in the Equation.”
• “Score” is how much -2LL would drop if a
single variable were added to the model
with intercept only.
Variables not in the Equation

Score df Sig.
Step Variables gender 25.685 1 .000
0 idealism 47.679 1 .000
relatvsm 7.239 1 .007
cosmetic .003 1 .955
theory 2.933 1 .087
meat .556 1 .456
veterin .013 1 .909
Overall Statistics 77.665 7 .000
Effect of Adding Purpose
• Our previous model had -2LL = 346.503.
• Adding Purpose dropped -2LL to 338.060.
Model Summary

-2 Log Cox & Snell Nagelkerke


Step likelihood R Square R Square
1 338.060 a .243 .327
a. Estimation terminated at iteration number 5 because
parameter estimates changed by less than .001.

 2(4) = 8.443, p = .0766.


• But I make planned comparisons (with medical
reference group) anyhow!
Classification Table
• YOU calculate the sensitivity, specificity,
false positive rate, and false negative rate.

Classification Tablea

Predicted

decision Percentage
Observed stop continue Correct
Step 1 decision stop 152 35 81.3
continue 54 74 57.8
Overall Percentage 71.7
a. The cut value is .500
Answer Key
• Sensitivity = 74/128 = 58%
• Specificity = 152/187 = 81%
• False Positive Rate = 35/109 = 32%
• False Negative Rate = 54/206 = 26%
Wald Chi-Square
• A conservative test of the unique
contribution of each predictor.
• Presented in Variables in the Equation.
• Alternative: drop one predictor from the
model, observe the increase in -2LL, test
via 2.
Variables in the Equation

95.0% C.I.for EXP(B)


B Wald df Sig. Exp(B) Lower Upper
Step
a
gender 1.255 20.586 1 .000 3.508 2.040 6.033
1 idealism -.701 37.891 1 .000 .496 .397 .620
relatvsm .326 6.634 1 .010 1.386 1.081 1.777
cosmetic -.709 2.850 1 .091 .492 .216 1.121
theory -1.160 7.346 1 .007 .314 .136 .725
meat -.866 4.164 1 .041 .421 .183 .966
veterin -.542 1.751 1 .186 .581 .260 1.298
Constant 2.279 4.867 1 .027 9.766
a. Variable(s) entered on step 1: gender, idealism, relatvsm, cosmetic, theory, meat, veterin.
Odds Ratios – Exp(B)
• Odds of approval more than cut in half (.496) for
each one point increase in Idealism.
• Odds of approval multiplied by 1.39 for each one
point increase in Relativism.
• Odds of approval if purpose is Theory Testing
are only .314 what they are for Medical
Research.
• Odds of approval if purpose is Agricultural
Research are only .421 what they are for
Medical research
Inverted Odds Ratios
• Some folks have problems with odds
ratios less than 1.
• Just invert the odds ratio.
• For example, 1/.421 = 2.38.
• That is, respondents were more than two
times more likely to approve the medical
research than the research designed to
feed to poor in the third world.
Classification Decision Rule
• Consider a screening test for Cancer.
• Which is the more serious error
– False Positive – test says you have cancer,
but you do not
– False Negative – test says you do not have
cancer but you do
• Want to reduce the False Negative rate?
Classification Decision Rule
• Analyze, Regression, Binary Logistic
• Options
• Classification Cutoff = .4, Continue, OK
Effect of Lowering Cutoff
• YOU calculate the Sensitivity, Specificity,
False Positive Rate, and False Negative
Rate for the model with the cutoff at .4.
• Fill in the table on page 15 of the handout.
Answer Key

Value When Cutoff = .5 .4


Sensitivity 58% 75%
Specificity 81% 72%
False Positive Rate 32% 36%
False Negative Rate 26% 19%
Overall % Correct 72% 73%
SAS Rules
• See, on page 16 of the handout, how easy
SAS makes it to see the effect of changing
the cutoff.
• SAS classification tables remove bias
(using a jackknifed classification
procedure), PASW does not have this
feature.
Presenting the Results
• See the handout.
Interaction Terms
• Center continuous variables
• Compute the interactions terms or
• Let Logistic compute them.
Deliberation and Physical
Attractiveness in a Mock Trial
• Subjects are mock jurors in a criminal trial.
• For half the defendant is plain, for the
other half physically attractive.
• Half recommend a verdict with no
deliberation, half deliberate first.
Get the Data
• Bring Logistic2x2x2.sav into PASW.
• Each row is one cell in 2x2x2 contingency
table.
• Could do a logit analysis, but will do
logistic regression instead.
• Tell PASW to weight cases by Freq. Data,
Weight Cases:
• Dependent = Guilty.
• Covariates = Delib, Plain.
• In left pane highlight Delib and Plain.
• Then click >a*b> to create the interaction
term.
• Under Options, ask for the Hosmer-
Lemeshow test and confidence intervals
on the odds ratios.
Significant Interaction
• The interaction is large and significant
(odds ratio of .030), so we shall ignore the
main effects.
Variables in the Equation

95.0% C.I.for EXP(B)


Wald df Sig. Exp(B) Lower Upper
Step
a
Delib 3.697 1 .054 .338 .112 1.021
1 Plain 4.204 1 .040 3.134 1.052 9.339
Delib by Plain 8.075 1 .004 .030 .003 .338
Constant .037 1 .847 1.077
a. Variable(s) entered on step 1: Delib, Plain, Delib * Plain .
• Use Crosstabs to test the conditional
effects of Plain at each level of Delib.
• Split file by Delib.
• Analyze, Crosstabs.
• Rows = Plain, Columns = Guilty.
• Statistics, Chi-square, Continue.
• Cells, Observed Counts and Column
Percentages.
• Continue, OK.
Rows = Plain, Columns = Guilty
• For those who did deliberate, the odds of a
guilty verdict are 1/29 when the defendant
was plain and 8/22 when she was
attractive, yielding a conditional odds ratio
of 0.09483 .
a
Plain * Guilty Crosstabulation

Guilty
No Yes Total
Plain Attrractive Count 22 8 30
% within Plain 73.3% 26.7% 100.0%
Plain Count 29 1 30
% within Plain 96.7% 3.3% 100.0%
Total Count 51 9 60
% within Plain 85.0% 15.0% 100.0%
a. Delib = Yes
• For those who did not deliberate, the odds
of a guilty verdict are 27/8 when the
defendant was plain and 14/13 when she
was attractive, yielding a conditional odds
ratio of 3.1339.
a
Plain * Guilty Crosstabulation

Guilty
No Yes Total
Plain Attrractive Count 13 14 27
% within Plain 48.1% 51.9% 100.0%
Plain Count 8 27 35
% within Plain 22.9% 77.1% 100.0%
Total Count 21 41 62
% within Plain 33.9% 66.1% 100.0%
a. Delib = No
Interaction Odds Ratio
• The interaction odds ratio is simply the ratio of
these conditional odds ratios – that is, .
09483/3.1339 = 0.030.
• Among those who did not deliberate, the plain
defendant was found guilty significantly more
often than the attractive defendant, 2(1, N = 62)
= 4.353, p = .037.
• Among those who did deliberate, the attractive
defendant was found guilty significantly more
often than the plain defendant, 2(1, N = 60) =
6.405, p = .011.
Interaction Between Continuous
and Dichotomous Predictor
Interaction Falls Short of
Significance
Standardizing Predictors
• Most helpful with continuous predictors.
• Especially when want to compare the
relative contributions of predictors in the
model.
• Also useful when the predictor is
measured in units that are not intrinsically
meaningful.
Predicting Retention in ECU’s
Engineering Program
Practice Your New Skills
• Try the exercises in the handout.

You might also like