Analysis of Categorical Data
Analysis of Categorical Data
by Ken Black
Chapter 12
Analysis of
Categorical Data
Copyright2011
Copyright 2011John
JohnWiley
Wiley&&Sons,
Sons,Inc.
Inc. 1
Learning Objectives
Explain the purpose of regression analysis and the meaning
of independent versus dependent variables.
Compute the equation of a simple regression line from a
sample of data, and interpret the slope and intercept of the
equation.
Estimate values of Y to forecast outcomes using the
regression model.
Understand residual analysis in testing the assumptions
and in examining the fit underlying the regression line.
Compute a standard error of the estimate and interpret
its meaning.
Compute a coefficient of determination and interpret it.
Test hypotheses about the slope of the regression model
and interpret the results.
Copyright 2011 John Wiley & Sons, Inc. 2
Correlation
Correlation is a measure of the degree of relatedness of
variables.
Coefficient of Correlation (r) - applicable only if both
variables being analyzed have at least an interval level
of data.
X X Y Y
X X Y Y
2 2
X Y
1 r 1
XY n
2 X
2
Y 2
Y 2
X
n n
r<0 r>0
r=0
X Y
XY
n
r
X 2
Y
2
X n Y n
2 2
92.93 2725
21,115.07
12
720.22 92 .93
2
619,207 2725
2
12 12
.815
Copyright 2011 John Wiley & Sons, Inc. 8
Computation of r
Economics Example (Part 2)
Is r = 0.815 high or low?
What can we conclude about the variables of
interest?
yˆ b0 b1 x
where : b0 = the sample intercept
b1 = the sample slope
yˆ = the predicted value of y
X Y
X X Y Y XY nXY XY
n
b
X X X n X
2 2 2
X
1 2
X 2
n
Y X
b Y b X n b n
0 1 1
SSXY X X Y Y XY
X Y
n
2
SSXX X X
2
X 2
X
n
SSXY
b1 SSXX
Y X
b Y b X n b n
0 1 1
X = 930 Y = 56.69 X 2
= 73,764 XY = 4,462.22
( X ) 2
( 930 ) 2
SS XX X 2
n
73 , 764
12
1689
SS 68 . 745
b1 XY
. 0407
SS XX 1689
b
Y
b1
X
56 . 69
(. 0407 )
930
1 . 57
0
n n 12 12
Y ˆ 1 . 57 . 0407 X
R Square 0.89908386
Observations 12
ANOVA
df SS MS F Significance F
Total 11 3.11209
Analysis of Variance
Source DF SS MS F P
Regression 1 2.7980 2.7980 89.09 0.000
Residual Error 10 0.3141 0.0314
Total 11 3.1121
Number of Predicted
Passengers Cost ($1,000) Value Residual
X Y Ŷ Y Yˆ
(Y Yˆ ) .001
Sum of Squares Error
2
SSE Y Y
Y b0 Y b1 XY
2
Standard Error
of the
Estimate SSE
Se n 2
Number of
Passengers Cost ($1,000) Residual
X Y Y Yˆ ( Y Yˆ )2
( Y Yˆ ) .001 ( Y Yˆ )2 =.31434
SSE = 0.3141
Sum of Squares
Y Yˆ
Error 2
SSE
Standard 0.31434
Error of the
SSE
Estimate
Se n 2
0.31434
10
0.1773
Copyright 2011 John Wiley & Sons, Inc. 34
Standard Error of the Estimate for
the Airline Cost Example
SSYY Y Y Y
2 2
Y
2
n
SSYY exp lained var iation un exp lained var iation
SSYY SSR SSE
SSR SSE
1
SSYY SSYY
2 SSR
r SSYY
SSE
1
SSYY
SSE
1
2
Y
2
0r 1
Y n
2
SSE 0.31434
SSYY Y
2
Y
2
270.9251
56.69
2
3.11209
n 12
SSE
r 1
2
89.9% of the variability
SSYY of the cost of flying a
.31434 Boeing 737 is accounted for
1 by the number of passengers.
3.11209
.899
H 0: 1 0 SSE
S
H 1: 1 0 n2
e
SSXX
X
2
H 0: 1 0
2
X
n
H 1: 1 0 1
the hypothesized slope
df n 2
Copyright 2011 John Wiley & Sons, Inc. 42
Hypothesis Test: Airline Cost Example
df n 2 10 2 10
H 0: 1 0 .05
t 2.228
H 1: 1 0
.025,10
If | t | 2.228, reject H0
If 2.228 t 2.228, do not reject H0
Note:
P-value = 0.000
H 0: 1 0 dfreg k 1
dferr n k 1 12 1 1 10
H 1: 1 0 .05
F .05,1,10
4.96
IfF 4.96, reject H0
If F 4.96, do not reject H0
Note:
P-value = 0.000
Yˆ 1.57 0.0407 X
For X 73,
Yˆ 1.57 0.0407 73
4.5411 or $4,541.10
ˆ 1 x0 x
Y t ,n2 S e
2 n SSXX
where : x0 a particular value of x
x
2
SSXX= x 2
n
For x0 73 and a 95% confidence level,
73 77.5
2
1
4.5411 2.228 0.1773
930
2
12
73, 764
12
4.5411 1220
4.4191 E Y 73 4.6631
X Confidence Interval
2
1 x0 x
Yˆ t ,n 2 S e 1
2 n SSXX
where : x0 a particular value of x
x
2
SSXX= x
2