Contingency Tables (Chi Square)
Contingency Tables (Chi Square)
(CHI SQUARE)
Learning Objectives
Contingency
Contingency tables
tables are
are helpful
helpful
to:
to:
•Perform
•Perform statistical
statistical significance
significance
testing
testing on
on count
count oror attribute
attribute data.
data.
•Allow
•Allow comparison
comparison of of more
more than
than
one
one subset
subset of
of data
data to
to help
help localize
localize
KPIV
KPIV factors.
factors.
IMPROVEMENT ROADMAP
Uses of Contingency Tables
Common Uses
Phase 1:
Measurement
Characterization
Conduct “ad hoc” training for your team prior to using the tool
Gather data, use the tool to test and then move on.
Keep it simple
AA contingency
contingency table table isis just
just another
another way way of of hypothesis
hypothesis testing.
testing. Just
Just like
like the
the
hypothesis
hypothesis testing
testing we we have
have learned
learned so so far,
far, we
we will
will obtain
obtain aa “critical
“critical value”
value” from
from aa
table 2
(χ2 in
table (χ in this
this case)
case) and and use
use itit as
as aa tripwire
tripwire for
for significance.
significance. We We then
then useuse the
the
calculate aa χχ2CALC
sample 2
sample datadata toto calculate value. Comparing this calculated value to our
CALC value. Comparing this calculated value to our
critical
critical value,
value, tells
tells us us whether
whether the
the data
data groups
groups exhibit
exhibit no
no significant
significant difference
difference (null
(null
hypothesis
hypothesis or or HHoo)) oror whether
whether one
one or or more
more significant
significant differences
differences exist
exist (alternate
(alternate
hypothesis
hypothesis or or HH11).).
Ho:
Ho:PP1=P2=P3…
1=P2=P3…
H1:
H1:One
Oneorormore
morepopulation
population(P)
(P)isissignificantly
significantlydifferent
different
χ2CRIT
μ
Ho H1
χ2CALC
So how do you build a Contingency Table?
•Define the hypothesis you want to test. In this example we have 3 vendors from
which we have historically purchased parts. We want to eliminate one and would
like to see if there is a significant difference in delivery performance. This is stated
mathematically as Ho: Pa=Pb=Pc. We want to be 95% confident in this result.
•We then construct a table which looks like this example. In this table we have
order performance in rows and vendors in columns.
Vendor A Vendor B Vendor C
•We ensure that we include both good and bad situations
Ordersso that the sum of each
On Time
Orders Late
column is the total opportunities which have been grouped into good and bad. We
then gather enough data to ensure that each cell will have a count of at least
5.
Collapse
Collapsethe
thetable
table
•If
•Ifwe
wewere
wereplacing
placingside
sidebets
betsininaanumber
numberof ofbars
barsand
and
wondered if there were any nonrandom factors at
wondered if there were any nonrandom factors at play. play.
We
Wegather
gatherdata
dataand
andconstruct
constructthethefollowing
followingtable:
table:
Bar #1 Bar #2 Bar #3
Won Money 5 7 2
Lost Money 7 9 4
•Since
•Sincebarbar#3
#3does
doesnot
notmeet
meetthe
the“5
“5or
ormore”
more”criteria,
criteria,we wedo do
not have enough data to evaluate that particular cell
not have enough data to evaluate that particular cell for for
Bar
Bar#3.
#3. This
Thismeans
meansthatthatwewemust
mustcombine
combinethethedata
data with
with
Bar #3 Collapsed into
that of another bar to ensure that we have significance.
that of another bar to ensure that we have significance.
Bar #2
This
This is referred to as “collapsing” the table. Theresulting
is referred to as “collapsing” the table. The resulting
collapsed table looks like the following:
collapsed table looks like the following:
•Calculate the percentage of the total contained in each row by dividing the row
total by the total for all the categories. For example, the Orders on Time row has a
total of 95. The overall total for all the categories is 116. The percentage for this
row will be row/total or 95/116 = .82.
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00
•.The row percentage times the column total gives us the expected occurrence for
each cell based on the percentages. For example, for the Orders on time row, .82
x32=26 for the first cell and .82 x 67= 55 for the second cell.
Actual Occurrences
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00
Expected Occurrences
Vendor A Vendor B Vendor C
Orders On Time 26 55
Orders Late
So how do you build a Contingency Table?
•Now we need to calculate the χ2 value for the data. This is done using the formula
(a-e)2/e (where a is the actual count and e is the expected count) for each cell. So,
the χ2 value for the first column would be (25-26)2/26=.04. Filling in the remaining
χ2 values we get:
Actual Occurrences Expected Occurrences
Vendor A Vendor B Vendor C Total Portion Vendor A Vendor B Vendor C
Orders On Time 25 58 12 95 0.82 Orders On Time 26 55 14
Orders Late 7 9 5 21 0.18 Orders Late 6 12 3
Column Total 32 67 17 116 1.00
Calculations ( χ 2 =(a-e) 2
/e) Calculated χ 2 Values
Vendor A Vendor B Vendor C Vendor A Vendor B Vendor C
2 2 2
Orders On Time (25-26) /26 (58-55) /55 (12-14) /14 Orders On Time 0.04 0.18 0.27
Orders Late (7-6) /6
2
(9-12) /12
2
(5-3) /3
2
Orders Late 0.25 0.81 1.20
Now what do I do?
Performing
Performingthe
theAnalysis
Analysis
theχχ2table.
•Determine 2
•Determinethe thecritical
criticalvalue
valuefrom
fromthe table. ToToget
getthe
the
value you need 3 pieces of data. The degrees
value you need 3 pieces of data. The degrees of freedom of freedom
are
areobtained
obtainedbybythe
thefollowing
followingequation;
equation;DF=(r-1)x(c-1).
DF=(r-1)x(c-1). In In
our case, we have 3 columns (c) and 2 rows
our case, we have 3 columns (c) and 2 rows (r) so our DF(r) so our DF
==(2-1)x(3-1)=1x2=2.
(2-1)x(3-1)=1x2=2.
•The
•Thesecond
secondpiece
pieceof ofdata
dataisisthe
therisk.
risk. Since
Sinceweweare arelooking
looking
for .95 (95%) confidence (and α
for .95 (95%) confidence (and α risk = 1 - confidence)we
risk = 1 - confidence) we
know the α risk will be .05.
know the α risk will be .05.
theχχ2table forαα==.05
•In 2
•Inthe table, ,we
wefindfindthat
thatthethecritical
criticalvalue
valuefor .05
and 2 DF to be 5.99. Therefore,
and 2 DF to be 5.99. Therefore, our χ CRIT our χ 22 = 5.99
CRIT = 5.99
calculatedχχ2value cellχχ2
•Our 2 2
•Ourcalculated valueisisthe thesumsumof ofthe
theindividual
individualcell
values.
values. For
Forourourexample
examplethis thisisis
ourχχ2CALC
.04+.18+.27+.25+.81+1.20=2.75. 2
.04+.18+.27+.25+.81+1.20=2.75. Therefore, Therefore,our =
CALC =
2.75.
2.75.
•We
•Wenownowhave
haveallallthe
thepieces
piecesto toperform
performour ourtest.
test. Our
OurHo:Ho:
isisχχ2CALC <<χχ2CRIT . . IsIsthis
2 2 true? Our data
this true? Our data shows shows
CALC CRIT
2.75<5.99,
2.75<5.99, therefore wefail
therefore we failto
toreject
rejectthethenull
nullhypothesis
hypothesis
that
that there is no significant difference betweenthe
there is no significant difference between thevendor
vendor
performance in this
performance in this area. area.
Contingency Table Exercise
We have a part which is experiencing high scrap. Your team thinks that since it is
manufactured over 3 shifts and on 3 different machines, that the scrap could be caused
(Y=f(x)) by an off shift workmanship issue or machine capability. Verify with 95%
confidence whether either of these hypothesis is supported by the data.
Construct
Construct aa
contingency
contingency table
table of
of
the
the data
data and
and interpret
interpret Actual Occurrences
the
the results
results for
for each
each Machine 1 Machine 2 Machine 3
data
data set.
set. Good Parts 100 350 900
Scrap 15 18 23
Actual Occurrences
Shift 1 Shift 2 Shift 3
Good Parts 500 400 450
Scrap 20 19 17
Learning Objectives