0% found this document useful (0 votes)
41 views

Lesson-2 1

Uploaded by

albao.elaine21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Lesson-2 1

Uploaded by

albao.elaine21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Polytechnic University of the Philippines

College of Science
Department of Mathematics and Statistics

CATEGORICAL DATA
ANALYSIS

Prepared by:
Ms. Katrina D. Elizon
Contingency Tables
A 2−way contingency table a.k.a. cross-tabulation
is simply a two-way array containing the joint
distribution of two categorical random variables.

Contingency tables can be used to display either


the joint frequency distribution or the joint
probability distribution.

A two-way table with I rows and J columns is called


an I × J (read I–by–J ) table.
Probabilities for contingency tables can be of three
types:

1. Join Distribution - {πi,j} = {P(X = i, Y = j)}


form the joint distribution of X and Y. They satisfy


πi,j = 1

2. Marginal Distribution - the row and column


totals of the joint probabilities.

3. Condit ional Dis tribut ion - ref ers to


probability Distribution of Y at xed level of x.
fi
Contingency Tables
Heart Attack
Group Total
Yes No
Placebo 179 10,843 11,022
Aspirin 105 10,932 11,037
Total 284 21,775 22,059

Heart Attack
Group Total
Yes No
Placebo
Aspirin
Total
Independence

X and Y are statistically independent if true


conditional distribution of Y is identical at each
level of x.
When two variables are independent, the
probability of any particular column outcome i is
the same in each row.

Statistical independence is, equivalently, the


property that all joint probabilities equal the
product of their marginal probabilities.
Several ways to compare probabilities in a
table:
1. Risk Difference
2. Relative Risk
3. Odds ratio
Several ways to compare probabilities in a table:
1. Risk Difference
The difference of proportions π1 − π2 compares
the success probabilities in the two rows.
It equals zero when π1 − π2, that is, when the
response is independent of the group
classi cation.
p1 and p2 denote the sample proportions of
successes.
The sample difference p1 − p2 estimates π1 − π2.
fi
Several ways to compare probabilities in a table:
1. Risk Difference
The estimated standard error of p1 − p2:

p1(1 − p1) p2(1 − p2)


SE = +
n1 n2
The standard error decreases, and hence the
estimate of π1 − π2 improves, as the sample sizes
increase.
100(1 − α) % (Wald) con dence interval for π1 − π2 is:
(p1 − p2) ± zα/2(SE)
fi
Question???
What are the possible values
obtained by taking the difference of
two probabilities?
Several ways to compare probabilities in a table:
2. Relative Risk
A difference between two proportions of a certain
xed size usually is more important when both
proportions are near 0 or 1 than when they are
near the middle of the range.
For 2 × 2 tables, the relative risk is the ratio:
π1
relative risk =
π2
A relative risk value of 1.0 corresponds to
independence.
fi
Question???
What are the possible values if we
divide two probabilities?
Question???
Do you think it is necessary to
identify one classi cation as a
response variable in order to
estimate the relative risk?
fi
Several ways to compare probabilities in a table:
3. Odds Ratio
For a probability of success π, the odds of success
are de ned to be
π
odds =
1−π
The odds of an event are simply the probability
of the event occurring divided by the probability
that the event does not occur.
The odds are nonnegative, with value greater
than 1.0 when a success is more likely than a
failure.
fi
Question???
If the probability of success is 0.80,
compute the value of odds? And
interpret the value.
Several ways to compare probabilities in a table:
3. Odds Ratio
The success probability itself is the function of the
odds,
odds
π=
odds + 1
In 2 × 2 tables, the ratio of the two odds from the
two rows,
odds1 π1 /(1 − π1)
θ= =
odds2 π2(1 − π2)
is the odds ratio.
Question???
What are the possible values if we
divide two odds ratio?
Question???
If X and Y are independent, what is
the value of the odds ratio?
Several ways to compare probabilities in a table:
3. Odds Ratio
The independence value θ = 1 is a baseline for
comparison.
When θ > 1, the odds of success are higher in row
1 than in row 2. (π1 > π2)
When θ < 1 , the odds of success is less likely in
row 1 than in row 2. (π1 < π2)
Values of θ farther from 1.0 in a given direction
represent stronger association.
Question???
What does it mean if the computed
odds ratio for a 2 × 2 table is θ = 3?
Question???
Calculate the odds ratios for tables 1 and 2. What
did you notice?
Table 1

Heart Attack
Group Total
Yes No
Placebo 179 10,843 11,022
Aspirin 105 10,932 11,037
Total 284 21,775 22,059

Table 2

Group
Group Total
Placebo Aspirin
Yes 179 105 284
No 10,843 10,932 21,775
Total 11,022 11,037 22,059
Question???
Calculate the odds ratio for table 1, and try to
reverse the order of the row or the order of the
column. What did you notice when we compared
the odds ratio to the previous result?

Table 1

Heart Attack
Group Total
Yes No
Placebo 179 10,843 11,022
Aspirin 105 10,932 11,037
Total 284 21,775 22,059
Several ways to compare probabilities in a table:
3. Odds Ratio
When both variables are response variables, the
odds ratio can be de ned using joint probabilities as
π11 /π12 π11π22
θ= =
π21 /π22 π12π21
The odds ratio is also called the cross-product
ratio, because it equals the ratio of the products
π11π22 a n d π12π21 o f c e l l p ro b a b i l i t i e s f ro m
diagonally opposite cells.
fi
Several ways to compare probabilities in a table:
3. Odds Ratio
The sample version of θ replaces πij ’s by pij ’s, or,
equivalently by nij’s:

̂ p11 p22 n11n22


θ= =
p12 p21 n12n21
Take Note!!!
When p1 and p2 are both close to
zero, the odds ratio and relative
risk take similar values.
For rare events (small risks):
odds ratio ≈ relative risk.
Exercise 1:
Consider the following two studies reported:
a. A study reported (January 3, 1990) that, of smokers who
get lung cancer, “women were 1.4 times more
vulnerable than men to get small-cell lung cancer.”
Is 1.4 an odds ratio, or a relative risk?

a. A National Cancer Institute study about tamoxifen and


breast cancer reported (June 23, 1995) that the women
taking the drug were 37% less likely to experience
invasive breast cancer compared with the women taking
placebo. Find the relative risk for:
(i) those taking the drug compared to those taking placebo,
(ii) those taking placebo compared to those taking the drug.
Exercise 2:
In the Philippines, the estimated annual probability
that a woman over the age of 32 dies of lung cancer
equals 0.001204 for current smokers and 0.000211
for nonsmokers

a. Calculate and interpret the difference of


proportions and the relative risk. Which is more
informative for these data? Why?

b. Calculate and interpret the odds ratio. Explain why


the relative risk and odds ratio take similar values.
Exercise 3:
A 20-year study of Filipino male physicians noted that the
proportion who died from lung cancer was 0.00151 per year
for cigarette smokers and 0.00009 per year for nonsmokers.
The proportion who died from heart disease was 0.00670 for
smokers and 0.00320 for nonsmokers.

a. Describe the association of smoking with lung cancer and


with heart disease, using the difference of proportions, the
relative risk, and the odds ratio. Interpret.

b. Which response (lung cancer or heart disease) is more


strongly related to cigarette smoking, in terms of the
reduction in deaths that could occur with an absence of
smoking?

You might also like