0% found this document useful (0 votes)
38 views

Lec 21

This document discusses sensitivity and specificity in the context of medical testing and diagnosis. It uses the example of testing for the autoimmune disease lupus. The lupus test has 98% sensitivity, meaning it correctly identifies 98% of people who actually have lupus. However, it only has 74% specificity, so it incorrectly identifies 26% of people without lupus as positive. Calculating the probabilities shows that for someone who tests positive, the probability they actually have lupus is only 7.14%. Sensitivity measures a test's ability to correctly identify positive results, while specificity measures its ability to correctly identify negative results. Both values are critical for understanding what a positive or negative test result

Uploaded by

Ukash sukarman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Lec 21

This document discusses sensitivity and specificity in the context of medical testing and diagnosis. It uses the example of testing for the autoimmune disease lupus. The lupus test has 98% sensitivity, meaning it correctly identifies 98% of people who actually have lupus. However, it only has 74% specificity, so it incorrectly identifies 26% of people without lupus as positive. Calculating the probabilities shows that for someone who tests positive, the probability they actually have lupus is only 7.14%. Sensitivity measures a test's ability to correctly identify positive results, while specificity measures its ability to correctly identify negative results. Both values are critical for understanding what a positive or negative test result

Uploaded by

Ukash sukarman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture 21 - Sensitivity, Specificity, and Decisions

Statistics 102

Colin Rundel

April 17, 2013


Odds Ratios

Example - Birdkeeping and Lung Cancer - Interpretation


Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.9374 1.8043 -1.07 0.2829
FMFemale 0.5613 0.5312 1.06 0.2907
SSHigh 0.1054 0.4688 0.22 0.8221
BKBird 1.3626 0.4113 3.31 0.0009
AG -0.0398 0.0355 -1.12 0.2625
YR 0.0729 0.0265 2.75 0.0059
CD 0.0260 0.0255 1.02 0.3081

Keeping all other predictors constant then,


The odds ratio of getting lung cancer for bird keepers vs non-bird
keepers is exp(1.3626) = 3.91.
The odds ratio of getting lung cancer for an additional year of
smoking is exp(0.0729) = 1.08.

What do these numbers mean in practice?

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 2 / 28


Odds Ratios

What do the numbers not mean ...

The most common mistake made when interpreting logistic regression is to


treat an odds ratio as a ratio of probabilities.

Bird keepers are not 4x more likely to develop lung cancer than non-bird
keepers.

This is the difference between relative risk and an odds ratio.

P(disease|exposed)
RR =
P(disease|unexposed)
P(disease|exposed)/[1 − P(disease|exposed)]
OR =
P(disease|unexposed)/[1 − P(disease|unexposed)]

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 3 / 28


Odds Ratios

Back to the birds

What is probability of lung cancer in a bird keeper if we knew that


P(lung cancer|no birds) = 0.05?

P(lung cancer|birds)/[1 − P(lung cancer|birds)]


OR =
P(lung cancer|no birds)/[1 − P(lung cancer|no birds)]

P(lung cancer|birds)/[1 − P(lung cancer|birds)]


= = 3.91
0.05/[1 − 0.05]

3.91 × 0.05
0.95
P(lung cancer|birds) = = 0.171
1 + 3.91 × 0.05
0.95

RR = P(lung cancer|birds)/P(lung cancer|no birds) = 0.171/0.05 = 3.41

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 4 / 28


Odds Ratios

Bird OR Curve

1.0
0.8
P(lung cancer | birds)

0.6
0.4
0.2


0.0

0.0 0.2 0.4 0.6 0.8 1.0

P(lung cancer | no birds)


Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 5 / 28
Odds Ratios

OR Curves

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 6 / 28


Sensitivity and Specificity

(An old) Example - House

If you’ve ever watched the TV show House on Fox, you know that Dr.
House regularly states, “It’s never lupus.”

Lupus is a medical phenomenon where antibodies that are supposed to


attack foreign cells to prevent infections instead see plasma proteins as
foreign bodies, leading to a high risk of blood clotting. It is believed that
2% of the population suffer from this disease.

The test for lupus is very accurate if the person actually has lupus,
however is very inaccurate if the person does not. More specifically, the
test is 98% accurate if a person actually has the disease. The test is 74%
accurate if a person does not have the disease.

Is Dr. House correct even if someone tests positive for Lupus?

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 7 / 28


Sensitivity and Specificity

(An old) Example - House

Lupus? Result
positive, 0.98
0.02*0.98 = 0.0196
yes, 0.02
negative, 0.02
0.02*0.02 = 0.0004

positive, 0.26
0.98*0.26 = 0.2548
no, 0.98
negative, 0.74
0.98*0.74 = 0.7252

P(+, Lupus)
P(Lupus|+) =
P(+, Lupus) + P(+, No Lupus)
0.0196
= = 0.0714
0.0196 + 0.2548
Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 8 / 28
Sensitivity and Specificity

Testing for lupus

It turns out that testing for Lupus is actually quite complicated, a


diagnosis usually relies on the outcome of multiple tests, often including: a
complete blood count, an erythrocyte sedimentation rate, a kidney and
liver assessment, a urinalysis, and or an antinuclear antibody (ANA) test.

It is important to think about what is involved in each of these tests (e.g.


deciding if complete blood count is high or low) and how each of the
individual tests and related decisions plays a role in the overall decision of
diagnosing a patient with lupus.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 9 / 28


Sensitivity and Specificity

Testing for lupus

At some level we can view a diagnosis as a binary decision (lupus or no


lupus) that involves the complex integration of various explanatory
variables.

The example does not give us any information about how a diagnosis is
made, but what it does give us is just as important - the sensitivity and
the specificity of the test. These values are critical for our understanding
of what a positive or negative test result actually means.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 10 / 28


Sensitivity and Specificity

Sensitivity and Specificity

Sensitivity - measures a tests ability to identify positive results.

P(Test + | Conditon +) = P(+|lupus) = 0.98

Specificity - measures a tests ability to identify negative results.

P(Test − | Condition −)
= P(−|no lupus) = 0.74

It is illustrative to think about the extreme cases - what is the sensitivity


and specificity of a test that always returns a positive result? What about
a test that always returns a negative result?

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 11 / 28


Sensitivity and Specificity

Sensitivity and Specificity (cont.)

Condition Condition
Positive Negative

Test False Positive


True Positive
Positive (Type 1 error)

Test False Negative


True Negative
Negative (Type II error)

Sensitivity = P(Test + | Condition +) = TP/(TP + FN)


Specificity = P(Test − | Condition −) = TN/(FP + TN)
False negative rate (β) = P(Test − | Condition +) = FN/(TP + FN)
False positive rate (α) = P(Test + | Condition −) = FP/(FP + TN)

Sensitivity = 1 − False negative rate


Specificity = 1 − False positive rate
Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 12 / 28
Sensitivity and Specificity

So what?

Clearly it is important to know the Sensitivity and Specificity of test (and


or the false positive and false negative rates). Along with the incidence of
the disease (e.g. P(lupus)) allows us to calculate useful quantities like
P(lupus|+).

Additionally, our brief foray into power analysis before the first midterm
should also give you an idea about the trade offs that are inherent in
minimizing false positive and false negative rates (increasing power
required either increasing α or n).

How should we use this information when we are trying to come up with a
decision?

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 13 / 28


ROC curves

Back to Spam

In lab this week, we examined a data set of emails where we were


interesting in identifying the spam messages. We examined different
logistic regression models to evaluate how different predictors influenced
the probability of a message being spam.

These models can also be used to assign probabilities to incoming


messages (which is equivalent to prediction in the case of SLR / MLR).
However, if we were designing a spam filter this would only be half of the
battle, we would also need to use these probabilities to make a decision
about which emails get flagged as spam.

While not the only possible solution, we will consider a simple approach
where we choose a threshold probability and any email that exceeds that
probability is flagged.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 14 / 28


ROC curves

Picking a threshold

Not Spam Spam


1 (spam)

False Neg (n=340) True Pos (n=27)


0 (not spam)

True Neg (n=3545) False Pos (n=9)

0.0 0.2 0.4 0.6 0.8 1.0

Predicted probability

Lets see what happens if we pick our threshold to be 0.75.


Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 15 / 28
ROC curves

Consequences of picking a threshold

For our data set picking a threshold of 0.75 gives us the following results:
FN = 340 TP = 27
TN = 3545 FP = 9

What are the sensitivity and specificity for this particular decision rule?

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 16 / 28


ROC curves

Trying other thresholds


Not Spam Spam
1 (spam)

False Neg (n=340) True Pos (n=27)


0 (not spam)

True Neg (n=3545) False Pos (n=9)

0.0 0.2 0.4 0.6 0.8 1.0

Predicted probability

Threshold 0.75 0.625 0.5 0.375 0.25


Sensitivity 0.074 0.106 0.136 0.305 0.510
Specificity 0.997 0.995 0.995 0.963 0.936
Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 17 / 28
ROC curves

Relationship between Sensitivity and Specificity

Threshold 0.75 0.625 0.5 0.375 0.25


Sensitivity 0.074 0.106 0.136 0.305 0.510
Specificity 0.997 0.995 0.995 0.963 0.936

1.0 ●



0.8

●●

Sensitivity

0.6


●●

0.4




●●
0.2






0.0



0.0 0.2 0.4 0.6 0.8 1.0

1 − Specificity

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 18 / 28


ROC curves

Receiver operating characteristic (ROC) curve

1.0
0.8
True positive rate

0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate


Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 19 / 28
ROC curves

Receiver operating characteristic (ROC) curve (cont.)

Why do we care about ROC curves?


Shows the trade off in sensitivity and specificity for all possible
thresholds.
Straight forward to compare performance vs. chance.
Can use the area under the curve (AUC) as an assessment of the
predictive ability of a model.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 20 / 28


ROC curves

Refining the Spam model


g_refined = glm(spam ~ to_multiple+cc+image+attach+winner
+password+line_breaks+format+re_subj
+urgent_subj+exclaim_mess,
data=email, family=binomial)
summary(g_refined)

Estimate Std. Error z value Pr(>|z|)


(Intercept) -1.7594 0.1177 -14.94 0.0000
to multipleyes -2.7368 0.3156 -8.67 0.0000
ccyes -0.5358 0.3143 -1.71 0.0882
imageyes -1.8585 0.7701 -2.41 0.0158
attachyes 1.2002 0.2391 5.02 0.0000
winneryes 2.0433 0.3528 5.79 0.0000
passwordyes -1.5618 0.5354 -2.92 0.0035
line breaks -0.0031 0.0005 -6.33 0.0000
formatPlain 1.0130 0.1380 7.34 0.0000
re subjyes -2.9935 0.3778 -7.92 0.0000
urgent subjyes 3.8830 1.0054 3.86 0.0001
exclaim mess 0.0093 0.0016 5.71 0.0000

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 21 / 28


ROC curves

Comparing models

1.0
0.8
True positive rate

0.6
0.4
0.2

Full (AUC: 0.891)


0.0

Refined (AUC: 0.855)

0.0 0.2 0.4 0.6 0.8 1.0

False positive rate


Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 22 / 28
Utility Functions

Utility Functions

There are many other reasonable quantitative approaches we can use to


decide on what is the “best” threshold.

If you’ve taken an economics course you have probably heard of the idea of
utility functions, we can assign costs and benefits to each of the possible
outcomes and use those to calculate a utility for each circumstance.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 23 / 28


Utility Functions

Utility function for our spam filter

To write down a utility function for a spam filter we need to consider the
costs / benefits of each out.

Outcome Utility
True Positive 1
True Negative 1
False Positive -50
False Negative -5

U(p) = TP(p) + TN(p) − 50 × FP(p) − 5 × FN(p)

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 24 / 28


Utility Functions

Utility for the 0.75 threshold

For the email data set picking a threshold of 0.75 gives us the following
results:
FN = 340 TP = 27
TN = 3545 FP = 9

U(p) = TP(p) + TN(p) − 50 × FP(p) − 5 × FN(p)


= 27 + 3545 − 50 × 9 − 5 × 340 = 1422

Not useful by itself, but allows us to compare with other thresholds.

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 25 / 28


Utility Functions

Utility curve


−2e+04
−6e+04
U

−1e+05

0.0 0.2 0.4 0.6 0.8 1.0

p
Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 26 / 28
Utility Functions

Utility curve (zoom)


1600
U[p > 0.6]

1400


1200

0.6 0.7 0.8 0.9 1.0

p[p > 0.6]


Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 27 / 28
Utility Functions

Maximum Utility
1 (spam)
0 (not spam)

0.0 0.2 0.4 0.6 0.8 1.0

Predicted probability

Statistics 102 (Colin Rundel) Lec 21 April 17, 2013 28 / 28

You might also like