Lecture Slides Lecture 4
Lecture Slides Lecture 4
Lecture 4
Section A: The Sample Proportion as a Summary Statistic
Binary Data
Learning Objectives
Example 11
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population 206 patients responded
Summarize a binary outcome across a group of individual observations via the sample proportion Explain why, with binary data, the sample proportion is the only summary statistic (besides sample size n) necessary to describe characteristics of the sample Compute the sample proportion based on the results of a study
http://inclass.kaggle.com/
Example 1
Example 1
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population (pronounced p-hat!), given
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population Why the hat? To distinguish , the sample estimate from the underlying true (population) proportion p. (which can only be estimated)
may be called estimated proportion, estimated probability or estimated risk of responding to treatment
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Example 1
Example 1
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population The sample proportion is just a sample mean of 0/1 data
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population So with 206 of the 1,000 responding, we have xi=1 for 206 observations, and xi=0 for 794 observations. So
Generally, binary data values are given a value of x=1 for observations that have the outcome, and x=0 for observations that do not have the outcome.
Example 1
Example 1
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population
Quantifying variability: There is a formula for the standard deviation of binary data
10
Example 1
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population
Randomized Trial: HIV positive pregnant women randomized to receive AZT or placebo
Visual Displays?
2Connor
E, et al. Reduction of Maternal-Infant Transmission of Human Immunodeficiency Virus Type 1 with Zidovudine Treatment. New England Journal of Medicine (1994). 331(18); 1173-1180 12
11
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Results
From Abstract
Of the 363 births whose HIV status was assessed (up to 18 months after birth) , 53 infants were HIV infected.
3Green
B, et al. An Automated Intervention With Stepped Increases in Support to Increase Uptake of Colorectal Cancer Screening: A Randomized Trial. Annals of Internal Medicine (2013). 158(5); 301-307 14
13
From Abstract
From Abstract
15
16
Summary
For quantifying the distribution of binary outcomes in a sample (and hence estimating the distribution in the population from which the sample was taken), the sample proportion is paramount
not only summarizes the percentage (probability, risk) of outcomes among a sample, it gives information about the variability of individual sample observations and the sample percentiles is the sample mean of sample observations that take on the value of 1 for observations with the outcome and 0 for observations without the outcome
Section B: Comparing Binary Outcomes Between Two (or More) Populations Using Sample Results: Risk Difference and Relative Risk
17
18
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Learning Objectives
Example 11
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population 206 patients responded
Compute the risk difference and relative risk for comparing binary outcomes between two samples Interpret the risk difference and relative risk in a public health/personal health context Understand that the risk difference and relative risk will always agree in terms of direction, but can differ greatly in magnitude Understand that neither the risk difference alone, or the relative risk alone is sufficient to quantify the association of interest
http://inclass.kaggle.com/
19
20
Example 1
Example 1
Among the 1,000 subjects in the sample: 503 had CD4 counts < 250 at start of therapy, and 127 responded to therapy 497 had CD4 counts 250 at start of therapy, and 79 responded to therapy
2 X 2 Table Representation
Example 1
Example 1
Summary Measure 1: the difference in proportions (also called risk difference , or attributable risk)
Summary Measure 2: the ratio proportions (also called relative risk, or risk ratio)
Interpretation(s): 9% greater (absolute) response to therapy in CD4<250 group as compared to CD4 250 group 9% greater absolute risk of response to therapy in CD4<250 group as compared to CD4 250 group
Interpretation(s): Those in the CD4<250 group have 1.56 times the chances (risk) of responding to therapy as compared to CD4 250 group 56% greater relative risk of response to therapy in CD4<250 group as compared to CD4 250 group
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Randomized Trial: HIV positive pregnant women randomized to receive AZT or placebo
Results
2Connor
E, et al. Reduction of Maternal-Infant Transmission of Human Immunodeficiency Virus Type 1 with Zidovudine Treatment. New England Journal of Medicine (1994). 331(18); 1173-1180 25 26
Example 2
Results
Summary Measure 1: the difference in proportions (also called risk difference , or attributable risk)
Interpretation(s): 15% (absolute) reduction in HIV+ transmission to children born to mothers given AZT as compared to children born to mothers given placebo
(at 18 mos) HIV+ HIVAZT 13 167 180 Placebo 40 143 183 53 310 363
15% lower absolute risk of HIV+ transmission to children born to mothers given AZT
27
28
Example 2
Example 2
Summary Measure 2: the ratio proportions (also called relative risk, or risk ratio)
Risk Difference Versus Relative Risk: Substantive Interpretations Both measures use exact same information but give seemingly different results: (risk difference) 15% reduction in HIV transmission (relative risk) 68% reduction in HIV transmission Notice, both agree in terms of direction of association
Interpretation(s): Risk of mother/child HIV transmission for mothers given AZT is 0.32 times the chances (risk) of mother/child HIV transmission for mothers given placebo 68% lower relative risk of mother/child HIV transmission for mothers given AZT
29
30
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Example 2
Example 2
Risk Difference : Substantive Interpretation Can be interpreted as impact (assuming causation) at the population level For example: with this risk difference of -15% : In a population of 1,000 HIV pregnant positive women, wed expect to see 150 (15%) fewer mother/child transmissions if the 1,000 women were given AZT during pregnancy In a population of 50,000 HIV pregnant positive women, wed expect to see 7,500 (15%) fewer mother/child transmissions if the 50,000 women were given AZT during pregnancy
Relative Risk : Substantive Interpretation Can be interpreted as impact (assuming causation) at the individual level For example: with this relative risk of 0.32: The risk that a HIV+ mother who takes AZT during pregnancy transmits HIV to her child is 0.32 times her risk if she did not take AZT The risk that a HIV+ mother transmits HIV to her child is 68% lower if she takes AZT during pregnancy (as compared to if she were not taking AZT)
31
32
Example 3
From Abstract
Ridker P, et al. A Randomized Trial of Low-Dose Aspirin in the Primary Prevention of Cardiovascular Disease in Women. New England Journal of Medicine (2005). 352(13); 1293-1304
33
34
Example 3
Example 3
Risk Difference
Relative Risk
0.2 % (absolute) reduction in (10-year) risk of CVD for women on low-dose aspirin therapy compared to women not on low dose therapy In a population of 100,000 women, we would expect to see 0.002*100,000=200 fewer cases of CVD (developing within 10 years) if the women were given low-dose aspirin therapy
10-year risk of CVD for 0.92 for women on low-dose aspirin regimen is 0.92 times the risk for women given placebo
A women can reduce her personal risk of CVD (developing within 10 years) by 8% if she takes a low dose of aspirin daily
35
36
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
The risk difference and relative risk will always agree in term of the direction of estimated association If If If
It is possible to see a large effect with one measure, and a small effect with the other For example, if then decrease of 0.2% But : an absolute
37
38
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
From Abstract
With more than 2 categories, common practice is to designate one of the categories as he reference group, and present comparisons of all other categories to this reference While the choice of reference group is arbitrary, in many cases it is purposely chosen to highlight the substantive emphasis For example, for this article written in a US published journal, the primary question of interest may be how the other 11 countries compared to the United States (with secondary interest in how these countries compared to each other)
Hirth R, et al. Out-Of-Pocket Spending And Medication Adherence Among Dialysis Patients In Twelve Countries. Health Affairs (2008). 27(1); 89-101
43
44
Summary
Risk difference ( ) and relative risk are two different estimates of the magnitude and direction of association for binary outcomes (between groups These two estimates are based on the exact same inputs and will always agree in terms of the direction of association, but not necessarily magnitude The risk difference helps to quantify the potential impact of a treatment or exposure for a group of individuals The relative risk helps quantify the potential impact of a treatment or exposure for an individual Neither estimate alone is sufficient to tell the whole story
Section C: Comparing Binary Outcomes Between Two (or More) Populations Using Sample Results: The Odds Ratio
45
46
Example 1:
Example 1
Response to therapy in random sample of 1,000 HIV+ positive patients from a citywide clinical population
CD4 <250 Respond Not Respond 127 376 503 CD4 250 79 418 497 206 794 1,000
Summary Measure 1: the difference in proportions (also called risk difference , or attributable risk)
Summary Measure 2: the ratio of proportions (also called relative risk, or risk ratio)
http://inclass.kaggle.com/ 47 48
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Odds
What is odds? The (estimated) odds of an event is the (estimated) probability of the event occurring, divided by the (estimated) probability of it not occurring:
As risk (
49
50
Example 1:
Example 1:
What is odds? The (estimated) odds of an event is the (estimated) probability of the event occurring, divided by the (estimated) probability of it not occurring:
Odds Ratio: ratio of the odds of an event for two groups So for our data:
51
52
Example 1:
Odds Ratio: Interpretation The <250 CD4 count group has 1.75 times the odds of responding to therapy as the 250 CD4 count group The <250 CD4 count group has 75% greater odds of responding to therapy than the 250 CD4 count group
Randomized Trial: HIV positive pregnant women randomized to receive AZT or placebo
Odds Ratio: not a direct comparison of risks, but a comparison of a function of risks Relative Risk and odds ratio will always agree in terms of direction, but not always be the same value In this example, and
2Connor
E, et al. Reduction of Maternal-Infant Transmission of Human Immunodeficiency Virus Type 1 with Zidovudine Treatment. New England Journal of Medicine (1994). 331(18); 1173-1180 54
53
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Results
Results
55
56
Example 2
Example 2:
Summary Measure 1: the difference in proportions (also called risk difference , or attributable risk) Summary Measure 2: the ratio proportions (also called relative risk, or risk ratio)
Odds Ratio: Interpretation The AZT group has 0.27 times the odds (of HIV to child transmission) of the placebo group The AZT group has 73% lower odds of HIV to child transmission than the placebo group
Relative Risk versus Odds Ratio: in this example the relative risk and odds ratio are and
57
58
Both measures use exact same information but can give numerically different results : both will always agree in terms of direction of association ie:
As with the relative risk, the odds ratio can be interpreted as impact (assuming causation) at the individual level The odds ratio does not directly compare the probabilities (risks, proportions) of an outcome, but instead compares a function of risk: the odds
59
60
10
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
The smaller
and
From Abstract
Ridker P, et al. A Randomized Trial of Low-Dose Aspirin in the Primary Prevention of Cardiovascular Disease in Women. New England Journal of Medicine (2005). 352(13); 1293-1304
61
62
Example 3
Example 3
Risk Difference
Relative Risk
Odds Ratio
63
64
Example 3:
Example 3:
Odds Ratio: Interpretation The aspirin group has 0.92 times the odds (of developing CHD) of the placebo group The aspirin group has 8% lower odds of developing CHD than the placebo group
Odds Ratio: Interpretation The aspirin group has 0.92 times the odds (of developing CHD) of the placebo group The AZT group has 8% lower odds of developing CHD than the placebo group
Relative Risk versus Odds Ratio: in this example the relative risk and odds ratio estimates are identical in value, unlike the previous two examples
Relative Risk versus Odds Ratio: in this example the relative risk and odds ratio estimates are identical in value, unlike the previous two examples
65
66
11
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
In many ways, the odds ratio is less intuitive and less direct measure of association than the relative risk However: In some types of studies (case control, more details coming in term 2), the odds ratio is the only measure of association that can be estimate In logistic regression (also coming in term 2), the results are initially presented as odds ratios, and hence frequently presented as odds ratios in publications
With more than 2 categories, common practice is to designate one of the categories as he reference group, and present comparisons of all other categories to this reference While the choice of reference group is arbitrary, in many cases it is purposely chosen to highlight the substantive emphasis
67
68
From Abstract
Table 4
Onyike C, et al. Is Obesity Associated with Major Depression? Results form the Third National Health and Nutrition Examination Survey. American Journal of Epidemiology (2003). 158(11); 1138-1304
69
70
Table 4
Table 4
71
72
12
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Summary
The odds ratio, ,provides an alternative to the relative risk, , for quantifying the association between a binary outcome between groups The odds ratio Is ratio of odds between two groups: odds is related to risk (probability, proportion) The odds ratio and relative risk both estimate the association between a binary outcome between groups at the individual level These two measures will agree in terms of direction, but not always magnitude The smaller the risk in the groups being compared, the more similar and
73
74
Learning Objectives
Understand that the scaling of ratios is not symmetric around the value of 1 (which would indicate equal values in the numerator and denominator) Consider the implications of the previous point when interpreting size of association Understand the on the log scale (well use natural log, ln) the values of ln(ratios) are symmetric about the value 0
(at 18 mos) HIV+ HIVAZT 13 167 180 Placebo 40 143 183 53 310 363
75
76
Example 1
Example 1
Summary Measures:
(risk difference)15% (absolute) decrease in HIV transmission risk (relative risk) 68% (relative) decrease in HIV transmission risk (odds ratio) 73% (relative) reduction in HIV transmission odds
77
78
13
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Example 1
Example 1
(risk difference) 15% (absolute) increase in HIV transmission risk (relative risk) 210% (relative) increase in HIV transmission risk (odds ratio) 270 % (relative) increase in HIV transmission odds
As compared to AZT
79
80
Example 1
Recap
So why do these associations seem to differ in magnitude if the direction of comparison is reversed?? The range of possible values for positive and negative associations are very different
81
82
83
84
14
Lecture 4: Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Summary
On the ratio scale (relative risk or odds ratio), the range of possible values is 0 ratio < 1: for negative associations, i.e. where the group in the numerator has lower risk (and hence odds) than the group in the denominator 1 < ratio : for positive associations, i.e. where the group in the numerator has lower risk (and hence odds) than the group in the denominator On the ln(ratio) scale, the range of possible values is: < ln(ratio) < 0: for negative associations, i.e. where the group in the numerator has lower risk (and hence odds) than the group in the denominator 0 < ln(ratio) : for positive associations, i.e. where the group in the numerator has lower risk (and hence odds) than the group in the denominator
86
85
Summary
These properties of ratios and ln(ratios) have potential implications for: Displaying associations for different group comparisons Performing statistical inference on ratios
87
15