Unit 03 - Producing Data - 4 Per Page
Unit 03 - Producing Data - 4 Per Page
3
A study shows that there is a positive correlation
between hospital size (number of beds, X) and median (a) Association between X and Y (partially) due to “X causes Y”
number of days, Y, patients stay in hospital. Does this (b) Association between X and Y (partially) explained by a
mean that you can shorten a hospital stay by choosing “lurking variable” (Z)
a small hospital? (c) Association between X and Y is mixed up with and cannot be
distinguished from the effect of an additional variable (Z)
33 4 4
1
7/2/2012
2
7/2/2012
3
7/2/2012
4
7/2/2012
Control group
Earliest controlled medical experiment - 1747
• Control - 1st principle of experimental design
• James Lind studied six strategies for treating scurvy on
• In a “controlled experiment”, two or more groups of HMS Salisbury attempting to control for other factors
individuals (subjects, experimental units) are compared
• “I took 12 patients in the scurvy on board the Salisbury at
• Treatment group: subjects receive a specific intervention sea. The cases were as similar as I could have them…
• Control group (aka, comparison group): subjects do not They lay together in one place and had one diet common
to them all… To two of them was given a quart of cider a
receive the specific intervention and are compared to the day, to two an elixir of vitriol, to two vinegar, to two sea
treatment group water, to two oranges and lemons, and to the remaining
• Controlled comparisons allow us to eliminate (or reduce) two an electuary recommended by a ship’s surgeon. The
most sudden and visible good effects were perceived from
effects of selection of subjects, placebo effects and the use of oranges and lemons, one of those who had taken
potential biases (systematic favoring of a certain outcome) them being at the end of six days fit for duty… The other
• If studies are uncontrolled, results may be meaningless was appointed nurse for the sick.”
19 1 20 2
9 0
5
7/2/2012
21 2 22 2
1 2
6
7/2/2012
45 46 77 17 09 77 55 80 00 95 32 86 32 94 85 82 22 69 00 56
27 2 28 2
7 8
7
7/2/2012
8
7/2/2012
Materia l
2 3 4 5
bo y
6 7 8 9
• By confining treatment comparisons to within
Wear differences computed Wear differences computed such blocks, greater precision can be obtained
between groups within boys (i.e. smaller σ for comparisons within each block)
Average difference same in two designs, but variance is
much smaller in matched-pairs design – more precision
33 3 34 3
3 4
9
7/2/2012
37 3 38 3
7 8
39 3 40 4
9 0
10
7/2/2012
Blinding
• Blinding: comparison of treatments can be distorted if
subjects or persons administering or evaluating
treatment know which treatment is being allocated –
especially for subjective endpoints
• Doctors want new treatments to work
• Patients want to please their doctors
• Blinding avoids many sources of unconscious biases
11
7/2/2012
45 4 46 4
5 6
12
7/2/2012
49 4 50 5
9 0
Multistage Sampling
Bias: cautions for sample surveys
• In multistage sampleing, units are randomly selected at
different levels of what you define as an individuals. 1) Selection bias: some groups in population are
• Expample: in national surveys often indiv's are chosen by: over- or under-represented in sample
– first randomly select states
(the sampling frame is limited)
– then randomly counties within those states, 2) Non-response bias: non-respondents may differ
– then randomly neighborhoods within those in important ways from respondents
counties (individuals choose not to respond)
• Reason: its much easier to survey groups of people rather
than just one person/unit that could be very far away. 3) Response bias: e.g., wording of questions,
• Works best if the higher levels that are sampled (states, telescoping in the recall of events
counties, etc...) are good representations of the entire
population themselves
51 5 52 5
1 2
13
7/2/2012
1936 Literary Digest Poll What went wrong with the Digest’s Poll?
• Literary Digest had predicted the winner of every US Selection bias and non-response bias
presidential election since 1916
• In 1936, Literary Digest mailed questionnaires to 10 • Selection bias: people surveyed
million people came from telephone books, club
memberships, mail order lists,
• 2.4 million people responded - the largest number of automobile ownership lists
people ever replying to a poll (more affluent households during
• When publishing the 1936 results, the Digest wrote: depression year)
“We make no claim to infallibility. We did not coin the phrase • Non-response bias: 76% did not respond
“uncanny accuracy” which has been so freely applied to
• The Gallup Poll correctly predicted Roosevelt's
our polls.”
victory with a sample of 50,000 people
• Prediction: Roosevelt 43%, Landon 57% (1/50th size of Digest’s Poll)
• Actual result: Roosevelt 62%, Landon 38%
53 5 54 5
3 4
• Do you think smokers have the right to impose their • Principles of experimental design
filthy habits on the rest of us, polluting our precious air? • Control, Randomization, Replication
• Blocking
• Placebos
Sample surveys
14
7/2/2012
Sampling distribution
Parameters and statistics • What would happen if a sample (or an experiment) were
repeated many times? (a “thought experiment”)
Parameter: number that describes the population • Take repeated samples of the same size from the same
population:
Statistic: number that describes a sample
– 1st sample, calculate the statistic of interest
– 2nd sample, calculate the statistic of interest
Statistical inference: use information from a
sample (a statistic) to make an inference about and so on . . .
the larger population (a population parameter) • The statistic will vary from sample to sample
• The theoretical sampling distribution of a statistic is the
Sample Population distribution of values taken by the statistic in all possible
(partial information) samples of the same size from the same population
• The sampling distribution often has a predictable pattern
57 5 58 5
7 8
15
7/2/2012
Repeat the process for surveys of size n = 2,500 The major concept of statistical inference
Note: the variability decreases as the sample size increases • A sampling distribution characterizes the behavior of
a statistic
• A sampling distribution is inherently unobservable,
because there will (in almost all cases) be only one
survey, or one experiment, or one observational study
• Probability theory provides tools for calculating the
theoretical form of a sampling distribution
• Understanding the behavior of a statistic under
(hypothetical) repeated samplings (the sampling
distribution) helps understand the precision and
reliability of the statistic
Entire population
61 6 62 6
1 2
63 6 64 6
3 4
16
7/2/2012
65 6 66 6
5 6
17