Statistics - Lying Without Sinning?: - "Lies, Damned Lies, and Statistics"
Statistics - Lying Without Sinning?: - "Lies, Damned Lies, and Statistics"
1954
Statistics - Lying without sinning?
In North Dakota, 54 Million Beer Bottles by the side of the Road
April 01 2002
South Dakota's Pierre Capital Journal reports (Mar. 1) that "an average of 650
beer cans and bottles are tossed per mile of road annually." The statistic is
attributed to Dennis W. Brezina, an activist against drunk-driving.
But how did he come up with his data? According to the Journal, Brezina traveled
"highways across the nation to determine whether the problem he perceived was For more
widespread. He made two trips to South Dakota, one in 1998 and another in Check out
2000." He counted "cans and bottles in ditches in May of both years" and claimed
to have found an average of "one beer can or bottle every 16 feet when walking www.STATS.org
randomly selected stretches of ditch."
But the math appears a little blurry. The web site of the South Dakota Department
of Transportation claims that the state "has 83,472 miles of highways, roads and
streets." Assuming Brezina's estimate is correct, South Dakotans appear to be
world-class litterbugs, tossing aside approximately 54,256,800 bottles or cans
every year. According to the Census Bureau there are 754,844 people in South
Dakota. So, according to Brezina, the average resident throws at least 71 beer
bottles or cans on the side of the road every year.
Statistics for Quantitative Analysis
x i
Mean = x i
n
Normal distribution
• Degree of scatter (measure of central tendency)
of population is quantified by calculating the
standard deviation
( xi x ) 2
s i
n 1
• Characterize sample by calculating xs
Standard deviation and the
normal distribution
• Standard deviation defines
the shape of the normal
distribution (particularly
width)
Amount of Data
Standard deviations
68 %
95 %
99.7 %
Total % of the data covered by distribution
Example of mean and standard
deviation calculation
x = 5.826 nM 5.82 nM
s = 0.368 nM 0.36 nM
http://www.willamette.edu/~mjaneba/help/TI-85-stats.htm
http://www2.ohlone.edu/people2/joconnell/ti/
Relative standard deviation (rsd)
or coefficient of variation (CV)
s
rsd or CV = 100
x
• Standard error =
s
sx
•
n
Take twice as many measurements, s decreases by
Variance = s2
5.23 5.82 5.79 5.82 6.21 5.82 5.88 5.82 6.02 5.82
d
5
d 0.25 0.25 or 0.2 nM
• Quantifies how far the true mean () lies from the
measured mean, x. Uses the mean and standard
deviation of the sample.
ts
x
n
where t is from the t-table and n = number of
measurements.
Degrees of freedom (df) = n - 1 for the CI.
Example of calculating a
confidence interval
Consider measurement of dissolved Ti
in a standard seawater (NASS-3):
Data: 1.34, 1.15, 1.28, 1.18, 1.33,
1.65, 1.48 nM
DF = n – 1 = 7 – 1 = 6
x = 1.34 nM or 1.3 nM ts
s = 0.17 or 0.2 nM x
95% confidence interval n
t(df=6,95%) = 2.447
CI95 = 1.3 ± 0.16 or 1.3 ± 0.2 nM
50% confidence interval
t(df=6,50%) = 0.718
CI50 = 1.3 ± 0.05 nM
Interpreting the confidence interval
• For a 95% CI, there is a 95% probability that the true
mean () lies between the range 1.3 ± 0.2 nM, or
between 1.1 and 1.5 nM
average 0.4998
mg/mL frequency
stdev 0.01647
0.53 3
0.52 5
0.51 13
0.5 10
0.49 10 Let’s Graph the Data!
0.48 5
0.47 3
0.46 1
nitrate concentration
14
outlier
12
10
frequency
8
6 ± 1
4
± 2
2
0
0.44 0.46 0.48 0.5 0.52 0.54
g/mL
Confidence Interval Exercise
s
x t sm t
n
Calculate the 95, 98 and 99 % confidence intervals
95 % 0.500 ± 0.005
98 % 0.500 ± 0.006
99 % 0.500 ± 0.006
50 % 0.500 ± 0.002
0.500 ± 0.006 0.500± 0.006
Is the difference due to a systematic error (bias) in the method - or simply to random error?
known value x
t calc n
s
Will compare tcalc to tabulated value of t at appropriate
df and CL.
If |tcalc| < ttable, results are not significantly different at the 95% CL.
If |tcalc| ttable, results are significantly different at the 95% CL.
For this example, tcalc < ttest, so experimental results are not significantly
different at the 95% CL. THE NULL HYPOTHESIS IS MAINTAINED and no BIAS
at the 95 % confidence level.
Comparing replicate measurements or
comparing means of two sets of data
• Another application of the t statistic
• Example: Given the same sample analyzed by two
different methods, do the two methods give the “same”
result?
x1 x 2 n1 n2
t calc
s pooled n1 n2
s12 (n1 1) s22 (n2 1) (0.07 3 ) 2 (4 1) (0.12 ) 2 (4 1)
s pooled 0.0993
n1 n2 2 442
If |tcalc| ttable, results are not significantly different at the 95%. CL.
If |tcalc| ttable, results are significantly different at the 95% CL.
Since |tcalc| (5.056) ttable (2.447), results from the two methods are
significantly different at the 95% CL.
Comparing replicate measurements or
comparing means of two sets of data
• Uses F distribution
F-test to compare standard deviations
s12
Fcalc where s1 s2
s22
s12 (0.12 ) 2
Fcalc 2.70
s22 (0.07 3 ) 2
Note: Keep 2 or 3 decimal places to compare with F table.
Compare Fcalc to Ftable at df = (n1 -1, n2 -1) = 3,3 and 95% CL.
If Fcalc Ftable, std. devs. are not significantly different at 95% CL.
If Fcalc Ftable, std. devs. are significantly different at 95% CL.
Ftable(df=3,3;95% CL) = 9.28
Since Fcalc (2.70) < Ftable (9.28), std. devs. of the two sets of data are
not significantly different at the 95% CL. (Precisions are similar.)
Comparing replicate measurements or
comparing means of two sets of data-
revisited
x1 x2
tcalc
s12 / n1 s22 / n2
( s1 / n1 s2 / n2 )
2 2 2
DF 2 2
1 1( s / n ) 2
( s 2
/ n ) 2
2 2
n1 1 n2 1
Flowchart for comparing means of two
sets of data or replicate measurements
Use F-test to see if std.
devs. of the 2 sets of
data are significantly
different or not
Qcalc = gap/range
questionable _ value x
1.822 6
1.938 7
G calc 2.032 8
s
2.11 9
2.176 10
2.234 11
2.285 12
2.409 15
2.557 20
reject if Gcalc > G table
range 0.56
0.51
6
7
0.47 8
0.44 9
0.41 10
But note that 5 times in 100 it will be wrong to reject this suspect value!
Also note that if 0.380 is retained, s = 0.011 mg/l, but if it is rejected,
s = 0.0056 mg/l, i.e. precision appears to be twice as good, just by
rejecting one value.