Lecture 2.2 - Statistics - Desc Stat and Distrib
Lecture 2.2 - Statistics - Desc Stat and Distrib
Lecture notes
The mean is used for normal The median is generally used for
Applicability skewed distributions.
distributions.
The median is better suited for skewed
The mean is not a robust tool distributions to derive at central
Robustness since it is largely influenced tendency since it is much more robust
by outliers. and sensible.
3 7 9 12 14 15 17 18 40
Number Summary
25% of all the numbers in the set are smaller than Q1
3 7 9 12 14 15 17 18 40
Number Summary
25% of all the numbers in the set are larger than Q3
What percent of all the numbers are between Q1 and
Q3?
50% of all the numbers are between Q1 and Q3
3 7 9 12 14 15 17 18 40
17 - 9 = 8
Find the mean and median of
the following set of numbers
(no outliers):
3 12 7 40 9 14 18 15 17
Median is 14 Median is 13
Standard deviation of…
• Sample – part of
• Population population
Estimates
the variation
in the population
- May not be representative
Actual variation in the sample
population
N
_
2
N
i
x 2 xi x
i 1
s
i 1
N 1
N
Why divide by N-1 when calculating “s”?
x i
X j 1 k k n
S ( x
2
k
i ji xj) 2
j 1 j 1 i 1
S2R = MSwithin + MSbetween MS within
k k (n 1)
k
n ( x j x) 2
j 1
MS between
k 1
Other ways of expressing the precision
of the data:
• Range R= xmax-xmin
• Median is the "middle number"
(in a sorted list of numbers).
• Variance Variance = s2
s
• Relative standard deviation RSD
x
• Percent RSD / coefficient of variation
s
%RSD 100
x
Measure of Asymmetry
(skewness)
• The measure of how a symmetric a
distribution can be is called skewness.
• Skewed to the Right: mean and the median
are both greater than the mode
• Skewed to the Left: the mean will be less
than the median
To express accuracy and precision
The center of
the target is
the true value.
• Mean (average)
• Percent error accuracy
• Range
• Deviation
• Standard deviation
precision
• Percent coefficient of variation
Data expressing using plot, box plot.
Calculating Statistical
Uncertainty
• Mean and standard deviation of set of independent
measurements (unknown errors, assumed
uniform): 1
x0
N
i
x i x;
1
2
xi x 2
N 1 i
• Standard deviation estimates the likely error of
any one measurement
• Uncertainty in the mean is what is quoted:
1/ 2
1 2
x x i x .
N N ( N 1) i
Propagating Uncertainties
• Functions of one variable (general formula):
df
F X
dx
• Specific cases:
x 2 2 xx or
x2
2
x
x2 x
x nx
n n 1
x or
xn
n
x
n
x x
sin x cos x x
1
ln x x
x
Propagating Uncertainties
• Functions of >1 variable (general formula):
2 2
f f
f 2
x y .
x y
• Specific cases:
i i
x 2
1 1
x i
1 i
2
2
i i
2
i
• Remember we are using the uncertainty in the
mean here:
i
N
2.5. Distribution of repeated measurements
• The pattern of variation of a variable is called its
distribution, which can be described both
mathematically and graphically.
• In essence, the distribution records all possible
numerical values of a variable and how often each
value occurs (its frequency).
• Can be either discrete or continuous
• Which statistical test is appropriate will depend
upon the distribution of your data.
Types of Distributions
Note that distributions can be either discrete or continuous
Characterised by:
_
x μ x x
z
σ s
Where
z = deviation from
the mean of a data
point stated in terms
of units of std dev.
Normal distribution
The standard deviation measures the width of the
Gaussian curve.
(The larger the value of σ, the broader the curve)
The more times you measure, the more confident you are that your
average value is approaching the “true” value.
The uncertainty decreases in proportion to 1/ n
t-distributions
Normal distribution
Required number of ts t 2s 2
replicate analyses: x n
n e2
e
µ = true population mean
x = measured mean
n = number of samples needed
s2 = variance of the sampling operation
e = sought-for uncertainty