Statistical Analysis Data Treatment and Evaluation
Statistical Analysis Data Treatment and Evaluation
Confidence Interval
In most quantitative chemical analyses, the true value of the mean, µ, cannot be
determined because a huge number of measurements (approaching infinity) would be
required.
However, the interval surrounding the experimentally determined mean, x, can be
determined within which the population mean µ is expected to lie with a certain degree of
probability. This interval is known as the confidence interval. The limits of the interval
are called confidence limits.
The probability that a result is outside the confidence interval is often called the
significance level.
If we make a single measurement x from a distribution of known σ, we can say that the true
mean should lie in the interval x ± zσ with a probability dependent on z.
However, we rarely estimate the true mean from a single measurement. Instead, we use
the experimental mean of N measurements as a better estimate of µ.
If population standard deviation (σ) is unknown, the given is sample standard deviation (s)
and the sample (n) is more than 30. Use the following formula:
( )
Cl for μ= x̄ ± z
s
√n
Values of z at various confidence levels:
Example: (follow these solutions so we can have uniform answers, for the final answers use the unit of
the mean)
1. I randomly select 25 students’ Math SAT scores and find =600. I know that σ from this
population is 50. Find a 95% Confidence Interval and interpret.
1.96 ( 50 )
95 % Cl=600± =600 ±19.6
√ 25
Upper limit :
95 %Cl=600+ 19.6=619.6
Lower limit :
95 % Cl=600−19.6=580.4
580.4< μ <619.6 Math SAT scores
Interpretation:
Therefore, from the experimental mean, we conclude that there is a 95% chance that the μ lies
between the interval of 619.6 and 580.4 Math SAT scores.
2. You sample 12 bugs and find the sample mean is 2.40 cm. You are told that σ=0.2 cm. Find a
95% Confidence Interval and interpret.
1.96 ( 0.2 )
95 % Cl=2.40± =2.40 ±0.11 cm
√ 12
Upper limit :
95 %Cl=2.40+ 0.11=2.51cm
Lower limit :
95 % Cl=2.40−0.11=2.29 cm
2.29< μ< 2.51cm
Interpretation:
Therefore, from the experimental mean, we conclude that there is a 95% chance that the μ lies
between the interval of 2.29 and 2.51 cm.
3. The National Center for Education Statistics surveyed 4400 college graduates about the lengths
of time required to earn their bachelor’s degrees. The mean was 5.15 years and the standard
deviation was 1.68 years. Based on the above information, construct a 98% confidence interval for
the mean time required to earn a bachelor’s degree by all college students.
Given: n= 4400 x̄ = 5.15 s= 1.68 z value for 98% Cl= 2.33
( √sn )
Cl for μ= x̄ ± z
98 % Cl=5.15 ± 2.33
( √1.68
4400 )
=5.15 ± 0.06
upper limit :
¿ 5.15+0.06=5.21
lower limit :
¿ 5.15−0.06=5.09
5.09< μ<5.21 years
Interpretation:
Therefore, we conclude that from the survey mean, there is a 98% chance that the µ lies between
the interval of 5.09 and 5.21 years.
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA) is a method for testing the hypothesis that there is no
difference between two or more population means.
The ANOVA technique enables us to perform the simultaneous test and as such is
considered to be an important tool of analysis in the hands of a researcher.
The significance of the difference pf the means of the two samples can be judged through
either z-test or t-test.
Z-test is applied to find out the degree of reliability of a statistics in case of large sample.
Z-test is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly means.
T-test is used to test the null hypothesis that the population means of to groups are the same
t-test with two samples is commonly used with small sample sizes, testing the difference
between the samples when the variance of two norma distributions are not known.
T-test is also used for judging the significance of the coefficients of simple and partial
correlations.
ANOVA Concepts
In ANOVA procedures, difference in several population means is obtained by comparing the
variances. For comparing I population means m 1, m2,…mI, the null hypothesis H0 is of the form
H0= μ1 = μ2 = μ3 = …. = μI
The alternative hypothesis Ha: at least two of the mi’s are different.
Assumptions in ANOVA
The experimental errors of the data are normally distributed.
Equal variances between treatments (i.e. Homogeneity of variances)
Independence of sample (i.e. each sample is randomly selected and independent
ANOVA Techniques:
One-way ANOVA
Is the simplest type of ANOVA, in which only one source of variation, or factor is
investigated.
It is an extension to three or more samples of the t-test procedure for use with two
independent samples.
Two-way ANOVA
Is used when the data are classified on the basis of two factors. For example, the
agricultural output may be classified on the basis of different varieties of seeds and also on
the basis of different varieties of fertilizers used.
A statistical test used to determine the effect of two normal predictor variables on a
continuous outcome variable.
Two-way ANOVA test analyzes the effect of the independent variables on the expected
outcome long with the relationship to the outcome itself.
Detection of gross errors
An outlier is a result that is quite different from the others in the data set.
It is important to develop a criterion to decide whether to retain or reject the outlying data
point.
The choice of criterion for the rejection of a suspected result has its perils. If the standard is
too strict so that it is quite difficult to reject a questionable result, there is a risk of retaining a
spurious value that has an inordinate effect on the mean.
If we set a lenient limit and make the rejection of a result easy, we are likely to discard a
value that rightfully belongs in the set, thus introducing bias to the data.
While there is no universal rule to settle the question of retention or rejection, the Q
test is generally acknowledged to be an appropriate method for making the decision.
Q-test
The Q test is a simple, widely used statistical test for deciding whether a suspected result
should be retained or rejected.
In this test, the absolute value of the difference between the questionable result x q and its
nearest neighbor xn is divided by the spread w of the entire set to give the quantity Q:
Example:
The analysis of a city drinking water for arsenic yielded vales of 5.60, 5.64, 5.70, 5.69, and 5.81 ppm. The
last value appears anomalous, should it be rejected at the 95% confidence level?
First, arrange the given values: 5.60, 5.64, 5.69, 5.70, 5.81 ppm
To solve for Q, calculate the absolute value of the difference between the questionable result (xq)(which is the
last value) and its nearest neighbor (xn) divided by the spread (w) (w= maximum value – minimum value) of
the entire set to give the quantity Q:
5.60, 5.64, 5.69, 5.70, 5.81 ppm
Nearest neighbor Questionable value
x q− X n
Q=
w
5.81−5.70 0.11
¿ = =0.52
5.81−5.60 0.21
Interpretation:
For 5 measurements, Qcrit at the 95% confidence level is 0.710. Because 0.52 < 0.710, we must retain the
outlier at the 95% confidence level.