0% found this document useful (0 votes)
77 views48 pages

Chapter 4

This document discusses statistics concepts for analytical chemistry. It begins with an overview of the Gaussian distribution and how experimental measurements contain variability. The document then outlines topics to be covered, including the Gaussian distribution, confidence intervals using Student's t-test, comparing means and standard deviations, outlier tests, calibration curves, and least squares regression. It provides examples of calculating the average, standard deviation, and coefficient of variation from a data set. It also explains how to calculate confidence intervals and compares two methods for measuring sulfur content based on their standard deviations.

Uploaded by

alex tomson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views48 pages

Chapter 4

This document discusses statistics concepts for analytical chemistry. It begins with an overview of the Gaussian distribution and how experimental measurements contain variability. The document then outlines topics to be covered, including the Gaussian distribution, confidence intervals using Student's t-test, comparing means and standard deviations, outlier tests, calibration curves, and least squares regression. It provides examples of calculating the average, standard deviation, and coefficient of variation from a data set. It also explains how to calculate confidence intervals and compares two methods for measuring sulfur content based on their standard deviations.

Uploaded by

alex tomson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Analytical Chemistry I

(CHEM 231)

Fall 2016
Dr. Marwa Elazazy
CH4: STATISTICS

Experimental measurements always


contain some variability, so no conclusion
can be drawn with certainty. Statistics
gives us tools to accept conclusions that
have a high probability of being correct
and to reject conclusions that do not.

2
Chapter Outline
4-1 Gaussian Distribution
4-2 Confidence Intervals
4-3 Comparisons of Means with Student’s t - test
4-4 Comparison of Standard Deviation with the F- test
4-6 Grubbs Test for an Outlier
4-7 The Method of Least Squares
4-8 Calibration Curves

4
4-1: Gaussian Distribution
• The results of many measurements of an experimental quantity follow a
Gaussian distribution.
• The measured mean, x, approaches the true mean, μ, as the number of
measurements becomes very large.
4.1 Gaussian Distribution
Gaussian distribution:
Theoretical bell-shaped distribution of measurements when all error is random.
 The center of the curve is the mean, μ, and the width is characterized
by the standard deviation, σ.
 Variation in experimental data is normally distributed when replicate
measurements exhibit the bell-shaped distribution in Figure 4-1.
 It is equally probable that a measurement will be higher or lower than
the mean. The probability of observing any value decreases as its
distance from the mean increases.
6
FIGURE 4-1  Bar graph and Gaussian curve describing the lifetimes of a hypothetical set of incandescent light bulbs. The
smooth curve has the same mean, standard deviation, and area as the bar graph. Any finite set of data, however, will
differ from the bell-shaped curve. The more measurements we make, the closer the results will come to the smooth
curve.

7
Light bulb lifetimes, and the corresponding Gaussian curve, are characterized by two
parameters.
•The arithmetic mean, also called the average, is the sum of the measured values divided
by n, the number of measurements. The mean gives the center of the distribution.

where xi is the lifetime of an individual bulb. The Greek capital sigma, Σ, means
summation: Σixi = x1 + x2 + x3 + … + xn. In Figure 4-1, the mean value is 845.2 h.

8
•The standard deviation, s, measures how closely the data are
clustered about the mean. The smaller the standard deviation,
the more closely the data are clustered about the mean. It
measures the width of the distribution. See equation below (4-
2)

9
For an infinite set of data, the mean is designated
by the lowercase Greek letter mu, μ (the
population mean), and the standard deviation is
written as a lowercase Greek sigma, σ (the
population standard deviation). We can never
measure μ and σ, but the values of x and s
approach μ and σ as the number of
measurements increases.

10
The quantity n − 1 in Equation 4-2 is called the degrees of freedom.
The square of the standard deviation is called the variance.
The standard deviation expressed as a percentage of the mean value
is called the relative standard deviation or the coefficient of
variation.
Coefficient of variation (RSD) = 100 × S/x

11
Question: Find the average, standard deviation, and coefficient
of variation (RSD) for 821, 783, 834, and 855.
The average is
To avoid accumulating round-off errors, retain one more digit than was
present in the original data. The standard deviation is:

The average and the standard deviation should both end at the same
decimal place. For =8232, we will write s = 30.3 The coefficient of
variation is the percent relative uncertainty:

12
Spreadsheets have built-in functions for the average and
standard deviation. In the adjacent spreadsheet, data points
are entered in cells B1 through B4. The average in cell B5 is
computed with the statement “= AVERAGE(B1:B4)”. B1:B4
means cells B1, B2, B3, and B4. The standard deviation in cell
B6 is computed with “= STDEV(B1:B4)”.
For ease of reading, cells B5 and B6 were set to display two
decimal places. A heavy line was placed beneath cell B4 in
Excel 2007 by highlighting the cell, going to Home, Font, and
selecting the Border icon. In earlier versions of Excel,
highlight cell B4, go to Format, Cells, and select Border.
13
14
The formula for a Gaussian curve is:

 It is useful to express deviations from the mean value in


multiples, z, of the standard deviation. That is, we transform
x into z, given by:

15
The probability of measuring z in a certain range is equal to the area of that range.
For example, the probability of observing z between −2 and −1 is 0.136. This probability
corresponds to the shaded area in Figure 4-3. The area under each portion of the
Gaussian curve is given in Table 4-1. Because the sum of the probabilities of all the
measurements must be unity, the area under the whole curve from z = −∞ to z = +∞
must be unity.

16
17
The number in Equation 4-3 is called the
normalization factor. It guarantees that the area under the
entire curve is unity. A Gaussian curve with unit area is called a
normal error curve.

18
Area Under a Gaussian Curve

Example: Suppose the manufacturer of the bulbs used for Figure 4-1 offers to
replace free of charge any bulb that burns out in less than 600 hours. If she
plans to sell a million bulbs, how many extra bulbs should she keep available
as replacements?

19
Solution:
1.We need to express the desired interval in multiples of the standard deviation
and then find the area of the interval in Table 4-1.
2. z = (600 − 845.2)/94.2 = −2.60.

3.The area under the curve between the mean value and z = −2.60 is 0.495 3 in
Table 4-1.

4. The entire area from −∞ to the mean value is 0.500 0, so the area from −∞ to
−2.60 must be 0.500 0 − 0.495 3 = 0.004 7. The area to the left of 600 hours in
Figure 4-1 is only 0.47% of the entire area under the curve. Only 0.47% of the
bulbs are expected to fail in fewer than 600 h. If the manufacturer sells 1 million
bulbs a year, she should make 4 700 extra bulbs to meet the replacement
demand.

20
21
• The standard deviation measures the width of the Gaussian curve. The larger
the value of σ, the broader the curve.
• In any Gaussian curve, 68.3% of the area is in the range from μ − 1σ to μ + 1σ.
That is, more than two-thirds of the measurements are expected to lie within
one standard deviation of the mean.
• Also, 95.5% of the area lies within μ ± 2σ, and 99.7% of the area lies within μ ±
3σ.
• Suppose that you use two different techniques to measure sulfur in coal:
Method A has a standard deviation of 0.4%, and method B has a standard
deviation of 1.1%. You can expect that approximately two-thirds of
measurements from method A will lie within 0.4% of the mean. For method B,
two-thirds will lie within 1.1% of the mean.

22
23
4.2 Confidence Intervals
Student’s t is a statistical tool used most frequently to express confidence
intervals and to compare results from different experiments.
From a limited number of measurements (n), we cannot find the true
population mean, μ, or the true standard deviation, σ. What we determine
are X and s, the sample mean and the sample standard deviation. The
confidence interval is computed from the equation:

t is the student
s is the standard deviation
is the mean
n is the number of trials

24
25
Question: The carbohydrate content of a glycoprotein (a protein with
sugars attached to it) is found to be 12.6, 11.9, 13.0, 12.7, and 12.5 wt% (g
carbohydrate/100 g glycoprotein) in replicate analyses. Find the 50% and
90% confidence intervals for the carbohydrate content.

26
Solution First calculate x (=12.54)and s (= 0.40) for the five
measurements. For the 50% confidence interval, look up t in
Table 4-2 under 50 and across from four degrees of freedom
(degrees of freedom = n − 1). The value of t is 0.741, so the
50% confidence interval is

27
4.3 Comparison of Means with Student’s t
• If you make two sets of measurements of the same quantity, the mean value from one set will
generally not be equal to the mean value from the other set because of small, random variations in
the measurements.

• We use a t test to compare one mean value with another to decide whether there is a statistically
significant difference between the two. That is, do the two means agree “within experimental error”?

• The null hypothesis in statistics states that “the mean values from two sets of measurements are not
different”. Statistics gives us a probability that the observed difference between two means arises
from random measurement error.

• We customarily reject the null hypothesis if there is less than a 5% chance that the observed
difference arises from random variations. With this criterion, we have a 95% chance that our
conclusion is correct. One time out of 20 when we conclude that two means are not different we will
be wrong.

28
Case 1. Comparing a Measured Result with a “Known”
Value
We measure a quantity several times, obtaining an average value and
standard deviation. We need to compare our answer with an accepted
answer. The average is not exactly the same as the accepted answer.
Does our measured answer agree with the accepted answer “within
experimental error”?

29
Ex.1.You purchased a Standard Reference Material coal sample certified by the
National Institute of Standards and Technology to contain 3.19 wt% sulfur. You are
testing a new analytical method to see whether it can reproduce the known value.
The measured values are 3.29, 3.22, 3.30, and 3.23 wt% sulfur, giving a mean of =
3.26 and a standard deviation of s = 0.04. Does your answer agree with the known
answer?

Answer:
For four measurements, there are 3 degrees of freedom and t95% = 3.182 in Table 4-
2. The 95% confidence interval is

30
Case 2. Comparing Replicate Measurements
Do the results of two different sets of measurements agree “within
experimental error”?

Ex. 2 Is Lord Rayleigh’s Gas from Air Denser Than N2 from Chemicals?

The average mass of gas from air in Table 4-3 is with a standard deviation of s1 =
0.000 143 (for n1 = 7 measurements). The mass of gas from chemical sources is with s2
= 0.001 38 (for n2 = 8 measurements).

31
32
If tcalculated > ttable (95%), the difference is significant.

33
For 7 + 8 − 2 = 13 degrees of freedom in Table 4-2, ttable lies between 2.228 and 2.131 for 95% confidence. Because
tcalculated > ttable, the difference is significant. In fact, ttable for 99.9% confidence is ~4.3. The difference is significant
beyond the 99.9% confidence level.

34
4.4 Comparison of Standard Deviations with the F
Test
• The F test tells us whether two standard deviations are “significantly” different
from each other. F is the quotient of the squares of the standard deviations:

35
36
In Table 4-3, the standard deviation from chemical decomposition is s1 = 0.001 38 (n1 = 8
measurements) and the standard deviation from air is s2 = 0.000 143 (n2 = 7 measurements).

Is the Standard Deviation from Chemical Decomposition Significantly Greater Than the
Standard Deviation from Air in Rayleigh’s Data?

Solution:
1. To answer the question, find F with Equation 4-12
2. In Table 4-4, look for Ftable in the column with 7 degrees of freedom for s1 (because degrees
of freedom = n − 1) and the row with 6 degrees of freedom for s2. Because Fcalculated (= 93.1) >
Ftable (= 4.21), we accept the hypothesis that s1 > s2 above the 95% confidence level. The
obvious difference in scatter of the two data sets in Figure 4-7 is highly significant.

37
4.6 Grubbs Test for an Outlier

Example: Students dissolved zinc from a galvanized nail and measured the mass lost by
the nail to tell how much of the nail was zinc. Here are 12 results: Mass loss (%): 10.2,
10.8, 11.6, 9.9, 9.4, 7.8, 10.0, 9.2, 11.3, 9.5, 10.6, 11.6
The value 7.8 appears out of line. A datum that is far from other points is called an
outlier. Should 7.8 be discarded before averaging the rest of the data or should 7.8 be
retained?

38
39
4.7 The Method of Least Squares
4.7.1. Finding the Equation of the Line
• Calibration Curve: A graph showing the value of some property versus
concentration of analyte. When the corresponding property of an
unknown is measured, its concentration can be determined from the
graph.
• method of least squares: Process of fitting a mathematical function to
a set of measured points by minimizing the sum of the squares of the
distances from the points to the curve.

40
4.8. Calibration curves
A calibration curve shows the response of an analytical method to known quantities
of analyte.8
Table 4-7 gives real data from a protein analysis that produces a colored product. A
spectrophotometer measures the absorbance of light, which is proportional to the
quantity of protein analyzed.

Solutions containing known concentrations of analyte are called standard solutions.


Solutions containing all reagents and solvents used in the analysis, but no deliberately
added analyte, are called blank solutions.

Blanks measure the response of the analytical procedure to impurities or interfering


species in the reagents.

41
Example: An unknown protein sample gave an absorbance of 0.406
and a blank had an absorbance of 0.104. How many micrograms of
protein are in the unknown?
Solution
The corrected absorbance is 0.406 − 0.104 = 0.302, which lies on the
linear portion of the calibration curve in Figure 4-13.

42
in which m is the slope and b is the y-intercept. The vertical deviation for the point (xi, yi) in Figure 4-11 is yi
− y, where y is the ordinate of the straight line when x = xi.

43
44
45
eh − fg

46
47
END OF CHAPTER 4

Terms to Understand page 91


Summary page 91
Exercises page 92
Problems page 93

48
Ch6:
Chemical Equilibrium

49

You might also like