Normal or Gaussian Curve of Errors
Normal or Gaussian Curve of Errors
errors
h
1 x 2
y
y
2
exp
2
2
h h 2 x 2
y e
- +
-r o +r
-3 -2 -1 1 2 3
x
Normal probability curve
Normal of Gaussian curve of errors:
The normal or Gaussian law of errors is the basis for major part of
study of random effects. This type of distribution is most
frequently met in practice.
The law of probability states the normal occurrence of deviations
from average value of an infinite number of measurements or
observations can be expressed by
h h 2 x 2
y e
x is the magnitude of deviation from mean
y= no. of readings at any deviation x, (the probability of
occurrence of deviation x
x
h
n0 x exp( h x
2 2
)dx
0
Precision index:
h h 2 x 2
y e
From above equation, at x=0
h
y
It is clear from the previous relation that the maximum value
of y depends upon h. The larger the value of h, the sharper is
the curve. Thus the value of h determines the sharpness of the
curve since the curve drops sharply owing to the term (-h2)
being in the exponent. A sharp curve evidently indicates that
the deviations are more closely grouped together around
deviation x=0.
x3 x1 x2 x4
Previous figure shows two curves having different values of h.
Curve 1 has a large value of h while curve 2 has a smaller value of
h. Therefore curve 1, wherein the deviations are closely clustered
together, is indicative of greater precision than curve 2, wherein the
deviations are spread over a much wider range.
It is clear that the probability that a variate (random variable) lies in
a given range becomes less as the range of the deviations becomes
greater. For a given deviation x, the probability is less, greater the h
and vice versa. Thus the name precision index for h is reasonable.
A large value of h represents high precision of the data because the
probability of occurrence of variates in a given range falls off
rapidly as the deviation increases because variates tend to cluster
(become closer) into a narrow range. On the other hand, a small
value of h represents low precision because the probability of
occurrences of variates in a given range falls off gradually as the
deviation increases; this is because the variates are spread over a
wide range.
Probable error:
Let us consider the two points –r and +r marked in the figure.
These points are bounded by curve, the x-axis, and the ordinates
erected at x=-r and x=+r is equal to half of the total area under
curve. Half the deviations lies between x=r.
A convenient measure of precision is the quantity r. It is called
Probable error or simply P.E. The reason for this name is the fact
h
1 x 2
y
y
2
exp 2
2
h h 2 x 2
y e
- +
-r o +r
-3 -2 -1 1 2 3
x
Normal probability curve
Mentioned above that half the observed values lies between the
limits r. If we determine r as the result of n measurement and then
make an additional measurement, the chances are 50-50 percent that a
new value will lie between –r and +r that is the chances are even that
any one reading will have an error no greater than r.
Location of point r can be found from equation by putting
x2
h
n12 exp( h x
2 2
)dx
x1
r
h 1
exp( h x )dx
2 2
r 2
0.4769
Probable error P.E. r
h
Average deviation for normal curve: average deviation is
computed when more than one reading is present at a given
deviation by multiplying the amount of deviation by the
number of points on the deviation. Then this product is added
to other similar products (without regard to sign) until all
readings are taken into account; then divide by the number of
readings.
Average deviation
D x ydx
2h 1
exp( h x ) x dx
2 2
0 h
1
h
D
Substituting the value of h in the equation of probable error
r r
Average deviation D
0.4769 0.8453
Standard deviation for the normal curve:
d 2
n
2h 1
exp( h x ) x dx 2
2 2 2 2
0 2h
Standard deviation for normal curve
1
2h
r r
2 0.4769 0.6754
We have P.E. r 0.8453D
0.6745
Probable error of a finite number of readings:
d
2
d d ....... d
2 2 2
r1 0.6745 1 2
0.6745n
n 1 n 1
This is in fact means that, for a computed probable error r
obtained from n readings, one more reading would have even
chance of being above or below r1
With a finite number of readings, the average reading has a
probable error of (or probable error of mean)
Specifying odds:
Probability of occurrence can be stated in terms of Odds.
Odds is the number of chances that a particular reading will
occur when the error limit is specified. For example, if the error
limits are specified as 0.6745, the chances are that 50% of the
observations will be between the above limits or in other words
we can say that odds are 1 to 1. odds can be calculated as under
odds
Probability of occurance
odds 1
Probability and odds
Deviation Probability Odds
0.6745 0.5000 1 to 1
0.6826 2.15 to 1
2 0.9546 21 to 1
3 0.9974 39 to 1
Specifying measurement data
After carrying out statistical analysis of multi-sample data, the
results of measurements must be specified. The results are
expressed as deviations about a mean value. The deviations are
expressed as:
(i) Standard deviation: The result is expressed as X . The error
limit in this case is the standard deviation. This means that 0.6828
(about 68%) of the readings are within the limits and odds are
2.15 to 1. Thus there is approximately a 2 to 1 possibility that a new
reading will fall within this limit.
Specifying measurement data
(ii) Probable error: The result is expressed as X 0.6745 . This
means that 50 % of the readings lie within this limit and odds are 1
to 1. This means that there is an even possibility that a new reading
will lie within this limit.
(iii) limit: The result is expressed as X 2. In this case the
probability range is increased. Now 0.9546 (or about 95% ) of
readings fall within this limit. Odds in this case are 21 to 1.
(iv)3 limit: The result is expressed as X 3 . The maximum or
boundary error limit is 3. The probability in this case is 0.9974.
Therefore 99.74 % of the readings will fall within this limit. In
other words it can be stated that there is a possibility of 26 readings
out of 10000 will fall beyond this limit. Thus practically all the
readings are included in this limit. The odds of any reading falling
within this limit are 369 to 1.
Confidence interval and confidence level
It is possible to state through statistical analysis of the data that a
range of deviation from the mean value within which a certain
fraction of all values expected to lie. This range is called the
confidence interval.
The probability that the value of a randomly selected observation
will lie in this range is called confidence level.
If the number of observations is large and their errors are random
and follow the normal Gaussian distribution, the various confidence
intervals about the mean value X are given below.
Confidence level Confidence Interval Values lying outside confidence interval
0.500 X 0.674 1 in 2
0.800
X 1.282 1 in 5
0.900 1 in 10
X 1.645
0.950 1 in 20
X 1.960
0.990 X 2.576 1 in 100
0.999 X 3.291 1 in 1000
Rejection of data
In most of the experiments, the experimenter finds that some of the
data points are noticeably different from majority of the data. If these
data points were obtained under abnormal conditions involving gross
blunders and experimenter is sure about their dubious nature, they can
be discarded straight away.
However experimenter cannot reject a data simply because it is
different from others, he must rely on certain standard mathematical
methods for rejecting any experimental data. There are many methods
available for assessing whether data be rejected or retained. Three
commonly used methods are (i) Chauvenet’s criterion (ii) use of
confidence intervals and (iii) 3 limits.
Chauvenet’s Criterion:
Suppose n observations are made for measurement of a quantity.
We assume that n is large enough that the results will follow
Gaussian distribution. This distribution may be used to compute
probability that a given reading will deviate by a certain amount
from the mean. Chauvenet’s criterion specifies that a reading may
be rejected if probability of obtaining the particular deviation from
the mean is less than ½ n.
When applying Chauvenet’s criterion, in order to eliminate any
dubious data, the mean value and the standard deviation are first
calculated using all data points. Deviations of individual readings
are then compared with standard deviation. If the ratio of deviation
of a reading to the standard deviation exceeds the limits given in
the table that reading is rejected. The mean value and the standard
deviations are again calculated by excluding the rejected reading
from the data.
Table gives the values of the ratio of deviation to standard deviation
for various values of n according to this criterion.
Rejection of data based upon confidence intervals:
A criterion used for discarding a data point is that its deviation
from mean exceeds four times the probable error of a single
reading. This results in discarding a data outside a confidence
interval for a single reading at a confidence level of 0.993.
A better criterion which does not involve the evaluation of
probable error when set of data points is small and standard
deviation is not accurately known, is to discard a reading that lies
outside the interval corresponding to confidence level 0.99 for a
single observation. You can see tables in next slide. On this
basis not more than 1 reading in 100 would lie outside this range.
Still a better method to use confidence interval corresponding to
a confidence level of 0.95 in order to scrutinize the measurement
procedure adopted.
Confidence level Confidence Interval Values lying outside confidence interval
0.500 X 0.674 1 in 2
0.800 X 1.282 1 in 5
0.900 X 1.645 1 in 10
0.950 X 1.960 1 in 20
0.990 X 2.576 1 in 100
0.999 X 3.291 1 in 1000
Confidence intervals
(When the number of observations is small)