Chapter 5- Estimation
Chapter 5- Estimation
_____________________________________________________________________
STATISTICAL ESTIMATION AND STATISTICAL INFERENCE
____________________________________________________________________
5.0 INTRODUCTION
Do you remember why we need to take samples? Yes, census is costly and sometimes
impossible. Therefore, we need to take part of the entire population (sample) and infer the
characteristic of the population form the sample we have drawn.
Statistical inference is the process of using limited information, a sample, for the purpose of
reaching conclusion about a large set of data, the population.
Estimation refers to any procedure where sample information is used to estimate or predict the
numerical value of some population measure (called parameter) such as the population mean μ.
1
Unbiasedeness
A statistic is an unbiased estimator of a parameter if the average value of the statistic is the same
as the parameter value. Thus on average the estimator will be correct statistic is an unbiased
estimator of a parameter of the expected value of the statistic equals the parameter, i.e. if
E (Statistic) = Parameter
Efficiency
Even though the average value of an unbiased estimator equals the parameter, an estimator may
yield estimates that are not particularly close to the parameter value. The efficiency of an
estimator is measured by the variance of the estimator. The efficient estimator is the unbiased
estimator with the smallest variance.
Consistency
Another desirable property is that an estimator should produce estimates that have a high
probability of being close to the true value as the sample size increases. An estimator that has
this property is called a consistent estimator. The variance of a consistent estimator becomes
smaller as larger sample sizes are taken. Thus, consistence indicates that the amount of bias
becomes smaller as the sample size increases.
The estimator is X , and 5 is the point estimate of the unknown population mean.
5.1.3 POINT ESTIMATE OF THE UNKNOWN POPULATION STANDARD
DEVIATION
We will use the symbol Sx to mean an estimate of the unknown population standard deviation σx.
The estimator, called sample standard deviation, is defined by the formula
Sx
(X X ) 2
n 1
2
Where X = sample mean
n= sample size
Recall that, earlier we used the devisor N, when computing a population standard deviation σx
instead of n-1.
Example: For the random sample 1, 2, 4, 5, 7, 11 the sample standard deviation will be
computed as follows:
Sx
(X X ) 2
n 1
(1 5)2 (2 5) 2 (4 5) 2 (5 5) 2 (7 5) 2 (11 5) 2
Sx =3.633
6 1
5.1.4 POINT ESTIMATOR OF STANDARD ERROR OF THE MEAN
x
Standard error of the mean is computed by the formula x when the sample size is less
n
than 5 % of the population size. In our case, the total size of the population is unknown; therefore
it is safer to assume that the sample is less than 5% of the entire population. Hence, we will use
the estimator sx to estimate the standard error X . The symbol S X is called the sample
n
standard error of the mean. The formula for S X is
Sx
SX
n
Where Sx= Sample standard deviation
n= sample size
Note that, we have calculated Sx= 3.633 for the random sample of 1, 2, 4, 5, 7, 11. The sample
standard error can be obtained using the formula
Sx 3.633
SX = 1.483
n 6
Note: The following table shows some population parameters and their estimators.
Population parameter Sample statistic (estimators)
3
Mean X
Standard deviation σx Sx
Variance σ2x S2x
1. Value of the mean X of the simple random sample selected from the population, and
2. Known facts about sampling distributions of the mean
The confidence interval shows how certain we are that the interval is correct. The choice of
method used in constructing a confidence interval for depends upon whether or not the
population is normal and whether the population standard deviation X is known or unknown.
Suppose we have a normal population whose mean and standard deviation are and x. the
x
sampling distribution of the mean is normal with the mean and standard error of x
n
For the sampling distribution of the mean, the standard normal variable is
4
X
Z
x
If we want to be 95% confident that the population mean, falls within the estimate, we can
calculate the range as follows.
1. find the Z value for 95% confidence level
2. Use the obtained Z value to calculate the unknown population parameter.
For example z value for 95% confidence interval is 1.96. Therefore, if we want to be 95% sure
that the true population mean falls within the estimate, we can rearrange the above formula and
get:
X 1.96 x X 1.96 x
The proportion of correct estimates (0.95 in our illustration) is called the confidence coefficient
C. the number 100C (95% in our illustration) is called the confidence level. The proportion of
incorrect statements is symbolized by the Greek letter α (alpha). The sum of the proportions of
correct and incorrect statements 1; so
C + α =1 or α = 1- c
We can describe C as the chance that the confidence interval is correct, and α as the chance that
the interval is incorrect.
Example: A normal population has standard deviation of 10; a random sample of size 25 has a
mean of 50. Construct a 95% confidence interval estimate of the population mean.
Therefore, the estimate can be given as, X 1.96 x X 1.96 x . That is:
10 10
50 1.96( ) 50 1.96( )
25 25
= 50 3.9 50 3.9
= 46.1 53.9
5
5.2.1.2 PRECISION, CONFIDENCE AND SAMPLE SIZE
The narrower the confidence interval is, the more precise it is. And the wider the interval, the
less precise is the interval. The end points of a confidence interval for µ are:
x
X Z / 2
n
x
The smaller the value of Z / 2 , the more precise (narrower) is the confidence interval.
n
Consequently, the smaller Z / 2 and x are, and the larger n is, the more precise will be the
interval. We conclude that the larger the sample size, the more precise is an interval estimate. It
can also be concluded that the smaller the variability the more precise the estimate. The final
conclusion that can be drawn from the above relationship is, the lower the confidence level, the
more precise is the interval estimate.
5.2.1.3 CONFIDENCE ESTIMATE OF µ, NORMAL POPULATION, STANDARD
DEVIATION UNKNOWN
Under the previous case we have seen the case where the population is uniformly distributed and
population standard deviation is known. In this case we search for Z value of /2 and use the
x
formula X Z / 2 to estimate the interval within which the population mean lies with C
n
Confidence coefficient. However, most of the time population mean µ is unknown, so is
population standard deviation, d. therefore, d must be estimated from sample standard
deviation.
Sx
(X X ) 2
n 1
After calculating the standard deviation, standard error must be computed using the formula.
SX
Sx
n
When population standard deviation known, the interval estimate can be calculated as
X
Z
x
6
However, if population standard deviation is unknown, we need to estimate population standard
deviation with sample standard deviation and the distribution does not follow normal
distribution. The distribution rather follows a student’s t-distribution. There are different t-
distributions for each sample size. T-distribution is discussed in a greater detail in hypothesis
test. In this chapter we will only illustrate how to make an interval estimate using the t-
distribution; without giving much emphasis for the distribution’s characteristic.
Tail areas for t-distribution are presented according to parameter called degrees of freedom. We
shall use the symbol for degrees of freedom. Degree of freedom for t-distribution can be
calculated as n 1.
As ν increases, the tail area decreases; so is the t-value. As degrees of freedom increases, the t-
distribution approaches the standard normal distribution. When degree of freedom is 30, the t-
distribution is approximately similar to normal distribution.
To construct interval estimate for µ under this situation, we need to use the value of t / 2 , which
will be read from statistical table in association with the formula:
SX SX
X t / 2, X t / 2,
n n
Where,
X
X = 95 9.5
n 10
7
Sx
(X X ) 2
=
94.5
=3.24
n 1 9
The confidence level is 95%. Therefore, significance level = 1-C= 1-0.95= 0.05 and
/2=0.025.
Next, we have to calculate the degree of freedom for the observation; which is given as
v = n-1 = 10-1=9
SX SX
We can now calculate the interval as X t / 2, X t / 2, . t / 2, in this specific
n n
situation means t0.025, 9 = 2.26
Therefore Interval can be calculated as:
3.24 3.24
9.5 2.26( ) 9.5 2.26( )
10 10
= 7.2 11.8