Stat ch2
Stat ch2
Chapter two
Statistical Inferences; Estimating for single populations
Estimating Population Mean with Large sample size
On many occasions estimating the population mean is useful in business research. A point
estimate is a statistic taken from a sample and is used to estimate a population parameter.
However, a point estimate is only as good as the representative ness of its sample. If other
random sample are taken from the population. The point estimates derived from those
samples are likely to vary. Because of variation in sample statistics, estimating a population
parameter with an interval estimate is often preferable to using point estimate. An interval
estimate (confidence interval) is a range of values with in which the analyst can declare
with some confidence the population parameter lies. Confidence interval can be two sided
or one sided.
As a result of the central limit theorem, the following Z formula for means can be used
when sample sizes are large, regardless of the shape of the population distribution, or for
smaller sizes if the population is normally distributed.
Z=
= – Z.
Because a sample mean can be greater than or less than the population mean, Z can be
positive or negative. Thus;
= Z.
Rewriting this expression yields the confidence interval formula for estimating with large
sample size.
Z ……………………………………………….. (2.1)
Or
–Z +Z
Where
The area under the normal curve out side the confidence interval area
1
Statistics for management
= The area in one end (tail) of the distribution outside the confidence interval
Here we use to locate the Z value in constructing the confidence interval. Because the
standard normal table is based on areas between a Z of 0 and Z , the table Z value is
found by locating the area of 0.5 – ,which is the part of the normal curve between the
middle of the curve and one of the tails. Another ways to locate this Z value is to change
the confidence level from percentage to proportion, divide it in half, and go to the table
with this value. The results are the same.
The confidence interval formula (2.1) yields a range (interval) with in which we feel with
some confidence the population mean is located. It is not certain that the population mean
is in the interval unless we have 100% confidence interval that it is infinitely wide.
However we can assign probability that the parameter ( ) is located with in the interval.
Formula 2.1 can be presented as a probability statement
P –Z +Z =1–
Z score for confidence intervals in relation to
-Z 0 Z
2
Statistics for management
1–
Confidence
2
Shaded area
Example;
Real estate broker estimate the mean family income in the area as an indicator of expected
sales. A sample of 100 families yields a mean of = 35, 500. Presume the population
standard deviation is = 7,200, given that a 95% confidence interval, is estimated as;
Confidence interval for is
Z.
3
Statistics for management
The Z value for a 95% confidence interval is 1.96 ( = 0.475, the Z value is 1.96 or Z
or Z is 0.5 – 0.025 = 0.475, the Z value is 1.96)
35,500 (1.96)
34,088.80 36,911.20
Interpretation
- The developer is 95% confident that the true unknown population mean is between $
34,088.80 and $ 36, 9911.20
central limit theorem require large samples for non normal populations, it makes sense to
modify formula (2.1) to use the sample standard deviation, S. Beware,however, not to use
this modified formula for small samples when the population standard deviation is
unknown even when the population is normally distributed.
Z Or
–Z +Z ……………………………… (2.3)
Example
Given, n = 110, = 85.5 and S =19.3, compute a 99% confidence interval to estimate
Solution
The confidence interval is
–Z +Z , Z = 2.575 Or
Table
-Value of Z for some of the more common levels of confidence.
Confidence Z value
Level
90% 1.645
95% 1.96
98% 2.33
99% 2.575
In many real life situations, sample sizes of less than 30 are the norm.
The t Distribution
5
Statistics for management
William S.Gosset (British statistician) developed the t distribution, which describes the
sample data in small samples when the population standard deviation is unknown and the
population is normally distributed. The formula for the t value is;
t=
The formula is essentially the same as the Z formula, but the distribution table values are
different.
The assumption underlying the use of the techniques discussed in this chapter for small
sample sizes is that the population is normally distributed. . If the population distribution is
not normal or is unknown, non parametric techniques should be used.
The t distribution is the appropriate distribution to use any time the population variance or
standard deviation is unknown, regardless of sample size. However, because the difference
between the table value for Z and t becomes negligible for large sample many researchers
use the Z distribution for large sample – analysis even when the standard deviation or
variance is unknown.
The t distribution is reserved for use with small sample size problems (n < 30) because, as
n nears size 30, the t table values approach the Z table values.
To find a value in the t distribution table requires knowing the sample size. The t
distribution table is a compilation of many t distributions, with each line of the table
representing a different sample size. However, the sample size must be converted to
degrees of freedom ( df ) before a table value can be determined.
t formula are used because the population variance or standard deviation, which is part of
the Z formula, is unknown and must be estimated by a sample standard deviation or
variance.
The t distribution table does not use the area between the statistic and the mean as does
the Z distribution. Instead t table uses the area in the tail of the distribution. The emphasis
in the t table is on and each tail of the distribution contains of the area under the curve
when confidence interval are constructed.
Degree of Freedom (df = n–1) -The number of observations that can be freely chosen
6
Statistics for management
Variance of t-distribution
Example,
Given, n= 4 observation that must produce a mean of 10. The mean of 10 serves as a
constraint and there are n-1= 3 degree of freedom.
t=
Can be manipulated algebraically to produce a formula for estimating the population mean
using small sample when is unknown and the population is normally distributed. The
result is the formulas given next
t , or
–t , +t , ……………………………….( 2.4)
Example;
Owner of a large equipment rental company wants to make rather quick estimate of the
average number of days a piece of equipment is rented out per person per time. The owners
decide to take a random sample of rental invoices. Fourteen different rentals of the
equipment are selected randomly from the files, yielding the following data. She uses these
data to construct a 99% confidence interval to estimate the average number of days that
equipment is rented and assume that the number of days per rental is normally distributed
in the population.
3 1 3 2 5 1 2 1 4 2 1 3 1 1
As, n= 14, the df= 13, the 99% level of confidence results in = 0.005 areas in each tail
of the distribution. The table t value is
t , = 3.012
The sample mean is 2.14 and the sample standard deviation is 1.29 the confidence interval
is
7
Statistics for management
Prob
The point estimate of the average length of time per day rental is 2.14 days , with an error
of 1.04.
Z=
Where Q = 1–P, recall that this formula can be applied only when n.p and n.Q are greater
than 5.
Algebraically manipulating this formula to estimate P involves solving for P. However P is
in both the numerator and the denominator ,which complicates the resulting formula. For
this reason – for confidence interval purposes only and for large sample size- is
substituted for P in the denominator, yielding.
Z=
–Z +Z ……………………………….2.5
Where
= Sample proportion
= 1–
P= Population proportion
n = Sample size
In this formula, is the point estimate and Z is the error of the estimation.
Example
8
Statistics for management
Solution
- = 0.39 – is the point estimate of the population proportion, P
- For n = 87, and = 0.39, a 95 % confidence interval can be computed to determine the
interval estimation of P.
= …………………………………………………………….2.6
df = n– 1
- The Chi-square distribution is not symmetrical and its shape will vary according to the
degree of freedom
Formula 2.6 can be algebraically to produce a formula that can be used to construct
confidence intervals for population variances.
9
Statistics for management
df = n– 1
The value of is equal to 1–(level of confidence expressed as a
proportion).Thus if we are constructing a 90% confidence interval, alpha is 10% of the
area and is expressed in proportion from = 0.10.
Example
Given, S = 1.12 , n= 25 develop a 95% confidence interval to estimate the population
variance. Assume the populations are normally distributed.
Solution
S = (1.12 ) = 1.2544
df= n– 1 = 25– 1 = 24
A 95% confidence means that alpha ( ) is 1– 0.95 =0.05.This value is split to determine
the area in each tail of the chi-square distribution; = 0.025. The values of the chi-square
obtained from the table are;
= 39.3641
= 12.4011
From this information, the confidence interval can be determined.
0.7648 2.4277
Prob 0.7648 2.4277
Graphically
10
Statistics for management
0.025
0.025
0.975
x x
-When is being estimated, the size of sample can be determined by using the z formula
for sample means to solve for n.
Z=
-is the error of estimation resulting from the sampling process. Let E = - the
error of estimation.
n= =
Suppose you want to estimate the average age of the all Boeing 727 air planes how in
active domestic U.S service. You want to be 95% confident, and you want your estimate to
be with in 2 years of the actual figure. The 727was first placed in service about 30 years
ago, but you believe that no active 727 is the U.S domestic fleets are more than 25 years
old. How large a sample should you take?
Solution
E= 2 years, the Z value for 95% is 1.96, and is unknown
( ) (range)
( ) (25) = 6.25
n= = = 37.52
38
If you randomly sample 38 units, you have an opportunity to estimate the average age of
active 727 with in two years and be 95% confident of results.
12