0% found this document useful (0 votes)
7 views13 pages

5b Sampling Distribution

Uploaded by

Gusti Fauza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

5b Sampling Distribution

Uploaded by

Gusti Fauza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

#5 Sampling Distribution

Gusti Fauza
Populations & Samples
• A population is the set (possibly infinite) of all possible observations.
– Each observation is a random variable X having some (often unknown)
probability distribution, f (x).
– If the population distribution is known, it might be referred to, for example,
as a normal population, etc.
• A sample is a subset of a population.
– Our goal is to make inferences about the population based on an analysis of
the sample.
– A biased sample, usually obtained by taking convenient, rather than
representative observations, will consistently over- or under-estimate some
characteristic of the population.
– Observations in a random sample are made independently and at random.
Here, random variables X1, X2, …, Xn in the sample all have same distribution
as the population, X.
Sampling Distribution
• Normal distribution: Sampling distribution of Xbar when  is known for
any population distribution.
– Also the sampling distribution for the difference of the means of two different
samples.
• Chi-square (2) distribution: Sampling distribution of S2. Population must
be normal.
• t-distribution: Sampling distribution of Xbar when  is unknown and S is
used. Population must be normal.
– Also the sampling distribution for the difference of the means of two different
samples when  is unknown.
• F-distribution: The distribution of the ratio of two 2 random variables.
Sampling distribution of the ratio of the variances of two different
samples. Population must be normal.
Central Limit Theorem
• The central limit theorem is the most important theorem in statistics. It
states that
If Xbar is the mean of a random sample of size n from a population with an
arbitrary distribution with mean  and variance 2, then the limiting form
of the distribution of

then as n, the sampling distribution of Xbar approaches a normal


distribution with mean  and standard deviation /n .

• The central limit theorem holds under the following conditions:


– For any population distribution if n  30.
– For n < 30, if the population distribution is generally shaped like a normal
distribution.
– For any value of n if the population distribution is normal.
Inferences About the Population Mean
We often want to test hypotheses about the population mean
(hypothesis testing will be formalized later).

• Example:
An important manufacturing process produces cylindrical component
parts for the automotive industry. It is important that the process
produce parts having a mean diameter of 5.0 millimeters. The
engineer involved conjectures that the population mean is 5.0
millimeters. An experiment is conducted in which 100 parts produced
by the process are selected randomly and the diameter measured on
each. It is known that the population standard deviation is σ = 0.1
millimeter. The experiment indicates a sample average diameter of x
= 5.027 millimeters. Does this sample information appear to support
or refute the engineer’s conjecture?
Sampling Distribution of S 2

• If S2 is the variance of a random sample of size n taken from a normal population with
known variance 2, then the statistic 2 below has a chi-squared distribution with  = n-1
degrees of freedom:

(n  1) S 2
(X  X ) 2

  
n
2
 i

 2 i 1
 2

• The 2 table is the reverse of the normal table.


– It shows the 2 value that has probability  to the right of it.
– Note that the 2 distribution is not symmetric.

• Degrees of freedom:
– As before, you can think of degrees of freedom as the number of independent pieces of
information. We use  = n-1, since one degree of freedom is subtracted due to the fact that the
sample mean is used to estimate  when calculating S2.
– If  is known, use n degrees of freedom.
A manufacturer of car batteries guarantees that
the batteries will last, on average, 3 years with a
standard deviation of 1 year. If five of these
batteries have lifetimes of 1.9, 2.4, 3.0, 3.5, and
4.2 years, should the manufacturer still be
convinced that the batteries have a standard
deviation of 1 year? Assume that the battery
lifetime follows a normal distribution.
t-Distribution (when  is Unknown)
• The problem with the central limit theorem is that it assumes that  is
known.
– Generally, if  is being estimated from the sample,  must be estimated from the
sample as well.
– The t-distribution can be used if  is unknown, but it requires that the original
population must be normally distributed.
• Let X1, X2, ..., Xn be independent, normally distributed random variables
with mean  and standard deviation . Then the random variable T below
has a
t-distribution with  = n - 1 degrees X 
of freedom:
T 
S/ n

– The t-distribution is like the normal, but with greater spread since both  and 
have fluctuations due to sampling.
Using the t-Distribution
• Observations on the t-Distribution:
– The t-statistic is like the normal, but using S rather than .
– Table value depends on the sample size (degrees of freedom).
– The t-distribution is symmetric with  = 0, but 2 > 1. As would be
expected, 2 is largest for small n.
– Approaches the normal distribution (2 = 1) as n gets large.
– For a given probability , table shows the value of t that has P() to the
right of it.
• The t-distribution can also be used for hypotheses concerning the
difference of two means where 1 and 2 are unknown, as long as
the two populations are normally distributed.
• Usually if n  30, S is a good enough estimator of , and the normal
distribution is typically used instead.
Contoh:
A chemical engineer claims that the population mean yield of a
certain batch process is 500 grams per liter of raw material. To
check this claim he samples 25 batches each month. If the
computed tvalue falls between −t0.05 and t0.05, he is satisfied with
this claim. What conclusion should he draw from a sample that
has a mean xbar = 518 grams per liter and a sample standard
deviation s = 40 grams? Assume the distribution of yields to be
approximately normal.
F-Distribution Comparing Sample Variances
• The F statistic is the ratio of two 2 random variables, each
divided by its number of degrees of freedom.
• If S12 and S22 are the variances of samples of size n1 and n2
taken from normal populations with variances 12 and 22,
then
S  2 2

F  1 1

S  2

2
2

has an F-distribution with 1 = n1 - 1 and 2 = n2 - 1 degrees of


freedom.
• For the F-distribution table

1
f ( , ) 
f ( , )
1  1 2

 2 1

– Note that since the ratio is inverted, 1 and 2 are


reversed.
F-Distribution Usage
• The F-distribution is used in two-sample situations
to draw inferences about the population variances.
• In an area of statistics called analysis of variance,
sources of variability are considered, for example:
– Variability within each of the two samples.
– Variability between the two samples (variability between
the means).
• The overall question is whether the variability
within the samples is large enough that the
variability between the means is not significant.

You might also like