0% found this document useful (0 votes)
17 views

Point-estimation-and-sampling-distribution

This document discusses point estimation and sampling in statistical inference, highlighting the importance of estimating population parameters from sample data. It covers key concepts such as point estimation, confidence intervals, hypothesis testing, and the Central Limit Theorem, emphasizing the behavior of sampling distributions. The document also explains how sample size affects the accuracy and precision of estimates, and provides examples to illustrate these concepts.

Uploaded by

ZhuoJinWoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Point-estimation-and-sampling-distribution

This document discusses point estimation and sampling in statistical inference, highlighting the importance of estimating population parameters from sample data. It covers key concepts such as point estimation, confidence intervals, hypothesis testing, and the Central Limit Theorem, emphasizing the behavior of sampling distributions. The document also explains how sample size affects the accuracy and precision of estimates, and provides examples to illustrate these concepts.

Uploaded by

ZhuoJinWoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

6.

1 Point Estimation and Sampling Statistical inference uses what we know about
probability to make our best “guesses” or
Distributions estimates from samples about
Learning Objectives the population they came from. The main
forms of Inference are:
By the end of this chapter, the student should
be able to: 1. Point estimation - Using sample data to
calculate a single statistic as an estimate of
• Understand point estimation an unknown population parameter
• Apply and interpret the Central Limit 2. confidence interval - An interval built around
Theorem a point estimate for an unknown population
• Construct and interpret confidence intervals parameter
for means when the population standard
3. Hypothesis testing- A decision making
deviation is known
procedure for determining whether sample
• Understand the behavior of confidence
intervals evidence supports a hypothesis
• Carry out hypothesis tests for means when
the population standard deviation is known
• Understand the probabilities of error in
hypothesis tests
Point Estimation
Suppose you were trying to determine the
mean rent of a two-bedroom apartment in
your town. You might look in the classified
section of the newspaper, write down several
rents listed, and average them together. You
would have obtained a point estimate of the
true mean. If you are trying to determine the
percentage of times you make a basket when
shooting a basketball, you might count the
number of shots you make and divide that by
the number of shots you attempted. In this
case, you would have obtained a point
estimate for the true proportion.

The most natural way to estimate features of


Statistical Inference the population (parameters) is to use the
corresponding summary statistic calculated
from the sample. Some common point
It is often necessary to “guess”, infer, or estimates and their corresponding parameters
generalize about the outcome of an event in are found i n the following table:
order to make a decision. Politicians study
polls to guess their likelihood of winning an
election. Teachers choose a particular course
of study based on what they think students can
comprehend. Doctors choose the treatments
needed for various diseases based on their
assessment of likely results. You may have
visited a casino where people play games
chosen because of the belief that the
likelihood of winning is good. You may have
chosen your course of study based on the
probable availability of jobs. Suppose the mean weight of a sample of 60
adults is 173.3 lbs; this sample mean is a
point estimate of the population mean
weight, µ. Remember this is one of many might produce a different value of x, such as
samples that we could have taken from the 169.5 lbs. Repeated random sampling could
population. If a different random sample of result in additional different values, perhaps
60 individuals were taken from the same 172.1 lbs, 168.5 lbs, and so on. Each sample
population, the new sample mean would mean can be thought of as a single
likely be different as a result of sampling observation from a random variable X. The
variability The idea that samples from the same
distribution of X is called the sampling
distribution of the sample mean, and has its
population can yield different results). While
own mean and standard deviation like the
estimates generally vary from one sample to
random variables discussed previously. We
another, the population mean is a fixed will simulate the concept of a sampling
value. distribution using technology to repeatedly
sample, calculate statistics, and graph
Suppose a poll suggested the US President’s them. However, the actual sampling
approval rating is 45%. We would consider distribution would only be attainable if we
45% to be a point estimate of the approval could theoretically take an infinite amount of
rating we might see if we collected responses samples.
from the entire population. This entire-
population response proportion is generally Each of the point estimates in the table above
referred to as the parameter of interest. When have their own unique sampling distributions
the parameter is a proportion, it is often which we will look at in the future
denoted by p, and we often refer to the sample
proportion as ˆp (pronounced “p-hat”). Unless
we collect responses from every individual in
the population, p remains unknown, and we
use ˆp as our estimate of p.

How would one estimate the difference in


Unbiased Estimation
average weight between men and
women? Suppose a sample of men yields a Although variability in samples is present,
mean of 185.1 lbs and a sample of women there remains a fixed value for any population
men yields a mean of 162.3 lbs. What is a parameter. What makes a statistical estimate
good point estimate for the difference in these of this parameter of interest a “Good” one? It
two population means? We will expand on must be both accurate and precise.
this in following chapters.
The accuracy of an estimate refers to how
well it estimates the actual value of that
Sampling Distributions parameter. Mathematically, this is true when
that the expected value your statistic is equal
to the value of that parameter. This can be
We have established that different samples visualized as the center of the sampling
yield different statistics due to sampling distribution appearing to be situated at the
variability . These statistics have their own value of that parameter.
distributions, called sampling distributions
(The probability distribution of a statistic at a
given sample size), that reflect this as a According to the law of large numbers(s the
random variable. The sampling number of trials in a probability experiment
distribution of a sample statistic is the increases, the relative frequency of an event
distribution of the point estimates based on approaches the theoretical probability),
samples of a fixed size, n, from a certain probabilities converge to what we expect
population. It is useful to think of a particular over time. Point estimates follow this rule,
point estimate as being drawn from a becoming more accurate with increasing
sampling distribution. sample size. The figure below shows the
Recall the sample mean weight calculated sample mean weight calculated for random
from a previous sample of 173.3 lbs. Suppose samples drawn, where sample size increases
another random sample of 60 participants
by 1 for each draw until sample size equals
500.

The maroon dashed horizontal line is drawn


at the average weight of all adults 169.7 lbs,
which represents the population mean
weight according to the CDC.

Note how a sample size around 50 may


produce a sample mean that is as much as 10
lbs higher or lower than the population mean.
As sample size increases, the fluctuations
around the population mean decrease; in other
words, as sample size increases, the sample
mean becomes less variable and provides a
more reliable estimate of the population mean.

In addition to accuracy, a precise estimate is


also more useful. This means when
repeatedly sampling, the values of the
statistics seem pretty close together. The
precision of an estimate can be visualized as
the spread of the sampling distribution,
usually quantified by the standard
deviation. The phrase “the standard deviation
of a sampling distribution” is often shortened
to the standard error (The standard deviation
of a sampling distribution). A smaller
standard error means a more precise estimate
and is also effected by sample size.
The size of the sample, n, that is required in
order to be “large enough” depends on the
original population from which the samples
are drawn (the sample size should be at least
30 or the data should come from a normal
distribution). If the original population is far
6.2 The Sampling Distribution from normal, then more observations are
of the Sample Mean (σ needed for the sample means or sums to be
normal. Sampling is done with replacement.
Known)
The following images look at sampling
Let’s start our foray into inference by distributions of the sample mean built from
focusing on the sample mean. Why are we so taking 1000 samples of different sample sizes
concerned with means? Two reasons: they from a normal Population. What pattern do
give us a middle ground for comparison, and you notice?
they are easy to calculate. In this section we
will see what we can deduce about the
sampling distribution of the sample mean.

The Central Limit


Theorem for a Sample
Mean
The central limit theorem (CLT) is one of the most
powerful and useful ideas in all of statistics. There
are two alternative forms of the theorem, and both
alternatives are concerned with drawing finite
samples size n from a population with a known
mean, μ, and a known standard deviation, σ. The
first alternative says that if we collect samples of
size n with a “large enough n,” then the resulting The following images look at sampling
distribution can be approximated by the normal distributions of the sample mean built from
distribution. taking 1000 samples of different sample sizes
from a non-normal Population (in this case it
Applying the law of large numbers (As the happens to be exponential). What pattern do
number of trials in a probability experiment you notice?
increases, the relative frequency of an event
approaches the theoretical probability ) here,
we could say that if you take larger and larger
samples from a population, then the
mean of the sample tends to get closer and
closer to μ. From the central limit theorem, we
know that as n gets larger and larger, the
sample means follow a normal distribution.
The larger n gets, the smaller the standard
deviation gets. (Remember that the standard
deviation for is .) This means that the
sample mean must be close to the
population mean μ. We can say that μ is the
value that the sample means approach
as n gets larger. The central limit theorem
illustrates the law of large numbers. What differences do you notice when
sampling from a normal population vs. Non
normal?
divided by the sample size. Standard deviation
is the square root of variance, so the standard
deviation of the sampling distribution (aka
standard error) is the standard deviation of the
original distribution divided by the square root
of n. The variable n is the number of values
that are averaged together, not the number of
Example times the experiment is done.
Suppose:
It would be difficult to overstate the
• eight students roll one fair die ten times importance of the central limit theorem in
• seven roll two fair dice ten times statistical theory. Knowing that data, even if
• nine roll five fair dice ten times its distribution is not normal, behaves in a
• 11 roll ten fair dice ten times. predictable way is a powerful tool. We can
simulate this idea using technology.

Suppose X is a random variable with a


distribution that may be known or unknown
(it can be any distribution). Using a subscript
Each time a person rolls more than one die, he that matches the random variable, let:
or she calculates the sample mean of the faces
showing. For example, one person might roll • μX = the mean of X
five fair dice and get 2, 2, 3, 4, 6 on one roll. • σX = the standard error of X
The mean is = 3.4. The 3.4 is
one mean when five fair dice are rolled. This = standard deviation of and is
same person would roll the five dice nine called the standard error of the mean. Note
more times and calculate nine more means for here we are assuming we know the population
a total of ten means. standard deviation.

As the number of dice rolled increases from If you draw random samples of size n, then
one to two to five to ten, the following would as n increases, the random variable which
happen: consists of sample means, tends to be
normally distributed and
1. The mean of the sample means remains
approximately the same. ~N .
2. The spread of the sample means (the
standard deviation of the sample means) gets To put it more formally, if you draw random
smaller. samples of size n, the distribution of the
3. The graph appears steeper and thinner. random variable , which consists of sample
means, is called the sampling distribution of
We have just demonstrated the idea of central the sample mean. The sampling distribution of
limit theorem (clt) for means, that as you the mean approaches a normal distribution
increase the sample size, the sampling as n, the sample size, increases.
distribution of the sample mean tends toward
a normal distribution.

To summarize, the central limit theorem for


Using the CLT
sample means says that if you keep drawing
larger and larger samples (such as rolling one, It is important to understand when to use
two, five, and finally, ten dice) and calculating the central limit theorem:
their means, the sample means form their own
• If you are being asked to find the probability
normal distribution (the sampling
of an individual value, do not use the
distribution). The normal distribution has the
CLT. Use the distribution of its random
same mean as the original distribution and a
variable.
variance that equals the original variance
• If you are being asked to find the probability
of the mean of a sample, then use the CLT for
the mean.

The random variable has a different z-score


formula associated with it from that of a
single observation. Remeber, The mean is
the mean of one sample and μX is the average,
or center, of both X (The original distribution)
and .

We can use our Z table and standardize just as


we are already familiar with, or can use your
technology of choice

Example

An unknown distribution has a mean of 90


and a standard deviation of 15. Samples of
size n = 25 are drawn randomly from the
population. b. Find the value that is two standard
deviations above the expected value, 90, of
a. Find the probability that the sample mean is the sample mean.
between 85 and 92.

• Let X = one value from the original


unknown population. The probability
question asks you to find a probability for
the sample mean.

Exercises:
An unknown distribution has a mean of 45
and a standard deviation of eight. Samples of
• Find P(85 < < 92). Draw a graph. size n = 30 are drawn randomly from the
population. Find the probability that the
sample mean is between 42 and 50.

You might also like