Unit - III (P&S Notes)
Unit - III (P&S Notes)
1) Purposive sampling
2) Random Sampling
3) Simple sampling
4) Stratified sampling
Purposive sampling:
A purposive sampling is one which the samples in it are selected with
definite purpose in view
Random sampling:
It is one which each member of the population has an equal of being
included in the sample
Simple Sampling
Stratified Sampling
Large Sample :
If the size of the sample 𝑛 ≥ 30, the sample is said to be large sample
Small Sample :
If the size of the sample 𝑛 < 30, the sample is said to be small sample
or exact sample.
Formulae:
𝜇 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛
𝑛
2
2
𝑋𝑖 − 𝜇
𝜎 − 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 =
𝑁
𝑖=1
𝑛
𝑋𝑖
𝑋ത − 𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 = 𝜇𝑋ത =
𝑁
𝑖=1
𝑛
2 2 𝑋𝑖 − 𝑋ത 2
𝑆 − 𝑆𝑎𝑚𝑝𝑙𝑒𝑣𝑎𝑟𝑖𝑒𝑛𝑐𝑒 = 𝜎𝑋ത =
𝑁−1
𝑖=1
𝜎
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 =
𝑛
Therefore,
𝑋 − 8 × 106 8.3 × 106 − 8 × 106
𝑃 𝑋 > 8.3 × 106 =𝑃 4
>
8.5381 × 10 8.5381 × 104
𝑋 − 8 × 106 0.3 × 106
=𝑃 4
>
8.5381 × 10 8.5381 × 104
= 𝑃 𝑍 > 3.51 , 𝑤ℎ𝑒𝑟𝑒 𝑍 𝑖𝑠 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙
≅ 0.00023
Thus, there are only 2.3 chances out of 10,000 that the total yearly
claim will exceed 8.3 million
Example 2:
The ideal size of a first-year class at a particular college is 150
students. The college, knowing from past experience that, on the
average, only 30 percent of those accepted for admission will actually
attend, uses a policy of approving the applications of 450 students.
Compute the probability that more than 150 first-year students attend
this college.
Let X denote the number of students that attend; then assuming that
each accepted applicant will independently attend, it follows that X is
a binomial random variable with parameters n = 450 and p = 0.3.
Since the binomial is a discrete and the normal a continuous
distribution, it is best to compute
σ𝑛 𝑋 −𝑋ത 2
𝑠2 = 𝑖=1 𝑖
𝑛−1
How to compute distribution of sample mean and sample variance
1) Distribution of Sample mean
Since the sum of independent normal random variables is normally
distributed, it follows that 𝑋ത is normal with mean
𝑛
𝐸 𝑋𝑖
ത
𝐸 𝑋 = =𝜇
𝑛
𝑖=1
Estimation
Estimate
Unknown population 𝒑, 𝝀, 𝝁, 𝝈
Parameters
Estimators
Sampling Population
Mean (𝑿
ഥ) Mean (𝛍)
Types of Estimators
Estimate:
An estimate is a statement made to find an unknown population
parameters quantities appearing in distribution such as ‘p’ in
binomial, 𝝁, 𝝈 in normal distribution.
Estimator:
The procedure or rule to determine an unknown population
parameter is called an estimator.
Example : Sample mean 𝑿 ഥ is an estimator of population mean 𝝁
because sample mean is a method of determining population
mean. A parameter can have one or two or many estimators.
The estimation can be done in two ways
(i) Point estimation
(ii) Interval estimation
Statistical estimation:
It is a part of statistical inference where a population parameter is
estimator from the corresponding sample statistics.
Point Estimation:
A point estimation of a parameter is a statistical estimation where
the parameter is estimated by a single numerical value from sample
data.
❖ A point estimate of parameter 𝜽 is a single numerical value
which is computed from a given sample has an approximation of
the unknown exact value of the parameter.
❖ A point estimator is a static for estimating the population
parameter 𝜽 and it is denoted by 𝜽
❖ The value 𝒙 ഥ computed from a sample size ‘n’ is
ഥ of the statistics 𝑿
ഥ
a point estimate of the population parameter 𝛍 i.e., 𝝁 = 𝑿
Properties of estimation
An estimator is not expected to estimate the population parameter
without error. An estimator should be close to the true value for
unknown parameter.
(a) Unbiased estimator
be an estimator of 𝜽, the static 𝜽 is said to be unbiased estimator
𝜽
or its value in an unbiased estimate if and only if the mean
expected value of 𝜽 = 𝜽 . i.e. the mean of probability distribution
is equal to 𝜽.
of 𝜽
Interval Estimate:
Even the most efficient unbiased estimator cannot estimate the
population parameter exactly. It is true, that our ocuurency
increased with large sample.
• If 𝜶 = 𝟎. 𝟎𝟓 we get 95%
• If 𝜶 = 𝟎. 𝟎1 we get 99%
Maximum error of estimate E for large sample
Since the sample mean estimate very rarely is equal to the mean
of population 𝝁 , a point estimate is generally accompined with a
statement of error which gives difference b/w estimate and the
quality to be estimated.
ഥ−𝝁
Therefore, estimator error is 𝒙
ഥ−𝝁
𝒙
For large n, the random variable is normal variate
𝝈/ 𝒏
approximately then
ഥ−𝝁
𝒙
𝑷 −𝒛𝜶/𝟐 < 𝒛 < 𝒛𝜶/𝟐 = 𝟏 − 𝜶 where 𝐳 =
𝝈/ 𝒏
ഥ−𝝁
𝒙
Hence 𝑷 −𝒛𝜶/𝟐 < < 𝒛𝜶/𝟐 = 𝟏 − 𝜶
𝝈/ 𝒏
𝑨𝒄𝒄𝒆𝒑𝒕𝒂𝒃𝒍𝒆 𝒓𝒆𝒈𝒊𝒐𝒏
𝝈 𝒛𝜶/𝟐 𝟐
𝒏=
𝑬
Maximum error of estimate E for small sample
When 𝒏 < 𝟑𝟎, small sample , we use s, the standard deviation of
sample to determine 𝑬. When 𝝈 is known , 𝒕 can be used to
construct confidence interval as 𝝁
Hence by the previous process
ഥ−𝝁
𝒙
𝑷 −𝒕𝜶/𝟐 < 𝒕 < 𝒕𝜶/𝟐 = 𝟏 − 𝜶 where 𝒕 =
𝒔/ 𝒏
ഥ−𝝁
𝒙
Hence 𝑷 −𝒕𝜶/𝟐 < < 𝒕𝜶/𝟐 = 𝟏 − 𝜶
𝒔/ 𝒏
𝑨𝒄𝒄𝒆𝒑𝒕𝒂𝒃𝒍𝒆 𝒓𝒆𝒈𝒊𝒐𝒏
𝑹𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
𝑹𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒓𝒆𝒈𝒊𝒐𝒏
−𝒕𝜶/𝟐 t= 𝟎 𝒕𝜶/𝟐
𝟐
𝒔 𝒕𝜶/𝟐
𝒏=
𝑬
Example 1:
What is the size of the smallest sample required to estimate an
unknown proportion to within a maximum error of 0.06 with at
least 95% confidence.
Solution:
Given 𝑬 = 𝟎. 𝟎𝟔 and 𝒏 =?
We know that 𝟏 − 𝜶 𝟏𝟎𝟎 = 𝟗𝟓
𝟏 − 𝜶 = 𝟎. 𝟗𝟓
𝜶 = 𝟏 − 𝟎. 𝟗𝟓 = 𝟎. 𝟎𝟓
𝒁𝜶/𝟐 =
𝜶/𝟐 = 𝟎. 𝟎𝟐𝟓 𝟎. 𝟒𝟕𝟓
𝒛=𝟎
𝑷 𝒁𝜶/𝟐 = 𝟏. 𝟗𝟔 (from normal distribution table)
𝝈 𝒁𝜶/𝟐 𝟐 𝒁𝜶/𝟐 𝟐
Sample size, 𝒏 = = 𝒑𝒒
𝑬 𝑬
since 𝝈 = 𝒏𝒑𝒒 for max n=1. therefore 𝝈 = 𝒑𝒒
𝒁𝜶/𝟐 𝟐
Hence 𝒏 = 𝒑𝒒 = 𝟐𝟔𝟔. 𝟕𝟕 ≈ 𝟐𝟔𝟕
𝑬
Example 2:
If we assert with 95% that the maximum error is 0.05 and ‘p’
values is 0.2. Find the sample size
Solution:
Given that maximum error 𝑬 = 𝟎. 𝟎𝟓
We know that 𝟏 − 𝜶 𝟏𝟎𝟎 = 𝟗𝟓
𝟏 − 𝜶 = 𝟎. 𝟗𝟓
𝜶 = 𝟏 − 𝟎. 𝟗𝟓 = 𝟎. 𝟎𝟓 𝒁𝜶/𝟐 =
𝜶/𝟐 = 𝟎. 𝟎𝟐𝟓 𝟎. 𝟒𝟕𝟓
𝑷 𝒁𝜶/𝟐 = 𝟏. 𝟗𝟔 (from normal distribution table) 𝒛 = 𝟎
𝝈 𝒁𝜶/𝟐 𝟐 𝒁𝜶/𝟐 𝟐
Sample size, 𝒏 = = 𝒑𝒒
𝑬 𝑬
Given 𝒑 = 𝟎. 𝟐 ⇒ 𝒒 = 𝟎. 𝟖
𝒁𝜶/𝟐 𝟐
Hence 𝒏 = 𝒑𝒒 = 𝟐𝟒𝟓. 𝟖𝟔 ≈ 𝟐46
𝑬
Example 3:
It is desired to estimate the mean no.of hours of continuous use
until a certain computer will first require repairs. If it can be
assumed that 𝝈 value 48 hours. How large a sample be needed so
that one will be able to assert with 90% confidence that the
sample mean is of that atmost by 10 hours.
Solution:
Given 𝝈 = 𝟒𝟖 𝒉𝒐𝒖𝒓𝒔 𝒂𝒏𝒅 𝑬 = 𝟏𝟎
We know that 𝟏 − 𝜶 𝟏𝟎𝟎 = 𝟗0
𝟏 − 𝜶 = 𝟎. 𝟗0 𝒁𝜶/𝟐 =
𝜶 = 𝟏 − 𝟎. 𝟗𝟎 = 𝟎. 𝟎1 𝟎. 𝟒𝟓
𝜶/𝟐 = 𝟎. 𝟎𝟓 𝒛=𝟎
𝑷 𝒁𝜶/𝟐 = 𝟏.65 (from normal distribution table)
𝝈 𝒁𝜶/𝟐 𝟐
Sample size, 𝒏 = = 𝟔𝟐. 𝟕𝟐 ≈ 𝟔𝟑
𝑬
Example 4:
The mean and SD of a population are 11795 and 14054
respectively . What one can assert with 95% confidence about the
maximum error with n=50.
(or)
If the mean and SD of a population are 11795 and 14054. if n value
is 50. Find 95% confidence interval for the mean.
Solution:
Sample mean 𝒙 ഥ = 𝟏𝟏𝟕𝟗𝟓=Population mean 𝝁
SD =14054
We know that 𝟏 − 𝜶 𝟏𝟎𝟎 = 𝟗𝟓
𝟏 − 𝜶 = 𝟎. 𝟗𝟓
𝜶 = 𝟏 − 𝟎. 𝟗𝟓 = 𝟎. 𝟎𝟓 𝒁𝜶/𝟐 =
𝟎. 𝟒𝟕𝟓
𝜶/𝟐 = 𝟎. 𝟎𝟐𝟓
𝑷 𝒁𝜶/𝟐 = 𝟏. 𝟗𝟔 (from normal distribution table) 𝒛 = 𝟎
𝝈 𝝈
ഥ − 𝒁𝜶
Confidence interval= 𝒙 , ഥ + 𝒁𝜶
𝒙
𝟐 𝒏, 𝟐 𝒏,
𝟏𝟒𝟎𝟓𝟒 𝟏𝟒𝟎𝟓𝟒
𝟏𝟏𝟕𝟗𝟓 − 𝟏. 𝟗𝟔 , 𝟏𝟏𝟕𝟗𝟓 + 𝟏. 𝟗𝟔
𝟓𝟎 𝟓𝟎
𝟕𝟖𝟗𝟗. 𝟒, 𝟏𝟓𝟔𝟗𝟎. 𝟓𝟕
Practice problems
Example 5:
The efficiency expert of a computer company tested 40 engineers to
estimate the average time it takes to assemble a certain computer
component, getting a mean of 12.73 minutes and SD of 2.06 minutes
ഥ = 𝟏𝟐. 𝟕𝟑 is used as a point estimate of the actual value time
(a) If 𝒙
required to perform the task, determine the maximum error with
99% confidence.
(b) Construct 98% confidence interval for the true average time it
takes to do the job.
(c) With what confidence can we assert that the sample mean does
not differ from the true ,mean by more than 30 seconds
Example 6:
A sample of 10 cam shafts intended for use in gasoline engines has
an average eccentricity of 1.02 and the standard deviation of 0,044
inch. Assuming the data may be treated random sample from a
normal population, determine a 95% confidence interval for the
actual mean eccentricity of the cam shaft.
Example 7:
Determine a 95% confidence interval for the mean of a normal
distribution with variance 0.25, using a sample of n=100 values
with mean 212.3.
Example 7:
A random sample of 100 teachers in a large metropolitan area
revealed a mean weekly salary of Rs. 487 with SD Rs 48. With what
degree of confidence can we assert than the average weekly salary
of all teachers in the metropolitan area between 472 to 502.
Maximum likely hood estimator
A particular type of estimator, known as the maximum likelihood
estimator, is widely used in statistics.
is defined to be that the value
The maximum likelihood estimate 𝜽
of 𝜽 maximizing 𝒇(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 …… 𝒙𝒏 |𝜽) where 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 ……
𝒙𝒏 are observed values. The function 𝒇(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 …… 𝒙𝒏 |𝜽)
often referred to as the likelihood function of 𝜽.
In determining the maximum value of 𝜽 it is often useful to use the
fact that 𝒇(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 …… 𝒙𝒏 |𝜽) and log ሼ𝒇(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 ……
𝒙𝒏 |𝜽) ሽ have their maximum at the same value of 𝜽.
by maximizing
Hence, we may also obtain 𝜽
log 𝒇(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 …… 𝒙𝒏 |𝜽)
The maximum likelihood estimator of the unknown mean of a
Bernoulli distribution is given by
σ𝒏
𝒊=𝟏 𝑿𝒊 σ𝒏
𝒊=𝟏 𝑿𝒊
𝒅(𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 , 𝒙𝟒 …… 𝒙𝒏 )= ෝ=
also 𝒑
𝒏 𝒏
Since σ𝒏𝒊=𝟏 𝑿𝒊 is the number of successful trials, we see that the
maximum likelihood estimator of p is equal to the proportion of
the observed trials that result in successes.
Solution:
Use these data to estimate the proportion of non-rainy days that
had 2 or fewer accidents that year
𝟏𝟎
𝟏
ഥ=
𝑿 𝑿𝒊 = 𝟐. 𝟕
𝟏𝟎
𝒊=𝟏
it follows that the maximum likelihood estimate of the Poisson
mean is 2.7. Since the long-run proportion of non-rainy days that
have 2 or fewer accidents is equal to 𝑷 𝑿 ≤ 𝟐
where X is the random number of accidents in a day, it follows
that the desired estimate is
𝑷 𝑿≤𝟐 =𝑷 𝑿=𝟎 +𝑷 𝑿=𝟏 +𝑷 𝑿=𝟐
𝟐. 𝟕 𝟐
= 𝒆−𝟐.𝟕 𝟏 + 𝟐. 𝟕 + = 𝟎. 𝟒𝟗𝟑𝟔
𝟐
Therefore, we estimate that a little less than half of the non-rainy
days had 2 or fewer accidents.
Example 2:
Two proofreaders were given the same manuscript to read. If
proofreader 1 found 𝒏𝟏 errors, and proofreader 2 found 𝒏𝟐 errors,
with 𝒏𝟏,𝟐 of these errors being found by both proofreaders,
estimate N, the total number of errors that are in the manuscript.
Before we can estimate N we need to make some assumptions
about the underlying probability model. So let us assume that the
results of the proofreaders are independent, and that each error in
the manuscript is independently found by proofreader 𝒊 with
probability 𝒑𝒊 , 𝒊 = 𝟏, 𝟐. To estimate N, we will start by deriving an
estimator of 𝒑𝟏 . To do so, note that each of the 𝒏𝟐 errors found by
reader 2 will, independently, be found by proofreader 1 with
probability pi. Because proofreader 1 found 𝒏𝟏,𝟐 of those 𝒏𝟐
errors, a reasonable estimate of 𝒑𝟏 is given by
𝒏𝟏,𝟐
ෝ=
𝒑
𝒏𝟐