0% found this document useful (0 votes)
7 views

Problems Set 2 (2021)

Uploaded by

vansavi patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Problems Set 2 (2021)

Uploaded by

vansavi patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Problems

Point and interval estimation of population parameters

1. The weights of packs of coffee are normal distributed with standard deviation 3 grams.
It is mentioned on the label that packs of coffee weigh on average 300 grams. The
average weight of 10 randomly chosen packs of coffee is 299.5 grams.
a. Assume that the message on the label is true. What is the probability of a random
sample of 10 packs of coffee to have an average weight of at most 299.5 grams?
b. Somebody, who is ignorant of statistics, claims that the message on the label is a
lie, because the average weight of all packs of coffee is less than 300 grams, namely
299.5 grams. Explain why your result in a. does not contradict the message on the
label.
c. To substantiate the claim of that person, who is ignorant of statistics, the most
appropriate method is to calculate a (make your choice)
 confidence interval;
 confidence lower bound;
 confidence upper bound.
Use an 80% confidence level in calculating your choice. What is your opinion about
the claim of that person?
The probability that you calculated in a. is called a p-value. More details will be given
when we discuss hypothesis testing.

2. Daily turnovers of a shop are normal distributed with standard deviation € 200.
a. Calculate the probability of the event where the average turnover of 25 randomly
chosen days deviates more than € 40 from the average turnover of all days.
b. What is the smallest sample size for which the probability in a. is reduced to at
most 5%?

3. The carrying-capacities of metal cables of a particular type have a standard deviation


of 0.73 ton (730 kg). The average carrying-capacity of 60 randomly chosen cables of
this type is 11.09 ton.
a. Calculate a 95% confidence interval for the average carrying-capacity of all cables
of this type.
b. Calculate a 99% confidence lower bound for the average carrying-capacity of all
cables of this type.

4. A machine in a factory produces metal rods whose thicknesses are normal distributed
with standard deviation 0.2 mm. Due to a malfunctioning of the machine, the average
thickness of the manufactured metal rods can deviate from the normative average
thickness of 2.0 mm. In a quality control, 5 rods are randomly selected from each batch
of rods produced by the machine. It is decided to repair the machine if the normative
average does not belong to a 95% confidence interval for the average thickness of the
rods in the batch. Suppose that the 5 randomly chosen rods have an average thickness
of 1.9 mm. Is the machine going to be repaired?

Page 1 of 9
Point and interval estimation of a population parameter Problems

5. There are 4 employees in a very small village. The hourly wages of these people are
€ 12, € 15, € 18, and € 21.
a. What is the (population) distribution of the hourly wages in this small population?
b. Calculate the population mean and the population variance , i.e. the average
and the variance of the hourly wages of all employees in this village.
c. What is the value of the population excess kurtosis ?
Suppose that a researcher wants to draw a random sample with replacement of size
out of this population of 4 employees. This researcher observes the wages
,…, of the sampled people, where is the hourly wage of the -th chosen
person in the sample. The sample result is a realisation of ,…, , where , … ,
are independent copies of the mother variable , which measures the hourly wage of
a randomly chosen employee from this village.
d. Is there any restriction on the sample size ?
e. What is the number of possible random samples of size ?
f. Consider the case = 2.
 Write down all possible sample results , .
 Calculate for all possible sample results , the sample mean ̅ and the
sample variance . The pair ̅ , is a realisation of the random vector
, .
 Find the probability distribution of (sometimes called sample distribution)
and compare your result with the population distribution from a.
 Verify the formulas for E , Var ,E , and Var .
 Calculate Corr , , i.e. the correlation between and .
 Are and independent?
g. Optional exercise: Use Excel to repeat your calculations in e for = 3 and = 4.

6. The rubbish collecting department of a city wants to estimate the average weight of
refuse that households produce weekly. The amounts of waste, produced during last
week by 200 randomly chosen households, have average value 12 kg and standard
deviation 5.6 kg.
a. Determine a 90% confidence interval for the average weight of rubbish that
households produced last week.
b. What is the 95% confidence upper bound for the average weight of rubbish that
households produced last week?
c. If the average weight of rubbish that households produce is larger than 13 kg, then
the capacity of the rubbish collecting department is exceeded. Use your answer to
b. to assess whether the capacity was exceeded last week.

Page 2 of 9
Point and interval estimation of a population parameter Problems

7. The manager of a fast-food restaurant wants to estimate the average amount of money
spent by his customers in a period of one month. The monthly expenses of 40
randomly chosen customers are recorded. Assume that the standard deviation of the
monthly expenses of all his customers is known to be 10 euro.
a. What is the width of a 95% confidence interval for the average monthly expenses
of all his customers?
b. Those 40 randomly chosen customers did spend on average 53 euro during the
observed month. Calculate a 95% confidence interval for the average amount of
money that the customers spent in the fast-food restaurant during a period of
month.

8. The standard deviation of the lifetimes of computer monitors is 100 hours. We intend
to draw a random sample of monitors with the objective of determining the average
lifetime of all monitors ( ).
a. How large should the sample be for a 95% confidence interval for to have a
margin of error that is at most 20 hours?
b. How large should the sample be for a 99% confidence interval for to have a
margin of error that is at most 20 hours?

9. A well-known (traditional) manufacturing process produced millions of light bulbs


whose lifetimes have average 1200 hours and standard deviation 300 hours. The
engineering department of the company has designed a new manufacturing process.
The engineers claim that the new manufacturing process is better, because it produces
light bulbs with a longer average lifetime. They draw a random sample of 100 light
bulbs, produced by the new manufacturing process, and these light bulbs have
lifetimes with an average of 1265 hours and a standard deviation of 306 hours.
a. Do they have to calculate a c.i., a c.l.b., or a c.u.b. for … to substantiate their claim?
b. Can the engineers be 95% confident that the new manufacturing process is better?

10. The carrying-capacities of cables produced by some company are normal distributed.
The manager of the company claims that the average carrying-capacity of these cables
is at least 8000 kg. In a random sample of 6 of these cables, the carrying-capacities
have an average of 7750 kg and a standard deviation of 135 kg. Calculate, for the
average carrying-capacity of all cables that were, are, or will be produced by this
company,
 a 95 % confidence interval;
 a 95% confidence lower bound;
 a 95% confidence upper bound.
Which is of these three results is appropriate for the (possible) refutation of the claim
of the manager?

Page 3 of 9
Point and interval estimation of a population parameter Problems

11. We want to estimate the average area of apartments that are for sale in Knokke (a
town on the Belgian coast). The following dataset contains the areas (in square
metres) of 45 randomly chosen apartments that are for sale in Knokke:

85 115 110 200 131


80 120 116 70 135
95 160 80 100 155
78 90 135 80 150
90 45 160 125 169
85 78 75 96 135
115 50 85 153 177
70 85 80 117 135
78 87 130 106 191

a. Try to assess visually, by plotting some graphs, whether it is plausible that the
mother variable is normal distributed. What is your conclusion?
b. Calculate a 95% confidence interval for the average area of all apartments that are
for sale in Knokke.
c. The mayor of Knokke concludes that the area of a randomly chosen apartment that
is for sale in his town belongs with 95% certainty to the 95% confidence interval
obtained in b. Give your comments on this conclusion.
d. Find a 99% confidence upper bound for the proportion of apartments with an area
that is at least 100 m2 in the population of all apartments that are for sale in
Knokke.

12. Functioning times of rechargeable batteries of mobile phones are normal distributed.
A student calculated a 95% confidence interval for the average functioning time of
these batteries and found 430, 470 minutes. This calculation was based on a sample
of 16 randomly chosen batteries.
a. What was the value of the sample mean?
b. What was the value of the sample standard deviation?
c. Calculate a 99% confidence interval for the average functioning time of these
batteries.

13. A company processes oranges into juice. A random sample of size 20 is drawn from
every cargo of oranges that arrives at the factory. With each sample result, a 95%
confidence interval is calculated for the average amount of sugar in the oranges of the
cargo. Next week, there will arrive twelve cargos of oranges.
a. What is the probability that each of the twelve 95% confidence intervals will
contain the average amount of sugar in the oranges of the investigated cargo?
b. What is the probability that at least two of the twelve 95% confidence intervals
will not contain the average amount of sugar in the oranges of the investigated
cargo?

Page 4 of 9
Point and interval estimation of a population parameter Problems

14. The carrying-capacities of cotton wires are normal distributed. The carrying-
capacities of 12 randomly chosen cotton wires have an average of 7.38 kg and a
standard deviation of 1.24 kg. Calculate a 95% confidence interval for the average
carrying-capacity of all cotton wires.

15. In a research about the sales of bread in some region, a random sample of 60 bakeries
was drawn. The following table contains the numbers of large white loaves that were
sold last week in these 60 bakeries.

278 281 322 303 302 391


252 266 263 242 331 243
313 294 216 287 352 260
347 316 300 317 314 326
304 331 309 304 286 276
208 287 329 265 343 327
279 252 286 255 282 351
295 346 288 326 389 275
285 305 265 277 290 330
241 303 269 303 323 290

a. Try to assess visually, by plotting some graphs, whether it is plausible that the
mother variable is approximately normal distributed. What is your conclusion?
b. Calculate a 95% confidence interval for the average number of large white loaves
that are sold last week by all bakeries in this region.
c. The price of a large white loaf is 1.8 euro in all the bakeries in this region.
Determine a 95% confidence interval for the expected revenue during last week
from the sales of white large loaves in a randomly chosen bakery.
d. Find a 99% confidence interval for the proportion of bakeries in this region that
sold more than 320 large white loaves last week.

16. 500 out of 2500 randomly sampled households have the intension to buy a new car in
the near future. Determine a 99% confidence interval for the proportion of all
households that plan to buy a new car in the near future.

17. Simonne and Valérie are the only candidates in the presidential election of France.
They fight a neck-and-neck race in convincing the voters. An opinion poll aims at
99.73% confidently estimating the proportion of supporters of Simonne with an
accuracy of at least 0.03 (i.e. with an “error” that is at most 0.03). What is the minimal
value of the sample size to attain this goal?

18. We suspect that at most 20% of the households in Brussels live in poverty. We want
to find a 95% confident estimate for this proportion, with an “error” that is at most
1%. What is the minimal value of the sample size to reach this objective?

Page 5 of 9
Point and interval estimation of a population parameter Problems

19. The soda manufacturer Loca Lola wants to know which proportion of the population
has noticed its newest advertising campaign. To solve this research problem, 450
randomly chosen people will be asked whether they noticed this newest advertising
campaign. Based on this sample, a 90% confidence interval will be determined for the
proportion of all people that has noticed the newest advertising campaign. Suppose
that we already know that this proportion lies between 40% and 70%.
a. Given this range, what is the largest possible value of the half width of this 90%
confidence interval?
b. Given this range, what is the smallest possible value of the half width of this 90%
confidence interval?

20. Some quantity is normal distributed in the population with unknown mean and
known standard deviation . A random sample yields 12.2, 14.2 as a 75% confidence
interval for . Based on the same sample, 11.2, 15.2 is found as another confidence
interval for is. What is the confidence level of the latter interval?

21. Solve the previous problem again, but assume that is unknown, when the sample
size is
a. = 5;
b. = 50.

22. An unusual (not only because it is so small) sample , has the properties =
E = 2, =E = −1, = Var =3 , = Var = 4 , and & =
Corr , = −0.9. Let

( = + and ( = .
3
a. Show that both ( and ( are unbiased estimators of = 1.
b. Which of these two estimators is the most efficient estimator of = 1?

The following exercises are less standard, in the sense that they don’t use formulae that you
already know. If you are successful in solving them, then you demonstrate that you
understand the reasoning that was used in class. However, you won’t be questioned about
such problems on the exam.

Page 6 of 9
Point and interval estimation of a population parameter Problems

23. The sample correlation - is defined by

cov ,0 ,…, ,0 ∑3 − 0 −0
- = = .
1var ,…, var 0 , … , 0 4 ∑3 ∑3 0 −0

If , 0 ∼ 6 7 , 8 , 7 , 8 , & , then √ - − & ≈ 6 0, 1 − & for large . Since -


is a consistent estimator of the population correlation &, we can conclude that
- −&
√ ≈ 6 0, 1 if is sufficiently large.
1−-

a. Use this property to derive a formula for a 100 1 − D % c.i. for &.
b. Suppose that = 100 and F = 0.5 (the observed value of - ). Use your result of a.
to calculate a 95% c.i. for &.
If , 0 ∼ 6 7 , 8 , 7 , 8 , & , then the delta-method (from mathematical statistics)
shows that √ GH - − H & I is asymptotically 6 0, 1 distributed. Here, H is the
function that maps ∈ −1, 1 on H = ln 1 + ⁄ 1 − ⁄2, is called the Fisher
transformation (which is usually denoted with L), and is an example of a variance
stabilising transformation (which is a favourable property).
c. Use the latter property to derive a formula for a 100 1 − D % c.i. for H & .
d. Suppose that = 100 and F = 0.5 (the observed value of - ). Use your result of c.
to calculate a 95% c.i. for H & .
e. Apply the inverse transformation H M = N O − 1 ⁄ N O + 1 , where −∞ <
< ∞, and your result of d. to obtain a 95% c.i. for &.
f. Compare your results in b. and e.

24. If ∼ 6 , , then −1 ⁄ ∼ R M , the chi-square distribution with − 1


degrees of freedom.
a. Use this property (and define S M ,T analogous to U M ,T ) to derive a 100 1 − D %
c.l.b. for .
b. Use this property to derive a 100 1 − D % c.u.b. for .
Define a 100 1 − D % c.i. for by

100 1 − D % c.i. for


= 100 1 − D ⁄2 % c.l.b. for , 100 1 − D ⁄2 % c.u.b. for

c. Suppose that = 10 and = 5.6 (the observed value of ). Use the formula above
to calculate a 95% c.i. for .
d. Use your result in c. to calculate a 95% c.i. for .

25. The chi-square distribution with X degrees of freedom is the probability distribution
of a sum of squared independent standard normal distributed variables Y , … , YZ , i.e.
Y + ⋯ + YZ ∼ RZ . Show that the expectation and variance of a RZ -distribution are X
and 2X. Hint: if Y ∼ 6 0, 1 , then E Y \ = 3 (which is the kurtosis of Y).

Page 7 of 9
Point and interval estimation of a population parameter Problems

26. In PowerPoint 4, the expectation and variance of are given:


2
E = and Var = \
] + ^,
−1
where = E − ⁄ \ − 3 is the population excess kurtosis (Greek letter kappa).
The word “excess” refers to a value that is measured relative to the kurtosis of the
standard normal distribution (if Y ∼ 6 0, 1 , then E Y \ = 3). From the expression
for Var it seen that ≥ −2 (it is easily verified that the smallest possible value −2
is reached by the Bernoulli distribution with parameter ` = 0.5, i.e. ℬ 1, 0.5 ). A
central limit theorem also exists for , namely
−E
≈ 6 0, 1 if is sufficiently large.
1Var
The sample excess kurtosis
∑3 − \
b = −3
∑3 −
is a consistent (but biased) estimator of . Therefore, and since we can replace −1
by in an asymptotic result, we obtain the property

≈ 6 0, 1 if is sufficiently large.
1 2+b ⁄
a. Use this property to derive a formula for a 100 1 − D % c.i. for .
b. Suppose that = 100, = 5.6, and X = 1.2 (the observed value of b ). Calculate a
95% c.i. for .
c. Use your result of b. to find a 95% c.i. for .

27. This exercise contains basic properties of estimators.


a. The efficiency of an estimator ( for c is measured by the expected squared
deviation of this estimator with respect to the estimated population parameter c,
i.e. E ( − c . Show that this measure can be decomposed into a variance term
and a bias term, i.e. E ( − c = Var ( + E ( − c .
b. Prove Chebyshev’s inequality: P |( − c| > g ≤ E ( − c ⁄g for any g > 0.
 By assuming that ( is a continuous random variable;
 In the general case.
c. Let and be the mean and the variance of the measurements ,…, in a
random sample with replacement. Show that and are uncorrelated if and
only if (⇔) the mother variable has a symmetric distribution.1 Hint: the
symmetry is measured with the skewness parameter j = E Y k , with Y =
− ⁄ , and
1 k
Cov , = l Cov m , G n − I o= l E mY GYn − Y̅ I o.
−1 −1
,n3 ,n3

1 Remember that for the normal distribution a stronger property holds:


and are independent ⇔ ∼6 ,

Page 8 of 9
Point and interval estimation of a population parameter Problems

28. In the previous exercises, it was always assumed that the random sampling took place
with replacement (or that the sample was negligibly small when compared with the
population). In that case, we know that Var = ⁄ and E = . Hence, is
an unbiased estimator of and ⁄ is an unbiased estimator of Var . These
properties do not hold for random sampling without replacement (from a finite
population), which is the case that we study in this exercise. In problems set 1, you
were asked to prove

q r−
Eq = q and Varq = Eq − q = ,
r−1
with r the population size and r − ⁄ r − 1 the finite-population correction factor.
a. Determine Eq .
b. Use the result of a. to construct an unbiased estimator for q .
c. Use the result of a. to construct an unbiased estimator for Varq .

Page 9 of 9

You might also like