Set - 2 - 2023 - Review - Outline Solutions
Set - 2 - 2023 - Review - Outline Solutions
TOPIC 1
T1_Q1: Choose the one alternative that best completes the statement or answers the question.
1) The methods involving the collection, presentation of data in tables, figures and graphs, and characterization of a set
of data in order to properly describe the various features of that set of data are called
A) the scientific method. B) statistical inference. C) sampling. D) descriptive statistics.
2) The estimation of the population average family expenditure on food based on the sample average expenditure of
1,000 families is an example of
A) a statistic. B) a parameter. C) inferential statistics. D) descriptive statistics.
3) The study of the collection, analysis, summarization, organization and interpretation of data is known as
A) Economics B) Mathematics C) Statistics. D) None of the above
4) A summary measure that is computed to describe a characteristic from only a sample of the population is called
A) a statistic. B) the scientific method. C) a census. D) a parameter.
10) According to the empirical rule, if the data form a ʺbell-shapedʺ normal distribution, _____ percent of the
observations will be contained within 2 standard deviations around the arithmetic mean.
A) 68 B) 95 C) 93.75 D) 9.99
∑ 𝑋 73.7
𝑋̅ = = = 14.74 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 $
𝑛 5
1 (∑ 𝑋)2 1 (73.7)2
𝑠=√ [∑ 𝑋 2 − ] = √ [1132.83 − ] = √11.623 = 3.41 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 $
𝑛−1 𝑛 4 5
T1_Q3: A university class contained 60 female students. They were asked to report their heights (in cms), with the following
results. n=60 mean = 163.39 standard deviation = 8.68
Assume that heights follow a bell or mound-shaped distribution.
The main purpose of this
(1) Within what range would you expect 95% of the heights to lie? question is to apply the
empirical rule. However, it
By empirical rule, we would expect 95% of the heights to lie within 2 standard deviation could
frombetheargued
mean,that ithence:
would
be more accurate to use a
𝑋̅ ± 2(𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛) 95% confidence interval in
part 1, and obtain exact
probability in parts 2 and 3.
163.39 ± 2(8.68)=163.39 ± 17.36
Both are acceptable.
[146.03cm, 180.75 cm]
(2) Find the z-score of Jenny, a female student whose height is 155.22 cm. Approximately how many students are shorter
than Jenny?
𝑋 − 𝑋̅ 155.22 − 163.39
𝑍= = = −0.94
𝑆 8.68
(3) Tina has a height of 195 cm. What is her z score? Approximately how many students in the class do you think would
be taller than Tina?
𝑋 − 𝑋̅ 195 − 163.39
𝑍= = = 3.64
𝑆 8.68
None. Tina is the tallest.
T2_Q1: Choose the one alternative that best completes the statement or answers the question.
1) The manager of the customer service division of a major consumer electronics company is interested in determining
whether the customers who have purchased a DVD player made by the company over the past 12 months are
satisfied with their products. The possible responses to the question ʺHow many people are there in your
household?ʺ The data you collected from this question are example of a
A) categorical random variable. B) discrete numerical random variable.
C) parameter. D) continuous numerical random variable.
2) Which of the following is a discrete numerical variable?
A) The Dow Jones Industrial average B) The distance you drove yesterday
C) The volume of water released from a dam D) The number of employees of an insurance company
3) Which of the following is a continuous numerical variable?
A) The number of gallons of milk sold at the local grocery store yesterday
Do not know
B) The color of a studentʹs eyes
if mutually
C) The amount of milk produced by a cow in one 24-hour period exclusive
D) The number of employees of an insurance company exclsive
4) If two events are collectively exhaustive, what is the probability that both occur at the same time?
A) 0.50 B) 1.00 C) 0 D) Cannot be determined from the information given.
5) If two events are mutually exclusive and collectively exhaustive, what is the probability that one or the other
occurs?
A) 1.00 B) 0.50 C) 0 D) Cannot be determined from the information given.
T2_Q2: A survey conducted by the Segal Company of New York found that in a sample of 170 large companies, 40 offered
stock options to their board members as part of their non -cash compensation packages. For small- to mid-sized companies,
43 of the 180 surveyed indicated that they offer stock options as part of their noncash compensation packages to their board
members.
Company Size Total
Large (L) Small to midsized
Stock Options (SM)
Yes (Y) 40 43 83
No (N) 130 137 267
Total 170 180 350
If a company is selected at random,
1) What is the probability that the company offered stock options to their board members?
P(Y) = 83/350 = 0.2371
2) What is the probability that the company is small to mid-sized and did not offer stock options to their board
members?
P(SM and N) = 137/350 = 0.3944
3) What is the probability that the company is small to mid-sized or offered stock options to their board members?
P(SM or Y) = (180+83-43)/350 = 0.6286
4) If a randomly selected company offered stock options to their board member, what is the probability that it is a large
company?
P(L | Y ) = 40/83 = 0.4819
5) Let A = the offered stock options to their board member and B = large company. Are the events A and B
independent. Show your working.
A and B are independent if P(A and B) = P(A) P(B)
We have: P(A) = P(Y) = 83/350 = 0.2371; P(B) = P(L) = 170/350 = 0.4857; P(A and B) = P(Y and L) = 40/350 = 0.1143
And: P(A) P(B) = 0.2371*0.4857 = 0.1152
Thus: 0.1152 ≠ 0.1143, therefore events A and B are not independent.
T3_Q1: A lab orders 100 rats a week for each of the 52 weeks in the year for experiments that the lab conducts. Prices for 100
rats follow the following distribution:
Work out the
Price: $10.00 $12.50 $15.00 expected price first
Probability: 0.35 0.40 0.25 then multiply by 52
1) How much should the lab budget for next yearʹs rat orders be, assuming this distribution does not change?
A) $520 B) $780 C) $650 D) $637
T3_Q2: Suppose that past history shows that 65% of college students prefer Brand C cola. A
sample of 5 students is to be selected and the following PHStat output is generated.
X P(X) P(<=X) P(<X) P(>X) P(>=X)
0 0.005252 0.005252 0 0.994748 1
1 0.04877 0.054023 0.005252 0.945978 0.994748
2 0.181147 0.235169 0.054022 0.764831 0.945978
3 0.336416 0.571585 0.235169 0.428415 0.764831
4 0.312386 0.883971 0.571585 0.116029 0.428415 Can have answers
5 0.116029 1 0.883971 0 0.116029 to 4 decimal
places.
5) Calculate the mean and standard deviation of students who would prefer brand C cola.
µ = nπ = 5*0.65 = 3.25
= √𝒏𝝅(𝟏 − 𝝅) = √𝟓(𝟎. 𝟔𝟓)(𝟎. 𝟑𝟓) = 𝟏. 𝟎𝟕 (𝟐𝒅𝒄𝒑)
T3_Q3:
1) Given the standard normal distribution (with mean 0 and standard deviation of 1), what is the probability that:
B) X < 70?
70 − 100
𝑃(𝑋 < 70) = 𝑃 (𝑍 < ) = 𝑃(𝑍 < −3.00) = 0.00135
10
𝑋0 −100
P(X > X0 ) = 0.05 or P(X < X0) = 0.95 or P(Z < 10
) = 0.95. But we know, P(Z < 1.645) = 0.95, therefore:
𝑋0 −100 𝑋0 −100
Z= 10
and 1.645= 10
. Solving for X0 = (1.645*10)+100 = 116.45
T3_Q4:
A company that sells annuities must base the annual payout on the probability distribution on the length of life of the
participants in the plan. Suppose the probability distribution of the lifetimes of the participants in the plan is approximately
a normal distribution with mean, 68 years and standard deviation, 3.5 years:
70 − 68
𝑃(𝑋 > 70) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 0.57) = 1 − 𝑃(𝑍 < 0.57) = 1 − 0.7157 = 0.2843
3.5
28.43% of participants
c) beyond age 75?
75 − 68
𝑃(𝑋 > 75) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 2.00) = 1 − 𝑃(𝑍 < 2.00) = 1 − 0.9772 = 0.0228
3.5
2.28% of participants
d) between 65 and 75 years old.
𝑃(65 < 𝑋 < 75) = 𝑃(−0.86 < 𝑍 < 2.00) = 𝑃(𝑍 < 2.00) − 𝑃(𝑍 < −0.86) = 0.9772 − 0.1949 = 0.7823
78.23% of participants
2) Complete the following statement: Only 10% of plan participants will receive payment beyond age _____?
𝑋 −100
P(X > X0 ) = 0.10 or P(X < X0) = 0.90 or P(Z < 0 10 ) = 0.90. But we know, P(Z < 1.28) = 0.90, therefore:
𝑋0 −68 𝑋0 −68
Z= 3.5
and 1.28= 3.5
. Solving for X0 = (1.28*3.5)+68 = 72.48
T4_Q1: A random sample of 16 is drawn from a normal population with mean equal to 15 and standard deviation 2.
1) What is the mean and the standard deviation of the sampling distribution of X?
Please note of the
µ = 15
2 formula of Z
= = 0.50 (denominator)
√16
2) Find the value of:
a) P( X > 15.5)
15.5 − 15
𝑃(𝑋̅ > 15.5) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 1.00) = 1 − 𝑃(𝑍 < 1.00) = 1 − 0.8413 = 0.1587
2⁄
√16
b) P( X < 14)
14 − 15
𝑃(𝑋̅ < 14) = 𝑃 (𝑍 > ) = 𝑃(𝑍 < −2.00) = 0.0228
2⁄
√16
c) P( X > 18)
18 − 15
𝑃(𝑋̅ > 18) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 6.00) = 1 − 𝑃(𝑍 < 6.00) = 1 − 0.99999999 0
2⁄
√16
3) Find P( X > X 0) = 0.60
𝑋̅0 −15
P(𝑋̅ > 𝑋̅0 ) = 0.60 or P(𝑋̅ < 𝑋̅0 ) = 0.40 or P(Z < 0.50
) = 0.40. But we know, P(Z < -0.25) = 0.40, therefore:
𝑋̅0 −15
-0.25= . Solving for 𝑋̅0 = (-0.25*0.50)+15 = 14.875
0.50
T4_Q2: The time spent using email per session is normally distributed with mean 8 and standard deviation of 2 minutes. If
random samples of 25 sessions were selected:
(a) What is the standard error of the sample mean for 25 sessions?
2
= = 0.40
√25
(b) What proportion of the sample means more than 9 minutes?
9−8
𝑃(𝑋̅ > 9) = 𝑃 (𝑍 >
) = 𝑃(𝑍 > 2.50) = 1 − 𝑃(𝑍 < 2.50) = 1 − 0.9938 = 0.0062
2⁄
√25
(c) What proportion of the sample means less than 7.5 minutes?
7.5 − 8
𝑃(𝑋̅ < 7.5) = 𝑃 (𝑍 > ) = 𝑃(𝑍 < −1.25) = 0.1056
2⁄
√25
(d) What proportion of the sample means would fall between 7.5 and 9 minutes?
𝑃(7.5 < 𝑋̅ < 9) = 𝑃(−1.25 < 𝑍 < 2.50) = 𝑃(𝑍 < 2.50) − 𝑃(𝑍 < −1.25) = 0.9938 − 0.1056 = 0.8882
p = 35 /50 = 0.70
(c) What is the probability that more than 65 % of households will have smart phones?
0.65 − 0.60
𝑃(𝑝 > 0.65) = 𝑃 (𝑍 > ) = 𝑃(𝑍 > 0.72) = 1 − 𝑃(𝑍 < 0.72) = 1 − 0.7642 = 0.2358
0.0693
T5_Q1: The marketing manager of a local department store wants to know the mean amount spent per customer
during the Thursday (6-8pm) late night shopping. A sample of 64 customers is taken. The sample mean is $140.50
and the standard deviation is $17.25.
1) Construct a 95% confidence interval for the mean amount spent per customer. Interpret the results.
X
is unknown, hence ~ t n 1 , therefore the confidence limits for µ is:
S
n
𝑆
𝑋̅ ± 𝑡𝛼,𝑛−1
2 √𝑛
17.25
140.50 ± 1.9983 ×
√64
140.50 ± 4.31
($136.19, $144.81)
𝑆
Alternatively, you could also use: 𝑋̅ ± 𝑍𝛼 by invoking the CLT. The final answer is:
2 √𝑛
17.25
140.50 ± 1.96 ×
√64
($136.27, $144.73)
We are confident that the unknown mean amount spent per customer during Thursday night shopping is
between $136.19 and $144.81.
T5_Q2: If the manager of a paint supply store wants to estimate the mean amount of pain in a 4-litre can to within 0.015
litres with 95% confidence and also assumes that the standard deviation is 0.075 litres, what sample size is needed?
𝑍𝛼/2 × 𝜎 2
𝑛≥[ ]
𝑒
1.96 × 0.075 2
𝑛≥[ ] = 97
0.015
3) If an economist wishes to determine whether there is evidence that mean family income in a community exceeds
$50,000
A) a one-tail test should be utilized.
B) either a one-tail or two-tail test could be used with equivalent results.
C) a two-tail test should be utilized.
D) None of the above.
5) A ______________ is a numerical quantity computed from the data of a sample and is used in reaching a decision on
whether or not to reject the null hypothesis.
A) significance level
B) critical value
C) parameter
D) test statistic
7) The value (s) obtained from either Table E2 or Table E3 are called the:
A) significance level
B) critical value
C) parameter
D) test statistic
H0: µ ≤ 30
H1: µ > 30
Justify your choice of test statistic and state any required assumptions and Calculation
𝑋̅ −𝜇
Sample size is large (n >30), we can invoke the CLT, hence the test statistic to use is: 𝑍 = 𝑆 .
⁄
√𝑛
30.45−30 0.45
𝑍= 5⁄ = 0.316 = 1.42 (2 𝑑𝑐𝑝)
√250
Conclusion:
At 5% level of significance, we do not have sufficient evidence to support that the mean age is over 30 years old.
T7_Q1: A computer software developer would like to use the number of downloads (in thousands) for the trial version of
his new shareware to predict the amount of revenue (in thousands of dollars) he can make on the full version of the
new shareware. Following is the output from a simple linear regression obtained from a data set of 30 different
sharewares that he has developed:
ANOVA
7) What is the predicted revenue (in thousand dollars) when the number of downloads is 30 thousands?
a) 16.8296
b) 111.891
c) -95.0614
d) None of the above
8) Which of the following is the correct interpretation for the coefficient of determination?
a) 74.67% of the variation in the number of downloads can be explained by the variation in revenue.
b) 75.54% of the variation in the number of downloads can be explained by the variation in revenue.
c) 74.67% of the variation in revenue can be explained by the variation in the number of downloads.
d) 75.54% of the variation in revenue can be explained by the variation in the number of downloads.
9) Which of the following is the correct alternative hypothesis for testing whether there is a linear relationship between
revenue and the number of downloads?
a) H1 : β1 ≠ 0
b) H1 : b1 ≠ 0
c) H1 : β1 = 0
d) H1 : b1 = 0
10) There is sufficient evidence that revenue and the number of downloads are linearly related at a 5% level of
significance.
a) True
b) False