0% found this document useful (0 votes)
28 views

ProblemSetQTII_2024

Uploaded by

bl24012
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

ProblemSetQTII_2024

Uploaded by

bl24012
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

BM 24-26 (Term II)

Quantitative Technique II

PROBLEM SET
DR. PRITHA GUHA

XLRI | Jamshedpur
1. Suppose following are the values in some population:
5, 27, 4, 17, 4.5, 19, 2, 11, 3, 6, 13, 18
A sample of size 4 is taken, and is observed to be 3, 4, 4.5, 2.
Is it most likely to be (a) a simple random sample, (b) a stratified sample or (c) a
clustered sample? Give reason for your answer.

2. A sample of size 15 is to be chosen from a population of size 100, divided into two
strata of sizes 60 and 40, respectively. If proportional allocation is used, and samples
are selected without replacement from the stratum, how many different samples can
be selected in total?

3. Suppose 10,000 payment vouchers are generated in 2017 in XLRI. An auditor checks
the vouchers by drawing a probability sample (known as audit sample).
a) Why simple random sampling may not be appropriate?
b) Which sampling design would you prefer?

4. A statistics student who is curious about the relationship between the amount of time
students spend on social networking sites and their performance at school decides to
conduct a survey. Various research strategies for collecting data are described below.
In each, name the sampling method proposed and any bias you might expect.
(a) He randomly samples 40 students from the study's population, gives them the
survey, asks them to fill it out and bring it back the next day.
(b) He gives out the survey only to his friends, making sure each one of them fills out
the survey.
(c) He posts a link to an online survey on Facebook and asks his friends to fill out the
survey.
(d) He randomly samples 5 classes and asks a random sample of students from those
classes to fill out the survey.

5. A university wants to determine what fraction of its undergraduate student body


support a new Rs. 2500 annual fee to improve the student union. For each proposed
method below, indicate whether the method is reasonable or not.
(a) Survey a simple random sample of 500 students.

Page 1 of 28
Quantitative Technique II
(b) Stratify students by their field of study, then sample 10% of students from each
stratum.
(c) Cluster students by their ages (e.g. 18 years old in one cluster, 19 years old in one
cluster, etc.), then randomly sample three clusters and survey all students in those
clusters.

6. A school in Jamshedpur is planning conduct a sample survey for all the teachers about
how many hours the teachers require to create class notes for the students using a
mobile phone. There are 20 pre-primary teachers, 25 primary teachers and 30
secondary teachers. The school decides to choose a total of 15 teachers using a
stratified sampling model by considering the three types of teachers as three different
strata. The mean and the standard deviation of the time required to create class notes
for each stratum were as follows:
Mean (in hours) Standard deviation/ SD (in hours)
Pre-primary 43.48 3.11
Primary 45.40 2.83
Secondary 53.05 6.80

What is the number of samples to be chosen from the strata of secondary teachers if
Neyman allocation is used (mark the closest one)?

7. TV watching time (TVWatching.csv): An advertising firm, interested in determining


how much to emphasize television advertising in a certain state decides to conduct a
sample survey to estimate the average number of hours each week that households
within that state watch television. The state has two towns, A and B, and a rural area
C. Town A is built around a factory and most households contain factory workers with
school-aged children. Town B contains mainly retirees, and the rural area C are mainly
farmers. There are 155 households in town A, 62 in town B and 93 in the rural area, C.
The firm decides to choose a total of 40 households from the state from Town A, Town
B and rural area C.
a) Using R, select a simple random sample with replacement (SRSWR) and without
replacement (SRSWOR) from the population.
b) Now the firm decides to use a stratified sampling model by considering Town A,
Town B and rural area C as three different strata. Find the number of samples to be

Page 2 of 28
Quantitative Technique II
chosen if proportional allocation and Neyman allocation is used. Choose the
samples for both allocations.

8. It is known that 80% of all Brand A MP3 players work in a satisfactory manner
throughout the warranty period (are “successes”). Suppose that n = 10 players are
randomly selected. Let X = the number of successes in the sample. The statistic X/n is
the sample proportion (fraction) of successes. Obtain the sampling distribution of this
statistic. (Can you simulate this problem in R?)

9. As part of a quality control process for computer chips, an engineer at a factory


randomly samples 212 chips during a week of production to test the current rate of
chips with severe defects. She finds that 27 of the chips are defective.
a) What population is under consideration in the data set?
b) What parameter is being estimated?
c) What is the point estimate for the parameter?
d) What is the name of the statistic can we use to measure the uncertainty of the
point estimate?
e) Compute the value from part (d) for this context.
f) The historical rate of defects is 10%. Should the engineer be surprised by the
observed rate of defects during the current week?
g) Suppose the true population value was found to be 10%. If we use this proportion
to recompute the value in part (e) using p = 0.1 instead of 𝑝̂ , does the resulting
value change much?

10. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following is not an
unbiased estimator of μ?
1
A) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
B) (X1 + X2 + X3 + X4 +X5 )
5
1
C) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15
D) 2X1 + 2X2 − X3 − X4 − X5

11. Sample variance is unbiased for population variance in SRSWR.

Page 3 of 28
Quantitative Technique II
12. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following has the lowest
variance?
1
A) (X1 + X2 + X3 + X4 +X5 )
5
B) 2X1 + 2X2 − X3 − X4 − X5
1
C) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
D) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15

13. For a SRS drawn WR from Poisson population, show that both sample mean 𝑋̅ =
1 𝑛 1
∑𝑖=1 𝑋𝑖 and ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ )2 are unbiased estimates for the population mean λ.
𝑛 𝑛−1

14. Suppose that 𝑋1 , 𝑋2 , … , 𝑋𝑛 are IID random sample from a Bernoulli distribution
with probability of success being p.
a) Show that 𝑋 = 𝑝̂ will be an unbiased estimator of p.
b) Also show that 𝑝̂ (sample proportion) is a consistent estimator of p (population
proportion).

15. X is a discrete random variable with the following probability mass function:
where 0 ≤ θ≤1. The following 10 samples were taken: 3, 0, 2, 1, 3, 2, 1, 0, 2, 1.
X 0 1 2 3
P(X) 2θ/3 θ/3 2(1-θ)/3 (1-θ)/3
a) Find MME of θ.
b) Find MLE of θ.

16. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a discrete distribution with


probability mass function
𝑝, 𝑥 = 0
𝑝(𝑥) = { 2𝑝, 𝑥 = 1
1 − 3𝑝, 𝑥 = 2
Find MME of p.

17. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a Uniform [-b,b]. Find MME of b.
18. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from an Exponential(λ). Find MLE of λ.
Page 4 of 28
Quantitative Technique II
19. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a 𝑁(𝜇, 𝜎 2 ). Find MLE of 𝜇 and 𝜎 2 .

20. Suppose X1, X2, …, Xn is an independently and identically distributed (IID) sample
from a normal distribution with mean 0 and variance σ2.
a) What is the likelihood function for σ2, using the sample?
b) Showing all relevant steps, obtain a maximum likelihood estimator for the
parameter σ2.

21. Obtain a maximum likelihood estimator (MLE) and a method of moment


estimator (MME) for the parameter a of a uniform distribution on the interval [a, 2a],
where a > 0.

22. Suppose a simple random sample from an uniform distribution on the interval
(0,b) is obtained as follows: 42, 46, 44, 47, 47, 43, 62, 64. Determine a method of
moment estimate of b.

23. Suppose the number of customers X that enter a store between the hours 9AM
and 10AM follows a Poisson distribution with parameter θ. Suppose a random sample
of the number of customers that enter the store between 9AM and 10AM for 10 days
results in the values 9, 7, 9, 15, 10, 13, 11, 7, 2, 12. Determine the MLE of θ.

24. A marketing analyst wishes to obtain a sample of size 100 with replacement from
a population to estimate the population mean µ (unknown). However, due to a coding
error, her algorithm starts recording every value twice. After recording 50 values in
that manner, the coding error is discovered, and corrected, although the recorded
values are left unchanged. Then 50 more observations are recorded to make it to a
100. The final sample therefore is as follows: X1, X1, X2, X2, …, X25, X25, X26, X27, …, X75.
Let the corresponding sample mean be X*. On the other hand, let the sample mean of
̅.
X1, X2, …, X25, X26, X27, …, X75, each used once (i.e., after removing the duplicates), be X
For the following, wherever necessary, assume that X1, X2, …, X75 are independent and
identically distributed with mean µ and variance σ2.
a) Among X* and X ̅, which is/are unbiased for µ?
b) Which estimator, among X* and ̅ X, is more efficient, and why?

Page 5 of 28
Quantitative Technique II
25. The capture/recapture method is sometimes used to estimate the size of a
wildlife population. Suppose that 10 animals are captured, tagged, and released. On a
later occasion, 20 animals are captured, and it is found that 4 of them are tagged. How
large is the population?

26. The Pareto distribution is used in economics to model values exceeding a


threshold. For a fixed known threshold value of 𝑥0 > 0, the density function is
𝑓(𝑥|𝑥0 , 𝜃) = 𝜃𝑥0𝜃 𝑥 −𝜃−1 , 𝑥 ≥ 𝑥0 𝑎𝑛𝑑 𝜃 > 1. (Note that the cumulative distribution
𝑥 −𝜃
function if X is 𝑃(𝑋 ≤ 𝑥) = 𝐹𝑋 (𝑥) = 1 − ( ) .)
𝑥 0
a) Find the method of moments (MME) estimate of θ.
b) Find MLE of θ.

27. Suppose that the 95% confidence interval obtained for the difference of average
weekly sales in two stores is (-3.232, 12.102) (in lakhs of rupees). From this, which of
the conclusions is NOT possible without additional information?
(a) At 10% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(b) At 1% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(c) At 2% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(d) At 5% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected

28. A manufacturer of light bulbs claims an average life of more than 750 hours per
bulb. A consumer group did not believe this claim and tested a sample of 35 bulbs. The
average lifetime of these 35 bulbs was 740 hours with a standard deviation of 30 hours.
The manufacturer responded that their claim was based on testing hundreds of bulbs.
a) What is the 95% confidence interval for life of light bulbs based on the data
collected by the consumers?
b) True or false: Compared to the consumer group’s 95% confidence interval, the
manufacturer’s 95% confidence interval is more likely to contain the population
mean because it is based on a larger sample. Justify your answer.

Page 6 of 28
Quantitative Technique II
c) The volume in a set of soft drink bottle is known to follow a Normal distribution with
standard deviation of 4 ml. You have taken a sample of the bottles and measured their
volumes. How many bottles do you have to sample to have a 90% confidence interval
for µ with width 1?

29. Suppose that against a certain opponent the number of points the BM basketball
team scores is normally distributed with unknown mean θ and unknown variance, σ2.
Suppose that over the course of the last 6 games between the two teams, BM scored
the following points: 59, 62, 59, 74, 70, and 61.
a) Compute a 95% t–confidence interval for θ. Does 95% confidence mean that the
probability θ is in the interval you just found is 95%?
b) Now suppose that you learn that σ2 = 25. Compute a 95% z–confidence interval for
θ. How does this compare to the interval in (i)?

30. The production line for Glow toothpaste is designed to fill tubes of toothpaste
with a mean weight of 60 grams. Periodically, a sample of 50 tubes will be selected in
order to check the filling process. Quality assurance procedures call for the
continuation of the filling process if the sample results are consistent with the
assumption that the mean filling weight for the population of toothpaste tubes is 60
grams; otherwise the filling process will be stopped and adjusted.
a) Formulate the null and the alternative hypotheses to help determine when the
filling process should continue operating and when it should be stopped and
corrected.
b) Assume that a sample of 50 toothpaste tubes provides a sample mean of 61 grams
and standard deviation of 2 grams. Based on the information:
(i) Compute the p-value of the test; hence carry out the testing of the hypotheses
formed in part (a) at a 5% level of significance.
(ii) Obtain a 95% confidence interval for the mean filling weight and then carry out
the same test as in part (i).
c) What is the power of the test if the true mean filling weight of the toothpaste tubes
is 60.5 grams, if we assume the true standard deviation of the filling weights is 2
grams?

Page 7 of 28
Quantitative Technique II
31. A large hotel chain periodically runs various surveys on their website to
understand customer preferences. One such survey is about the preference of
smoking versus non-smoking rooms. In a random sample of 400 visitors to their
website five years ago, 166 had indicated their preference for the non-smoking rooms.
This year, 205 such visitors, in a sample of 380, preferred the non-smoking rooms.
a) What would be a 95% confidence interval for the true difference of proportions,
regarding the preference of non-smoking rooms, from five year ago to now?
b) Would you recommend that the hotel chain convert more rooms to non-smoking?
Support your recommendation, by testing the appropriate hypotheses, using the
data given above, at a 0.05 level of significance.

32. A controlled clinical trial was performed on 9 patients to investigate the effect of
a drug on the behavioural disorders of chronic schizophrenics. The following table
gives the behavioural rating scores for the patients at the beginning of the trial and
after 3 months of the end of the trial. High scores are good.
Patient 1 2 3 4 5 6 7 8 9
Before Drug 2.3 2.0 1.9 3.1 2.2 2.3 2.8 1.9 1.1
After Drug 3.1 2.1 2.45 3.7 2.54 3.72 4.54 1.61 1.63
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the drug improves the patients’ behavioural rating scores.
33. The screening process for detecting a rare disease is not perfect. Researchers
have developed a blood test that is considered fairly reliable. It gives a positive
reaction in 98% of the people who have that disease. However, it erroneously gives a
positive reaction in 3% of the people who do not have the disease. Suppose the null
hypothesis is “the individual does not have the disease” and the alternative hypothesis
is “the individual has the disease”.
a) What is the probability of Type I error?
b) What is the power of the test?

Page 8 of 28
Quantitative Technique II
34. A study has been conducted to check whether average height changes from
generation to generation. The following table gives the heights (in centimetres) of a
sample of 8 fathers and their oldest adult sons.
Height of father 165.1 160 170.2 162.6 172.7 157.5 177.8 167.6
Height of son 172.7 167.6 172.7 175.3 167.6 165.1 180.3 170.2
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the average height changes from generation to generation. What
is the p-value of the test?

35. The mean lifetime of a sample of 100 light bulbs produced by a company is
computed to be 1590 hours with a standard deviation of 120 hours. The manager of
the company wants to test the null hypothesis µ = 1600 hours against the alternative
hypothesis µ ≠ 1600 hours, where µ is the mean lifetime of all the bulbs produced by
the company.
a) Compute the p-value of the test.
b) From (a), what is your conclusion at a 5% level of significance?
c) Obtain a 95% confidence interval for µ. Would you arrive at the same conclusion,
for the test of hypotheses using this confidence interval as in b)? Justify.

36. An oil company wishes to study the effects of three different fuel additives on
mean fuel mileage. The company randomly selects three groups of six automobiles
each and assigns a group of six automobiles to each additive type (A, B, and C). All the
18 automobiles are of same make and model. Each of the six automobiles assigned to
fuel additive test is driven using the appropriate additive and the fuel mileage (in
km/lit) for the test drive is recorded. Following are the results:

Additive A 12.5 15 14.4 11 14.9 13


Additive B 14.2 12.3 15.4 17.3 15.1 14.1
Additive C 16.4 12.6 14 17.8 11.2 13.6
Assume that the data are normally and independently distributed. Can we consider the
variances of each of the three groups to be the same? Perform a relevant test at 5%
level of significance by stating your hypotheses.

Page 9 of 28
Quantitative Technique II
a) Suppose that we want to test at a 5% level of significance whether there is a
difference in the three types of fuel additives regarding mileage of the automobile.
Towards that, state the null and alternative hypotheses, the assumptions you make and
why they are justifiable, compute the appropriate test statistics and then perform the
test.
b) Now consider a situation when the data are not known to be normally distributed.
Suggest and perform an appropriate test at a 5% significance level to check whether
there is a difference in the three types of fuel additives regarding mileage of the
automobile. Remember to state your assumptions and hypotheses.

37. A software company develops software for GPS navigational system. From market
area A to residential area B, there are four routes possible, Route 1, Route 2, Route 3
and Route 4. The company obtains data for travelling along each route for one week
and obtains the following data. The time of travel from market area A to residential
area B is given in minutes.
Route 1 29.5 30.5 33 31 32
Route 2 27.5 32.5 28 30 29
Route 3 25 27 23.5 25.5 26
Route 4 24 26.5 28.5 31.5 24.5

a) Assume that the data are normally and independently distributed.


i) Can we consider the variances in time of each of the four routes to be the same?
Perform a relevant test at 1% level of significance by stating your hypotheses.
ii) Suppose that we want to test at a 1% level of significance whether there is a
difference in the four different routes. Towards that, state the null and alternative
hypotheses, the assumptions you make and why they are justifiable, compute the
appropriate test statistics and then perform the test.
b) Now consider a situation when the data are not known to be normally distributed.
Suggest and perform an appropriate test at a 1% significance level to check whether there
is a difference in the four different routes. Remember to state your assumptions and
hypotheses.

Page 10 of 28
Quantitative Technique II
38. The Scholastic Aptitude Test (SAT) contains three areas: critical reading,
mathematics, and writing. Each area is scored on an 800-point scale. A sample of SAT
scores for six students follows.

Student Critical Mathematical Writing


Reading
1 526 534 530
2 594 590 586
3 465 464 445
4 561 566 553
5 436 478 430
6 430 458 420
Assume that the data are normally and independently distributed. Using a .05 level of
significance, do students perform differently on the three areas of the SAT?

39. To improve students’ performance on the GMAT, a university is considering


offering the following three GMAT preparation programs.
i) A three-hour review session, ii) A one-day program covering, iii) An intensive 10-week
course.
Scores on the GMAT range from 200 to 800, with higher scores implying higher aptitude.
The GMAT is usually taken by students from three colleges: the College of Business, the
College of Engineering, and the College of Arts and Sciences. Let us assume that the
randomly selected students participated in the preparation programs and then took the
GMAT. The scores obtained are reported in the following table:
College
Business Engineering Art and
Preparation Science
Program 3-hr review 500, 580 540, 460 480, 400
1 Day Program 460, 540 560, 620 420, 480
10-week course 560, 600 600, 580 480, 410
Assume that the data are normally and independently distributed. Test the following at
5% level of significance:
a) Do the preparation programs differ in terms of effect on GMAT scores?
b) Do the undergraduate colleges differ in terms of effect on GMAT scores?
Page 11 of 28
Quantitative Technique II
c) Do students in some colleges do better on one type of preparation program
whereas others do better on a different type of preparation program?

40. Sales figures for a sample of 40 days from a store selling mobile phones is given
below:
26 29 33 30 27 29 24 30 30 34
39 24 29 31 31 32 27 26 31 28
36 30 33 25 33 31 36 35 30 24
34 36 41 33 34 29 31 31 32 26

Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a normal distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.

41. A farmer who has sprayed insecticide on 6 of his apple trees to eliminate mites.
From each of the 6 apple trees, 25 leaves were selected, and the number of mites were
counted. Following is the data on 150 leaves which gives the number of count of mites
on each leaf:

Number per leaf 0 1 2 3 4 5 6


Observed count 70 38 17 10 9 3 3

Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a Poisson distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.

42. An employment survey asked a sample of human resource executives how their
company planned to change its workforce over the next 12 months. A categorical
response variable showed three options: The company plans to hire and add to the
number of employees, the company plans no change in the number of employees, or

Page 12 of 28
Quantitative Technique II
the company plans to lay off and reduce the number of employees. Another
categorical variable indicated if the company was private or public. Sample data for
180 companies are summarized as follows.
Employment Plan Company
Private Public
Add Employees 37 32
No Change 19 34
Lay Off Employees 16 42
Construct a test of independence to determine if the employment plan for the next 12
months is independent of the type of company. At a 5% level of significance, what is
your conclusion?

Regression

1. Consider the Fresh detergent data set shared with you. If we consider performing a
multiple regression of “Demand” on “Price”, “IndPrice” and “AdvExp” using the Fresh
detergent dataset, then the F-statistic for ANOVA requires to be tested at degrees of
freedom:
A. 3 and 26 B. 1 and 26 C. 3 and 27 D. 1 and 28

2. If we consider performing a multiple regression of “Demand” on “Price”, “IndPrice”


and “AdvExp” using only the first 25 data points of the Fresh detergent dataset, then
the t-statistics for “Price” comes out to be -3.263. The corresponding p-value is
A. 0.0037 B. 0.0033 C. 0.0031 D. 0.0028

3. If we perform simple regressions of “Demand” on “Price”, “IndPrice” and “AdvExp”


using the Fresh detergent dataset, the fitted coefficients come out to be -3.545,
2.3374 and 1.0434, respectively. Which of the following conclusions is incorrect?
A. “Demand” and “Price” have a negative correlation
B. Unit change in “Price” changes demand more than respective unit changes in
“IndPrice” or “AdvExp”
C. “Price” and “IndPrice” held constant, unit change in “AdvExp” leads to 1.0434 units
change in “Demand”

Page 13 of 28
Quantitative Technique II
D. Performing a multiple regression of “Demand” on “Price”, “IndPrice” and “AdvExp”
will give a higher value of R2 compared to any of these three simple regressions

4. If we consider performing a multiple regression of “Demand” on “Price”, “IndPrice”


and “AdvExp” using the Fresh detergent dataset, then which of the following is an
incorrect conclusion?
A. All coefficients are significant at 5% level of significance
B. If “IndPrice” and “AdvExp” are held constant, a 2.3577 unit drop in Price will
increase demand by 1 unit
C. 89.36% of the total variation in demand is explained by the multiple regression model
D. When all independent variables are zero, the demand is 7.5891 units

5. Among the following independent variable combinations used to fit multiple linear
regression models to forecast “Demand” using the Fresh detergent dataset, the best
fit (in terms of AIC) is provided by:
A. “Price”, “IndPrice” and “AdvExp”
B. “Price” and “AdvExp”
C. “IndPrice” and “AdvExp”
D. Only “AdvExp”

6. Consider the problem of fitting a suitable linear model to forecast “Demand” using
the Fresh detergent dataset. Which of the following is correct?
A. The model with the highest adjusted R2 value can be considered to provide the best
fit
B. The model with the highest R2 value can be considered to provide the best fit
C. The model with the highest AIC value can be considered to provide the best fit
D. Only a model with all coefficients having p-values less than 5% can provide the best
fit

7. What is the MSE for a multiple linear regression of “Demand” on “Price” and
“AdvExp”?
A. 0.3373 B. 0.1138 C. 3.0727 D. 0.7717

Page 14 of 28
Quantitative Technique II
8. A simple linear regression fit of price and sold quantity of a product, with 25 data
points, result in the fit of the corresponding line: Price = 7 - 0.04xSold quantity with
multiple R-squared value of 0.70. The correlation between Price and Sold quantity is
closest to
A. -0.84 B. -0.49 C. -0.04 D. 0.49
9. A simple linear regression fit of price on sold quantity of a product, with 25 data
points, results in the fit of the corresponding line: Price = 7 - 0.04xSold quantity with
multiple R-squared value of 0.70. The value of the adjusted R-square is closest to
A. 0.687 B. 0.713 C. 0.329 D. 0.613

10. The “mtcars” is an inbuilt dataset in R which is a data frame with 32 observations
on 11 (numeric) variables. Attach the “mtcars” data in R (using attach(mtcars)) and fit
a linear regression model for mpg as response and disp and wt as predictors using the
following code lm.mtcars = lm(mpg~disp+wt). Run the following codes also:
summary(lm.mtcars), anova(lm.mtcars). Answer the following questions based on
your output:
Q10.a) The regression equation would be:
A. 34.96055 - 0.01773 disp - 3.35082 wt + ϵ
B. 34.96055 + 0.01773 disp + 3.35082 wt
C. 34.96055 + 0.01773 disp + 3.35082 wt + ϵ
D. 34.96055 - 0.01773 disp - 3.35082 wt
Q10.b)True/False: The predicted mpg decreases by 3.35082 by 1% increase in wt when
disp is unchanged.
A) True B) False
Q10.c) The p-value for the disp variable:
A)4.91e-16 B) 0.06362 C) 0.00743 D) 2.744e-10
Q10.d)The value of the coefficient of determination (R2) for this model is:
A)0.7658 B) 0.7809 C) 51.69 D) 2.917
Q10.e) Construct the ANOVA table for the above regression model.

Page 15 of 28
Quantitative Technique II
11. For this exercise, use the mtcars dataset available in R. View the dataset using the
command View(mtcars), and attach it using the command attach(mtcars). In the
mtcars dataset, the variable am gives the transmission type (0 = automatic, 1 =
manual) of the corresponding vehicle, the variable wt gives the weight (in 1000 lbs),
cyl gives the number of cylinders, and, hp gives the gross horsepower. Ignore the
other variables.
Q11.a)We want to predict the transmission type based only on the weight of the car.
Based on the fitted logistic regression model, we CANNOT conclude that
A) If the car weight increases by 1 lb, the probability of the car being automatic
reduces by 4.024%
B) Heavier cars are more likely to be automatic
C) Weight is a significant predictor in finding out the transmission type of a car
D) Any model with additional independent variables will have residual deviance
lesser than 19.176

Q11.b) If we perform a G-test for the logistic regression model fitted on the transmission
type based only on the weight of the car, then the p-value is closest to
A)9x10-7 B) 1x10-5 C) 0.94564 D) 0.04504

Q11.c) Based on the logistic regression model fitted on the transmission type based only
on the weight of the car, the probability that a car weighing 3000lb will be of manual
transmission type, is closest to
A) 0.492 B) 0.508 C) 0.968 D) 0.032
Q11.d) Starting from a model with wt, cyl and hp as independent variables, if a backward
stepwise selection is performed based on AIC, the best fitted model obtained will
include the following predictor variables
A) wt and hp only B) wt only C) wt and cyl only D) all 3 variables
Q11.e) The AIC value of the model chosen above is given by
A) 16.059 B) 17.841 C) 23.176 D) 15.818

Page 16 of 28
Quantitative Technique II
Quantitative Technique II

PROBLEM SET
DR. PRITHA GUHA

XLRI | Jamshedpur

Page 17 of 28
Quantitative Technique II
1. Solution: It is a simple random sample.

2. Solution: Size of strata 1: 60, size of strata 2: 40. Total number of samples: 15. We are
60
using proportional allocation. Thus, sample size from strata 1, 𝑛1 = 15 × = 9 and
100
sample size from strata 2, 𝑛2 = 6. Total number of possible samples if samples are
selected without replacement (unordered) from the two strata: (60
9
) × (40
6
).

3. Solution:
a) Some vouchers may have very small value (e.g. for buying office supply etc.) and
some may have very high value (e.g. for buying lab equipment). A simple random
sample may not give the proper representation of the population in this case as
one may end up having vouchers with only a small amount or vouchers with only
larger amount.
b) In this case we should do a stratified random sampling.

4. Solution:
(a) Simple random sample. Non-response bias, if only those people who have strong
opinions about the survey responds his sample may not be representative of the
population.
(b) Convenience sample. Under coverage bias, his sample may not be representative
of the population since it consists only of his friends. It is also possible that the study
will have non-response bias if some choose to not bring back the survey.
(c) Convenience sample. This will have a similar issue to handing out surveys to friends.
(d) Multi-stage sampling. If the classes are similar to each other with respect to student
composition this approach should not introduce bias, other than potential non-
response bias.

5. Solution: (a) Simple random sampling is okay. In fact, it's rare for simple random
sampling to not be a reasonable sampling method!
(b) The student opinions may vary by field of study, so the stratifying by this variable
makes sense and would be reasonable.
(c) Students of similar ages are probably going to have more similar opinions, and we
want clusters to be diverse with respect to the outcome of interest, so this would not

Page 18 of 28
Quantitative Technique II
be a good approach. (Additional thought: the clusters in this case may also have very
different numbers of people, which can also create unexpected sample sizes.)
6. Solution: 9
7. This is an R exercise.
8. Solution:
X/n 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
P(X/n) 0.0 0.0 0.0001 0.0008 0.0055 0.0264 0.0881 0.2013 0.3020 0.2684 0.1074
9. Solution: (b) The fraction of computer chips manufactured at the factory during the
27
week of production that had defects. (c) p̂ = = 0.127; (d) Standard error; (e)
212
𝑆𝐸(𝑝̂ ) = 0.023; (g) 0.021
10. Solution: (A)
12. Solution: (A)
15. Solution: a) 5/12, b) 1/2
1
16. Solution: MME of p is (2 − 𝑋̅)
4
17. Solution: Discussed in class
18. Solution: MLE of λ = 1/𝑋̅
1
19. Solution: MLE of µ is 𝑋̅, 𝜎 2 is ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑛
1 𝑛 1
∑𝑛 𝑋 2

20. Solution: a) Likelihood function 𝐿(𝜎) = ( ) 𝑒 2𝜎2 𝑖=1 𝑖 ,
𝜎√ 2𝜋
1
b) MLE 𝜎̂ 2 = ∑𝑛𝑖=1 𝑋𝑖2
𝑛
2
21. Solution: MME: â = ̅
X , MLE: â = Max(Xi )/2
3
22. Solution: 98.75
23. Solution: 9.5
24. ̅, MSE is least.
Solution: a) Both are unbiased for µ. b) X
25. Solution: 50
𝑥̅ 1 𝑥 −1
26. Solution: a) 𝜃̂ = ( ) ; b) 𝜃̂ = [𝑛 ∑𝑛𝑖=1 ln (𝑥 𝑖 )]]
𝑥̅ −𝑥0 0
27. Solution: (a)
28. Solution: a) [730.0612, 749.9388] if normal distribution is used, [729.6946,
750.3054] is t distribution is used. b) False. c) 174
29. Solution: a) (57.549, 70.784), b) (60.166, 68.167)
30. Solution:a) To test, H0: µ = 60, H1: µ ≠ 60, i.e., whether the mean weight of the
toothpaste tube is significantly different from 60gms. b) i) A sample of 50 toothpaste

Page 19 of 28
Quantitative Technique II
tubes provided a sample mean of 61gms with sd 2 gms. As the sample size (n = 50) is
large we can use a z-test here.
We have, 𝑋̅ = 61, 𝑆𝑋 = 2, 𝛼 = 0.05
𝑋̅−µ
Under H0, test statistic, 𝑍 = ~ 𝑁(0,1).
𝑆𝑋 /√𝑛

61−60
As this is a two sided test, |𝑍𝑜𝑏𝑠 | = | 2 | = 3.535534
√50

p-value: P(Z< -3.535534)+ P(Z > 3.535534) = 2 × P(Z< -3.535534) = 2 × 0.0002034759


= 0.0004069519
We are using R to find the value of P(Z< -3.535534).
R Code: pnorm(-3.535534) = 0.0002034759
As we have p-value (0.0004069519) < α (0.05), we reject H0, i.e., from the given sample,
at 5% level of significance, we conclude that we need to stop the filling process and
correct it as the filling weight is significantly different from 60 gms.
Solution:b)ii) 95% confidence interval for the population mean µ would be:
𝑆𝑋 2 2
[𝑋̅ ± 𝑍𝛼/2 ] = [61 ± 𝑍0.025 ] = [61 ± 1.96 × ]
√𝑛 √50 √50
= [60.44563, 61.55437]

As µ = 60 does not belong to the 95% confidence interval [60.44563,61.55437], we reject


H0, i.e., from the given sample, at 5% level of significance, we conclude that we need to
stop the filling process and correct it as the filling weight is significantly different from 60
gms.
Solution:c) The true mean filling weight of the toothpaste tubes is 60.5 gms with sd of
2 gms.
As this is a two sided test and α = 0.05, we reject H0 if,
𝑋̅ − 60 𝑋̅ − 60
< − 𝑍0.025 𝑜𝑟 > 𝑍0.025
2 2
√50 √50

Page 20 of 28
Quantitative Technique II
2 2
=> 𝑋̅ < 60 − 𝑍0.025 × , 𝑋̅ > 60 + 𝑍0.025 ×
√50 √50
𝑋 < 59.44563, 𝑋̅ > 60.55437
̅
If µ = 60.5 and σ = 2,
Power: 𝑃(𝑋̅ < 59.44563 | µ = 60.5, 𝜎 = 2) + 𝑃(𝑋̅ > 60.55437| µ = 60.5, 𝜎 = 2)
22
When, µ = 60.5 and σ = 2, then 𝑋̅~ 𝑁 (60.5, )
50
Thus, power:
𝑃(𝑋̅ < 59.44563 | µ = 60.5, 𝜎 = 2) + 𝑃(𝑋̅ > 60.55437| µ = 60.5, 𝜎 = 2)
= 0.4224016
We have used R to compute P(Z<-3.727761) and P(Z>0.192227) using the R codes
pnorm(-3.727761) and 1-pnorm(0.192227) respectively.
31. Solution:a) Five years ago, in a random sample of 400 visitors to their website,
166 indicated their preference for non-smoking rooms.
166
n1 = 400, Thus, ̂
𝑝1 = = 0.415
400

This year in a random sample of 380 visitors, 205 indicated their preference for non-
smoking rooms.
205
n2 = 205, Thus, 𝑝
̂2 = = 0.5394737
380

As, α = 0.05, a 95% confidence interval for the true difference of proportions, regarding
̂(1−
𝑝1 ̂)
𝑝1 ̂(1−
𝑝2 ̂)
𝑝2
the preference of non-smoking rooms would be:[(𝑝 ̂2 ) ± 𝑍𝛼 √
̂1 − 𝑝 + ]
2 𝑛1 𝑛2

= [−0.194067, −0.05488039]
Solution: b) To test H0: p1 = p2, H1: p1 < p2
We have α = 0.05
Common population proportion,
166 + 205
𝑝̂ = = 0.475641
400 + 380
1 1
𝑆𝑝̂− ̂2 = √𝑝̂ (1 − 𝑝̂ ) (
1 𝑝
+ ) = 0.03577499
𝑛1 𝑛2
̂−
𝑝 ̂2
1 𝑝
Under H0, the test statistic, 𝑍 = ~ 𝑁(0,1).
𝑆𝑝̂1 − 𝑝̂2

Page 21 of 28
Quantitative Technique II
−0.1244737
Now, 𝑍𝑜𝑏𝑠 = = −3.47935
0.03577499
This is a one sided test and we would reject H0 if Zobs < -Zα.
Now -Zα = -1.644854 (from R, R Code: qnorm(0.05), Output: -1.644854)
As Zobs < -Zα, we reject H0 for α = 0.05, i.e., we can recommend the hotel chain to convert
more rooms to non-smoking.

32. Solution: Using parametric test:


Assumption:
1. The population from which the samples are collected is following a normal
distribution.
2. Xi and Yi are related for the same i, but, Xi and Yj are independent whenever 𝑖 ≠ 𝑗.
We would be doing a paired t-test.
As 𝑡𝑜𝑏𝑠 (−3.094344) < 𝑡8;0.05 (−1.859548), we reject H0, i.e., there is some positive
effect of the drug on the patients at 5% level of significance.
Solution: Using non-parametric test:
Let Xi = Before drug scores, Yi = After drug scores
(X1, Y1), …, (X9, Y9), are matched pairs.
Assumption:
The distribution of the population from which the sample is chosen is not known to be
normal.
Suppose X1, …, X9 is from distribution F and Y1, …, Y9 is from distribution G.
Di = Xi - Yi
To test, whether the drug improves the patients scores.
H0: F and G are identical probability distributions, i.e., there is no change in the scores
H1: F is shifted to the left of G, i.e., the drug improves the scores
We would be performing a Wilcoxon Signed Rank Test.
As our H1: F is shifted to the left of G , the test statistic:
T = T+ = sum of the ranks corresponding to positive values of Di
Patient Before After Di
(Xi) (Yi)
1 2.3 3.1 -0.8
2 2.0 2.1 -0.1
3 1.9 2.45 -0.55
Page 22 of 28
Quantitative Technique II
4 3.1 3.7 -0.6 We would be using R.
5 2.2 2.54 -0.34
R Code:
6 2.3 3.72 -1.42
7 2.8 4.54 -1.74 Wilcox.test(X, Y, alternative = “less”, paired = T)
8 1.9 1.61 0.29
R outputs:
9 1.1 1.63 -0.53
Test statistic value = 2
p-value = 0.005859
As α = 0.05 and p-value (0.005859) < α (0.05), we reject H0, i.e., there is some positive
effect of the drug on the patients at 5% level of significance.
33. Solution: a) 0.03, b) 0.98
34. Solution: From two-sided paired t-test, p-value: 0.03995
35. Solution: a) 0.4066, b) Cannot reject null. c) [1566.48, 1613.52]
36. Solution There are 3 different additives, Additive A, Additive B and Additive C.
To test whether the variances of the three groups are same, i.e., to test
H0: σ12 = σ22 = σ32, H1: Not all σj2 s are equal
where σ12, σ22, σ32 are variances corresponding to the three populations for Additive A,
Additive B and Additive C respectively.
Assumptions:
1. The population from which the samples are collected is following a normal distribution
2. The data are independently distributed
We would be performing Bartlett’s test for homogeneity of variance.
Under H0, the test statistic H ~ χ2 with 2 degrees of freedom
We reject H0 for α = 0.05 if p-value < α
We are using R to perform the test.
R Code: bartlett.test(Mileage~Additive)
Relevant R outputs:
p-value: 0.577
As p-value (0.577) < α (0.05), we cannot reject H0.
Thus, we conclude that, the variances of each of the three groups are the same/not
significantly different.
Page 23 of 28
Quantitative Technique II
37. a)Solution: There are 4 different, Route 1, Route 2, Route 3, Route 4.
i)To test whether the variances of the four routes are same, i.e., to test
H0: σ12 = σ22 = σ32 = σ42, H1: Not all σj2 s are equal
where σ12, σ22, σ32, σ42 are variances corresponding to the four populations for Route 1,
Route 2, Route 3, Route 4 respectively.
Assumptions:
1. The population from which the samples are collected is following a normal distribution
2. The data are independently distributed
We would be performing Bartlett’s test for homogeneity of variance.
Under H0, the test statistic H ~ χ2 with 3 degrees of freedom
We reject H0 for α = 0.1 if p-value < α
We are using R to perform the test.
R Code: bartlett.test(Time~Route)
Relevant R outputs:
data: Time by Route
Bartlett's K-squared = 2.3838, df = 3, p-value = 0.4967
As p-value (0.4967) > α (0.1), we cannot reject H0.
Thus, we conclude that, the variances of each of the four groups are the same/not
significantly different.
ii)Solution: We would like to test at 1% level (α =0.01), whether there is a difference in
travelling along the four routes from market area A to residential area B.
To test,
H0: µ1 = µ2 = µ3 = µ4, H1: At least one µi is different
where µ1, µ2, µ3 and µ4 are means corresponding to the four populations for Route 1,
Route 2, Route 3, Route 4 respectively.
Assumption:
1. The population from which the samples are collected is following a normal distribution
with the same variance (i.e. σ12 = σ22 = σ32 = σ42using part (i))
2. The data are independently distributed
With the above assumptions we can perform one-way ANOVA for comparing the means
of the four populations. We are using R to perform the test.

Page 24 of 28
Quantitative Technique II
R Code: Anova(lm(Route$Time~Route$Route), type = "II")
We are filling up the ANOVA table from the outputs obtained from R.
ANOVA Table
Source of Degrees of Sum of Mean Sum of F-Statistic
Variation Freedom squares Squares (Fobs)
Between Groups 3 98.55 32.85 7.763663
Error 16 67.70 4.23125
Total 19 166.25
Here, k = 4, n=20, α = 0.01.
Under H0, test statistic, F ~ F3,16;0.01 and from ANOVA table, we get Fobs = 7.763663
Using R/F-distribution table to find F3,16;0.01.
R code: qf(0.90, 3,16).
From R, F3,16;0.1 = 2.461811.
As Fobs (7.763663) > F3,16;0.1 (2.461811), we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.002015458
Our α = 0.10. As p-value < α, thus we reject H0.
b)Solution: Now suppose we do not know whether the data are normally distributed.
To test, H0: All the four populations are identical, H1: At least two of the populations are
different
Assumption:
1. The samples are independent
We would be performing Kruskall-Wallies test (Non-parametric Test) to test the
hypothesis.
As there are no ties, the test statistic,
𝑘
12 𝑇𝑖2
𝐻= ∑ − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑛𝑖
𝑖=1

where, n1 = 5, n2 = 5, n3 = 5, n4 = 5, n = n1 + n2 + n3 + n4 = 20, k = 4,
Ti = sum of ranks for the i-th group.
We also have α = 0.1.
Under H0, H ~ χ2 with 3 degrees of freedom

Page 25 of 28
Quantitative Technique II
Route Route Route Route As there are no ties,
1 2 3 4
12 822 642 242 402
13 9 4 2 𝐻𝑜𝑏𝑠 = ( + + + )−3
15 19 8 7 20 ∗ 21 5 5 5 5
20 10 1 11 ∗ 20 = 11.263
16 14 5 17 From Chi-square table, for α = 0.1,
18 12 6 3
82 = T1 64= T2 24= T3 40=T4 χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Using R: R Code: kruskal.test(Time~Route)
Output: Hobs = 11.263
As α = 0.1 and under H0, H ~ χ2 with 3 degrees of freedom, we find the critical value using
R.
R code: qchisq(0.9, 3)
R output: for α = 0.1, χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
time travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.01039
Our α = 0.1. As p-value < α, thus we reject H0.
38. Solution: 𝐻0 : 𝜇𝐶𝑅 = 𝜇𝑀 = 𝜇𝑊 , 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜇𝑖 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
Two-way ANOVA Table (Without interaction)
Source of DF SS MS F-stat
Variation
Subject 2 1348 674 5.6167
Student 5 63250 12650 105.4167
Error 10 1200 120
Total 17 65798
R Code: Anova(lm(Marks~Subject+Student), type = "II") and α=0.05
p-value: 0.0232 < α=0.05, reject the null 𝐻0
39. Solve by using R.
40. Solution: H0: The data is from a normally distributed family
Page 26 of 28
Quantitative Technique II
H1: The data is not from a normally distributed family
The total number of observations, n = 40.
Finding the number of classes: As 25 < 40 < 26, we can divide the data into 5 groups with
equal probabilities.
1
Thus the expected frequency for each group = 40 × = 8.
5
From the sample, mean = 30.75, variance = 15.833
As we are dividing the data into 5 groups with equal probabilities, we would be looking
for 20th, 40th, 60th and 80th percentile of N(30.75, 15.833).
Obtaining the percentiles using R.
R code (after reading and attaching the file):
break.MS = qnorm(c(0.001, 0.2, 0.4, 0.6, 0.8, 0.999), mean(Mobile), sd(Mobile))
The percentiles (from R output):
18.45362, 27.40109, 29.74190, 31.75810, 34.09891, 43.04638
Dividing into 5 groups using the above cut-offs’ and computing the group frequencies
using
Class Oi Ei Under H0, the test statistic, 𝜒 2 ~ 𝜒𝑑2 , where, d = k-m-1.
18.5 – 27.4 9 8 Also, α = 0.05
27.4 – 29.4 5 8 Here we have, d = k – m – 1 = 5 – 2 -1 =2
29.4 – 31.8 11 8 5
2
(𝑂𝑖 − 𝐸𝑖 )2
31.8 – 34.1 9 8 𝜒𝑜𝑏𝑠 = ∑ =3
𝐸𝑖
34.1 - 43 6 8 𝑖=1
2 2
We reject H0 if 𝜒𝑜𝑏𝑠 > 𝜒2;0.05
2
Using R to obtain 𝜒2;0.05 using the R code qchisq(0.95, 2).
2
From R output we get, 𝜒2;0.05 = 5.991465.
2 2
As 𝜒𝑜𝑏𝑠 < 𝜒2;0.05 , we cannot reject H0, i.e., we can say at 5% level of significance, the
data is from a normally distributed family.

41. Solution: H0: The data is from a Poisson distribution family


H1: The data is not from a Poisson distribution family
 =1.14
Number O E Number O E
per leaf per leaf
0 70 47.97285 0 70 47.97285 10.11396
1 38 54.68905 1 38 54.68905 5.092873
Page 27 of 28
Quantitative Technique II
2 17 31.17276 2 17 31.17276 6.443675
3 10 11.84565 ≥ 3 25 16.16529 4.828376
4 9 3.37601
5 3 0.7697303
6 3 0.1739

Under H0, the test statistic, 𝜒 2 ~ 𝜒𝑑2 , where, d = k-m-1.


Also, α = 0.05
Here we have, d = k – m – 1 = 4 -2 =2
(𝑂𝑖 −𝐸𝑖 )2
2
𝜒𝑜𝑏𝑠 = ∑4𝑖=1 =26.47888
𝐸𝑖
2 2
We reject H0 if 𝜒𝑜𝑏𝑠 > 𝜒3;0.05
2
Using R to obtain 𝜒3;0.05 using the R code qchisq(0.95, 3).
2
From R output we get, 𝜒3;0.05 = 7.814728.
2 2
As 𝜒𝑜𝑏𝑠 > 𝜒3;0.05 , we reject H0, i.e., we can say at 5% level of significance, the data is not
from a Poisson distribution distributed family.

42. Solution: 𝐻0 : Plan is independent of type of company, 𝐻1 : Plan not independent


of type of company
χ2 = 9.44, df =2
p-value is less than .01
Reject null 𝐻0 ; thus plan not independent of type of company

Page 28 of 28
Quantitative Technique II

You might also like