ProblemSetQTII_2024
ProblemSetQTII_2024
Quantitative Technique II
PROBLEM SET
DR. PRITHA GUHA
XLRI | Jamshedpur
1. Suppose following are the values in some population:
5, 27, 4, 17, 4.5, 19, 2, 11, 3, 6, 13, 18
A sample of size 4 is taken, and is observed to be 3, 4, 4.5, 2.
Is it most likely to be (a) a simple random sample, (b) a stratified sample or (c) a
clustered sample? Give reason for your answer.
2. A sample of size 15 is to be chosen from a population of size 100, divided into two
strata of sizes 60 and 40, respectively. If proportional allocation is used, and samples
are selected without replacement from the stratum, how many different samples can
be selected in total?
3. Suppose 10,000 payment vouchers are generated in 2017 in XLRI. An auditor checks
the vouchers by drawing a probability sample (known as audit sample).
a) Why simple random sampling may not be appropriate?
b) Which sampling design would you prefer?
4. A statistics student who is curious about the relationship between the amount of time
students spend on social networking sites and their performance at school decides to
conduct a survey. Various research strategies for collecting data are described below.
In each, name the sampling method proposed and any bias you might expect.
(a) He randomly samples 40 students from the study's population, gives them the
survey, asks them to fill it out and bring it back the next day.
(b) He gives out the survey only to his friends, making sure each one of them fills out
the survey.
(c) He posts a link to an online survey on Facebook and asks his friends to fill out the
survey.
(d) He randomly samples 5 classes and asks a random sample of students from those
classes to fill out the survey.
Page 1 of 28
Quantitative Technique II
(b) Stratify students by their field of study, then sample 10% of students from each
stratum.
(c) Cluster students by their ages (e.g. 18 years old in one cluster, 19 years old in one
cluster, etc.), then randomly sample three clusters and survey all students in those
clusters.
6. A school in Jamshedpur is planning conduct a sample survey for all the teachers about
how many hours the teachers require to create class notes for the students using a
mobile phone. There are 20 pre-primary teachers, 25 primary teachers and 30
secondary teachers. The school decides to choose a total of 15 teachers using a
stratified sampling model by considering the three types of teachers as three different
strata. The mean and the standard deviation of the time required to create class notes
for each stratum were as follows:
Mean (in hours) Standard deviation/ SD (in hours)
Pre-primary 43.48 3.11
Primary 45.40 2.83
Secondary 53.05 6.80
What is the number of samples to be chosen from the strata of secondary teachers if
Neyman allocation is used (mark the closest one)?
Page 2 of 28
Quantitative Technique II
chosen if proportional allocation and Neyman allocation is used. Choose the
samples for both allocations.
8. It is known that 80% of all Brand A MP3 players work in a satisfactory manner
throughout the warranty period (are “successes”). Suppose that n = 10 players are
randomly selected. Let X = the number of successes in the sample. The statistic X/n is
the sample proportion (fraction) of successes. Obtain the sampling distribution of this
statistic. (Can you simulate this problem in R?)
10. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following is not an
unbiased estimator of μ?
1
A) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
B) (X1 + X2 + X3 + X4 +X5 )
5
1
C) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15
D) 2X1 + 2X2 − X3 − X4 − X5
Page 3 of 28
Quantitative Technique II
12. Let X1, X2, X3, X4, X5 be an independent and identically distributed (IID) sample
from a population with mean μ and variance 1. Which of the following has the lowest
variance?
1
A) (X1 + X2 + X3 + X4 +X5 )
5
B) 2X1 + 2X2 − X3 − X4 − X5
1
C) (X1 + X2 + 2X3 + 2X4 + 2X5 )
4
1
D) (X1 + 2X2 + 3X3 + 4X4 +5X5 )
15
13. For a SRS drawn WR from Poisson population, show that both sample mean 𝑋̅ =
1 𝑛 1
∑𝑖=1 𝑋𝑖 and ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ )2 are unbiased estimates for the population mean λ.
𝑛 𝑛−1
14. Suppose that 𝑋1 , 𝑋2 , … , 𝑋𝑛 are IID random sample from a Bernoulli distribution
with probability of success being p.
a) Show that 𝑋 = 𝑝̂ will be an unbiased estimator of p.
b) Also show that 𝑝̂ (sample proportion) is a consistent estimator of p (population
proportion).
15. X is a discrete random variable with the following probability mass function:
where 0 ≤ θ≤1. The following 10 samples were taken: 3, 0, 2, 1, 3, 2, 1, 0, 2, 1.
X 0 1 2 3
P(X) 2θ/3 θ/3 2(1-θ)/3 (1-θ)/3
a) Find MME of θ.
b) Find MLE of θ.
17. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a Uniform [-b,b]. Find MME of b.
18. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from an Exponential(λ). Find MLE of λ.
Page 4 of 28
Quantitative Technique II
19. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be IID random sample from a 𝑁(𝜇, 𝜎 2 ). Find MLE of 𝜇 and 𝜎 2 .
20. Suppose X1, X2, …, Xn is an independently and identically distributed (IID) sample
from a normal distribution with mean 0 and variance σ2.
a) What is the likelihood function for σ2, using the sample?
b) Showing all relevant steps, obtain a maximum likelihood estimator for the
parameter σ2.
22. Suppose a simple random sample from an uniform distribution on the interval
(0,b) is obtained as follows: 42, 46, 44, 47, 47, 43, 62, 64. Determine a method of
moment estimate of b.
23. Suppose the number of customers X that enter a store between the hours 9AM
and 10AM follows a Poisson distribution with parameter θ. Suppose a random sample
of the number of customers that enter the store between 9AM and 10AM for 10 days
results in the values 9, 7, 9, 15, 10, 13, 11, 7, 2, 12. Determine the MLE of θ.
24. A marketing analyst wishes to obtain a sample of size 100 with replacement from
a population to estimate the population mean µ (unknown). However, due to a coding
error, her algorithm starts recording every value twice. After recording 50 values in
that manner, the coding error is discovered, and corrected, although the recorded
values are left unchanged. Then 50 more observations are recorded to make it to a
100. The final sample therefore is as follows: X1, X1, X2, X2, …, X25, X25, X26, X27, …, X75.
Let the corresponding sample mean be X*. On the other hand, let the sample mean of
̅.
X1, X2, …, X25, X26, X27, …, X75, each used once (i.e., after removing the duplicates), be X
For the following, wherever necessary, assume that X1, X2, …, X75 are independent and
identically distributed with mean µ and variance σ2.
a) Among X* and X ̅, which is/are unbiased for µ?
b) Which estimator, among X* and ̅ X, is more efficient, and why?
Page 5 of 28
Quantitative Technique II
25. The capture/recapture method is sometimes used to estimate the size of a
wildlife population. Suppose that 10 animals are captured, tagged, and released. On a
later occasion, 20 animals are captured, and it is found that 4 of them are tagged. How
large is the population?
27. Suppose that the 95% confidence interval obtained for the difference of average
weekly sales in two stores is (-3.232, 12.102) (in lakhs of rupees). From this, which of
the conclusions is NOT possible without additional information?
(a) At 10% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(b) At 1% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(c) At 2% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
(d) At 5% level of significance, the null of equal average weekly sales against the
alternative of unequal average weekly sales cannot be rejected
28. A manufacturer of light bulbs claims an average life of more than 750 hours per
bulb. A consumer group did not believe this claim and tested a sample of 35 bulbs. The
average lifetime of these 35 bulbs was 740 hours with a standard deviation of 30 hours.
The manufacturer responded that their claim was based on testing hundreds of bulbs.
a) What is the 95% confidence interval for life of light bulbs based on the data
collected by the consumers?
b) True or false: Compared to the consumer group’s 95% confidence interval, the
manufacturer’s 95% confidence interval is more likely to contain the population
mean because it is based on a larger sample. Justify your answer.
Page 6 of 28
Quantitative Technique II
c) The volume in a set of soft drink bottle is known to follow a Normal distribution with
standard deviation of 4 ml. You have taken a sample of the bottles and measured their
volumes. How many bottles do you have to sample to have a 90% confidence interval
for µ with width 1?
29. Suppose that against a certain opponent the number of points the BM basketball
team scores is normally distributed with unknown mean θ and unknown variance, σ2.
Suppose that over the course of the last 6 games between the two teams, BM scored
the following points: 59, 62, 59, 74, 70, and 61.
a) Compute a 95% t–confidence interval for θ. Does 95% confidence mean that the
probability θ is in the interval you just found is 95%?
b) Now suppose that you learn that σ2 = 25. Compute a 95% z–confidence interval for
θ. How does this compare to the interval in (i)?
30. The production line for Glow toothpaste is designed to fill tubes of toothpaste
with a mean weight of 60 grams. Periodically, a sample of 50 tubes will be selected in
order to check the filling process. Quality assurance procedures call for the
continuation of the filling process if the sample results are consistent with the
assumption that the mean filling weight for the population of toothpaste tubes is 60
grams; otherwise the filling process will be stopped and adjusted.
a) Formulate the null and the alternative hypotheses to help determine when the
filling process should continue operating and when it should be stopped and
corrected.
b) Assume that a sample of 50 toothpaste tubes provides a sample mean of 61 grams
and standard deviation of 2 grams. Based on the information:
(i) Compute the p-value of the test; hence carry out the testing of the hypotheses
formed in part (a) at a 5% level of significance.
(ii) Obtain a 95% confidence interval for the mean filling weight and then carry out
the same test as in part (i).
c) What is the power of the test if the true mean filling weight of the toothpaste tubes
is 60.5 grams, if we assume the true standard deviation of the filling weights is 2
grams?
Page 7 of 28
Quantitative Technique II
31. A large hotel chain periodically runs various surveys on their website to
understand customer preferences. One such survey is about the preference of
smoking versus non-smoking rooms. In a random sample of 400 visitors to their
website five years ago, 166 had indicated their preference for the non-smoking rooms.
This year, 205 such visitors, in a sample of 380, preferred the non-smoking rooms.
a) What would be a 95% confidence interval for the true difference of proportions,
regarding the preference of non-smoking rooms, from five year ago to now?
b) Would you recommend that the hotel chain convert more rooms to non-smoking?
Support your recommendation, by testing the appropriate hypotheses, using the
data given above, at a 0.05 level of significance.
32. A controlled clinical trial was performed on 9 patients to investigate the effect of
a drug on the behavioural disorders of chronic schizophrenics. The following table
gives the behavioural rating scores for the patients at the beginning of the trial and
after 3 months of the end of the trial. High scores are good.
Patient 1 2 3 4 5 6 7 8 9
Before Drug 2.3 2.0 1.9 3.1 2.2 2.3 2.8 1.9 1.1
After Drug 3.1 2.1 2.45 3.7 2.54 3.72 4.54 1.61 1.63
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the drug improves the patients’ behavioural rating scores.
33. The screening process for detecting a rare disease is not perfect. Researchers
have developed a blood test that is considered fairly reliable. It gives a positive
reaction in 98% of the people who have that disease. However, it erroneously gives a
positive reaction in 3% of the people who do not have the disease. Suppose the null
hypothesis is “the individual does not have the disease” and the alternative hypothesis
is “the individual has the disease”.
a) What is the probability of Type I error?
b) What is the power of the test?
Page 8 of 28
Quantitative Technique II
34. A study has been conducted to check whether average height changes from
generation to generation. The following table gives the heights (in centimetres) of a
sample of 8 fathers and their oldest adult sons.
Height of father 165.1 160 170.2 162.6 172.7 157.5 177.8 167.6
Height of son 172.7 167.6 172.7 175.3 167.6 165.1 180.3 170.2
Stating your assumptions, null and alternative hypotheses, test at a 5% level of
significance whether the average height changes from generation to generation. What
is the p-value of the test?
35. The mean lifetime of a sample of 100 light bulbs produced by a company is
computed to be 1590 hours with a standard deviation of 120 hours. The manager of
the company wants to test the null hypothesis µ = 1600 hours against the alternative
hypothesis µ ≠ 1600 hours, where µ is the mean lifetime of all the bulbs produced by
the company.
a) Compute the p-value of the test.
b) From (a), what is your conclusion at a 5% level of significance?
c) Obtain a 95% confidence interval for µ. Would you arrive at the same conclusion,
for the test of hypotheses using this confidence interval as in b)? Justify.
36. An oil company wishes to study the effects of three different fuel additives on
mean fuel mileage. The company randomly selects three groups of six automobiles
each and assigns a group of six automobiles to each additive type (A, B, and C). All the
18 automobiles are of same make and model. Each of the six automobiles assigned to
fuel additive test is driven using the appropriate additive and the fuel mileage (in
km/lit) for the test drive is recorded. Following are the results:
Page 9 of 28
Quantitative Technique II
a) Suppose that we want to test at a 5% level of significance whether there is a
difference in the three types of fuel additives regarding mileage of the automobile.
Towards that, state the null and alternative hypotheses, the assumptions you make and
why they are justifiable, compute the appropriate test statistics and then perform the
test.
b) Now consider a situation when the data are not known to be normally distributed.
Suggest and perform an appropriate test at a 5% significance level to check whether
there is a difference in the three types of fuel additives regarding mileage of the
automobile. Remember to state your assumptions and hypotheses.
37. A software company develops software for GPS navigational system. From market
area A to residential area B, there are four routes possible, Route 1, Route 2, Route 3
and Route 4. The company obtains data for travelling along each route for one week
and obtains the following data. The time of travel from market area A to residential
area B is given in minutes.
Route 1 29.5 30.5 33 31 32
Route 2 27.5 32.5 28 30 29
Route 3 25 27 23.5 25.5 26
Route 4 24 26.5 28.5 31.5 24.5
Page 10 of 28
Quantitative Technique II
38. The Scholastic Aptitude Test (SAT) contains three areas: critical reading,
mathematics, and writing. Each area is scored on an 800-point scale. A sample of SAT
scores for six students follows.
40. Sales figures for a sample of 40 days from a store selling mobile phones is given
below:
26 29 33 30 27 29 24 30 30 34
39 24 29 31 31 32 27 26 31 28
36 30 33 25 33 31 36 35 30 24
34 36 41 33 34 29 31 31 32 26
Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a normal distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.
41. A farmer who has sprayed insecticide on 6 of his apple trees to eliminate mites.
From each of the 6 apple trees, 25 leaves were selected, and the number of mites were
counted. Following is the data on 150 leaves which gives the number of count of mites
on each leaf:
Retracing the steps of what we did in class, perform a chi-square goodness of fit test
at a 5% level of significance to test whether the data come from a Poisson distribution.
Clearly write your null and alternative hypotheses, group the data in a reasonable
number of classes, compute the value of the appropriate test statistic, perform the
test and write your final conclusion.
42. An employment survey asked a sample of human resource executives how their
company planned to change its workforce over the next 12 months. A categorical
response variable showed three options: The company plans to hire and add to the
number of employees, the company plans no change in the number of employees, or
Page 12 of 28
Quantitative Technique II
the company plans to lay off and reduce the number of employees. Another
categorical variable indicated if the company was private or public. Sample data for
180 companies are summarized as follows.
Employment Plan Company
Private Public
Add Employees 37 32
No Change 19 34
Lay Off Employees 16 42
Construct a test of independence to determine if the employment plan for the next 12
months is independent of the type of company. At a 5% level of significance, what is
your conclusion?
Regression
1. Consider the Fresh detergent data set shared with you. If we consider performing a
multiple regression of “Demand” on “Price”, “IndPrice” and “AdvExp” using the Fresh
detergent dataset, then the F-statistic for ANOVA requires to be tested at degrees of
freedom:
A. 3 and 26 B. 1 and 26 C. 3 and 27 D. 1 and 28
Page 13 of 28
Quantitative Technique II
D. Performing a multiple regression of “Demand” on “Price”, “IndPrice” and “AdvExp”
will give a higher value of R2 compared to any of these three simple regressions
5. Among the following independent variable combinations used to fit multiple linear
regression models to forecast “Demand” using the Fresh detergent dataset, the best
fit (in terms of AIC) is provided by:
A. “Price”, “IndPrice” and “AdvExp”
B. “Price” and “AdvExp”
C. “IndPrice” and “AdvExp”
D. Only “AdvExp”
6. Consider the problem of fitting a suitable linear model to forecast “Demand” using
the Fresh detergent dataset. Which of the following is correct?
A. The model with the highest adjusted R2 value can be considered to provide the best
fit
B. The model with the highest R2 value can be considered to provide the best fit
C. The model with the highest AIC value can be considered to provide the best fit
D. Only a model with all coefficients having p-values less than 5% can provide the best
fit
7. What is the MSE for a multiple linear regression of “Demand” on “Price” and
“AdvExp”?
A. 0.3373 B. 0.1138 C. 3.0727 D. 0.7717
Page 14 of 28
Quantitative Technique II
8. A simple linear regression fit of price and sold quantity of a product, with 25 data
points, result in the fit of the corresponding line: Price = 7 - 0.04xSold quantity with
multiple R-squared value of 0.70. The correlation between Price and Sold quantity is
closest to
A. -0.84 B. -0.49 C. -0.04 D. 0.49
9. A simple linear regression fit of price on sold quantity of a product, with 25 data
points, results in the fit of the corresponding line: Price = 7 - 0.04xSold quantity with
multiple R-squared value of 0.70. The value of the adjusted R-square is closest to
A. 0.687 B. 0.713 C. 0.329 D. 0.613
10. The “mtcars” is an inbuilt dataset in R which is a data frame with 32 observations
on 11 (numeric) variables. Attach the “mtcars” data in R (using attach(mtcars)) and fit
a linear regression model for mpg as response and disp and wt as predictors using the
following code lm.mtcars = lm(mpg~disp+wt). Run the following codes also:
summary(lm.mtcars), anova(lm.mtcars). Answer the following questions based on
your output:
Q10.a) The regression equation would be:
A. 34.96055 - 0.01773 disp - 3.35082 wt + ϵ
B. 34.96055 + 0.01773 disp + 3.35082 wt
C. 34.96055 + 0.01773 disp + 3.35082 wt + ϵ
D. 34.96055 - 0.01773 disp - 3.35082 wt
Q10.b)True/False: The predicted mpg decreases by 3.35082 by 1% increase in wt when
disp is unchanged.
A) True B) False
Q10.c) The p-value for the disp variable:
A)4.91e-16 B) 0.06362 C) 0.00743 D) 2.744e-10
Q10.d)The value of the coefficient of determination (R2) for this model is:
A)0.7658 B) 0.7809 C) 51.69 D) 2.917
Q10.e) Construct the ANOVA table for the above regression model.
Page 15 of 28
Quantitative Technique II
11. For this exercise, use the mtcars dataset available in R. View the dataset using the
command View(mtcars), and attach it using the command attach(mtcars). In the
mtcars dataset, the variable am gives the transmission type (0 = automatic, 1 =
manual) of the corresponding vehicle, the variable wt gives the weight (in 1000 lbs),
cyl gives the number of cylinders, and, hp gives the gross horsepower. Ignore the
other variables.
Q11.a)We want to predict the transmission type based only on the weight of the car.
Based on the fitted logistic regression model, we CANNOT conclude that
A) If the car weight increases by 1 lb, the probability of the car being automatic
reduces by 4.024%
B) Heavier cars are more likely to be automatic
C) Weight is a significant predictor in finding out the transmission type of a car
D) Any model with additional independent variables will have residual deviance
lesser than 19.176
Q11.b) If we perform a G-test for the logistic regression model fitted on the transmission
type based only on the weight of the car, then the p-value is closest to
A)9x10-7 B) 1x10-5 C) 0.94564 D) 0.04504
Q11.c) Based on the logistic regression model fitted on the transmission type based only
on the weight of the car, the probability that a car weighing 3000lb will be of manual
transmission type, is closest to
A) 0.492 B) 0.508 C) 0.968 D) 0.032
Q11.d) Starting from a model with wt, cyl and hp as independent variables, if a backward
stepwise selection is performed based on AIC, the best fitted model obtained will
include the following predictor variables
A) wt and hp only B) wt only C) wt and cyl only D) all 3 variables
Q11.e) The AIC value of the model chosen above is given by
A) 16.059 B) 17.841 C) 23.176 D) 15.818
Page 16 of 28
Quantitative Technique II
Quantitative Technique II
PROBLEM SET
DR. PRITHA GUHA
XLRI | Jamshedpur
Page 17 of 28
Quantitative Technique II
1. Solution: It is a simple random sample.
2. Solution: Size of strata 1: 60, size of strata 2: 40. Total number of samples: 15. We are
60
using proportional allocation. Thus, sample size from strata 1, 𝑛1 = 15 × = 9 and
100
sample size from strata 2, 𝑛2 = 6. Total number of possible samples if samples are
selected without replacement (unordered) from the two strata: (60
9
) × (40
6
).
3. Solution:
a) Some vouchers may have very small value (e.g. for buying office supply etc.) and
some may have very high value (e.g. for buying lab equipment). A simple random
sample may not give the proper representation of the population in this case as
one may end up having vouchers with only a small amount or vouchers with only
larger amount.
b) In this case we should do a stratified random sampling.
4. Solution:
(a) Simple random sample. Non-response bias, if only those people who have strong
opinions about the survey responds his sample may not be representative of the
population.
(b) Convenience sample. Under coverage bias, his sample may not be representative
of the population since it consists only of his friends. It is also possible that the study
will have non-response bias if some choose to not bring back the survey.
(c) Convenience sample. This will have a similar issue to handing out surveys to friends.
(d) Multi-stage sampling. If the classes are similar to each other with respect to student
composition this approach should not introduce bias, other than potential non-
response bias.
5. Solution: (a) Simple random sampling is okay. In fact, it's rare for simple random
sampling to not be a reasonable sampling method!
(b) The student opinions may vary by field of study, so the stratifying by this variable
makes sense and would be reasonable.
(c) Students of similar ages are probably going to have more similar opinions, and we
want clusters to be diverse with respect to the outcome of interest, so this would not
Page 18 of 28
Quantitative Technique II
be a good approach. (Additional thought: the clusters in this case may also have very
different numbers of people, which can also create unexpected sample sizes.)
6. Solution: 9
7. This is an R exercise.
8. Solution:
X/n 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
P(X/n) 0.0 0.0 0.0001 0.0008 0.0055 0.0264 0.0881 0.2013 0.3020 0.2684 0.1074
9. Solution: (b) The fraction of computer chips manufactured at the factory during the
27
week of production that had defects. (c) p̂ = = 0.127; (d) Standard error; (e)
212
𝑆𝐸(𝑝̂ ) = 0.023; (g) 0.021
10. Solution: (A)
12. Solution: (A)
15. Solution: a) 5/12, b) 1/2
1
16. Solution: MME of p is (2 − 𝑋̅)
4
17. Solution: Discussed in class
18. Solution: MLE of λ = 1/𝑋̅
1
19. Solution: MLE of µ is 𝑋̅, 𝜎 2 is ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑛
1 𝑛 1
∑𝑛 𝑋 2
−
20. Solution: a) Likelihood function 𝐿(𝜎) = ( ) 𝑒 2𝜎2 𝑖=1 𝑖 ,
𝜎√ 2𝜋
1
b) MLE 𝜎̂ 2 = ∑𝑛𝑖=1 𝑋𝑖2
𝑛
2
21. Solution: MME: â = ̅
X , MLE: â = Max(Xi )/2
3
22. Solution: 98.75
23. Solution: 9.5
24. ̅, MSE is least.
Solution: a) Both are unbiased for µ. b) X
25. Solution: 50
𝑥̅ 1 𝑥 −1
26. Solution: a) 𝜃̂ = ( ) ; b) 𝜃̂ = [𝑛 ∑𝑛𝑖=1 ln (𝑥 𝑖 )]]
𝑥̅ −𝑥0 0
27. Solution: (a)
28. Solution: a) [730.0612, 749.9388] if normal distribution is used, [729.6946,
750.3054] is t distribution is used. b) False. c) 174
29. Solution: a) (57.549, 70.784), b) (60.166, 68.167)
30. Solution:a) To test, H0: µ = 60, H1: µ ≠ 60, i.e., whether the mean weight of the
toothpaste tube is significantly different from 60gms. b) i) A sample of 50 toothpaste
Page 19 of 28
Quantitative Technique II
tubes provided a sample mean of 61gms with sd 2 gms. As the sample size (n = 50) is
large we can use a z-test here.
We have, 𝑋̅ = 61, 𝑆𝑋 = 2, 𝛼 = 0.05
𝑋̅−µ
Under H0, test statistic, 𝑍 = ~ 𝑁(0,1).
𝑆𝑋 /√𝑛
61−60
As this is a two sided test, |𝑍𝑜𝑏𝑠 | = | 2 | = 3.535534
√50
Page 20 of 28
Quantitative Technique II
2 2
=> 𝑋̅ < 60 − 𝑍0.025 × , 𝑋̅ > 60 + 𝑍0.025 ×
√50 √50
𝑋 < 59.44563, 𝑋̅ > 60.55437
̅
If µ = 60.5 and σ = 2,
Power: 𝑃(𝑋̅ < 59.44563 | µ = 60.5, 𝜎 = 2) + 𝑃(𝑋̅ > 60.55437| µ = 60.5, 𝜎 = 2)
22
When, µ = 60.5 and σ = 2, then 𝑋̅~ 𝑁 (60.5, )
50
Thus, power:
𝑃(𝑋̅ < 59.44563 | µ = 60.5, 𝜎 = 2) + 𝑃(𝑋̅ > 60.55437| µ = 60.5, 𝜎 = 2)
= 0.4224016
We have used R to compute P(Z<-3.727761) and P(Z>0.192227) using the R codes
pnorm(-3.727761) and 1-pnorm(0.192227) respectively.
31. Solution:a) Five years ago, in a random sample of 400 visitors to their website,
166 indicated their preference for non-smoking rooms.
166
n1 = 400, Thus, ̂
𝑝1 = = 0.415
400
This year in a random sample of 380 visitors, 205 indicated their preference for non-
smoking rooms.
205
n2 = 205, Thus, 𝑝
̂2 = = 0.5394737
380
As, α = 0.05, a 95% confidence interval for the true difference of proportions, regarding
̂(1−
𝑝1 ̂)
𝑝1 ̂(1−
𝑝2 ̂)
𝑝2
the preference of non-smoking rooms would be:[(𝑝 ̂2 ) ± 𝑍𝛼 √
̂1 − 𝑝 + ]
2 𝑛1 𝑛2
= [−0.194067, −0.05488039]
Solution: b) To test H0: p1 = p2, H1: p1 < p2
We have α = 0.05
Common population proportion,
166 + 205
𝑝̂ = = 0.475641
400 + 380
1 1
𝑆𝑝̂− ̂2 = √𝑝̂ (1 − 𝑝̂ ) (
1 𝑝
+ ) = 0.03577499
𝑛1 𝑛2
̂−
𝑝 ̂2
1 𝑝
Under H0, the test statistic, 𝑍 = ~ 𝑁(0,1).
𝑆𝑝̂1 − 𝑝̂2
Page 21 of 28
Quantitative Technique II
−0.1244737
Now, 𝑍𝑜𝑏𝑠 = = −3.47935
0.03577499
This is a one sided test and we would reject H0 if Zobs < -Zα.
Now -Zα = -1.644854 (from R, R Code: qnorm(0.05), Output: -1.644854)
As Zobs < -Zα, we reject H0 for α = 0.05, i.e., we can recommend the hotel chain to convert
more rooms to non-smoking.
Page 24 of 28
Quantitative Technique II
R Code: Anova(lm(Route$Time~Route$Route), type = "II")
We are filling up the ANOVA table from the outputs obtained from R.
ANOVA Table
Source of Degrees of Sum of Mean Sum of F-Statistic
Variation Freedom squares Squares (Fobs)
Between Groups 3 98.55 32.85 7.763663
Error 16 67.70 4.23125
Total 19 166.25
Here, k = 4, n=20, α = 0.01.
Under H0, test statistic, F ~ F3,16;0.01 and from ANOVA table, we get Fobs = 7.763663
Using R/F-distribution table to find F3,16;0.01.
R code: qf(0.90, 3,16).
From R, F3,16;0.1 = 2.461811.
As Fobs (7.763663) > F3,16;0.1 (2.461811), we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.002015458
Our α = 0.10. As p-value < α, thus we reject H0.
b)Solution: Now suppose we do not know whether the data are normally distributed.
To test, H0: All the four populations are identical, H1: At least two of the populations are
different
Assumption:
1. The samples are independent
We would be performing Kruskall-Wallies test (Non-parametric Test) to test the
hypothesis.
As there are no ties, the test statistic,
𝑘
12 𝑇𝑖2
𝐻= ∑ − 3(𝑛 + 1)
𝑛(𝑛 + 1) 𝑛𝑖
𝑖=1
where, n1 = 5, n2 = 5, n3 = 5, n4 = 5, n = n1 + n2 + n3 + n4 = 20, k = 4,
Ti = sum of ranks for the i-th group.
We also have α = 0.1.
Under H0, H ~ χ2 with 3 degrees of freedom
Page 25 of 28
Quantitative Technique II
Route Route Route Route As there are no ties,
1 2 3 4
12 822 642 242 402
13 9 4 2 𝐻𝑜𝑏𝑠 = ( + + + )−3
15 19 8 7 20 ∗ 21 5 5 5 5
20 10 1 11 ∗ 20 = 11.263
16 14 5 17 From Chi-square table, for α = 0.1,
18 12 6 3
82 = T1 64= T2 24= T3 40=T4 χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
travelling along the four routes from market area A to residential area B.
Using R: R Code: kruskal.test(Time~Route)
Output: Hobs = 11.263
As α = 0.1 and under H0, H ~ χ2 with 3 degrees of freedom, we find the critical value using
R.
R code: qchisq(0.9, 3)
R output: for α = 0.1, χ2 with 3 degrees of freedom = 6.251389
As Hobs > χ2 with 3 degrees of freedom, we reject H0, i.e., there is significant difference in
time travelling along the four routes from market area A to residential area B.
Other approach using p-value:
From R output we get, p-value = 0.01039
Our α = 0.1. As p-value < α, thus we reject H0.
38. Solution: 𝐻0 : 𝜇𝐶𝑅 = 𝜇𝑀 = 𝜇𝑊 , 𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝜇𝑖 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡
Two-way ANOVA Table (Without interaction)
Source of DF SS MS F-stat
Variation
Subject 2 1348 674 5.6167
Student 5 63250 12650 105.4167
Error 10 1200 120
Total 17 65798
R Code: Anova(lm(Marks~Subject+Student), type = "II") and α=0.05
p-value: 0.0232 < α=0.05, reject the null 𝐻0
39. Solve by using R.
40. Solution: H0: The data is from a normally distributed family
Page 26 of 28
Quantitative Technique II
H1: The data is not from a normally distributed family
The total number of observations, n = 40.
Finding the number of classes: As 25 < 40 < 26, we can divide the data into 5 groups with
equal probabilities.
1
Thus the expected frequency for each group = 40 × = 8.
5
From the sample, mean = 30.75, variance = 15.833
As we are dividing the data into 5 groups with equal probabilities, we would be looking
for 20th, 40th, 60th and 80th percentile of N(30.75, 15.833).
Obtaining the percentiles using R.
R code (after reading and attaching the file):
break.MS = qnorm(c(0.001, 0.2, 0.4, 0.6, 0.8, 0.999), mean(Mobile), sd(Mobile))
The percentiles (from R output):
18.45362, 27.40109, 29.74190, 31.75810, 34.09891, 43.04638
Dividing into 5 groups using the above cut-offs’ and computing the group frequencies
using
Class Oi Ei Under H0, the test statistic, 𝜒 2 ~ 𝜒𝑑2 , where, d = k-m-1.
18.5 – 27.4 9 8 Also, α = 0.05
27.4 – 29.4 5 8 Here we have, d = k – m – 1 = 5 – 2 -1 =2
29.4 – 31.8 11 8 5
2
(𝑂𝑖 − 𝐸𝑖 )2
31.8 – 34.1 9 8 𝜒𝑜𝑏𝑠 = ∑ =3
𝐸𝑖
34.1 - 43 6 8 𝑖=1
2 2
We reject H0 if 𝜒𝑜𝑏𝑠 > 𝜒2;0.05
2
Using R to obtain 𝜒2;0.05 using the R code qchisq(0.95, 2).
2
From R output we get, 𝜒2;0.05 = 5.991465.
2 2
As 𝜒𝑜𝑏𝑠 < 𝜒2;0.05 , we cannot reject H0, i.e., we can say at 5% level of significance, the
data is from a normally distributed family.
Page 28 of 28
Quantitative Technique II