0% found this document useful (0 votes)
304 views18 pages

Rajiv Ranjan 11 Dec 2022

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
304 views18 pages

Rajiv Ranjan 11 Dec 2022

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/369660278

Advanced Statistics (AS) A Project Report

Method · December 2022


DOI: 10.13140/RG.2.2.30399.59044

CITATIONS READS
0 123

1 author:

Rajiv Ranjan
Liverpool John Moores University
146 PUBLICATIONS   70 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Impact Assessment Studies View project

Foundational Educational Learning (FEL) for Early Childhood Care and Education (ECCE) View project

All content following this page was uploaded by Rajiv Ranjan on 31 March 2023.

The user has requested enhancement of the downloaded file.


Advanced Statistics (AS)

A Project Report

Submitted to

by

Dr. Rajiv Ranjan

in Partial Fulfillment of

PGP-DSBA
Table of Contents
Problem 1...................................................................................................................................... 5
A physiotherapist with a male football team is interested in studying the relationship between
foot injuries and the positions at which the players play from the data collected ......................... 5
1.1 What is the probability that a randomly chosen player would suffer an injury? .............. 5
1.2 What is the probability that a player is a forward or a winger? ........................................... 5
1.3 What is the probability that a randomly chosen player plays in a striker position and
has a foot injury? .................................................................................................................................... 5
1.4 What is the probability that a randomly chosen injured player is a striker? ..................... 6
1.5 What is the probability that a randomly chosen injured player is either a forward or an
attacking midfielder? ............................................................................................................................. 6
Problem 2 ..................................................................................................................................... 6
An independent research organization is trying to estimate the probability that an accident at a
nuclear power plant will result in radiation leakage. The types of accidents possible at the plant
are fire hazards, mechanical failure, or human error. The research organization also knows that
two or more types of accidents cannot occur simultaneously. ....................................................... 6
2.1 What are the probabilities of a fire, a mechanical failure, and a human error respectively? 7
2.2 What is the probability of a radiation leak? ................................................................................. 7
2.3 Suppose there has been a radiation leak in the reactor for which the definite cause is not
known. What is the probability that it has been caused by: ............................................................. 8
i. A Fire. ............................................................................................................................................... 8
ii. A Mechanical Failure..................................................................................................................... 8
iii. A Human Error. ............................................................................................................................ 8
Problem 3: .................................................................................................................................... 8
The breaking strength of gunny bags used for packaging cement is normally distributed with a
mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg per sq. centimeter. The
quality team of the cement company wants to know the following about the packaging material
to better understand wastage or pilferage within the supply chain; Answer the questions below
based on the given information; ..................................................................................................... 8
3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg per sq cm? . 8
3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per sq cm.? .... 9
3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5 kg per sq
cm.? .......................................................................................................................................................... 9
3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and 7.5 kg
per sq cm.? ............................................................................................................................................ 10
Problem 4:...................................................................................................................................10

2|Page
Grades of the final examination in a training course are found to be normally distributed, with a
mean of 77 and a standard deviation of 8.5. Based on the given information answer the
questions below..............................................................................................................................10
4.1 What is the probability that a randomly chosen student gets a grade below 85 on this
exam? ..................................................................................................................................................... 10
4.2 What is the probability that a randomly selected student scores between 65 and 87? ....... 11
4.3 What should be the passing cut-off so that 75% of the students clear the exam? ................ 11
Zingaro stone printing is a company that specializes in printing images or patterns on polished
or unpolished stones. However, for the optimum level of printing of the image the stone surface
has to have a Brinell's hardness index of at least 150. Recently, Zingaro has received a batch of
polished and unpolished stones from its clients. Use the data provided to answer the following
(assuming a 5% significance level); ............................................................................................... 12
5.1 Earlier experience of Zingaro with this particular client is favorable as the stone surface
was found to be of adequate hardness. However, Zingaro has reason to believe now that the
unpolished stones may not be suitable for printing. Do you think Zingaro is justified in
thinking so? ........................................................................................................................................... 12
5.2 Is the mean hardness of the polished and unpolished stones the same?............................... 13
Problem 6:................................................................................................................................... 13
Aquarius health club, one of the largest and most popular cross-fit gyms in the country has been
advertising a rigorous program for body conditioning. The program is considered successful if
the candidate is able to do more than 5 push-ups, as compared to when he/she enrolled in the
program. Using the sample data provided can you conclude whether the program is successful?
(Consider the level of Significance as 5%) ..................................................................................... 13
Note that this is a problem of the paired-t-test. Since the claim is that the training will make a
difference of more than 5, the null and alternative hypotheses must be formed accordingly. ..... 13
Problem 7: ................................................................................................................................... 14
Dental implant data: The hardness of metal implant in dental cavities depends on multiple
factors, such as the method of implant, the temperature at which the metal is treated, the alloy
used as well as on the dentists who may favour one method above another and may work better
in his/her favourite method. The response is the variable of interest. ......................................... 14
7.1 Test whether there is any difference among the dentists on the implant hardness. State the
null and alternative hypotheses. Note that both types of alloys cannot be considered together.
You must state the null and alternative hypotheses separately for the two types of alloys.? .... 14
7.2 Before the hypotheses may be tested, state the required assumptions. Are the assumptions
fulfilled? Comment separately on both alloy types.? ...................................................................... 14
7.3 Irrespective of your conclusion in 2, we will continue with the testing procedure. What do
you conclude regarding whether implant hardness depends on dentists? Clearly state your
conclusion. If the null hypothesis is rejected, is it possible to identify which pairs of dentists
differ? ..................................................................................................................................................... 14

3|Page
7.4 Now test whether there is any difference among the methods on the hardness of dental
implant, separately for the two types of alloys. What are your conclusions? If the null
hypothesis is rejected, is it possible to identify which pairs of methods differ?.......................... 15
7.5 Now test whether there is any difference among the temperature levels on the hardness of
dental implant, separately for the two types of alloys. What are your conclusions? If the null
hypothesis is rejected, is it possible to identify which levels of temperatures differ? ................ 15
7.6 Consider the interaction effect of dentist and method and comment on the interaction
plot, separately for the two types of alloys? ...................................................................................... 16
7.7 Now consider the effect of both factors, dentist, and method, separately on each alloy.
What do you conclude? Is it possible to identify which dentists are different, which methods
are different, and which interaction levels are different?............................................................... 17

4|Page
Problem 1

A physiotherapist with a male football team is interested in studying the relationship


between foot injuries and the positions at which the players play from the data collected

Striker Forward Attacking Midfielder Winger Total

Players Injured 45 56 24 20 145

Players Not Injured 32 38 11 9 90

Total 77 94 35 29 235

1.1 What is the probability that a randomly chosen player would suffer an injury?
Solution: Probability that a randomly chosen player would suffer an injury

= Total Number of Players Injured/ Total Number of Players

= 145/235

= 0.62.

1.2 What is the probability that a player is a forward or a winger?


Solution: Probability that a player is a forward or a winger

= (Number of Forwards + Number of Wingers) */ Total Number of Players

= (94+29)/235

= 0.52.

*NOTE1: Addition was done because a player can be either forward or a winger and
cannot be both at the same time. So, no intersection possible.

1.3 What is the probability that a randomly chosen player plays in a striker position and
has a foot injury?
Solution: Probability that a randomly chosen player plays in a striker position and has
a foot injury

= Number of Injured Strikers/ Total Number of Players

= 45/235

= 0.19.

5|Page
1.4 What is the probability that a randomly chosen injured player is a striker?
Solution: Probability that a randomly chosen injured player is a striker

= Number of Injured Strikers/ Total Number of Injured Players

= 45/145

= 0.31.

1.5 What is the probability that a randomly chosen injured player is either a forward or
an attacking midfielder?
Solution: Probability that a randomly chosen injured player is either a forward or an
attacking midfielder

= (Number of Injured Forwards + Number of Injured Attacking Midfielders**)/


Total Number of Injured Players

= (56+24)/145

= 80/145

= 0.55.

**NOTE2: Addition was done because a player can be either forward or an attacking
midfielder and cannot be both at the same time. So, no intersection possible.

Problem 2

An independent research organization is trying to estimate the probability that an


accident at a nuclear power plant will result in radiation leakage. The types of accidents
possible at the plant are fire hazards, mechanical failure, or human error. The research
organization also knows that two or more types of accidents cannot occur
simultaneously.

According to the studies carried out by the organization, the probability of a radiation
leak in case of a fire is 20%, the probability of a radiation leak in case of a mechanical
50%, and the probability of a radiation leak in case of a human error is 10%. The studies
also showed the following;
• The probability of a radiation leak occurring simultaneously with a fire is 0.1%.
• The probability of a radiation leak occurring simultaneously with a mechanical
failure is 0.15%.
• The probability of a radiation leak occurring simultaneously with a human error
is 0.12%.
On the basis of the information available, answer the questions below:

6|Page
2.1 What are the probabilities of a fire, a mechanical failure, and a human error
respectively?
Solution: Following are the events:

1. RL = Radiation Leak
2. F = Fire
3. MF = Mechanical Failure
4. HE = Human Error
5. NA = No Accident*

***NOTE3: Event 1 is the accident event due to radiation leak and events 2,3 and 4
are subsets of that. Event 5 however is a disjoint set and completes the Universal set of
all possible accidental and non-accidental possibilities.

Let P(RL) is Probability of Radiation Leak and similarly for other events. The given
probabilities are the following:

1. P(RL|F) = 20% = 0.2


2. P(RL|MF) = 50% = 0.5
3. P(RL|HE) = 10% = 0.1
4. P(RL⋂F) = 0.1% = 0.001
5. P(RL⋂MF) = 0.15% = 0.0015
6. P(RL⋂HE) = 0.12% = 0.0012

Probability of Fire, P(F)

= P(RL⋂F)/P(RL|F) = 0.001/0.2 = 0.005.

Probability of Mechanical Failure, P(MF)

= P(RL⋂MF)/P(RL|MF) = 0.0015/0.5 = 0.003.

Probability of Human Error, P(HE)

= P(RL⋂HE)/P(RL|HE) = 0.0012/0.1 = 0.012.

2.2 What is the probability of a radiation leak?


Solution: Since, Radiation Leak happens due to 3 types of accidents viz., 1. Fire
hazards 2. Mechanical Failure and 3. Human Error, which cannot occur simultaneously.
And hence,

P(RL) = P(RL⋂F) + P(RL⋂MF) + P(RL⋂HE)

= 0.001 + 0.0015 + 0.0012

= 0.0037. [Total Probability = Sum of Individual Non-intersecting probabilities]

7|Page
2.3 Suppose there has been a radiation leak in the reactor for which the definite cause is
not known. What is the probability that it has been caused by:
i. A Fire.
P(F|RL) = P(RL⋂F)/P(RL) = 0.001/0.0037 = 0.27.

ii. A Mechanical Failure.


P(MF|RL) = P(RL⋂MF)/P(RL) = 0.0015/0.0037 = 0.41.

iii. A Human Error.


P(HE|RL) = P(RL⋂HE)/P(RL) = 0.0012/0.0037 = 0.32.

Problem 3:

The breaking strength of gunny bags used for packaging cement is normally distributed
with a mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg per sq.
centimeter. The quality team of the cement company wants to know the following about
the packaging material to better understand wastage or pilferage within the supply
chain; Answer the questions below based on the given information;
(Provide an appropriate visual representation of your answers, without
which marks will be deducted)

Solution: Probability Distribution Function of Breaking Strength of Gunny Bags is


given as, X~N (µ = 5, σ = 1.5)

3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg per sq
cm?
P (X < 3.17) = 0.11.

8|Page
3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per sq
cm.?
P (X >= 3.6) = 1 - P (X<3.6) = 0.82.

3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5 kg
per sq cm.?
P (5< X <5.5) = P (X<5.5) – P (X<5) = 0.13.

9|Page
3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and 7.5
kg per sq cm.?
P (X < 3 and X > 7.5) = P (X<3) + 1 - P (X<7.5) = 0.14.

[Please refer to the attached RAJIV_RANJAN_11Dec_2022.ipynb python


programming file for details of calculation from Problem 3 and later.]

Problem 4:

Grades of the final examination in a training course are found to be normally


distributed, with a mean of 77 and a standard deviation of 8.5. Based on the given
information answer the questions below.
Solution: Probability Distribution Function of Grades in final examination is given as,
X~N (µ = 77, σ = 8.5)
4.1 What is the probability that a randomly chosen student gets a grade below 85 on this
exam?
P (X < 85) = 0.83.

10 | P a g e
4.2 What is the probability that a randomly selected student scores between 65 and 87?
P (65< X <87) = P (X<87) – P (X<65) = 0.80.

4.3 What should be the passing cut-off so that 75% of the students clear the exam?

P(X>x) = 0.75 and z = (x- µ)/ σ


Hence, (x-77)/8.5=-0.6745 (Please refer Python code)
=>x = 77 – 5.73325
=> x = 71.27.

11 | P a g e
Problem 5:

Dataset - Link

Zingaro stone printing is a company that specializes in printing images or patterns on


polished or unpolished stones. However, for the optimum level of printing of the image
the stone surface has to have a Brinell's hardness index of at least 150. Recently, Zingaro
has received a batch of polished and unpolished stones from its clients. Use the data
provided to answer the following (assuming a 5% significance level);
5.1 Earlier experience of Zingaro with this particular client is favorable as the stone
surface was found to be of adequate hardness. However, Zingaro has reason to believe
now that the unpolished stones may not be suitable for printing. Do you think Zingaro is
justified in thinking so?
Solution: First, hypothesizing for unpolished stones, we have:

Step1: Ho (Null Hypothesis) -> Sample Mean >=150

H1 (Alternate Hypothesis) -> Sample Mean <150

Step2: Given, α = 0.05

Step 3: 1 sample z-test as this is a one-sided sample with sample size is 75>30

Step 4: µ = 150 and n=75.

Also, X̄ unpolished= 134.11, S unpolished = 33.04, X̄ polished= 147.79, S polished = 15.59.

Type of Stone Unpolished Treated and Polished


mean 134.11053 147.7881
stddev 33.041804 15.58736
0.9999844 0.890447
p-value 1.559E-05 0.109553
Two-tailed t-test 0.0007328

Step 5: p-value for unpolished~0 < 0.05 (i.e. α) and hence Null Hypothesis is rejected in
that case. So, unpolished stones do not have a Brinell's hardness index of at
least 150. Whilst, in case of treated and polished p-value is 0.11 > 0.05 (i.e. α) and
hence Null Hypothesis cannot rejected in that case. So, treated, and polished
stones have a Brinell's hardness index of at least 150.

INFERENCE 1: Hence, Zingaro is right in thinking that unpolished stones not


suitable for printing unlike the treated and polished stones which have the right
fitment.

12 | P a g e
5.2 Is the mean hardness of the polished and unpolished stones the same?

Solution: Hypotheses - Null: mean polished = mean unpolished | Alternate : mean polished ≠
mean unpolished
Test: Two-tailed t-test, Also, X̄ unpolished= 134.11, S unpolished = 33.04, X̄ polished= 147.79, S
polished = 15.59 and n =75.

The p-value of the two-tailed test is 0.0007328 and is significantly less than 0.05 (i.e.
α), so alternate hypothesis prevails.

INFERENCE 2: Hence, the hardness of polishes and unpolished stones are


significantly different.

Problem 6:

Aquarius health club, one of the largest and most popular cross-fit gyms in the country
has been advertising a rigorous program for body conditioning. The program is
considered successful if the candidate is able to do more than 5 push-ups, as compared
to when he/she enrolled in the program. Using the sample data provided can you
conclude whether the program is successful? (Consider the level of Significance as 5%)

Note that this is a problem of the paired-t-test. Since the claim is that the training will
make a difference of more than 5, the null and alternative hypotheses must be formed
accordingly.
Dataset - Link

Solution:
Step1: Hypotheses - Null: mean difference <= 5
Alternate: mean difference > 5
Step 2: Given, α = 0.05

Step 3: Sample sizes for both samples are the same. We have two paired samples and we
do not know the population standard deviation. The sample is not a very large sample, n
= 100. So, I use the t distribution and the tSTAT test statistic for two sample paired test.

Step 4: Calculating the p-value and the t-statistic (Please refer attached. ipynb file for
details) - t stat is -19.323 and Paired two-sample t-test p-value= 7.768158524368873e-12.
We have enough evidence to reject the null hypothesis in favour of alternative
hypothesis

INFERENCE 3: Hence, the claim that the Aquarius gym training makes a difference of
more than 5 push-ups is supported by data.

13 | P a g e
Problem 7:

Dental implant data: The hardness of metal implant in dental cavities depends on
multiple factors, such as the method of implant, the temperature at which the metal is
treated, the alloy used as well as on the dentists who may favour one method above
another and may work better in his/her favourite method. The response is the variable
of interest.
Dataset - Link

7.1 Test whether there is any difference among the dentists on the implant hardness.
State the null and alternative hypotheses. Note that both types of alloys cannot be
considered together. You must state the null and alternative hypotheses separately for
the two types of alloys.?

Solution:
Hypotheses

Given the alloys viz., alloy1 and alloy 2 respectively in this case, we can hypothesize
for dentist’s implant hardness alloy-wise as follows:

Null: meandentist1 = meandentist2 = … = meandentist5

Alternate: meandentist1 ≠ meandentist2 ≠ … ≠ meandentist5

7.2 Before the hypotheses may be tested, state the required assumptions. Are the
assumptions fulfilled? Comment separately on both alloy types.?

Solution:
Given the alloys viz., alloy1 and alloy 2 respectively in this case, for hypothesis testing
the independence assumption between the independent variables viz., dentist, method,
temperature needs to be tested to avoid Multicollinearity problem.

Hence, Interaction(V1iV2j) = 0 (wherein V1 and V2 can be Dentist, Method, Alloy, and


Temp having 5, 3, 2, 3 values respectively in the dataset).

7.3 Irrespective of your conclusion in 2, we will continue with the testing procedure.
What do you conclude regarding whether implant hardness depends on dentists?
Clearly state your conclusion. If the null hypothesis is rejected, is it possible to identify
which pairs of dentists differ?
df sum_sq mean_sq F PR(>F)

C(Dentist) 4.0 1.577946e+05 39448.638889 1.934537 0.112066

Residual 85.0 1.733301e+06 20391.776471 NaN NaN

14 | P a g e
Since, p-value = 0.11> α = 0.05, failed to reject the null.

7.4 Now test whether there is any difference among the methods on the hardness of
dental implant, separately for the two types of alloys. What are your conclusions? If the
null hypothesis is rejected, is it possible to identify which pairs of methods differ?
df sum_sq mean_sq F PR(>F)

C(Method) 2.0 5.934275e+05 296713.744444 19.89268 7.683892e-08

Residual 87.0 1.297668e+06 14915.724904 NaN NaN

Since p-value is significantly less than α = 0.05, there is sufficient evidence against the
null hypothesis that all methods have equal impact on hardness. The alternate
hypothesis that one or more methods are significantly different from the other can be
accepted.

7.5 Now test whether there is any difference among the temperature levels on the
hardness of dental implant, separately for the two types of alloys. What are your
conclusions? If the null hypothesis is rejected, is it possible to identify which levels of
temperatures differ?
df sum_sq mean_sq F PR(>F)

C(Temp) 2.0 8.217802e+04 41089.011111 2.074835 0.131818

C(Alloy) 1.0 1.058155e+05 105815.511111 5.343270 0.023194

Residual 86.0 1.703102e+06 19803.511886 NaN NaN

df sum_sq mean_sq F PR(>F)

C(Temp) 2.0 8.217802e+04 41089.011111 1.976179 0.144771

Residual 87.0 1.808918e+06 20792.155556 NaN NaN

sum_sq mean_sq F PR(>F)


df

C(Temp):C(Alloy) 5.0 2.097189e+05 41943.777778 2.095472 0.073914

Residual 84.0 1.681377e+06 20016.388889 NaN NaN

15 | P a g e
Since, p-value> α = 0.05, failed to reject the null. So, there is no difference among the
temperature levels on the hardness of dental implant, separately for the two types of
alloys.

7.6 Consider the interaction effect of dentist and method and comment on the
interaction plot, separately for the two types of alloys?

Figure 7.1. Interaction Plot of Dentist and Method

Figure 7.2. Interaction Plot of Alloy and Method

16 | P a g e
7.7 Now consider the effect of both factors, dentist, and method, separately on each
alloy. What do you conclude? Is it possible to identify which dentists are different, which
methods are different, and which interaction levels are different?
PR(&gt;
index df sum_sq mean_sq F
F)
39448.638888888
C(Dentist) 4.0 157794.55555555588 0.0 NaN
97
296713.74444444
C(Method) 2.0 593427.488888888 0.0 NaN
4
105815.51111111
C(Alloy) 1.0 105815.5111111111 0.0 NaN
11
41089.011111111
C(Temp) 2.0 82178.02222222215 0.0 NaN
075
38308.980555555
C(Dentist):C(Method) 8.0 306471.8444444442 0.0 NaN
53
1421.7611111111
C(Dentist):C(Alloy) 4.0 5687.04444444444 0.0 NaN
1
16803.038888888
C(Dentist):C(Temp) 8.0 134424.31111111125 0.0 NaN
907
27342.544444444
C(Alloy):C(Method) 2.0 54685.0888888889 0.0 NaN
45
7663.1111111111
C(Temp):C(Method) 4.0 30652.44444444441 0.0 NaN
02
10862.677777777
C(Alloy):C(Temp) 2.0 21725.3555555555 0.0 NaN
75
9616.1694444444
C(Dentist):C(Method):C(Alloy) 8.0 76929.35555555556 0.0 NaN
46
16. 11344.868055555
C(Dentist):C(Method):C(Temp) 181517.8888888889 0.0 NaN
0 557
4142.8111111111
C(Method):C(Alloy):C(Temp) 4.0 16571.24444444444 0.0 NaN
1
5930.4277777777
C(Temp):C(Alloy):C(Dentist) 8.0 47443.422222222114 0.0 NaN
64
C(Dentist):C(Method):C(Alloy):C( 16. 4735.7486111111
75771.97777777785 0.0 NaN
Temp) 0 16
2.4905891256088818 Na
Residual 0.0 Infinity NaN
e-22 N

The above table helps us infer the existence of the requisite differentials. However,
identifying the exact variables may not be possible from this.

17 | P a g e

View publication stats

You might also like