0% found this document useful (0 votes)
130 views

SDSM - Cold Storage Case Project: August 30, 2019 Pg-Babi Authored By: Saloni Sachdeva

1. The document describes a case study analyzing temperature data from a cold storage facility. It includes two problems to solve related to analyzing the temperature data. 2. For the first problem, the document asks to calculate temperature statistics like mean and standard deviation, and use these to determine probabilities and penalties related to temperature thresholds. 3. The second problem asks to test hypotheses about whether the average temperature exceeds an acceptable limit, using z-tests and t-tests on more recent temperature data. 4. The objective is to explore the temperature data sets, calculate relevant statistics, and use these to solve the problems by determining probabilities, penalties, and testing hypotheses.

Uploaded by

Nishank Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

SDSM - Cold Storage Case Project: August 30, 2019 Pg-Babi Authored By: Saloni Sachdeva

1. The document describes a case study analyzing temperature data from a cold storage facility. It includes two problems to solve related to analyzing the temperature data. 2. For the first problem, the document asks to calculate temperature statistics like mean and standard deviation, and use these to determine probabilities and penalties related to temperature thresholds. 3. The second problem asks to test hypotheses about whether the average temperature exceeds an acceptable limit, using z-tests and t-tests on more recent temperature data. 4. The objective is to explore the temperature data sets, calculate relevant statistics, and use these to solve the problems by determining probabilities, penalties, and testing hypotheses.

Uploaded by

Nishank Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SDSM – Cold Storage

Case Project

August 30, 2019

PG-BABI
Authored by: Saloni Sachdeva

1
S.No. Content Page
1 Problem Statements 3
2 Project Objectives 4
3 Assumptions 4
4 Exploratory Data Analysis 4
5 Problem 1 - Solution 4
5.1 Problem 1 - Q1 – Solution 4
5.2 Problem 1 - Q2 – Solution 5
5.3 Problem 1 - Q3 – Solution 5
5.4 Problem 1 - Q4 – Solution 5
5.5 Problem 1 - Q5 – Solution 5
5.6 Problem 1 - Q6 – Solution 6
6 Problem 2 - Solution 6
6.1 Problem 2 - Z test Solution 6
6.2 Problem 2 - T test Solution 7
6.3 Problem 3 - Inference of tests 8
7 Appendix 9
7.1 Problem 1 - R code 9
7.2 Problem 2 - R code 10

2
1. Problem Statement
Problem 1

Cold Storage started its operations in Jan 2016. They are in the business of storing Pasteurized Fresh Whole or
Skimmed Milk, Sweet Cream, Flavoured Milk Drinks. To ensure that there is no change of texture, body appearance,
separation of fats the optimal temperature to be maintained is between 2 deg - 4 deg C.

In the first year of business they outsourced the plant maintenance work to a professional company with stiff penalty
clauses. It was agreed that if it was statistically proven that probability of temperature going outside the 2 degrees -
4 degrees C during the one-year contract was above 2.5% and less than 5% then the penalty would be 10% of AMC
(annual maintenance case). In case it exceeded 5% then the penalty would be 25% of the AMC fee. The average
temperature data at date level is given in the file “Cold_Storage_Temp_Data.csv”

1. Find mean cold storage temperature for Summer, Winter and Rainy Season
2. Find overall mean for the full year
3. Find Standard Deviation for the full year
4. Assume Normal distribution, what is the probability of temperature having fallen below 2 deg C?
5. Assume Normal distribution, what is the probability of temperature having gone above 4 deg C?
6. What will be the penalty for the AMC Company?

Problem 2

In Mar 2018, Cold Storage started getting complaints from their Clients that they have been getting complaints from
end consumers of the dairy products going sour and often smelling. On getting these complaints, the supervisor pulls
out data of last 35 days temperatures. As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.

Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1 do you feel that there is need for some
corrective action in the Cold Storage Plant or is it that the problem is from procurement side from where Cold Storage
is getting the Dairy Products. The data of the last 35 days is in “Cold_Storage_Mar2018.csv”

[Use the same standard deviation that you have calculated from the first problem wherever you
think is necessary]

1. State the Hypothesis, do the calculation using z test


2. State the Hypothesis, do the calculation using t-test
3. Give your inference after doing both the tests

3
2. Project Objective
The objective of the project report is to explore the Cold Storage data set (“Cold_Storage_Temp_Data”) in R and
generate insights about the data set. We will try to solve the problem statements given in the above section through
these insights. This exploration report will consist of the following:
a. Importing the dataset in R
b. Understanding the structure of dataset
c. Graphical exploration
d. Descriptive statistics
e. Hypothesis testing
f. Insights from the dataset

3. Assumptions
Problem 1
We assumed data to be normally distributed in the problem 1 to calculate the probabilities required in the problem.

Problem 2
We assumed data to be normally distributed in the problem 2 for hypothesis testing.

4. Exploratory Data Analysis


We will setup the environment by installing necessary packages and invoke associated libraries. As a next step, we
will setup the working directory in R to access the data set “Cold Storage” and analysis it for insights.

5. Solution for Problem 1

5.1. Mean of cold storage temperature for Summer, Winter and Rainy Season
After environment setup, we will assign the dataset to a variable named “A” and check its structure. To calculate
mean of temperature separately for different seasons, we are using Pivot table.
We will call the rpivotTable library. We will define the variable “A” in pivot Table and calculate mean of each season
as below:

Mean temperature of various seasons are as follows:


1. Rainy season = 3.04 Degrees
2. Summer season = 3.15 Degrees
3. Winter season = 2.70 Degrees

4
5.2. Overall mean for the full year
To calculate the overall mean of temperature for the full year we will use the mean function. We will assign mean to
a vector named “T_mean1” as below:

T_mean1=mean(A$Temperature)

Overall mean temperature of the data: 2.96 Degrees

5.3. Standard deviation for the full year


To calculate the standard deviation of temperature for the full year we will use the sd function. We will assign standard
deviation to a vector named “T_SD” as below:

T_SD=sd(A$Temperature, na.rm = TRUE)

Standard deviation of the data: 0.51 Degrees

5.4. Probability of temperature having fallen below 2 deg C


We will first assume the data to be normally distributed. To calculate the probability of temperature having fallen
below 2 deg C we will use the pnorm function. We will assign probability to a vector named “Prob_less2” as below:

Prob_less2= pnorm(2, mean = T_mean1, sd=T_SD)

where, 2 is the vector of quantile.


mean is the vector of overall mean.
sd is the vector of standard deviation for the full year.
So, there is 2.92% probability of temperature having fallen below 2 Degrees

5.5. Probability of temperature having gone above 4 deg C


We will first assume the data to be normally distributed. To calculate the probability of temperature having gone
above 4 deg C we will use the pnorm function. We will assign probability to a vector named “Prob_more4” as below:

Prob_more4=1-pnorm(4, mean = T_mean1, sd=T_SD)

where, 2 is the vector of quantile.


mean is the vector of overall mean.
sd is the vector of standard deviation for the full year.
So, there is 2.07% probability of temperature having gone above 4 Degrees

5
5.6. What will be the penalty for the AMC Company?
As per the problem statement it was statistically proven that probability of temperature going outside the 2 degrees
- 4 degrees C during the one-year contract was above 2.5% and less than 5% then the penalty would be 10% of AMC
(annual maintenance case). Thus, we will add the probabilities calculated in 1.4 & 1.5:

Prob_2x4= Prob_more4+ Prob_less2

We will convert the probability into percentage by assigning it to a vector named “Prob_Per” as below as below:

Prob_Per= Prob_2x4*100

The answer to the above is 4.98%. This means that the penalty for the AMC Company is 10%.

6. Solution for Problem 2

6.1 State the Hypothesis, do the calculation using z test


Z test is generally used for large samples. It is used when the standard deviation of the population is known.
Z is denoted by, sampling error to standard error.
• After environment setup, we will assign the dataset to a variable named “B”.
• The sample size, N = 35 which is sufficiently large for a Z-Test.
• The population standard deviation, sd = 0.5085 which is calculated in Problem 1.
• The sample mean,
• SM=mean(B$Temperature)
• Actual Mean, Mu = 3.9 which is provided in the Problem 2.

The sole purpose of the test is to check if the average temperature of the storage is at optimum level required for the
quality of the dairy product. Now, we will state our hypothesis basis the given conditions of the cold storage plant.

Hypothesis Formulation
Ho: 𝜇=3.9 (Mean Temperature is equal to 3.9-degree C)
Ha: 𝜇<3.9 (Mean Temperature is less than 3.9-degree C)

The level of significance (Alpha) = 0.1 is given in the problem statement. This will lead to the 90% level of confidence
for the test (1- alpha).

Z Test Equation:

Z = (sample mean - actual mean)/ (SD/Square root of "sample space"),


i.e., Z=(SM-Mu)/(sd/n^.5)

6
Now we will calculate the P value using the below;
Pvalue = pnorm(abs(Z))

Since the P value is greater than alpha, thus, Ho, i.e. Null hypothesis is true.

6.2 State the Hypothesis, do the calculation using t test

T test is generally used for small samples. It is used when the standard deviation of the population is unknown.
After environment setup, we will assign the dataset to a variable named “B”.
• The level of significance (Alpha) = 0.1 is given in the problem statement. This will lead to the 90% level of
confidence for the test (1- alpha).
• Actual Mean, Mu = 3.9 which is provided in the Problem 2.
• The sample size, N = 35 which is sufficiently large for a Z-Test.
• The population standard deviation, sd = 0.5085 which is calculated in Problem 1.
• The sample mean,
• SM=mean(B$Temperature)
• Actual Mean, Mu = 3.9 which is provided in the Problem 2.
Now, we will state our hypothesis basis the given conditions of the cold storage plant.

Hypothesis Formulation
We will perform the One Sample t-test
Ho: 𝜇=3.9 (Mean Temperature is equal to 3.9-degree C)
Ha: 𝜇<3.9 (Mean Temperature is less than 3.9-degree C)

T Test Equation:

t = (sample mean - actual mean)/ (SD/Square root of "sample space" - 1),


i.e., t=(SM-Mu)/[sd/(n-1)^.5]

Now we will calculate the P value using the below;


Pvalue = pnorm(abs(t))

Since the P value is greater than alpha, thus, Ho, i.e. Null hypothesis is true.

OR,

t.test(B$Temperature,y=NULL, alternative = "less", mu=3.9, conf.level = 0.90)

7
Since the value of mean of x is 3.974286 which is above 3.9, thus, this proves that the null hypothesis is true.

6.3 Give your inference after doing both the tests

From the given data and tests conducted above, it may be concluded that:
• Statistically temperature of the cold storage facility is not maintained at an adequate level
• There should be a schedule for periodic check in the Cold Storage to ensure adequate temperature level
• In the current situation there is a need for some corrective action in the Cold Storage Plant immediately

8
7. Appendix A – Source Code

7.1 R Code: Problem 1


#==========================================================================================
# # Exploratory Data Analysis – Cold Storage # #
#==========================================================================================

> #Environment Set up and Data Import


> # Setup Working Directory
> setwd("C:/Users/salon794/Documents/Great Lakes/Projects/Project 1")
> getwd()
[1] "C:/Users/salon794/Documents/Great Lakes/Projects/Project 1"
> # Read Input File
> A=read.csv("Cold_Storage_Temp_Data.csv")
> library(rpivotTable)
> rpivotTable(A)
> ##Find out over all mean
> T_mean1=mean(A$Temperature)
> T_mean1
[1] 2.96274
> ##Find out over all SD
> T_SD=sd(A$Temperature, na.rm = TRUE)
> T_SD
[1] 0.508589
> ##Find out probability of temperature having fallen below 2 deg C
> Prob_less2= pnorm(2, mean = T_mean1, sd=T_SD)
> Prob_less2
[1] 0.02918146
> ##Find out probability of temperature having gone above 4 deg C
> Prob_more4=1-pnorm(4, mean = T_mean1, sd=T_SD)
> Prob_more4
[1] 0.02070077
> ##Find out probability of temperature having fallen below 2 deg C and gone above 4 deg C
> Prob_2x4= Prob_more4+ Prob_less2
> Prob_2x4
[1] 0.04988223
> ##convert the probability into percentage
> Prob_Per=Prob_2x4*100
> Prob_Per
[1] 4.988223

>

9
7.2 R Code: Problem 2
#==========================================================================================
# # Exploratory Data Analysis – Cold Storage # #
#==========================================================================================
> #Environment Set up and Data Import
> # Setup Working Directory
> setwd("C:/Users/salon794/Documents/Great Lakes/Projects/Project 1")
> getwd()
[1] "C:/Users/salon794/Documents/Great Lakes/Projects/Project 1"
> B=read.csv("Cold_Storage_Mar2018.csv")
> ##Z test
> ##we are assuming that Ho: temp is equal to 3.9
> ##Ha=temp is less than 3.9
> ##z=(Xbar-actual mean)/(Population SD/Sq.root of "sample space")
> sd=0.5085
> SM=mean(B$Temperature)
> SM
[1] 3.974286
> Mu=3.9
> n=35
> alpha=0.1
> Z=(SM-Mu)/(sd/n^.5)
>Z
[1] 0.8642679
> Pvalue = pnorm(abs(Z))
> ##Since the P value is greater than alpha, thus, Ho is true, ie, when is high, null will fly.
> Pvalue
[1] 0.8062796
> t=(SM-Mu)/(sd/(n-1)^.5)
>t
[1] 0.8518317
> Pvalue = pnorm(abs(t))
> ##Since the p value is greater than alpha, thus, null hyp is true, ie, when P is high null will fly
> Pvalue
[1] 0.8028462
> ##OR
> t.test(B$Temperature,y=NULL, alternative = "less", mu=3.9, conf.level = 0.90)

One Sample t-test

data: B$Temperature
t = 2.7524, df = 34, p-value = 0.9953
alternative hypothesis: true mean is less than 3.9
90 percent confidence interval:
-Inf 4.00956
sample estimates:
mean of x
3.974286

10

You might also like