SDSM - Cold Storage Case Project: August 30, 2019 Pg-Babi Authored By: Saloni Sachdeva
SDSM - Cold Storage Case Project: August 30, 2019 Pg-Babi Authored By: Saloni Sachdeva
Case Project
PG-BABI
Authored by: Saloni Sachdeva
1
S.No. Content Page
1 Problem Statements 3
2 Project Objectives 4
3 Assumptions 4
4 Exploratory Data Analysis 4
5 Problem 1 - Solution 4
5.1 Problem 1 - Q1 – Solution 4
5.2 Problem 1 - Q2 – Solution 5
5.3 Problem 1 - Q3 – Solution 5
5.4 Problem 1 - Q4 – Solution 5
5.5 Problem 1 - Q5 – Solution 5
5.6 Problem 1 - Q6 – Solution 6
6 Problem 2 - Solution 6
6.1 Problem 2 - Z test Solution 6
6.2 Problem 2 - T test Solution 7
6.3 Problem 3 - Inference of tests 8
7 Appendix 9
7.1 Problem 1 - R code 9
7.2 Problem 2 - R code 10
2
1. Problem Statement
Problem 1
Cold Storage started its operations in Jan 2016. They are in the business of storing Pasteurized Fresh Whole or
Skimmed Milk, Sweet Cream, Flavoured Milk Drinks. To ensure that there is no change of texture, body appearance,
separation of fats the optimal temperature to be maintained is between 2 deg - 4 deg C.
In the first year of business they outsourced the plant maintenance work to a professional company with stiff penalty
clauses. It was agreed that if it was statistically proven that probability of temperature going outside the 2 degrees -
4 degrees C during the one-year contract was above 2.5% and less than 5% then the penalty would be 10% of AMC
(annual maintenance case). In case it exceeded 5% then the penalty would be 25% of the AMC fee. The average
temperature data at date level is given in the file “Cold_Storage_Temp_Data.csv”
1. Find mean cold storage temperature for Summer, Winter and Rainy Season
2. Find overall mean for the full year
3. Find Standard Deviation for the full year
4. Assume Normal distribution, what is the probability of temperature having fallen below 2 deg C?
5. Assume Normal distribution, what is the probability of temperature having gone above 4 deg C?
6. What will be the penalty for the AMC Company?
Problem 2
In Mar 2018, Cold Storage started getting complaints from their Clients that they have been getting complaints from
end consumers of the dairy products going sour and often smelling. On getting these complaints, the supervisor pulls
out data of last 35 days temperatures. As a safety measure, the Supervisor has been vigilant to maintain the
temperature below 3.9 deg C.
Assume 3.9 deg C as upper acceptable temperature range and at alpha = 0.1 do you feel that there is need for some
corrective action in the Cold Storage Plant or is it that the problem is from procurement side from where Cold Storage
is getting the Dairy Products. The data of the last 35 days is in “Cold_Storage_Mar2018.csv”
[Use the same standard deviation that you have calculated from the first problem wherever you
think is necessary]
3
2. Project Objective
The objective of the project report is to explore the Cold Storage data set (“Cold_Storage_Temp_Data”) in R and
generate insights about the data set. We will try to solve the problem statements given in the above section through
these insights. This exploration report will consist of the following:
a. Importing the dataset in R
b. Understanding the structure of dataset
c. Graphical exploration
d. Descriptive statistics
e. Hypothesis testing
f. Insights from the dataset
3. Assumptions
Problem 1
We assumed data to be normally distributed in the problem 1 to calculate the probabilities required in the problem.
Problem 2
We assumed data to be normally distributed in the problem 2 for hypothesis testing.
5.1. Mean of cold storage temperature for Summer, Winter and Rainy Season
After environment setup, we will assign the dataset to a variable named “A” and check its structure. To calculate
mean of temperature separately for different seasons, we are using Pivot table.
We will call the rpivotTable library. We will define the variable “A” in pivot Table and calculate mean of each season
as below:
4
5.2. Overall mean for the full year
To calculate the overall mean of temperature for the full year we will use the mean function. We will assign mean to
a vector named “T_mean1” as below:
T_mean1=mean(A$Temperature)
5
5.6. What will be the penalty for the AMC Company?
As per the problem statement it was statistically proven that probability of temperature going outside the 2 degrees
- 4 degrees C during the one-year contract was above 2.5% and less than 5% then the penalty would be 10% of AMC
(annual maintenance case). Thus, we will add the probabilities calculated in 1.4 & 1.5:
We will convert the probability into percentage by assigning it to a vector named “Prob_Per” as below as below:
Prob_Per= Prob_2x4*100
The answer to the above is 4.98%. This means that the penalty for the AMC Company is 10%.
The sole purpose of the test is to check if the average temperature of the storage is at optimum level required for the
quality of the dairy product. Now, we will state our hypothesis basis the given conditions of the cold storage plant.
Hypothesis Formulation
Ho: 𝜇=3.9 (Mean Temperature is equal to 3.9-degree C)
Ha: 𝜇<3.9 (Mean Temperature is less than 3.9-degree C)
The level of significance (Alpha) = 0.1 is given in the problem statement. This will lead to the 90% level of confidence
for the test (1- alpha).
Z Test Equation:
6
Now we will calculate the P value using the below;
Pvalue = pnorm(abs(Z))
Since the P value is greater than alpha, thus, Ho, i.e. Null hypothesis is true.
T test is generally used for small samples. It is used when the standard deviation of the population is unknown.
After environment setup, we will assign the dataset to a variable named “B”.
• The level of significance (Alpha) = 0.1 is given in the problem statement. This will lead to the 90% level of
confidence for the test (1- alpha).
• Actual Mean, Mu = 3.9 which is provided in the Problem 2.
• The sample size, N = 35 which is sufficiently large for a Z-Test.
• The population standard deviation, sd = 0.5085 which is calculated in Problem 1.
• The sample mean,
• SM=mean(B$Temperature)
• Actual Mean, Mu = 3.9 which is provided in the Problem 2.
Now, we will state our hypothesis basis the given conditions of the cold storage plant.
Hypothesis Formulation
We will perform the One Sample t-test
Ho: 𝜇=3.9 (Mean Temperature is equal to 3.9-degree C)
Ha: 𝜇<3.9 (Mean Temperature is less than 3.9-degree C)
T Test Equation:
Since the P value is greater than alpha, thus, Ho, i.e. Null hypothesis is true.
OR,
7
Since the value of mean of x is 3.974286 which is above 3.9, thus, this proves that the null hypothesis is true.
From the given data and tests conducted above, it may be concluded that:
• Statistically temperature of the cold storage facility is not maintained at an adequate level
• There should be a schedule for periodic check in the Cold Storage to ensure adequate temperature level
• In the current situation there is a need for some corrective action in the Cold Storage Plant immediately
8
7. Appendix A – Source Code
>
9
7.2 R Code: Problem 2
#==========================================================================================
# # Exploratory Data Analysis – Cold Storage # #
#==========================================================================================
> #Environment Set up and Data Import
> # Setup Working Directory
> setwd("C:/Users/salon794/Documents/Great Lakes/Projects/Project 1")
> getwd()
[1] "C:/Users/salon794/Documents/Great Lakes/Projects/Project 1"
> B=read.csv("Cold_Storage_Mar2018.csv")
> ##Z test
> ##we are assuming that Ho: temp is equal to 3.9
> ##Ha=temp is less than 3.9
> ##z=(Xbar-actual mean)/(Population SD/Sq.root of "sample space")
> sd=0.5085
> SM=mean(B$Temperature)
> SM
[1] 3.974286
> Mu=3.9
> n=35
> alpha=0.1
> Z=(SM-Mu)/(sd/n^.5)
>Z
[1] 0.8642679
> Pvalue = pnorm(abs(Z))
> ##Since the P value is greater than alpha, thus, Ho is true, ie, when is high, null will fly.
> Pvalue
[1] 0.8062796
> t=(SM-Mu)/(sd/(n-1)^.5)
>t
[1] 0.8518317
> Pvalue = pnorm(abs(t))
> ##Since the p value is greater than alpha, thus, null hyp is true, ie, when P is high null will fly
> Pvalue
[1] 0.8028462
> ##OR
> t.test(B$Temperature,y=NULL, alternative = "less", mu=3.9, conf.level = 0.90)
data: B$Temperature
t = 2.7524, df = 34, p-value = 0.9953
alternative hypothesis: true mean is less than 3.9
90 percent confidence interval:
-Inf 4.00956
sample estimates:
mean of x
3.974286
10