SMDM Assignment PDF
SMDM Assignment PDF
SMDM PROJECT
COLD STORAGE
1. PROJECT OBJECTIVE
This report aims to perform basic data analysis of a Cold Storage operation business for the
year 2016, from the “Cold_Storage_Temp _Data.csv
_Data.csv”” dataset. An attempt is also made to
understand the operational inefficacies reported in March 2018, from the
“Cold_Storage_Mar2018.csv” dataset through Hypothesis Testing. Finally, an inference is
drawn on the potential reason behind the customer complaints in March 2018.
2. ASSUMPTIONS
● # 3: Standard Deviation of 2018 sample data is same as that of 2016 population data
3.1 ENVIRONMENT
ENVIRONMENT SET UP AND DATA IMPORT
3.1.2 INSTALL
INSTALL P
PACKA
ACKAGES
GES & INVOKE SYSTEM LIBRARIES
● 4 packages invoked from system library for exploratory data analysis. Details on the
packages used & their corresponding
corr esponding purpose iiss given below:
● Average
Avera ge temperature data at date level, for the year 2016, is read from the
"Cold_Storage_Temp_Data.csv" file and stored in the object “Avg.Temp.Data.2016”
“Avg.Temp.Data.2016”
● Average temperature data for the last 35 days from March 2018 is read from the
"Cold_Storage_Mar2018.csv" file and stored in the object “Avg.Temp.Data.2018”
“Avg.Temp.Data.2018”
(Rows) each
This dataset contains 35 observations (Rows) each of 4 variables (Columns)
(Columns)
● 4 variables are stored in both the datasets. Details on the variable name & type is given
below:
DATASET 1: "Cold_Storage_Temp_Data.csv"
Variable Name
Variable Vari
Variable
able Type Value Range Total no. of values
Season Categorical & Nominal Summer, Winter, Rainy 365
Month Categorical & Nominal Jan, Feb, …, Nov, Dec
Dec 365
Date Numerical & Discrete
Di screte 1, 2, …, 30, 31
31 365
Temperature Numerical & Continuous
Continuou s 1.7 ~ 5 365
Variable
Variable Name Vari
Variable
able Type Value Range Total no. of values
Season Categorical & Nominal Summer 35
Month Categorical & Nominal Feb, Mar 35
INFERENCE: Target variable is ‘Temperature’ and the rest of the variables can be assumed
to be Input variables.
● Following functions are used for data import, manipulation & aggregation:
levels() To view all the levels for each categorical variable in both the datasets
To identify missing values, if any (denoted by NA) & also view the 5
summary()
number summary for each numeric variable in both the datasets
as.factor() To change data type of variables ‘Season’ & ‘Month’ (Character
‘Month’ (Character to Factor)
Data
Visualization
Data
Visualization
Variable 1 –
1 – ‘Temperature
Temperature’’ vs Variable 2 –
2 – ‘Season
Season’’
Class
(Numerical vs Categorical)
Data
Visualization
X-Axis Temperature
Y-Axis Percentage
Percentage of Total
Temperature Distribution
Inference Rainy & Winter Seasons –
Seasons – Right
Right Skewed (Towards the higher range)
range)
Summer Season –
Season – Symmetrical
Symmetrical
Variable 1 –
1 – ‘Season’ vs Variable 2 – ‘Temperature’
‘Temperature’
Class
(Categorical vs Numerical)
Data
Visualization
X-Axis Season
Y-Axis Temperature
1) Temperature variability is maximum in Rainy season, followed by Summer
season & the least in Winter season
Inference
2) Median temperature varies across the seasons & is maximum in Summer,
followed by Rainy season and the least in Winter season
Variable 1 –
1 – ‘Month’ vs Variable 2 – ‘Temperature’
‘Temperature’
Class
Numerical)
(Categorical vs Numerical)
Data
Visualization
X-Axis Month
Y-Axis Temperature
1) Temperature variability is maximum in Jun/Jul/Aug/Sep months (Rainy
season) & minimum in Jan/Feb months (Winter season)
Inference
2) Median temperature varies across the months & is maximum from Feb ~
May months, while it is minimum during Nov ~ Jan months
Variable 1 –
1 – ‘ Season
Season’’ vs Variable 2 –
2 – ‘Month
Month’’
Class
(Categorical vs Categorical)
Data
Visualization
X-Axis Season
● Presence of missing values in both data sets were checked using summary() function
● No missing values found
Rainy Season 5
Winter Season 3.9, 3.8 & 3.7
Summer Season -
Feb/ Mar/ Apr/ May/ Jun/ Jul/ Aug/ Nov/ Dec Months -
Sep Month 5
Year 2016 5
4 CONCLUSION
● The probability of the cold storage temperature going outside the optimal range of 2 ~ 4
degrees celsius in the year 2016 was calculated as 4.98%.
● Penalty imposed on the AMC company for the year 2016 would be 10% of AMC fees
● With 90% confidence level, it can estimated that the maximum cold storage temperature
would be 3.9 degrees and the potential reason behind the customer complaints can be
attributed to the procurement of dairy products
products
5 APPENDIX A – SOURCE
SOURCE CODE
# IMPORTING 2016 AVERAGE COLD STORAGE TEMPERATURE DATA FOR 365 DAYS (.csv
FILE)
Avg.Temp.Data.2016 <-
Avg.Temp.Data.2016 read_csv("Cold_Storage_Temp_Data.csv")
<- "Cold_Storage_Temp_Data.csv")
str(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # VIEW DATA TYPE
## Classes 'spec_tbl_df',
'spec_tbl_df', '
'tbl_df',
tbl_df', 'tbl' and 'data.frame':
'data.frame': 365 obs. of 4
variables:
## $ Season : chr "Winter" "Winter" "Winter" "Winter" ...
## $ Month : chr "Jan" "Jan" "Jan" "Jan" ...
## $ Date : num 1 2 3 4 5 6 7 8 9 10 ...
## $ TTemperature:
emperature: num 2.4 2.3 2.4 2.8 2 2.5
.5 2.
2.4
4 2.8 2.3 2.4 2.8 ...
## - attr(*, "spec")=
## .. cols(
## .. Season = col_character(),
## .. Month = col_character(),
## .. Date = col_double(),
## .. Temperature = col_double()
## .. )
summary(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # VIEW MISSING DATA, IF ANY
# CHANGE DATA TYPE (CHARACTER TO FACTOR) FOR COLUMNS - 'SEASON' & 'MONTH'
Avg.Temp.Data.2016$Season <-
Avg.Temp.Data.2016 <- as.factor(Avg.Temp.Data.2016$Season)
(Avg.Temp.Data.2016
.2016$Month <-
Avg.Temp.Data.2016
Avg.Temp.Data <- as.factor(Avg.Temp.Data.2016$Month)
(Avg.Temp.Data.2016
str(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # REVIEW DATA TYPE
## Classes 'spec_tbl_df',
'spec_tbl_df', '
'tbl_df',
tbl_df', 'tbl' and 'data.frame':
'data.frame': 365 obs. of 4
variables:
## $ Season : Factor w/ 3 levels "Rainy","Summer",..: 3 3 3 3 3 3 3 3
3 3 ...
## $ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5
5 5 5 ...
## $ Date : num 1 2 3 4 5 6 7 8 9 10 ...
## $ TTemperature:
emperature: num 2.4 2.3 2.4 2.8 2 2.5
.5 2.
2.4
4 2.8 2.3 2.4 2.8 ...
## - attr(*, "spec")=
## .. cols(
## .. Season = col_character(),
## .. Month = col_character(),
## .. Date = col_double(),
## .. Temperature = col_double()
## .. )
summary(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # REVIEW DATA SUMMARY
levels(Avg.Temp.Data.2016$Season)
(Avg.Temp.Data.2016
levels(Avg.Temp.Data.2016$Month)
(Avg.Temp.Data.2016
main =
="Histogram
"Histogram of Temperature Distribu
tion 2016" )
2016" )
Histogram1.Temp.2016
Histogram1.Temp.2016
Histogram2.Temp.2016
Histogram2.Temp.2016 <- hist(Avg.Temp.Data
<- .2016$Temperature, col =
(Avg.Temp.Data.2016 =
"RED"
"RED",
,
main =
= "Histogram of Temperature Distribution
2016",
2016",
= "Temperature"
xlab = "Temperature",
,
ylab =
= "Frequency"
"Frequency")
)
Histogram3.Temp.2016
Histogram3.Temp.2016 <- qplot(Temperature, data =
<- = Avg.Temp.Data
Avg.Temp.Data.2016
.2016,
,
= "Histogram of Temperature Distributio
main =
n 2016",
2016",
xlab =
= "Temperature",
"Temperature",
= "No of Days")
ylab = Days") # CHART: HISTOGRAM; VAR
IABLE: TEMPERATURE
.2016
Histogram3.Temp.2016
Histogram3.Temp
Barchart.Season.2016
Barchart.Season.2016 <- qplot(Season, data =
<- = Avg.Temp.Data
Avg.Temp.Data.2016
.2016,
,
main =
= "Barchart of Season Distribution 2016
",
xlab =
= "Season",
"Season",
ylab =
= "No. of Days")
Days") # CHART: BARCHART; VAR
IABLE: SEASON
.2016
Barchart.Season.2016
Barchart.Season
Histogram.Temp.Season.2016 <-
Histogram.Temp.Season.2016 histogram(~Temperature|factor(Season), data
<-
= Avg.Temp.Data
Avg.Temp.Data.2016
.2016,
,
= "Histogram of Temperature D
main =
istribution Across Season 2016")
2016")
.2016
Histogram.Temp.Season.2016
Histogram.Temp.Season
Barchart.Season.Month.2016
Barchart.Season.Month .2016 <- qplot(Season, fill = Month,
<- = Month, data =
= Avg.Temp.
Avg.Temp.
Data.2016
Data .2016,
, geom =
=
"bar"
"bar",
,
=
main = "Barchart of Season Distributio
n by Month 2016",
2016",
=
xlab = "Season"
"Season",
,
=
ylab = "No. of Days"
Days")
)
Barchart.Season.Month.2016
Barchart.Season.Month .2016
.2016 <-
Boxplot.Temp.Season.2016
Boxplot.Temp.Season qplot(Season, Temperature, data = Avg.Temp.Dat
<- = Avg.Temp.Dat
a.2016,
.2016, geom ==
"boxplot"
"boxplot",
,
=
main = "Boxplot of Temperature Distribut
ion Across Season 2016",
2016",
xlab =
=
"Season"
"Season",
,
=
ylab = "Temperature"
"Temperature")
)
Boxplot.Temp.Season.2016
Boxplot.Temp.Season .2016
Boxplot.Temp.Month.2016
Boxplot.Temp.Month .2016 <- qplot(Month, Temperature, data =
<- = Avg.Temp.Data
Avg.Temp.Data.
.
2016,
2016 , geom =
=
"boxplot"
"boxplot",
,
= "Boxplot of Temperature Distributi
main =
2016",
on Across Month 2016",
xlab =
= "Month",
"Month",
ylab =
= "Temperature")
"Temperature")
Boxplot.Temp.Month.2016
Boxplot.Temp.Month.2016
.2016 <-
ScatterPlot.Temp.Date.2016
ScatterPlot.Temp.Date qplot(Date, Temperature, data = Avg.Temp.Dat
<- = Avg.Temp.Dat
a.2016,
.2016,
=
main = "Scatterplot of Temperature Dis
tribution Across Date 2016",
2016",
xlab =
ylab =
=
"Date"
= "Date",
,
"Temperature"
"Temperature")
)
ScatterPlot.Temp.Date.2016
ScatterPlot.Temp.Date.2016
# FIND MEAN COLD STORAGE TEMPERATURE FOR SUMMER, WINTER & RAINY SEASON
##
## Attaching package: 'dplyr'
Mean.Temp.By.Season.2016 <-
Mean.Temp.By.Season.2016 <-
Avg.Temp.Data
Avg.Temp.Data.2016 %>% group_by(Season) %>% su
.2016
mmarise(mean(Temperature)) # DEFINE VARIABLE TO STORE SEASON-WISE MEAN COL
D STORAGE TEMPERATURE
class(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # CHECK DATA TYPE
Mean.Temp.By.Season.2016 <-
Mean.Temp.By.Season.2016 as.data.frame(Mean.Temp.By.Season
<- (Mean.Temp.By.Season.2016
.2016)) # CHAN
GE DATA TYPE TO DATAFRAME
class(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # RECHECK DATA TYPE
## [1] "data.frame"
"dat a.frame"
summary(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # VIEW SUMMARY OF SEASON-WISE ME
MEAN
AN COLD
STORAGE TEMPERATURE
## Season mean(Temperature)
## Rainy :1 Min. :2.701
## Summer:1 1st Qu.:2.870
## Winter:1 Median :3.039
## Mean :2.964
## 3rd Qu.:3.096
## Max. :3.153
View(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # VIEW SEASON-WISE MEAN COLD STORAGE TEMPER
ATURE IN
IN A TABLE
TABLE FORMA
FORMAT
T
# FIND OVERALL MEAN COLD STORAGE TEMPERATURE FOR THE FULL YEAR
Yearly.Temp.Mean.2016 <-
Yearly.Temp.Mean.2016 mean(Avg.Temp.Data
<- .2016$Temperature)
(Avg.Temp.Data.2016
Yearly.Temp.Mean.2016
Yearly.Temp.Mean.2016
## [1] 2.96274
# FIND STANDARD DEVIATION OF COLD STORAGE TEMPERATURE FOR THE FULL YEAR.
Yearly.Temp.Std.Dev.2016 <-
Yearly.Temp.Std.Dev.2016 <-
sd(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016$Temperature)
Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016
## [1] 0.508589
Optimal.Temp.Lower.Limit<- 2
Optimal.Temp.Lower.Limit<-
Probability.Temp.Below.Lower.Limit<-
Probability.Temp.Below.Lower.Limit<- pnorm(Optimal.Temp.Lower.Limit, Yearl
y.Temp.Mean.2016
y.Temp.Mean .2016,
, Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016,
, lower.tail =
=
TRUE
TRUE))
Probability.Temp.Below.Lower.Limit
## [1] 0.02918146
Optimal.Temp.Upper.Limit<- 4
Optimal.Temp.Upper.Limit<-
1 - pnorm(Optimal.Temp.Upper.Limit, Y
Probability.Temp.Above.Upper.Limit<-
Probability.Temp.Above.Upper.Limit<-
.2016,
early.Temp.Mean.2016
early.Temp.Mean , Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016,
, lower.tail =
= TRUE
TRUE)
)
Probability.Temp.Above.Upper.Limit
## [1] 0.02070077
Probability.Temp.Outside.Limit<- Probability.Temp.Below.Lower.Limit + Prob
Probability.Temp.Outside.Limit<-
ability.Temp.Above.Upper.Limit # ADDITION RULE
Probability.Temp.Outside.Limit
## [1] 0.04988223
if(Probability.Temp.Outside.Limit< 0.025
0.025)
) {"NO
{"NO PENALTY"} else {if(Probabil
PENALTY"}
ity.Temp.Outside.Limit>= 0.025 & Probability.Temp.Outside.Limit
0.025 Probability.Temp.Outside.Limit<= 0.05
0.05)
) {"P
{"P
FEES" } else{"PENALTY - 25% OF AMC FEES"
ENALTY - 10% OF AMC FEES"} FEES"}}
}}
# IMPORTING 2018 AVERAGE COLD STORAGE TEMPERATURE DATA FOR 35 DAYS (.csv F
ILE)
Avg.Temp.Data.2018 <-
Avg.Temp.Data.2018 read_csv("Cold_Storage_Mar2018.csv")
<- "Cold_Storage_Mar2018.csv")
summary(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
) # VIEW DATA SUMMARY
# CHANGE DATA TYPE (CHARACTER TO FACTOR) FOR COLUMNS - 'SEASON' & 'MONTH'
Avg.Temp.Data.2018$Season <-
Avg.Temp.Data.2018 <- as.factor(Avg.Temp.Data.2018$Season)
(Avg.Temp.Data.2018
.2018$Month <-
Avg.Temp.Data.2018
Avg.Temp.Data <- as.factor(Avg.Temp.Data.2018$Month)
(Avg.Temp.Data.2018
summary(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
) # REVIEW DATA SUMMARY
View(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
)
# ASSUMPTION 1
# MEAN OF 2018 SAMPLE DATA IS NORMALLY DISTRIBUTED [CENTRAL LIMIT THEOREM
IS VALID SINCE 2018 SAMPLE SIZE = 35 (>30)]
# ASSUMPTION 2
# STANDARD DEVIATION OF 2018 SAMPLE DATA IS SAME AS THAT OF 2016 POPULATIO
N DATA
# HYPOTHESIS STATEMENT
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN
PRO DUCT PROCUREMENT]
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS
CE LSIUS [PROBLEM IDEN
TIFIED AT COLD STORAGE PLANT]
# Alpha = 0.1
Acceptable.Temp.Upper.Limit.2018 <-
Acceptable.Temp.Upper.Limit.2018 <-
3.9
3.9
Sample.Temp.Data.Count.2018
Sample.Temp.Data.Count.2018 <-
<-
35
35
Sample.Temp.Mean.2018
Sample.Temp.Mean.2018 <-
Z.Stat.Computed<- <-
mean(Avg.Temp.Data
(Avg.Temp.Data.2018
(Sample.Temp.Mean
Z.Stat.Computed<- (Sample.Temp.Mean.2018 - .2018
.2018 $Temperature)
Acceptable.Temp.Upper.Limit.201
Acceptable.Temp.Upper.Limit.201
8)/(Yearly.Temp.Std.Dev.2016/sqrt(Sample.Temp.Data.Count
(Yearly.Temp.Std.Dev.2016 (Sample.Temp.Data.Count.2018
.2018))
))
Z.Stat.Computed
## [1] 0.8641166
Z.Stat.Critical<- qnorm(0.9
Z.Stat.Critical<- 0.9)
) # RIGHT TAILED TEST
TEST
Z.Stat.Critical
## [1] 1.281552
if(Z.Stat.Computed>Z.Stat.Critical) {"REJECT
{"REJECT H0 - PROBLEM IDENTIFIED AT CO
PLANT"} else {"DO
LD STORAGE PLANT"} {"DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY
PRODUCT PROCUREMENT"}
PROCUREMENT"}
## [1] "DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREME
NT"
# HYPOTHESIS STATEMENT
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN
PRO DUCT PROCUREMENT]
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS [PROBLEM IDEN
TIFIED AT COLD STORAGE PLANT]
# Alpha = 0.1
T.Stat<- (Sample.Temp.Mean
T.Stat<- (Sample.Temp.Mean.2018 - Acceptable.Temp.Upper.Limit
.2018 Acceptable.Temp.Upper.Limit.2018
.2018))/(Yearl
.2016/sqrt(Sample.Temp.Data.Count
y.Temp.Std.Dev.2016
y.Temp.Std.Dev (Sample.Temp.Data.Count.2018
.2018))
))
T.Stat
## [1] 0.8641166
P.Value<- 1 - pt(T.Stat, (Sample.Temp.Dat
P.Value<- (Sample.Temp.Data.Count
a.Count.2018-1
.2018-1))
)) # RIGHT TAILED T
EST
EST
<- 0.1
Alpha <- 0.1
if(P.Value< Alpha) {"REJECT
{"REJECT H0 - PROBLEM IDENTIFIED AT COLD STORA
STORAGE
GE PLANT"
} else {"DO
{"DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREM
ENT"}
ENT" }
##
NT"[1] "DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREME
# HYPOTHESIS STATEMENT
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN
PRO DUCT PROCUREMENT]
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS [PROBLEM IDEN
TIFIED AT COLD STORAGE PLANT]