100% found this document useful (1 vote)
1K views

SMDM Assignment PDF

This document summarizes exploratory data analysis performed on temperature data from a cold storage facility. Descriptive statistics and visualizations are used to analyze temperature data from 2016 and March 2018. Hypothesis tests are conducted to understand operational issues reported in March 2018. The analysis aims to identify potential reasons for customer complaints that month.

Uploaded by

Eric Norman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

SMDM Assignment PDF

This document summarizes exploratory data analysis performed on temperature data from a cold storage facility. Descriptive statistics and visualizations are used to analyze temperature data from 2016 and March 2018. Hypothesis tests are conducted to understand operational issues reported in March 2018. The analysis aims to identify potential reasons for customer complaints that month.

Uploaded by

Eric Norman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

 

GREAT LAKES PGP BABI 

SMDM PROJECT
COLD STORAGE

ANAND KRISHNAN V U (JULY BATCH


BATC H ‘C’) 
‘C’) 
8/30/2019
 

1. PROJECT OBJECTIVE

This report aims to perform basic data analysis of a Cold Storage operation business for the
year 2016, from the “Cold_Storage_Temp _Data.csv
 _Data.csv”” dataset. An attempt is also made to
understand the operational inefficacies reported in March 2018, from the
“Cold_Storage_Mar2018.csv”  dataset through Hypothesis Testing. Finally, an inference is
drawn on the potential reason behind the customer complaints in March 2018.

This exploration report will consist of the following:


Import datasets in R
Descriptive statistics
Insights from the dataset  

2. ASSUMPTIONS

●  # 1: 2016 population data is Normally Distributed  

●  # 2: Mean Of 2018 sample data is Normally Distributed

[Central Limit Theorem Is Valid Since 2018 Sample Size = 35 (>30)]

●  # 3: Standard Deviation of 2018 sample data is same as that of 2016 population data

3. EXPLORATORY DATA ANALYSIS  


ANALYSIS

3.1 ENVIRONMENT
ENVIRONMENT SET UP AND DATA IMPORT

3.1.1 SET WORKING DIRECTORY

Working Directory: "E:/PGP BABI/Working Directory (R)/02.SMDM/03.Project"

3.1.2 INSTALL
INSTALL P
PACKA
ACKAGES
GES & INVOKE SYSTEM LIBRARIES

● 4 packages invoked from system library for exploratory data analysis. Details on the
 packages used & their corresponding
corr esponding purpose iiss given below:

Package Name Purpose

readr For importing .csv datasets using read_cs


read_csv()
v() function

dplyr For data manipulation using group_by() & summarise() functions

ggplot2 For data visualization using qplot() function

lattice For data visualization using hist() function

 Please refer Appendix


Appen dix A for Source Cod
Codee
 

3.1.3IMPORT AND READ THE DATASET

● Average
Avera ge temperature data at date level, for the year 2016, is read from the
"Cold_Storage_Temp_Data.csv" file and stored in the object “Avg.Temp.Data.2016” 
“Avg.Temp.Data.2016”  

This dataset contains 365 observations (Rows) each


(Rows)  each of 4 variables (Columns)
(Columns)  

● Average temperature data for the last 35 days from March 2018 is read from the
"Cold_Storage_Mar2018.csv" file and stored in the object “Avg.Temp.Data.2018” 
“Avg.Temp.Data.2018” 

(Rows) each
This dataset contains 35 observations (Rows)  each of 4 variables (Columns) 
(Columns)  

 Please refer Appendix


Appe ndix A for Source Co
Code
de

3.2 VARIABLE IDENTIFICATION

● 4 variables are stored in both the datasets. Details on the variable name & type is given
 below:

DATASET 1: "Cold_Storage_Temp_Data.csv"

Variable Name
Variable Vari
Variable
able Type Value Range Total no. of values
Season Categorical & Nominal Summer, Winter, Rainy 365
Month Categorical & Nominal Jan, Feb, …, Nov, Dec 
Dec   365
Date Numerical & Discrete
Di screte 1, 2, …, 30, 31 
31   365
Temperature Numerical & Continuous
Continuou s 1.7 ~ 5 365

DATASET 2: "Cold_Storage_Mar2018.csv "

Variable
Variable Name Vari
Variable
able Type Value Range Total no. of values
Season Categorical & Nominal Summer 35
Month Categorical & Nominal Feb, Mar 35

Date Numerical & Discrete


Di screte 1, 2, …, 27, 28 
28   35
Temperature Numerical & Continuous
Continuou s 3.8 ~ 4.6 35

INFERENCE: Target variable is ‘Temperature’ and the rest of the variables can be assumed
to be Input variables.

● Following functions are used for data import, manipulation & aggregation:

Function Name Purpose


To read the “Cold_Storage_Temp_Data.csv
“ Cold_Storage_Temp_Data.csv ” &
read_csv()
“Cold_Storage_Mar2018.csv
Cold_Storage_Mar2018.csv”” datasets 
datasets 
To view the data type of the 4 variables & also the number of levels for
str()
each categorical variable in both the datasets
 

levels() To view all the levels for each categorical variable in both the datasets
To identify missing values, if any (denoted by NA) & also view the 5
summary()
number summary for each numeric variable in both the datasets
as.factor() To change data type of variables ‘Season’ & ‘Month’ (Character
‘Month’  (Character to Factor)

group_by() To group data by


data by the variable ‘Season’ 
‘Season’  

summarise() To summarise the variable ‘Temperature’ against the variable ‘Season’


‘Season’  

mean() To compute the mean value of the variable ‘Temperature’  

sd() To compute the standard deviation of the variable ‘Temperature’  


To compute
compute the probability of the variable
variable ‘Temperature’ in taking a
 pnorm()
 particular value, under the
th e assumption of a normal distribut
distribution
ion
To return the penalty imposed on AMC company based on the given test
if() {} else {}
conditions

3.3 UNIVARIATE ANALYSIS

● Frequency distribution of the variable ‘Temperature’


‘Temperature’ in
 in a Histogram as well as a Box Plot:

Class Variable 1 –   Temperat


T emperature
ure

Data
Visualization

Plot Type Histogram


Histogram Box Plot
X-Axis Temperature Temperature
Y-Axis No. of Days (Frequency)
(Fre quency) -
Peaks(Most Common Values): 2.5 ~ 3
Peaks(Most
Symmetry: Right Skewed
Inference Spread: 1.5~5
Outliers:
Outliers: 5
Symmetry: Right Skewed
 

● Frequency distribution of the variable ‘Season’


‘Season’ in
 in a Bar Chart:

Class Variable 2 –  Season


  Season

Data
Visualization

Plot Type Bar Chart


X-Axis Season (“Rainy”, “Summer” & “Winter”)  
Y-Axis No. of Days (Freque
(Frequency)
ncy)
Inference Frequency across the seasons is almost the same
 Please refer Code 
refer Appendix A for Source Code

3.4 BI-VARIATE ANALYSIS

● Temperature distribution across each Season in a single Histogram:

Variable 1 – 
1 –  ‘Temperature
Temperature’’  vs Variable 2 – 
2 –  ‘Season
Season’’ 
Class
(Numerical vs Categorical)

Data
Visualization

Plot Type Histogram


Histogram

X-Axis Temperature

Y-Axis Percentage
Percentage of Total
Temperature Distribution
Inference Rainy & Winter Seasons – 
Seasons –  Right
  Right Skewed (Towards the higher range)
range)  
Summer Season – 
Season –  Symmetrical
  Symmetrical
 

● Temperature distribution across each Season in a Box Plot:

Variable 1 – 
1 –  ‘Season’ vs Variable 2 –  ‘Temperature’
‘Temperature’  
Class
(Categorical vs Numerical)

Data
Visualization

Plot Type Box Plot

X-Axis Season

Y-Axis Temperature
1) Temperature variability is maximum in Rainy season, followed by Summer
season & the least in Winter season
Inference
2) Median temperature varies across the seasons & is maximum in Summer,
followed by Rainy season and the least in Winter season

● Temperature distribution across each Month in a Box Plot:

Variable 1 – 
1 –  ‘Month’ vs Variable 2 –  ‘Temperature’
‘Temperature’  
Class
Numerical)  
(Categorical vs Numerical) 

Data
Visualization

Plot Type Box Plot

X-Axis Month

Y-Axis Temperature
1) Temperature variability is maximum in Jun/Jul/Aug/Sep months (Rainy
season) & minimum in Jan/Feb months (Winter season)
Inference
2) Median temperature varies across the months & is maximum from Feb ~
May months, while it is minimum during Nov ~ Jan months
 

● Frequency distribution of Months across each Season in a Bar Chart:

Variable 1 – 
1 –  ‘ Season
Season’’ vs Variable 2 – 
2 –  ‘Month
Month’’ 
Class
(Categorical vs Categorical)

Data
Visualization

Plot Type Bar Chart

X-Axis Season

Y-Axis No of Days (Frequency)

Inference Number of months per season is the same

3.5 MISSING VALUE IDENTIFICATION

● Presence of missing values in both data sets were checked using summary() function
● No missing values found

 Please refer Code 


refer Appendix A for Source Code

3.6 OUTLIER IDENTIFICATION

● Presence of outliers was identified using Box Plot of temperature


● Outlier values vary based on the sample data in consideration

Class Outlier Values (Temperature)

Rainy Season 5
Winter Season 3.9, 3.8 & 3.7

Summer Season -

Feb/ Mar/ Apr/ May/ Jun/ Jul/ Aug/ Nov/ Dec Months -

Jan Month 3.9, 3.5 & 3.4

Sep Month 5

Oct Month 3.8

Year 2016 5
 

3.7 VARIABLE TRANSFORMATION / FEATURE CREATION

● Variables ‘Season’ & ‘Month’ were originally of Character data type  


● These variables were transformed to ‘Factor’ data types for better data handling

4 CONCLUSION

● The probability of the cold storage temperature going outside the optimal range of 2 ~ 4
degrees celsius in the year 2016 was calculated as 4.98%.

● Penalty imposed on the AMC company for the year 2016 would be 10% of AMC fees

● With 90% confidence level, it can estimated that the maximum cold storage temperature
would be 3.9 degrees and the potential reason behind the customer complaints can be
attributed to the procurement of dairy products 
products  

5 APPENDIX A  –  SOURCE
 SOURCE CODE

setwd("E:/PGP BABI/Working Directory (R)/02.SMDM/03.Project"


(R)/02.SMDM/03.Project")
)
getwd()

## [1] "E:/PGP BABI/Working Directory (R)/02.SMDM/03.Pro


(R)/02.SMDM/03.Project"
ject"

# IMPORTING 2016 AVERAGE COLD STORAGE TEMPERATURE DATA FOR 365 DAYS (.csv
FILE) 

library(readr) # PACKAGE TO READ .csv F ILE 


FILE

## Warning: package 'readr' was built under R version 3.5.3

Avg.Temp.Data.2016 <-
Avg.Temp.Data.2016  read_csv("Cold_Storage_Temp_Data.csv")
 <-  "Cold_Storage_Temp_Data.csv")

## Parsed with column specification:


## cols(
## Season = col_character(),
## Month = col_character(),
## Date = col_double(),
## Temperature = col_double()
## )

str(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # VIEW DATA TYPE 

## Classes 'spec_tbl_df',
'spec_tbl_df', '
'tbl_df',
tbl_df', 'tbl' and 'data.frame':
'data.frame': 365 obs. of 4
 variables:
## $ Season : chr "Winter" "Winter" "Winter" "Winter" ...
## $ Month : chr "Jan" "Jan" "Jan" "Jan" ...
## $ Date : num 1 2 3 4 5 6 7 8 9 10 ...
## $ TTemperature:
emperature: num 2.4 2.3 2.4 2.8 2 2.5
.5 2.
2.4
4 2.8 2.3 2.4 2.8 ...
## - attr(*, "spec")=
## .. cols(
## .. Season = col_character(),
## .. Month = col_character(),
## .. Date = col_double(),
 

## .. Temperature = col_double()
## .. )

summary(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # VIEW MISSING DATA, IF ANY 
 

## Season Month Date Temperature


## Length:365 Length:365 Min. : 1.00 Min. :1.700
## Class :character Class :character 1st Qu.: 8.00 1st Qu.:2.500
## Mode :character Mode :ch
:character
aracter Median :16.00 Median :2.900
## Mean :15.72 Mean :2.963
## 3rd Qu.:23.00 3rd Qu.:3.300
## Max. :31.00 Max. :5.000

# CHANGE DATA TYPE (CHARACTER TO FACTOR) FOR COLUMNS - 'SEASON' & 'MONTH' 

Avg.Temp.Data.2016$Season <- 
Avg.Temp.Data.2016 <- as.factor(Avg.Temp.Data.2016$Season)
(Avg.Temp.Data.2016
.2016$Month <- 
Avg.Temp.Data.2016
Avg.Temp.Data <- as.factor(Avg.Temp.Data.2016$Month)
(Avg.Temp.Data.2016
str(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # REVIEW DATA TYPE 

## Classes 'spec_tbl_df',
'spec_tbl_df', '
'tbl_df',
tbl_df', 'tbl' and 'data.frame':
'data.frame': 365 obs. of 4
 variables:
## $ Season : Factor w/ 3 levels "Rainy","Summer",..: 3 3 3 3 3 3 3 3
 3 3 ...
## $ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5
 5 5 5 ...
## $ Date : num 1 2 3 4 5 6 7 8 9 10 ...
## $ TTemperature:
emperature: num 2.4 2.3 2.4 2.8 2 2.5
.5 2.
2.4
4 2.8 2.3 2.4 2.8 ...
## - attr(*, "spec")=
## .. cols(
## .. Season = col_character(),
## .. Month = col_character(),
## .. Date = col_double(),
## .. Temperature = col_double()
## .. )

summary(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016)
) # REVIEW DATA SUMMARY 
 

## Season Month Date Temperature


## Rainy :122 Aug : 31 Min. : 1.00 Min. :1.700
## Summer:120 Dec : 31 1st Qu.: 8.00 1st Qu.:2.500
## Winter:123 Jan : 31 Median :16.00 Median :2.900
## Jul : 31 Mean :15.72 Mean :2.963
## Mar : 31 3rd Qu.:23.00 3rd Qu.:3.300
## May : 31 Max. :31.00 Max. :5.000
## (Other):179

levels(Avg.Temp.Data.2016$Season)
(Avg.Temp.Data.2016

## [1] "Rainy" "Summer" "Winter"

levels(Avg.Temp.Data.2016$Month)
(Avg.Temp.Data.2016

## [1] "Apr" "Aug" "Dec" "


"Feb"
Feb" "Jan" "Ju
"Jul"
l" "Jun" "
"Mar"
Mar" "May" "Nov" "Oct"
## [12] "Sep"
 

library(ggplot2) # PACKAGE FOR DATA VISUALIZATION 


 
library(lattice)
.2016 <-
Boxplot.Temp.2016
Boxplot.Temp  boxplot(Avg.Temp.Data
 <-  .2016$Temperature, horizontal =
(Avg.Temp.Data.2016 = 
 
TRUE
TRUE,,
col = 
= "RED",
"RED",
= "Boxplot of Temperature Distribution 2
main = 
016")
016")
Histogram1.Temp.2016
Histogram1.Temp.2016 <- histogram(~Temperature, data =
 <-  = Avg.Temp.Data
 Avg.Temp.Data.2016
.2016,
,

  main =
="Histogram
"Histogram of Temperature Distribu
tion 2016" )
2016" )
Histogram1.Temp.2016
Histogram1.Temp.2016 
 
Histogram2.Temp.2016
Histogram2.Temp.2016 <- hist(Avg.Temp.Data
 <-  .2016$Temperature, col =
(Avg.Temp.Data.2016 = 
 "RED"
"RED",
,
main = 
= "Histogram of Temperature Distribution
 2016",
 2016",
= "Temperature"
xlab = "Temperature",
,
ylab = 
= "Frequency"
"Frequency")
)
Histogram3.Temp.2016
Histogram3.Temp.2016 <- qplot(Temperature, data =
 <-  = Avg.Temp.Data
 Avg.Temp.Data.2016
.2016,
,
= "Histogram of Temperature Distributio
main = 
n 2016",
2016",
xlab = 
= "Temperature",
"Temperature",
= "No of Days")
ylab =  Days") # CHART: HISTOGRAM; VAR
IABLE: TEMPERATURE 
.2016 
Histogram3.Temp.2016
Histogram3.Temp  
Barchart.Season.2016
Barchart.Season.2016 <- qplot(Season, data =
 <-  = Avg.Temp.Data
 Avg.Temp.Data.2016
.2016,
,
main = 
= "Barchart of Season Distribution 2016
",
xlab = 
= "Season",
"Season",
ylab = 
= "No. of Days")
Days") # CHART: BARCHART; VAR
 
IABLE: SEASON 
.2016 
Barchart.Season.2016
Barchart.Season  

Histogram.Temp.Season.2016 <-
Histogram.Temp.Season.2016  histogram(~Temperature|factor(Season), data
 <- 
= Avg.Temp.Data
 Avg.Temp.Data.2016
.2016,
,
= "Histogram of Temperature D
main = 
istribution Across Season 2016")
2016")
.2016 
Histogram.Temp.Season.2016
Histogram.Temp.Season  
Barchart.Season.Month.2016
Barchart.Season.Month .2016 <- qplot(Season, fill = Month,
 <-  = Month, data =
= Avg.Temp.
 Avg.Temp.
Data.2016
Data .2016,
, geom =

 "bar"
"bar",
,

main = "Barchart of Season Distributio
n by Month 2016",
2016",

xlab = "Season"
"Season",
,

ylab = "No. of Days"
Days")
)
Barchart.Season.Month.2016
Barchart.Season.Month .2016 
 
.2016 <-
Boxplot.Temp.Season.2016
Boxplot.Temp.Season  qplot(Season, Temperature, data = Avg.Temp.Dat
 <-  = Avg.Temp.Dat
a.2016,
.2016, geom == 
 "boxplot"
"boxplot",
,

main = "Boxplot of Temperature Distribut
ion Across Season 2016",
2016",
xlab =

 "Season"
"Season",
,

ylab = "Temperature"
"Temperature")
)
Boxplot.Temp.Season.2016
Boxplot.Temp.Season .2016 
 
Boxplot.Temp.Month.2016
Boxplot.Temp.Month .2016 <- qplot(Month, Temperature, data =
 <-  = Avg.Temp.Data
 Avg.Temp.Data.
.
2016,
2016 , geom =

 "boxplot"
"boxplot",
,
= "Boxplot of Temperature Distributi
main = 
 

2016",
on Across Month 2016",
xlab = 
= "Month",
"Month",
ylab = 
= "Temperature")
"Temperature")
Boxplot.Temp.Month.2016 
Boxplot.Temp.Month.2016 
.2016 <-
ScatterPlot.Temp.Date.2016
ScatterPlot.Temp.Date  qplot(Date, Temperature, data = Avg.Temp.Dat
 <-  = Avg.Temp.Dat
a.2016,
.2016,

main = "Scatterplot of Temperature Dis
tribution Across Date 2016",
2016",

xlab =
ylab =

 
 "Date"
= "Date",
,
"Temperature"
"Temperature")
)

ScatterPlot.Temp.Date.2016 
ScatterPlot.Temp.Date.2016 

 
# FIND MEAN COLD STORAGE TEMPERATURE FOR SUMMER, WINTER & RAINY SEASON 

library(dplyr) # PACKAGE TO USE GROUP_BY FUNCTION 


 

## Warning: package 'dplyr'


' dplyr' was built under R version 3.5.3

##
## Attaching package: 'dplyr'

## The following objects


obje cts are masked from 'package:stats':
##
## filter, lag

## The following objects


obje cts are masked from 'package:base':
##
## intersect, setdiff, setequal, union

Mean.Temp.By.Season.2016 <-
Mean.Temp.By.Season.2016 <- 
 Avg.Temp.Data
Avg.Temp.Data.2016 %>% group_by(Season) %>% su
.2016 
mmarise(mean(Temperature)) # DEFINE VARIABLE TO STORE SEASON-WISE MEAN COL
D STORAGE TEMPERATURE 
class(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # CHECK DATA TYPE 

## [1] "tbl_df" "tbl" "data.frame"

Mean.Temp.By.Season.2016 <-
Mean.Temp.By.Season.2016  as.data.frame(Mean.Temp.By.Season
 <-  (Mean.Temp.By.Season.2016
.2016)) # CHAN 
GE DATA TYPE TO DATAFRAME 
class(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # RECHECK DATA TYPE 
## [1] "data.frame"
"dat a.frame"

summary(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # VIEW SUMMARY OF SEASON-WISE ME
MEAN
AN COLD
STORAGE TEMPERATURE 

## Season mean(Temperature)
## Rainy :1 Min. :2.701
## Summer:1 1st Qu.:2.870
## Winter:1 Median :3.039
## Mean :2.964
## 3rd Qu.:3.096
## Max. :3.153

View(Mean.Temp.By.Season
(Mean.Temp.By.Season.2016
.2016)
) # VIEW SEASON-WISE MEAN COLD STORAGE TEMPER
 ATURE IN
IN A TABLE
TABLE FORMA  
FORMAT 

 

# FIND OVERALL MEAN COLD STORAGE TEMPERATURE FOR THE FULL YEAR 

Yearly.Temp.Mean.2016 <-
Yearly.Temp.Mean.2016  mean(Avg.Temp.Data
 <-  .2016$Temperature)
(Avg.Temp.Data.2016
Yearly.Temp.Mean.2016
Yearly.Temp.Mean.2016 
 

## [1] 2.96274

# FIND STANDARD DEVIATION OF COLD STORAGE TEMPERATURE FOR THE FULL YEAR. 

Yearly.Temp.Std.Dev.2016 <-
Yearly.Temp.Std.Dev.2016 <- 
 sd(Avg.Temp.Data
(Avg.Temp.Data.2016
.2016$Temperature)
Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016 
 

## [1] 0.508589

# ASSUMING NORMAL DISTRIBUTION, WHAT IS THE PROBABILITY OF COLD STORAGE TE


 MPERATURE
 MPERATURE HAVING
HAVING FALLE
FALLEN
N BELOW
BELOW 2 DEGREE
DEGREES CELSIUS? 
S CELSIUS?

Optimal.Temp.Lower.Limit<- 2 
Optimal.Temp.Lower.Limit<- 
Probability.Temp.Below.Lower.Limit<- 
Probability.Temp.Below.Lower.Limit<-  pnorm(Optimal.Temp.Lower.Limit, Yearl
y.Temp.Mean.2016
y.Temp.Mean .2016,
, Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016,
, lower.tail =

 TRUE
TRUE))
Probability.Temp.Below.Lower.Limit

## [1] 0.02918146

# ASSUMING NORMAL DISTRIBUTION, WHAT IS THE PROBABILITY OF COLD STORAGE TE


 MPERATURE HAVING
 MPERATURE HAVING GONE ABOVE 4 DEGREES
DEGREES CELSIUS? 
CELSIUS?

Optimal.Temp.Upper.Limit<- 4 
Optimal.Temp.Upper.Limit<- 
 1 - pnorm(Optimal.Temp.Upper.Limit, Y
Probability.Temp.Above.Upper.Limit<- 
Probability.Temp.Above.Upper.Limit<-
.2016,
early.Temp.Mean.2016
early.Temp.Mean , Yearly.Temp.Std.Dev.2016
Yearly.Temp.Std.Dev.2016,
, lower.tail = 
= TRUE
TRUE)
)
Probability.Temp.Above.Upper.Limit

## [1] 0.02070077

# WHAT WILL BE THE PENALTY FOR AMC COMPANY? 

Probability.Temp.Outside.Limit<- Probability.Temp.Below.Lower.Limit + Prob
Probability.Temp.Outside.Limit<- 
ability.Temp.Above.Upper.Limit # ADDITION RULE 
Probability.Temp.Outside.Limit

## [1] 0.04988223

if(Probability.Temp.Outside.Limit< 0.025
0.025)
) {"NO
{"NO PENALTY"} else {if(Probabil
PENALTY"}
ity.Temp.Outside.Limit>= 0.025  & Probability.Temp.Outside.Limit
0.025  Probability.Temp.Outside.Limit<= 0.05
0.05)
) {"P
{"P
FEES" } else{"PENALTY - 25% OF AMC FEES"
ENALTY - 10% OF AMC FEES"} FEES"}}
}}

## [1] "PENALTY - 10% OF AMC FEES"

## MUTUALLY EXCLUSIVE EVENTS: TEMPERATURE FALLING BELOW 2 DEGREES CELSIUS


& TEMPERATURE GOING ABOVE 4 DEGREES CELSIUS 

# IMPORTING 2018 AVERAGE COLD STORAGE TEMPERATURE DATA FOR 35 DAYS (.csv F
ILE) 

Avg.Temp.Data.2018 <-
Avg.Temp.Data.2018  read_csv("Cold_Storage_Mar2018.csv")
 <-  "Cold_Storage_Mar2018.csv")
 

## Parsed with column specification:


## cols(
## Season = col_character(),
## Month = col_character(),
## Date = col_double(),
## Temperature = col_double()
## )

summary(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
) # VIEW DATA SUMMARY 
 

## Season Month Date Temperature


## Length:35 Length:35 Min. : 1.0 Min. :3.800
## Class :character Class :character 1st Qu.: 9.5 1st Qu.:3.900
## Mode :character Mode :ch
:character
aracter Median :14.0 Median :3.900
## Mean :14.4 Mean :3.974
## 3rd Qu.:19.5 3rd Qu.:4.100
## Max. :28.0 Max. :4.600

# CHANGE DATA TYPE (CHARACTER TO FACTOR) FOR COLUMNS - 'SEASON' & 'MONTH' 

Avg.Temp.Data.2018$Season <- 
Avg.Temp.Data.2018 <- as.factor(Avg.Temp.Data.2018$Season)
(Avg.Temp.Data.2018
.2018$Month <- 
Avg.Temp.Data.2018
Avg.Temp.Data <- as.factor(Avg.Temp.Data.2018$Month)
(Avg.Temp.Data.2018
summary(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
) # REVIEW DATA SUMMARY 
 

## Season Month Date Temperature


## Summer:35 Feb:18 Min. : 1.0 Min. :3.800
## Mar:17 1st Qu.: 9.5 1st Qu.:3.900
## Median :14.0 Median :3.900
## Mean :14.4 Mean :3.974
## 3rd Qu.:19.5 3rd Qu.:4.100
## Max. :28.0 Max. :4.600

View(Avg.Temp.Data
(Avg.Temp.Data.2018
.2018)
)

# ASSUMPTION 1 
# MEAN OF 2018 SAMPLE DATA IS NORMALLY DISTRIBUTED [CENTRAL LIMIT THEOREM
IS VALID SINCE 2018 SAMPLE SIZE = 35 (>30)] 

# ASSUMPTION 2 
# STANDARD DEVIATION OF 2018 SAMPLE DATA IS SAME AS THAT OF 2016 POPULATIO
N DATA 

# STATE THE HYPOTHESIS AND DO THE CALCULATIONS USING Z-TEST. 

# HYPOTHESIS STATEMENT  
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN 
PRO DUCT PROCUREMENT] 
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS
CE LSIUS [PROBLEM IDEN 
TIFIED AT COLD STORAGE PLANT] 
# Alpha = 0.1 

Acceptable.Temp.Upper.Limit.2018 <-
Acceptable.Temp.Upper.Limit.2018 <- 
 3.9
3.9 
 
Sample.Temp.Data.Count.2018
Sample.Temp.Data.Count.2018 <-
 <- 
 35
35 
 

Sample.Temp.Mean.2018
Sample.Temp.Mean.2018 <-
Z.Stat.Computed<-  <- 
 mean(Avg.Temp.Data
(Avg.Temp.Data.2018
 (Sample.Temp.Mean
Z.Stat.Computed<- (Sample.Temp.Mean.2018 - .2018
.2018  $Temperature)
Acceptable.Temp.Upper.Limit.201
Acceptable.Temp.Upper.Limit.201
 

8)/(Yearly.Temp.Std.Dev.2016/sqrt(Sample.Temp.Data.Count
(Yearly.Temp.Std.Dev.2016 (Sample.Temp.Data.Count.2018
.2018))
))
Z.Stat.Computed

## [1] 0.8641166

Z.Stat.Critical<- qnorm(0.9
Z.Stat.Critical<-  0.9)
) # RIGHT TAILED TEST
TEST 
 
Z.Stat.Critical

## [1] 1.281552

if(Z.Stat.Computed>Z.Stat.Critical) {"REJECT
{"REJECT H0 - PROBLEM IDENTIFIED AT CO
PLANT"} else {"DO
LD STORAGE PLANT"}  {"DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY
 PRODUCT PROCUREMENT"}
PROCUREMENT"}

## [1] "DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREME
NT"

# STATE THE HYPOTHESIS AND DO THE CALCULATIONS USING T-TEST. 

# HYPOTHESIS STATEMENT  
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN 
PRO DUCT PROCUREMENT] 
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS [PROBLEM IDEN 
TIFIED AT COLD STORAGE PLANT] 
# Alpha = 0.1 
T.Stat<- (Sample.Temp.Mean
T.Stat<- (Sample.Temp.Mean.2018 - Acceptable.Temp.Upper.Limit
.2018  Acceptable.Temp.Upper.Limit.2018
.2018))/(Yearl
.2016/sqrt(Sample.Temp.Data.Count
y.Temp.Std.Dev.2016
y.Temp.Std.Dev (Sample.Temp.Data.Count.2018
.2018))
))
T.Stat

## [1] 0.8641166

P.Value<- 1 - pt(T.Stat, (Sample.Temp.Dat
P.Value<-  (Sample.Temp.Data.Count
a.Count.2018-1
.2018-1))
)) # RIGHT TAILED T
EST 
EST 
<- 0.1
Alpha <-  0.1 
 
if(P.Value< Alpha) {"REJECT
{"REJECT H0 - PROBLEM IDENTIFIED AT COLD STORA
STORAGE
GE PLANT"
} else {"DO
 {"DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREM
ENT"}
ENT" }

##
NT"[1] "DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT PROCUREME

# PROVIDE INFERENCE AFTER DOING BOTH TESTS. 

# HYPOTHESIS STATEMENT  
# H0: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 = 3.9 DEGREES CELSIUS [PROBLEM IDEN 
PRO DUCT PROCUREMENT] 
TIFIED WITH DAIRY PRODUCT
# H1: Acceptable.Temp.Upper
Acceptable.Temp.Upper.Limit.2018
.Limit.2018 > 3.9 DEGREES CELSIUS [PROBLEM IDEN 
TIFIED AT COLD STORAGE PLANT] 

# Z-TEST RESULT: DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT


 
PROCUREMENT 
# T-TEST RESULT: DO NOT REJECT H0 - PROBLEM IDENTIFIED WITH DAIRY PRODUCT
 
PROCUREMENT 

Hypothesis.Testing.Inference<- "Acceptable.Temp.Upper.Limit.2018 = 3.9 DEG


Hypothesis.Testing.Inference<- "Acceptable.Temp.Upper.Limit.2018
 

DA IRY PRODUCT PROCUREMENT" 


REES CELSIUS; PROBLEM IDENTIFIED WITH DAIRY PROCUREMENT" 
Hypothesis.Testing.Inference

## [1] "Acceptable.Temp.Upper.Limit.2018 = 3.9 DEGREES CELSIUS; PROBLEM ID


ENTIFIED WITH DAIRY PRODUCT
P RODUCT PROCUREMENT"

You might also like