0% found this document useful (0 votes)
4 views190 pages

Business Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views190 pages

Business Report

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 190

TIME SERIES FORECASTING

FINAL PROJECT

Balaji M P
PGP DSBA Online -March’ 22
Date: 23.10.2022
Date
1
TABLE OF CONTENTS

Rose Wine Analysis ................................... 8


Sparkling Wine Analysis .......................... 99

LIST OF FIGURES

Fig.1 Rose Wine Analysis 8


Fig.2 Details of the dataset columns 11
Fig.3 Time stamp of dataset columns 11
Fig.4 Details of the updated dataset columns 12
Fig.5 Details of the dataset columns after renaming 12
Fig.6 Null values in the dataset 13
Fig.7 Graph plot of the Rose wine sales dataset 13
Fig.8 Imputed values of the dataset 14
Fig.9 Null values after imputation 14
Fig.10 Descriptive Summary of Rose_Wine_Sales column 15
Fig.11.1 Yearly plot of Rose wine sales 16
Fig.11.2 Monthly plot of Rose wine sales 17
Fig.12 Line plot – Annual sales 18
Fig.13 Line plot – Quarterly sales 18
Fig.14 Monthly sales across different years 19
Fig.15 Line plot – Empirical cumulative distribution function 19
Fig.16 Line plot – Monthly time series
20
Fig.17 Line plot – Average and % Change over each month 21
Fig.18 Additive decomposition of time series 22
Fig.19 Additive Decomposition - Sample of Trend, Seasonality & Residual values 22
Fig.20.1 Multiplicative decomposition of time series 23
Fig.20.2 Multiplicative Decomposition - Sample of Trend, Seasonality & Residual values 23
Fig.21.1 First and Last few rows of Train data 25
Fig.21.2 First and Last few rows of Test data 25
Fig.22 Count summary on train and test data 26
2

Fig.23 Line Plot – Splitting of time series into Train & Test data 26
Fig.24 Rose Wine – Linear regression model 27
Fig.25 Linear regression on Test data 27
Fig.26 Naïve forecast on Test data 29
Fig.27 Rose Wine – Simple Average model 31
Fig.28 Simple Average model predictions on Test data 31
Fig.29 Rose Wine – Sample of Trailing Moving Averages 33
Fig.30 Moving Average on Entire data 33
Fig.31 Individual visualization of moving averages on entire data 34
Fig.32 Moving averages forecast on test data 35
Fig.33 Comparison of different models on test data (Regression, Naïve, Simple and Moving
Average) 37
Fig.34 Rose Wine – Simple Exponential Smoothing Model 38
Fig.35 Sample of SES predictions 38
Fig.36 Rose Wine - SES predictions on Test data 39
Fig.37 SES prediction metrics for different alpha values 40
Fig.38 SES forecast for different Alpha values 40
Fig.39 Rose Wine – Double Exponential Smoothing Model 42
Fig.40 Sample of DES predictions 43
Fig.41 Rose Wine - DES predictions on Test data 43
Fig.42 DES prediction metrics for different alpha, beta values 44
Fig.43 DES forecast for different Alpha, Beta values 44
Fig.44 Rose Wine – Triple Exponential Smoothing Model 46
Fig.45 Sample of TES predictions 47
Fig.46 Rose Wine - TES predictions on Test data 47
Fig.47 TES prediction metrics for different alpha, beta and gamma values 48
Fig.48 TES forecast for automated model parameters 48
Fig.49 TES forecast for different model parameters 49
Fig.50 Comparison of Test RMSE values of different exponential smoothing models 50
Fig.51 Comparison of different models on test data (SES, DES and TES) 51
Fig.52 Rose Wine – ADF summary 52
Fig.53 Rose Wine – ADF summary with differencing 53
Fig.54 Time Series Plot of Entire data – With differencing 53
Fig.55.1 Time Series Plot of Train data 54
Fig.55.2 Rose Wine – ADF summary on train data 54
Fig.56 Rose Wine – ADF summary on train data with differencing 55
Fig.57 Time Series Plot of Training data with differencing 55
Fig.58 Parameter Combinations for ARIMA model
57
3

Fig.59 AIC values for different parameter combinations 57


Fig.60 Sorted AIC values for different parameter combinations 57
Fig.61 Rose Wine – Automated ARIMA model 58
Fig.62 Automated ARIMA – Diagnostics plot 59
Fig.63 Sample of Automated ARIMA (2,1,3) predictions 60
Fig.64 Plot of Automated ARIMA (2,1,3) predictions on Test data 60
Fig.65 ACF plot of Train data 63
Fig.66 Parameter Combinations for SARIMA model 64
Fig.67 AIC values for different parameter combinations 64
Fig.68 Sorted AIC values for different parameter combinations 65
Fig.69 Rose Wine – Automated SARIMA model 65
Fig.70 Automated SARIMA – Diagnostics plot 66
Fig.71 Sample of Automated SARIMA (3,1,1) (3,0,2,12) predictions 67
Fig.72 Plot of Automated SARIMA (3,1,1) (3,0,2,12) predictions on Test data 67
Fig.73 ACF plot on differenced train data 70
Fig.74 PACF plot on differenced train data 70
Fig.75 Rose Wine – Manual ARIMA model 71
Fig.76 Manual ARIMA – Diagnostics plot 72
Fig.77 Sample of Manual ARIMA (2,1,2) predictions 73
Fig.78 Plot of Manual ARIMA (2,1,2) predictions on Test data 73
Fig.79 ACF plot on differenced train data 76
Fig.80 PACF plot on differenced train data 76
Fig.81 Rose Wine – Manual SARIMA model 77
Fig.82 Manual SARIMA – Diagnostics plot 78
Fig.83 Sample of Manual SARIMA (4,1,2) (0,1,1,12) predictions 79
Fig.84 Plot of Manual SARIMA (4,1,2) (0,1,1,12) predictions on Test data 79
Fig.85 RMSE values of all models 81
Fig.86 Sorted RMSE values of all models 82
Fig.87 Time Series Plot 1 – Different Model predictions on test data 83
Fig.88 Time Series Plot 2 – Different Model predictions on test data 84
Fig.89 Time Series Plot 3 – Different Model predictions on test data 85
Fig.90 TES Optimum Model – Line plot of Predictions vs Actual values 86
Fig.91 TES Optimum Model – Line plot of Predictions vs Actual values on Test data 87
Fig.92 TES Optimum Model 88
Fig.93 TES Model – Forecast for next 12 months 88
Fig.94 TES Optimum Model – Time series plot forecast for next 12 months 89
Fig.95 TES Optimum Model – Future forecast with confidence intervals 89
Fig.96 TES Optimum Model – Time series plot forecast with confidence intervals 90
Fig.97 TES Optimum Model – Forecast for next 12 months with confidence intervals 90
4

Fig.98 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values 91
Fig.99 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values on Test data 92
Fig.100 Manual SARIMA Optimum Model 93
Fig.101 Manual SARIMA Model – Forecast for next 12 months with confidence intervals 94
Fig.102 Manual SARIMA Optimum Model – Time series plot forecast for next 12 months 94
Fig.103 Manual SARIMA Optimum Model – Time series plot forecast with confidence intervals 95
Fig.104 Manual SARIMA Optimum Model – Forecast for next 12 months with confidence interval 95
Fig.105 Sparkling Wine Analysis 99
Fig.106 Details of the dataset columns 102
Fig.107 Time stamp of dataset columns 102
Fig.108 Details of the updated dataset columns 103
Fig.109 Details of the dataset columns after renaming 103
Fig.110 Null values in the dataset 104
Fig.111 Graph plot of the Sparkling wine sales dataset 104
Fig.112 Descriptive Summary of Sparkling_Wine_Sales column 105
Fig.113 Yearly plot of Sparkling wine sales 106
Fig.114 Monthly plot of Sparkling wine sales 107
Fig.115 Line plot – Annual sales 108
Fig.116 Line plot – Quarterly sales 108
Fig.117 Monthly sales across different years 109
Fig.118 Line plot – Empirical cumulative distribution function 109
Fig.119 Time series plot – Monthly time series 110
Fig.120 Line plot – Average and % Change over each month 111
Fig.121 Additive decomposition of time series 112
Fig.122 Additive Decomposition - Sample of Trend, Seasonality & Residual values 112
Fig.123 Multiplicative decomposition of time series 113
Fig.124 Multiplicative Decomposition - Sample of Trend, Seasonality & Residual values 113
Fig.125 First and Last few rows of Train data 115
Fig.126 First and Last few rows of Test data 115
Fig.127 Count summary on train and test data 116
Fig.128 Line Plot – Splitting of time series into Train & Test data 116
Fig.129 Sparkling Wine – Linear regression model 117
Fig.130 Linear regression on Test data 117
Fig.131 Naïve forecast on Test data 119
Fig.132 Sparkling Wine – Simple Average model 121
Fig.133 Simple Average model predictions on Test data 121
Fig.134 Sparkling Wine – Sample of Trailing Moving Averages 123
Fig.135 Moving Average on Entire data 123
Fig.136 Individual visualization of moving averages on entire data 124
5

Fig.137 Moving averages forecast on test data 125


Comparison of different models on test data (Regression, Naïve, Simple and Moving
Fig.138
Average) 127
Fig.139 Sparkling Wine – Simple Exponential Smoothing Model 128
Fig.140 Sample of SES predictions 128
Fig.141 Sparkling Wine - SES predictions on Test data 129
Fig.142 SES prediction metrics for different alpha values 130
Fig.143 SES forecast for different Alpha values 130
Fig.144 Sparkling Wine – Double Exponential Smoothing Model 132
Fig.145 Sample of DES predictions 133
Fig.146 Sparkling Wine - DES predictions on Test data 133
Fig.147 DES prediction metrics for different alpha, beta values 134
Fig.148 DES forecast for different Alpha, Beta values 134
Fig.149 Sparkling Wine – Triple Exponential Smoothing Model 136
Fig.150 Sample of TES predictions 137
Fig.151 Sparkling Wine - TES predictions on Test data 137
Fig.152 TES prediction metrics for different alpha, beta and gamma values 138
Fig.153 TES forecast for automated model parameters 138
Fig.154 TES forecast for different model parameters 139
Fig.155 Comparison of Test RMSE values of different exponential smoothing models 140
Fig.156 Comparison of different models on test data (SES, DES and TES) 141
Fig.157 Sparkling Wine – ADF summary 142
Fig.158 Sparkling Wine – ADF summary with differencing 143
Fig.159 Time Series Plot of Entire data – With differencing 143
Fig.160 Time Series Plot of Train data 144
Fig.161 Sparkling Wine – ADF summary on train data 144
Fig.162 Sparkling Wine – ADF summary on train data with differencing 145
Fig.163 Time Series Plot of Training data with differencing 145
Fig.164 Parameter Combinations for ARIMA model 147
Fig.165 AIC values for different parameter combinations 147
Fig.166 Sorted AIC values for different parameter combinations 147
Fig.167 Sparkling Wine – Automated ARIMA model 148
Fig.168 Automated ARIMA – Diagnostics plot 149
Fig.169 Sample of Automated ARIMA (4,1,4) predictions 150
Fig.170 Plot of Automated ARIMA (4,1,4) predictions on Test data 150
Fig.171 ACF plot of Train data 153
Fig.172 Parameter Combinations for SARIMA model 154
Fig.173 AIC values for different parameter combinations 154
Fig.174 Sorted AIC values for different parameter combinations 155
6

Fig.175 Sparkling Wine – Automated SARIMA model 155


Fig.176 Automated SARIMA – Diagnostics plot 156
Fig.177 Sample of Automated SARIMA (3,1,2) (3,0,1,12) predictions 157
Fig.178 Plot of Automated SARIMA (3,1,2) (3,0,1,12) predictions on Test data 157
Fig.179 ACF plot on differenced train data 160
Fig.180 PACF plot on differenced train data 160
Fig.181 Sparkling Wine – Manual ARIMA model 161
Fig.182 Manual ARIMA – Diagnostics plot 162
Fig.183 Sample of Manual ARIMA (2,1,1) predictions 163
Fig.184 Plot of Manual ARIMA (2,1,1) predictions on Test data 163
Fig.185 ACF plot on differenced train data 166
Fig.186 PACF plot on differenced train data 166
Fig.187 Sparkling Wine – Manual SARIMA model 167
Fig.188 Manual SARIMA – Diagnostics plot 168
Fig.189 Sample of Manual SARIMA (4,1,2) (0,1,1,12) predictions 169
Fig.190 Plot of Manual SARIMA (4,1,2) (0,1,1,12) predictions on Test data 169
Fig.191 RMSE values of all models 171
Fig.192 Sorted RMSE values of all models 172
Fig.193 Time Series Plot 1 – Different Model predictions on test data 173
Fig.194 Time Series Plot 2 – Different Model predictions on test data 174
Fig.195 Time Series Plot 3 – Different Model predictions on test data 175
Fig.196 TES Optimum Model – Line plot of Predictions vs Actual values 176
Fig.197 TES Optimum Model – Line plot of Predictions vs Actual values on Test data 177
Fig.198 TES Optimum Model 178
Fig.199 TES Model – Forecast for next 12 months 178
Fig.200 TES Optimum Model – Time series plot forecast for next 12 months 179
Fig.201 TES Optimum Model – Future forecast with confidence intervals 179
Fig.202 TES Optimum Model – Time series plot forecast with confidence intervals 180
Fig.203 TES Optimum Model – Forecast for next 12 months with confidence intervals 180
Fig.204 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values 181
Fig.205 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values on Test data 182
Fig.206 Manual SARIMA Optimum Model 183
Fig.207 Manual SARIMA Model – Forecast for next 12 months with confidence intervals 184
Fig.208 Manual SARIMA Optimum Model – Time series plot forecast for next 12 months 184
Fig.209 Manual SARIMA Optimum Model – Time series plot forecast with confidence intervals 185
Fig.210 Manual SARIMA Optimum Model – Forecast for next 12 months with confidence interval 185
7

LIST OF TABLES
Table 1 Sample of first 5 rows of the dataset 10
Table 2 Sample of last 5 rows of the dataset 10
Table 3 Sample of first 5 rows of the dataset 101
Table 4 Sample of last 5 rows of the dataset 101
8
Rose Wine Analysis

Executive Summary
Data on wine sales from the 20th century are available from ABC Estate Wines, a wine producing firm, and
should be examined. With the provided information, an estimate of wine sales in the 20th century must be
forecasted.

Fig.1 Rose Wine Analysis


9

Introduction
The purpose of this report is to explore the dataset. Do the exploratory data analysis. Explore the
dataset using central tendency and other parameters. The data consists of sales of Rose wine from
20th century.

Data Dictionary

Variable Name Description


YearMonth Represents the year and month in which the
sales were recorded
Rose Denotes the number of wine units sold

Data Description
1. YearMonth: Datatime variable from 1980-01 to 1995-07
2. Rose: Continuous from 89 to 267
10

Sample of the dataset

Table 1. Sample of first 5 rows of the dataset

Table 2. Sample of last 5 rows of the dataset

Dataset has 2 columns which captures the Year and Month of recorded data and the number of units
sold on corresponding Year-Month respectively.
11

1) Read the data as an appropriate Time Series data and plot the data.
Let us check the types of variables in the data frame and check for missing values in the dataset

Fig.2 Details of the dataset columns

The dataset has 2 variables and 187 rows in total. The "YearMonth" column can be deleted
after creating a suitable time stamp column because it is not necessary for our modelling.
The column Rose is of float type. Additionally, we can observe from the data above that Rose
column has some missing values which needs to be imputed further as it’s a time series.

Time Stamp created from ‘YearMonth’ column

Fig.3 Details of the dataset columns


12

Resulting dataset after removing the “Year-Month” column and appending


Time_Stamp column

Fig.4 Details of the dataset columns

Time_Stamp column has been set as index of the dataset and column Rose has been renamed as
Rose_Wine_Sales.

Renaming the columns of the data frame


The below mentioned columns of the data frame have been renamed as shown.

Original Column Name Renamed Column Name


Rose Rose_Wine_Sales

Fig.5 Details of the dataset columns after renaming


13

Checking null values in the dataset

Fig.6 Null values in the dataset

As can be seen from the above figure, there are 2 null values present in the dataset.
Since it’s a time series we cannot remove it and hence must be imputed.

Fig.7 Graph plot of the Rose wine sales dataset

Observation:

• The data set provided contains sales information from January 1980 to July 1995.
• We can see from the plot that there has been a decline in sales over time. Over the
years, the sales have gradually decreased. The data also exhibit some seasonality, as
may be shown.
• There are 2 missing values which must be imputed.
14

2) Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

Handling Missing Values

Fig.8 Imputed values of the dataset

As can be seen from Fig.6, values are missing for July and August month of 1994. Since it’s a time
series, the missing values cannot be removed. We have imputed them using linear interpolation.

Fig.9 Null values after imputation


15

Descriptive Summary of the Dataset

Fig.10 Descriptive Summary of Rose_Wine_Sales column

Observation:

• 90 bottles of rose wine are typically sold each month.


• Between 62 and 111 units make up more than 50% of the sold rose wine units.
• The lowest unit sold is 28 units, while the highest unit sold is 267 units.
16

Exploratory Analysis
Let us analyze the wine sales across different years and months using boxplots

Yearly Plot

Fig.11 Yearly plot of Rose wine sales

Observation:

• We can see from the figure above that sales of rose wine have been declining over
time.
• After 1992, the median sales have been at their lowest levels, having peaked in 1980
and 1981.
• Additionally, we can see that there are outliers in the box plots.
17

Monthly Plot

Fig.11 Monthly plot of Rose wine sales

Observation:

• The sales trajectory appears to be precisely the reverse of that seen in the yearly plot,
increasing near the end of each year.
• January has the lowest wine sales while December sees the greatest. The sales
modestly grow from January to August and then sharply climb after that.
• Additionally, we can see that there are outliers in the box plots.
18

Annual Sales

Fig.12 Line plot – Annual sales

Quarterly Sales

Fig.13 Line plot – Quarterly sales


19

Monthly Sales across Different Years

Fig.14 Line plot – Monthly sales across different years

Empirical Cumulative Distribution Plot

Fig.15 Line plot – Empirical cumulative distribution function


20

Monthly Time Series Plot

Fig.16 Line plot – Monthly time series

Observation:

• After 1981, the sales fell drastically. Sales are typically lowest in the first quarter and
highest in the fourth quarter.
• Every year, December has the highest sales, followed by November and October.
January had the lowest sales.
• From the cumulative distribution graph, we can observe that around 70 to 75 percent
of the units sold are fewer than 100, and 90% of the units sold are less than 150. Only
15% of sales involved less than 50 items. Therefore, it is clear that the bulk of sales
were in the range of 50 to 100 units.
21

Average Wine sales per month & change percentage over each month

Fig.17 Line plot – Average and % Change over each month

Observation:

• We can see that there is a declining trend and seasonality from the average sales and
% change plots. Additionally, the seasonality in the percentage change appears to be
consistent throughout all the years.
22

Decomposition of Time Series

Additive Decomposition

Fig.18 Additive decomposition of time series

Fig.19 Additive Decomposition - Sample of Trend, Seasonality & Residual values


23

Multiplicative Decomposition

Fig.20.1 Multiplicative decomposition of time series

Fig.20.2 Multiplicative Decomposition - Sample of Trend, Seasonality & Residual values


24

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal.
• The residual patterns after additive decomposition of the time series appear to
represent the seasonal element and exhibit substantial variation.
• In the multiplicative decomposition of the time series, it has been observed that the
seasonal fluctuation of residuals is under control.
• The size of the seasonal variations doesn't change on comparison, but the residuals
are tightly controlled by the multiplicative decomposition. In addition to this, the
residuals are not independent of seasonality thus we may assume that it is
multiplicative.
25

3) Split the data into training and test. The test data should start in 1991.
Train and test data are separated from the provided dataset. Sales data up to 1991 is included in the
training data, while data from 1991 through 1995 is used for testing.

Fig.21.1 First and Last few rows of Train data Fig.21.2 First and Last few rows of Test data
26

Fig.22 Count summary on train and test data

Fig.23 Line Plot – Splitting of time series into Train & Test data
27

4) Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naïve forecast
models and simple average models. should also be built on the training data and
check the performance on the test data using RMSE.

Model 1 – Linear Regression


For this particular linear regression, we are going to regress the 'Rose_Wine_Sales' variable against
the order of the occurrence.

For the selection criteria, the below Linear Regression model is built by using default parameters.

Fig.24 Rose Wine – Linear regression model

Fig.25 Linear regression on Test data


28

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• The train and test data trends have been caught by the linear regression model
however, it is unable to account for seasonality
• The root means squared error (RMSE) for the linear regression model is 15.268. The
size of the seasonal

Linear Regression: Model Evaluation

Performance Metric
Test RMSE 15.268887
29

Model 2 – Naïve Forecast


For this particular naive model, we say that the prediction for tomorrow is the same as today and
the prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is same as
today, therefore the prediction for day after tomorrow is also today.

Fig.26 Naïve forecast on Test data


30

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• The seasonality and trend of the time series data cannot be captured by the simple
forecast model.
• The root means squared error (RMSE) for the naïve forecast model is 79.719 which is
significantly higher than the regression model.

Naïve Forecast: Model Evaluation

Performance Metric
Test RMSE 79.718576
31

Model 3 – Simple Average


For this particular simple average method, we will forecast by using the average of the training
values.

Fig.27 Rose Wine – Simple Average model

Fig.28 Simple Average model predictions on Test data


32

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• The seasonality and trend of the time series data cannot be captured by the simple
average model.
• The root means squared error (RMSE) for the simple average model is 53.46 which is
significantly higher than the regression model but lower than naïve forecast model.

Simple Average: Model Evaluation

Performance Metric
Test RMSE 53.460367
33

Model 4 – Moving Average (MA)


For the moving average model, we are going to calculate rolling means (or moving averages) for
different intervals. The best interval can be determined by the maximum accuracy (or the minimum
error) over here.

Fig.29 Rose Wine – Sample of Trailing Moving Averages

Fig.30 Moving Average on Entire data


34

Fig.31 Individual visualization of moving averages on entire data


35

Fig.32 Moving averages forecast on test data

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• The seasonality and trend of the time series data may both be predicted using
moving average models.
• We can see how the data smooth out as the number of observation points taken
increases. The 2-point TMA has characteristics that are more similar to test results
than the 9-point TMA.
• The root means squared error (RMSE) for the 2-point trailing average model is
11.529, which is lowest than all models build so far.
36

Moving Average: Model Evaluation

Model Test RMSE

2 Point Trailing Moving Average 11.529278


4 Point Trailing Moving Average 14.451376
6 Point Trailing Moving Average 14.566262
9 Point Trailing Moving Average 14.727596
37

Let's compare the visualization of each model's predictions that we have constructed so far before
investigating exponential smoothing methods.

Fig.33 Comparison of different models on test data (Regression, Naïve, Simple and Moving Average)

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• We can see from the graph above that simple average and naive forecast models fail
to adequately describe the characteristics of the test data.
• The trend portion of the series has been caught using linear regression, however the
seasonality has been missed
• Both trend and seasonality may be accounted for using moving average models
38

Model 5 – Simple Exponential Smoothing


The simplest of the exponentially smoothing methods is naturally called simple exponential smoothing
(SES). This method is suitable for forecasting data with no clear trend or seasonal pattern.

In Single ES, the forecast at time (t + 1) is given by Winters,1960

Ft+1=αYt + (1−α)Ft
Parameter α is called the smoothing constant and its value lies between 0 and 1. Since the model
uses only one smoothing constant, it is called Single Exponential Smoothing.

For the selection criteria, the below Simple Exponential Smoothing is built by using optimized
parameters.

Fig.34 Rose Wine – Simple Exponential Smoothing Model

Fig.35 Sample of SES predictions


39

Fig.36 Rose Wine - SES predictions on Test data

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.

The range of alpha value is from 0.1 to 0.95 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.
40

Fig.37 SES prediction metrics for different alpha values

Fig.38 SES forecast for different Alpha values


41

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• When there is neither a trend nor a seasonal component to the time series, simple
exponential smoothing is typically used. It is due to this reason, it unable to capture
the characteristics of the time series data.
• The root means squared error (RMSE) for the simple exponential smoothing model
with Alpha=0.0987 is 36.796 and for Alpha=0.1, RMSE is 36.827.
• The Simple Exponential Smoothing with alpha=0.0987 is taken as the best model
among two as it has the lowest test RMSE.

Simple Exponential Smoothing: Model Evaluation

Model Test RMSE


SES (Alpha = 0.0987) 36.796036
SES (Alpha = 0.1) 36.827827
42

Model 6 – Double Exponential Smoothing (Holt's Model)


This model is an extension of SES known as Double Exponential model which estimates two
smoothing parameters. Applicable when data has Trend but no seasonality. Two separate
components are considered: Level and Trend. Level is the local mean. One smoothing parameter α
corresponds to the level series. A second smoothing parameter β corresponds to the trend series.

Double Exponential Smoothing uses two equations to forecast future values of the time series, one
for forecasting the short-term average value or level and the other for capturing the trend.

Intercept or Level equation, Lt is given by: Lt = αYt + (1−α)Ft

Trend equation is given by Tt = β(Lt − Lt−1) + (1−β)Tt−1

Here, αα and ββ are the smoothing constants for level and trend, respectively,

0 <α < 1 and 0 < β < 1.

The forecast at time t + 1 is given by

Ft+1 = Lt + Tt

Ft+n = Lt + nTt

For the selection criteria, the below Double Exponential Smoothing is built by using optimized
parameters.

Fig.39 Rose Wine – Double Exponential Smoothing Model


43

Fig.40 Sample of DES predictions

Fig.41 Rose Wine - DES predictions on Test data


44

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.

The range of alpha value is from 0.05 to 1.0 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.

Fig.42 DES prediction metrics for different alpha, beta values

Fig.43 DES forecast for different Alpha, Beta values


45

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• When there is simply trend and no seasonality in the time series data, the double
exponential smoothing model performs well. It is due to this reason it is only able to
capture the trend characteristics of the data and seasonality is not accounted for.
• The root means squared error (RMSE) for the double exponential smoothing model
with Alpha=1.49e-08, Beta=7.389e-09 is 15.268 and for Alpha=0.05, Beta=0.35 (Auto
tuned model), RMSE is 16.328994.
• The Double Exponential Smoothing with Alpha=1.49e-08, Beta=7.389e-09 is taken
as the best model among two as it has the lowest test RMSE.
• Additionally, it should be highlighted that compared to the simple exponential
smoothing model, the double exponential smoothing model has almost halved the
RMSE values.

Double Exponential Smoothing: Model Evaluation

Model Test RMSE


DES (Alpha=1.49e-08, Beta=7.389e-09) 15.268889
DES (Alpha=0.05, Beta=0.35) 16.328994
46

Model 7 – Triple Exponential Smoothing (Holt-Winter’s Model)


This model is an extension of DES known as Triple Exponential Smoothing model which estimates
three smoothing parameters. Applicable when data has both Trend and seasonality. Three separate
components are considered: Level, Trend and Seasonality.

One smoothing parameter α corresponds to the level series.

A second smoothing parameter β corresponds to the trend series.

A third smoothing parameter γ corresponds to the seasonality series

where,

0 < α <1,

0 < β <1,

0 < γ <1

For the selection criteria, the below Triple Exponential Smoothing is built by using optimized
parameters.

Fig.44 Rose Wine – Triple Exponential Smoothing Model


47

Fig.45 Sample of TES predictions

Fig.46 Rose Wine - TES predictions on Test data

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.
48

The range of alpha value is from 0.1 to 1.0 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.

Fig.47 TES prediction metrics for different alpha, beta and gamma values

Fig.48 TES forecast for automated model parameters


49

Fig.49 TES forecast for different model parameters

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• When there is both trend and seasonality in the time series data, the triple
exponential model works well. It is due to this reason it able to capture both the
trend and seasonal characteristics and nearly match the actual test data plot.
• The root means squared error (RMSE) for the double exponential smoothing model
with Alpha=0.064, Beta=0.053, Gamma=0.0 is 21.154 and for Alpha=0.2, Beta=0.85,
Gamma=0.15 (Auto tuned model), RMSE is 9.121.
• The Triple Exponential Smoothing with Alpha=0.2, Beta=0.85, Gamma=0.15 is taken
as the best model among two as it has the lowest test RMSE.
• Additionally, it should be highlighted that compared to the double exponential
smoothing model, the triple exponential smoothing model has almost reduced the
RMSE value by 40%.
50

Triple Exponential Smoothing: Model Evaluation

Model Test RMSE


TES (Alpha=0.064, Beta=0.053, 21.154527
Gamma=0.0)
TES (Alpha=0.2, Beta=0.85, 9.121757
Gamma=0.15)

Let's compare the RMSE values of the models we have constructed so far and visualize the plot of the
best exponential smoothing models thus built.

Fig.50 Comparison of Test RMSE values of different exponential smoothing models


51

Fig.51 Comparison of different models on test data (SES, DES and TES)

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• Simple exponential smoothing is frequently employed when the time series doesn't
include a trend or a seasonal component. This is the reason why it is unable to
capture the time series data's features.
• The double exponential smoothing model works effectively when the time series
data just contains trend and no seasonality. This explains why seasonality is not taken
into consideration and just the trend features of the data are captured.
• The triple exponential model performs effectively when the time series data exhibit
both trend and seasonality. This is the reason why it is essentially identical to the test
data plot and is able to capture both the trend and seasonal aspects.
• The Triple exponential model is the best model we have built so far as it has the
lowest RMSE value.
52

5) Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.

Checking for Stationarity of Entire Data


The Augmented Dickey-Fuller test is an unit root test which determines whether there is a unit root and
subsequently whether the series is non-stationary.

Framing the hypothesis:

H0: The Time Series has a unit root and is thus non-stationary.

H1: The Time Series does not have a unit root and is thus stationary.

The series have to be stationary for building ARIMA/SARIMA models and thus we would want the
p-value of this test to be less than the α value.

Fig.52 Rose Wine – ADF summary

Inference:

We see that at 5% significant level the Time Series is non-stationary as p-value is 0.467 which is
more than alpha value (0.05), therefore we fail to reject the null hypothesis. Let us take one level
of differencing to see whether the series becomes stationary.
53

Fig.53 Rose Wine – ADF summary with differencing

Inference:

We see that at 5% significant level the Time Series becomes stationary as p-value is 3.015e-11
which is less than alpha value (0.05), therefore we reject the null hypothesis. We can see that the
provided time series becomes stationary with differencing.

Fig.54 Time Series Plot of Entire data – With differencing


54

Checking for Stationarity of Training Data

Fig.54 Time Series Plot of Train data

Fig.55 Rose Wine – ADF summary on train data

Inference:

We see that at 5% significant level the Time Series of training data is non-stationary as p-value is
0.756 which is more than alpha value (0.05), therefore we fail to reject the null hypothesis. Let us
take one level of differencing to see whether the series becomes stationary.
55

Fig.56 Rose Wine – ADF summary on train data with differencing

Inference:

We see that at 5% significant level the Time Series of training data is non-stationary as p-value is
3.894e-08 which is less than alpha value (0.05), therefore we reject the null hypothesis. We can
see that the provided training time series becomes stationary with differencing.

Fig.57 Time Series Plot of Training data with differencing

Observation:

• As per the Augmented Dicky-Fuller test, we observed that the time series data by
itself is not stationary, however, it becomes stationary when differencing is done.
• The same thing is also observed with Training data. Therefore, for training the
models, it can be built with order of difference d=1.
56

6) Build an automated version of the ARIMA/SARIMA model in which the parameters


are selected using the lowest Akaike Information Criteria (AIC) on the training data
and evaluate this model on the test data using RMSE.

Model 8 – Auto-Regressive Integrated Moving Average (ARIMA)


Auto-regression means regression of a variable on itself. One of the fundamental assumptions of an
AR model is that the time series is assumed to be a stationary process. When the time series data is
not stationary, then we have to convert the non-stationary time-series data to stationary time-series
before applying AR.

ARIMA models may be used to represent any "non-seasonal" time series that has patterns and isn't
just random noise.

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

For the selection criteria of p,d,q the below ARIMA model is built by using automated model
parameters with lowest Akaike Information Criteria.
57

Fig.58 Parameter Combinations for ARIMA model Fig.59 AIC values for different parameter combinations

Fig.60 Sorted AIC values for different parameter combinations

We can see that among all the possible given combinations, the AIC is lowest for the combination
(2,1,3). Hence, the model is built with these parameters to determine the RMSE value of test data.
58

Fig.61 Rose Wine – Automated ARIMA model


59

Fig.62 Automated ARIMA – Diagnostics plot

Observation:

• The optimal parameters are decided based on the lowest Akaike Information Criteria
(AIC) values. The AIC is lowest for the combination (2,1,3) as we see from the above
results.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
60

Fig.63 Sample of Automated ARIMA (2,1,3) predictions

Fig.64 Plot of Automated ARIMA (2,1,3) predictions on Test data


61

Automated ARIMA: Model Evaluation


For evaluating the model’s performance metrics, we look at root means squared error (RMSE)
& mean absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=2, d=1, q=3) 36.813 75.839

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• ARIMA models performs well on non-seasonal time series. It is due to this reason it is
unable to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the ARIMA model with (p=2,
d=1, q=3) is 36.813.
• Not surprisingly, the RMSE of the aforementioned ARIMA model is greater than the
majority of previously constructed models.
62

Model 9 – Seasonal Auto-Regressive Integrated Moving Average (SARIMA)


SARIMA models or also known as Seasonal ARIMA is an extension of ARIMA for a time series data
with defined seasonality. SARIMA models use seasonal differencing which is similar to regular
differencing.

A SARIMA model is characterized by 7 terms: p, d, q, P, Q, D and F

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

P is the order of the Seasonal Auto Regressive (AR) term

Q is the order of the Seasonal Moving Average (MA) term

D is the number of seasonal differencing required to make the time series stationary

F is the seasonal frequency of the time series

We must examine the PACF and ACF plots, respectively, at delays that are the multiple of "F" in order
to determine the "P" and "Q" values, and determine where these cut-off values are (for appropriate
confidence interval bands).

By examining the lowest AIC values, we can also estimate "p," "q," "P," and "Q" for the SARIMA
models.

By examining the ACF plots, one may calculate the seasonal parameter 'F'. The existence of
seasonality should be shown by a spike in the ACF plot at multiples of "F."
63

Fig.65 ACF plot of Train data

From the above ACF plot we can observe that at every 12 th lag is significant indicating the presence of
seasonality. Hence for our model building we will consider the term F=12.
64

For the selection criteria of p, d, q, P, D, Q & F the below SARIMA model is built by using automated
model parameters with lowest Akaike Information Criteria.

Fig.66 Parameter Combinations for SARIMA model

Fig.67 AIC values for different parameter combinations


65

Fig.68 Sorted AIC values for different parameter combinations

We can see that among all the possible given combinations, the AIC is lowest for the combination
(3,1,1) (3,0,2,12). Hence, the model is built with these parameters to determine the RMSE value of
test data.

Fig.69 Rose Wine – Automated SARIMA model


66

Fig.70 Automated SARIMA – Diagnostics plot

Observation:

• The optimal parameters are decided based on the lowest Akaike Information Criteria
(AIC) values. The AIC is lowest for the combination (3,1,1) (3,0,2,12) as we see from
the above results.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
67

Fig.71 Sample of Automated SARIMA (3,1,1) (3,0,2,12) predictions

Fig.72 Plot of Automated SARIMA (3,1,1) (3,0,2,12) predictions on Test data


68

Automated SARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


SARIMA (p=3, d=1, q=1) 18.881 36.375
(P=3, D=0, Q=2, F=12)

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• SARIMA model performs well on seasonal time series. It is due to this reason it is able
to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the SARIMA model with (p=3,
d=1, q=1) (P=3, D=0, Q=2, F=12) is 18.881.
• Additionally, it should be highlighted that compared to the ARIMA model, the
SARIMA model has almost halved the RMSE value.
69

7) Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.

Model 10 – Auto-Regressive Integrated Moving Average (ARIMA) - Manual

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

Indicating which previous series values are most beneficial in forecasting future values, autocorrelation
and partial autocorrelation are measures of relationship between present and past series values. You
may identify the sequence of processes in an ARIMA model using this information.

The parameters p & q can be determined by looking at the PACF & ACF plots respectively.

Autocorrelation function (ACF) - At lag k, this is the correlation between series values that
are k intervals apart.

Partial autocorrelation function (PACF) - At lag k, this is the correlation between series values that
are k intervals apart, accounting for the values of the intervals between.

In an ACF & PACF plots, each bar represents the size and direction of the connection. Bars that cross
the red line are statistically significant.
70

ACF Plot – Training Data

Fig.73 ACF plot on differenced train data

PACF Plot – Training Data

Fig.74 PACF plot on differenced train data


71

Observation:

• The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the
significant lag after which the PACF plot cuts-off below the confidence interval.
• The Moving-Average parameter in an ARIMA model is 'q' which comes from the
significant lag after which the ACF plot cuts-off below the confidence interval.
• By looking at the above plots, we will take the value of p=2 and q=2 respectively.
The value of d=1, as with differencing the time series becomes stationary.

Fig.75 Rose Wine – Manual ARIMA model


72

Fig.76 Manual ARIMA – Diagnostics plot

Observation:

• The model's parameters, p and q, were identified by examining the ACF (q=2) and
PACF (p=2) graphs. Since we differenced the series to make it stationary, the
parameter d=1.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
73

Fig.77 Sample of Manual ARIMA (2,1,2) predictions

Fig.78 Plot of Manual ARIMA (2,1,2) predictions on Test data


74

Manual ARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=2, d=1, q=2) 36.87 76.055

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• ARIMA models performs well on non-seasonal time series. It is due to this reason it is
unable to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the ARIMA model with (p=2,
d=1, q=2) is 36.87.
• Not surprisingly, the RMSE of the aforementioned ARIMA model is greater than the
majority of previously constructed models and nearly equal to ARIMA (2,1,3) model.
75

Model 11 – Seasonal Auto-Regressive Integrated Moving Average (SARIMA) – Manual

A SARIMA model is characterized by 7 terms: p, d, q, P, Q, D and F

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

P is the order of the Seasonal Auto Regressive (AR) term

Q is the order of the Seasonal Moving Average (MA) term

D is the number of seasonal differencing required to make the time series stationary

F is the seasonal frequency of the time series

We must examine the PACF and ACF plots, respectively, at delays that are the multiple of "F" in order
to determine the "P" and "Q" values, and determine where these cut-off values are (for appropriate
confidence interval bands).

By examining the ACF plots, one may calculate the seasonal parameter 'F'. The existence of
seasonality should be shown by a spike in the ACF plot at multiples of "F."

The parameters P & Q can be determined by looking at the seasonally differenced PACF & ACF plots
respectively.

Autocorrelation function (ACF) - At lag k, this is the correlation between series values that
are k intervals apart.

Partial autocorrelation function (PACF) - At lag k, this is the correlation between series values that
are k intervals apart, accounting for the values of the intervals between.

In an ACF & PACF plots, each bar represents the size and direction of the connection. Bars that cross
the red line are statistically significant.
76

ACF Plot – Seasonally differenced (F=12) Training Data

Fig.79 ACF plot on differenced train data

PACF Plot – Seasonally differenced (F=12) Training Data

Fig.80 PACF plot on differenced train data


77

Observation:

• From the PACF plot it can be seen in early lags that till lag 4 is significant before cut-
off, so AR term ‘p = 4’ is chosen. From the multiples of seasonal lags, after first
seasonal lag of 12, it cuts off, so keep seasonal AR ‘P = 0’.
• From ACF plot, it can be seen in early lags, lag 1 and 2 are significant before it cuts off,
so let’s keep MA term ‘q = 2’ and at seasonal lag of 12, a significant lag is apparent
and no seasonal lags are apparent at lags 24, 36 or afterwards, so let’s keep ‘Q = 1'.
• The final selected terms for SARIMA model is (4, 1, 2) (0, 1, 1, 12), as inferred from
the ACF and PACF plots.

Fig.81 Rose Wine – Manual SARIMA model


78

Fig.82 Manual SARIMA – Diagnostics plot

Observation:

• The model's parameters, p, q, P, Q were identified by examining the ACF (q=2, Q=1)
and PACF (p=4, P=0) graphs. Since we differenced the series to make it stationary, the
parameter d=1, D=1.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
79

Fig.83 Sample of Manual SARIMA (4,1,2) (0,1,1,12) predictions

Fig.84 Plot of Manual SARIMA (4,1,2) (0,1,1,12) predictions on Test data


80

Manual SARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=4, d=1, q=2) 15.907 23.712
(P=0, D=1, Q=1, F=12)

Observation:

• We can see from the graphs above that the time series has a falling trend and is
seasonal
• SARIMA model performs well on seasonal time series. It is due to this reason it is able
to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the SARIMA model with (p=4,
d=1, q=1) (P=0, D=1, Q=1, F=12) is 15.907.
• Additionally, it should be highlighted that compared to the all the ARIMA/SARIMA
models built so far, this SARIMA model has the lowest RMSE value.
81

8) Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

Fig.85 RMSE values of all models


82

Fig.86 Sorted RMSE values of all models

Observation:

• From the above table, we can see that Triple Exponential Smoothing model with
parameters (Alpha=0.2, Beta=0.85, Gamma=0.15) has the lowest RMSE for test data.
• The naïve forecast model has performed the worst in terms of RMSE.
83

9) Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.
From Fig.86 we observed the Triple Exponential Smoothing model is the optimum model for the
given data set as it has the lowest RMSE value.

However, as we know SARIMA models tend to perform better with seasonal time series, we are also
considering SARIMA model for the forecast.

Let us visually see the time series plots of different models we have built so far on test data

Fig.87 Time Series Plot 1 – Different Model predictions on test data


84

Fig.88 Time Series Plot 2 – Different Model predictions on test data


85

Plotting the lowest RMSE models

Fig.89 Time Series Plot 3 – Different Model predictions on test data


86

Optimum Model 1:
Triple Exponential Smoothing Model (Alpha=0.2, Beta=0.85, Gamma=0.15)

Fig.90 TES Optimum Model – Line plot of Predictions vs Actual values


87

Fig.91 TES Optimum Model – Line plot of Predictions vs Actual values on Test data
88

Fig.92 TES Optimum Model

Fig.93 TES Model – Forecast for next 12 months


89

Fig.94 TES Optimum Model – Time series plot forecast for next 12 months

Fig.95 TES Optimum Model – Future forecast with confidence intervals


90

Fig.96 TES Optimum Model – Time series plot forecast with confidence intervals

Fig.97 TES Optimum Model – Time series plot forecast for next 12 months with confidence intervals
91

Optimum Model 2:
Manual SARIMA Model (4, 1, 2) (0, 1, 1, 12)

Fig.98 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values
92

Fig.99 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values on Test data
93

Fig.100 Manual SARIMA Optimum Model


94

Fig.101 Manual SARIMA Model – Forecast for next 12 months with confidence intervals

Fig.102 Manual SARIMA Optimum Model – Time series plot forecast for next 12 months
95

Fig.103 Manual SARIMA Optimum Model – Time series plot forecast with confidence intervals

Fig.104 Manual SARIMA Optimum Model – Forecast for next 12 months with confidence interval
96

10) Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
We needed to construct an optimum model to forecast the rose wine sales for the next 12 months.
The model information, insights and recommendations are as follows.

Model Insights:

• The time series in consideration exhibits a declining trend and stable seasonality. When
comparing the various models, we can see that Triple Exponential Smoothing and SARIMA
models frequently deliver the greatest results. This is due to the fact that these models
are excellent at predicting time series that demonstrate trend and seasonality. Apart
from these Double Exponential Smoothing and Moving Average Models also tend to
perform moderately good.
• We examine the root mean squared value of the forecast model to assess its performance
(RMSE). The model with the lowest RMSE value and characteristics that match the test
data is regarded as being a superior model.
• We observed that Triple Exponential Smoothing model had the lowest RMSE and the
characteristics that most closely fit test data. As a result, its regarded as the best model
for forecasting and can thus be used by the company for forecast analysis.

Historical Insights:

• The rose wine sales have declined throughout time. Rose wine sales peaked in 1980 &
1981 and fell to their present low position in 1995 (as we have data for only first 7
months).
• The monthly sales trajectory appears to be exactly the opposite of the yearly plot, with a
progressive increase towards the end of each year. January has the lowest wine sales,
while December has the highest. From January to August, sales increase gradually, and
then they quickly increase after that.
• The average monthly sales of Rose wine are 90 bottles. More than 50% of the sold units
of rose wine fall between 62 and 111. 28 units were sold as the lowest and 267 units as
the most. Only 20% of monthly sales that were recorded were for more than 120 units.
• Around 70 to 75 percent of the units sold are fewer than 100, and 90% of the units sold
are less than 150. Only 15% of sales involved more than 50 items. Therefore, it is clear
that the bulk of sales were in the range of 50 to 100 units.
97

Forecast Insights:

• Based on the forecast made by the Triple Exponential Smoothing model previously
presented, the following insights are offered.
• The forecast calls for average sale of 44 units, down by 45 units from the historical
average of 89 units. Thus, we might observe an alarming decrease in average sales by
50%.
• The prediction for minimum sales volume of 28 units equals the minimum sales volume
in the past. Consequently, a no percentage change could be seen in minimum quantity
sold.
• The projection estimates a maximum sales volume of 70 units, which is 197 units fewer
than the largest sales volume recorded in the past, which was 267 units. Consequently, a
73% decrease in maximum sales is visible.
• In comparison to the historical standard deviation of 62 recorded in the past, the
forecast's standard deviation is 10 units, or 52 units lower. It's gone down by 83%. This
is not anticipated because historical data tends to have less volatility than future data.
• We can see from the prediction that the months of October, November, and December
have increased sales. December is often when the sales are at their highest. There is a
startling decline in sales in January following December. The months after January
appear to witness a gradual improvement in sales until October, when it jumps sharply.

Recommendations:

• Records show that the months of September, October, November, and December
account for 40% of the total sales forecast. Many festivities take place in these months,
and many people travel during this time. One of the most premium types of wine used
during festive and event celebrations is rose wine.
• Wine sales often climb in the final two months of the year as people hurry to buy holiday
beverages. For forthcoming occasions like Thanksgiving, Christmas, and New Year's,
people typically stock up. The majority of individuals also buy in bulk for holiday
gatherings and gift-giving.
• Many individuals choose wine as their go-to gift when it comes to occasions like parties
and gift-giving. Sales of Rose wine rise just before the winter holidays as more collectors
purchase these wines as presents or look for vintages to serve at holiday gatherings.
98

• This blush wine works nicely with nearly anything, including spicy dishes, sushi, salads,
grilled meats, roasts, and rich sauces. It is well renowned for its outdoor-friendly
drinking style.
• The festival seasons may vary depending on where you are geographically, however the
most of the celebrations take place in the last four months.
▪ In these months, promotional offers might be implemented to lower costs
and significantly boost revenue.
▪ To increase sales, we must take advantage of all holiday events and set prices
appropriately.
▪ Many individuals order in bulk to prepare for upcoming festivities, which may
result in a high shipping expenditure. Businesses may provide significant
discounts or free shipping beyond a certain threshold at these times.
▪ Giving customers gifts to improve their user experience is one of the greatest
marketing strategies to deploy. In order to attract more consumers and
increase sales, the company might provide free gifts on orders with significant
sales.
▪ To target various client demographics, the proper marketing campaigns must
be run
▪ Numerous ecommerce campaigns and competitions may be performed to
broaden the product's audience and enhance sales.
• The period from January to June is one of the key challenges for Rose wine sales.
▪ To identify the elements affecting sales, in-depth market research must be
conducted.
▪ Due to the fact that rose wines are premium category of wine, a market-
friendly version of the existing product might be introduced by the company,
helping to make up for the drop in sales. Long-term, this may bring in
additional clients.
▪ The company can rebrand its product to instill a fresh perspective towards
the product and break the declining sales trend.
• There are other key elements that might be driving the sales, despite the present model's
ability to closely track the historical sales trend.
▪ The forecast might be improved by doing in-depth market research on the
factors that influence sales and incorporating that information into the model
for projection
Sparkling Wine 99

Analysis

Executive Summary
Data on wine sales from the 20th century are available from ABC Estate Wines, a wine producing
firm, and should be examined. With the provided information, an estimate of wine sales in the
20th century must be forecasted.

Fig.105 Sparkling Wine Analysis


100

Introduction
The purpose of this report is to explore the dataset. Do the exploratory data analysis. Explore the
dataset using central tendency and other parameters. The data consists of sales of Sparkling wine from
20th century.

Data Dictionary

Variable Name Description


YearMonth Represents the year and month in which the
sales were recorded
Sparkling Denotes the number of wine units sold

Data Description
3. YearMonth: Datatime variable from 1980-01 to 1995-07
4. Sparkling: Continuous from 1070 to 7242
101

Sample of the dataset

Table 3. Sample of first 5 rows of the dataset

Table 4. Sample of last 5 rows of the dataset

Dataset has 2 columns which captures the Year and Month of recorded data and the number of units
sold on corresponding Year-Month respectively.
102

1) Read the data as an appropriate Time Series data and plot the data.
Let us check the types of variables in the data frame and check for missing values in the dataset

Fig.106 Details of the dataset columns

The dataset has 2 variables and 187 rows in total. The "YearMonth" column can be deleted
after creating a suitable time stamp column because it is not necessary for our modelling.
The column Sparkling is of float type. Additionally, we can observe from the data above that
Sparkling column has no missing values.

Time Stamp created from ‘YearMonth’ column

Fig.107 Details of the dataset columns


103

Resulting dataset after removing the “Year-Month” column and appending


Time_Stamp column

Fig.108 Details of the dataset columns

Time_Stamp column has been set as index of the dataset and column Sparkling has been renamed as
Sparkling_Wine_Sales.

Renaming the columns of the data frame


The below mentioned columns of the data frame have been renamed as shown.

Original Column Name Renamed Column Name


Sparkling Sparkling_Wine_Sales

Fig.109 Details of the dataset columns after renaming


104

Checking null values in the dataset

Fig.110 Null values in the dataset

As can be seen from the above figure, there are no null values present in the
dataset.

Fig.111 Graph plot of the Sparkling wine sales dataset

Observation:

• The data set provided contains sales information from January 1980 to July 1995.
• We can see from the plot that there has been a constant pattern of sales with
seasonality. Over the years, the sales have consistent. The data also exhibits
seasonality, as may be shown.
• There are no missing values which must be imputed.
105

2) Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

Descriptive Summary of the Dataset

Fig.112 Descriptive Summary of Sparkling_Wine_Sales column

Observation:

• 2402 bottles of sparkling wine are typically sold each month.


• Between 1605 and 2549 units make up more than 50% of the sold sparkling wine
units.
• The lowest unit sold is 1070 units, while the highest unit sold is 7242 units.
• Only 25% of monthly sales that recorded are more than 2549 units.
106

Exploratory Analysis
Let us analyze the wine sales across different years and months using boxplots

Yearly Plot

Fig.113 Yearly plot of Sparkling wine sales

Observation:

• We can see from the figure above that sales of sparkling wine have remained
constant over the years.
• The median sales of sparkling wine reached their peak in 1988 and their current low
point in 1995.
• Additionally, we can see that there are outliers in the box plots.
107

Monthly Plot

Fig.114 Monthly plot of Sparkling wine sales

Observation:

• The sales trajectory appears to be precisely the reverse of that seen in the yearly plot,
seeing a gradual increase towards the end of each year.
• January has the lowest wine sales while December sees the greatest. The sales
modestly grow from January to August and then sharply climb after that.
• Additionally, we can see that there are few outliers in the box plots.
108

Annual Sales

Fig.115 Line plot – Annual sales

Quarterly Sales

Fig.116 Line plot – Quarterly sales


109

Monthly Sales across Different Years

Fig.117 Line plot – Monthly sales across different years

Empirical Cumulative Distribution Plot

Fig.118 Line plot – Empirical cumulative distribution function


110

Monthly Time Series Plot

Fig.119 Time series plot – Monthly time series

Observation:

• Over the years, sales have stayed steady. The sales climbed gradually starting in 1982
until 1988, then decreased until 1990, then slightly increased again until 1994.
• Every year, December has the highest sales, followed by November and October. The
first 2 months January and February have the lowest median sales.
• From the cumulative distribution graph, we can observe that around 60 to 70 percent
of the units sold are fewer than 2500, and 80% of the units sold are less than 4000.
Only 20% of sales involved more than 3000 items. Therefore, it is clear that the bulk
of sales were in the range of 1000 to 3000 units.
111

Average Wine sales per month & change percentage over each month

Fig.120 Line plot – Average and % Change over each month

Observation:

• We can see that there is a no trend but only seasonality from the average sales and %
change plots. Additionally, the seasonality in the percentage change appears to be
consistent throughout all the years.
112

Decomposition of Time Series

Additive Decomposition

Fig.121 Additive decomposition of time series

Fig.122 Additive Decomposition - Sample of Trend, Seasonality & Residual values


113

Multiplicative Decomposition

Fig.123 Multiplicative decomposition of time series

Fig.124 Multiplicative Decomposition - Sample of Trend, Seasonality & Residual values


114

Observation:

• The residual patterns after additive decomposition of the time series appear to
represent the seasonal element and exhibit substantial variation.
• In the multiplicative decomposition of the time series, it has been observed that the
seasonal fluctuation of residuals is under control.
• The size of the seasonal variations doesn't change on comparison, but the residuals
are tightly controlled by the multiplicative decomposition. In addition to this, the
residuals are not independent of seasonality thus we may assume that it is
multiplicative.
115

3) Split the data into training and test. The test data should start in 1991.
Train and test data are separated from the provided dataset. Sales data up to 1991 is included in the
training data, while data from 1991 through 1995 is used for testing.

Fig.125 First and Last few rows of Train data Fig.126 First and Last few rows of Test data
116

Fig.127 Count summary on train and test data

Fig.128 Line Plot – Splitting of time series into Train & Test data
117

4) Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naïve forecast
models and simple average models. should also be built on the training data and
check the performance on the test data using RMSE.

Model 1 – Linear Regression


For this particular linear regression, we are going to regress the 'Sparkling_Wine_Sales' variable
against the order of the occurrence.

For the selection criteria, the below Linear Regression model is built by using default parameters.

Fig.129 Sparkling Wine – Linear regression model

Fig.130 Linear regression on Test data


118

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• The train and test data trends have been caught by the linear regression model
however, it is unable to account for seasonality
• The root means squared error (RMSE) for the linear regression model is 1389.135.
The size of the seasonal

Linear Regression: Model Evaluation

Performance Metric
Test RMSE 1389.135175
119

Model 2 – Naïve Forecast


For this particular naive model, we say that the prediction for tomorrow is the same as today and
the prediction for day after tomorrow is tomorrow and since the prediction of tomorrow is same as
today, therefore the prediction for day after tomorrow is also today.

Fig.131 Naïve forecast on Test data


120

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• The seasonality and trend of the time series data cannot be captured by the naive
forecast model.
• The root means squared error (RMSE) for the naïve forecast model is 3864.279 which
is significantly higher than the regression model.

Naïve Forecast: Model Evaluation

Performance Metric
Test RMSE 3864.279352
121

Model 3 – Simple Average


For this particular simple average method, we will forecast by using the average of the training
values.

Fig.132 Sparkling Wine – Simple Average model

Fig.133 Simple Average model predictions on Test data


122

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• The seasonality and trend of the time series data cannot be captured by the simple
average model.
• The root means squared error (RMSE) for the simple average model is 1275.081
which is significantly lower than the naïve forecast model and slightly lower than
Linear regression model.

Simple Average: Model Evaluation

Performance Metric
Test RMSE 1275.081804
123

Model 4 – Moving Average (MA)


For the moving average model, we are going to calculate rolling means (or moving averages) for
different intervals. The best interval can be determined by the maximum accuracy (or the minimum
error) over here.

Fig.134 Sparkling Wine – Sample of Trailing Moving Averages

Fig.135 Moving Average on Entire data


124

Fig.136 Individual visualization of moving averages on entire data


125

Fig.137 Moving averages forecast on test data

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• The seasonality and trend of the time series data may both be predicted using
moving average models.
• We can see how the data smooth out as the number of observation points taken
increases. The 2-point TMA has characteristics that are more similar to test results
than the 9-point TMA.
• The root means squared error (RMSE) for the 2-point trailing average model is 813.4,
which is lowest than all models build so far.
126

Moving Average: Model Evaluation

Model Test RMSE

2 Point Trailing Moving Average 813.400684


4 Point Trailing Moving Average 1156.589694
6 Point Trailing Moving Average 1283.927428
9 Point Trailing Moving Average 1346.278315
127

Let's compare the visualization of each model's predictions that we have constructed so far before
investigating exponential smoothing methods.

Fig.138 Comparison of different models on test data (Regression, Naïve, Simple and Moving Average)

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• We can see from the graph above that simple average and naive forecast models fail
to adequately describe the characteristics of the test data.
• The trend portion of the series has been caught using linear regression, however the
seasonality has been missed
• Both trend and seasonality may be accounted for using moving average models
128

Model 5 – Simple Exponential Smoothing


The simplest of the exponentially smoothing methods is naturally called simple exponential smoothing
(SES). This method is suitable for forecasting data with no clear trend or seasonal pattern.

In Single ES, the forecast at time (t + 1) is given by Winters,1960

Ft+1=αYt + (1−α)Ft
Parameter α is called the smoothing constant and its value lies between 0 and 1. Since the model
uses only one smoothing constant, it is called Single Exponential Smoothing.

For the selection criteria, the below Simple Exponential Smoothing is built by using optimized
parameters.

Fig.139 Sparkling Wine – Simple Exponential Smoothing Model

Fig.140 Sample of SES predictions


129

Fig.141 Sparkling Wine - SES predictions on Test data

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.

The range of alpha value is from 0.1 to 0.95 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.
130

Fig.142 SES prediction metrics for different alpha values

Fig.143 SES forecast for different Alpha values


131

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• When there is neither a trend nor a seasonal component to the time series, simple
exponential smoothing is typically used. It is due to this reason, it unable to capture
the characteristics of the time series data.
• The root means squared error (RMSE) for the simple exponential smoothing model
with Alpha =0.0496 is 1316.135and for Alpha=0.1, RMSE is 1375.393.
• The Simple Exponential Smoothing with alpha=0.0496 is taken as the best model
among two as it has the lowest test RMSE.

Simple Exponential Smoothing: Model Evaluation

Model Test RMSE


SES (Alpha = 0.0496) 1316.135411
SES (Alpha = 0.1) 1375.393398
132

Model 6 – Double Exponential Smoothing (Holt's Model)


This model is an extension of SES known as Double Exponential model which estimates two
smoothing parameters. Applicable when data has Trend but no seasonality. Two separate
components are considered: Level and Trend. Level is the local mean. One smoothing parameter α
corresponds to the level series. A second smoothing parameter β corresponds to the trend series.

Double Exponential Smoothing uses two equations to forecast future values of the time series, one
for forecasting the short-term average value or level and the other for capturing the trend.

Intercept or Level equation, Lt is given by: Lt = αYt + (1−α)Ft

Trend equation is given by Tt = β(Lt − Lt−1) + (1−β)Tt−1

Here, αα and ββ are the smoothing constants for level and trend, respectively,

0 <α < 1 and 0 < β < 1.

The forecast at time t + 1 is given by

Ft+1 = Lt + Tt

Ft+n = Lt + nTt

For the selection criteria, the below Double Exponential Smoothing is built by using optimized
parameters.

Fig.144 Sparkling Wine – Double Exponential Smoothing Model


133

Fig.145 Sample of DES predictions

Fig.146 Sparkling Wine - DES predictions on Test data


134

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.

The range of alpha value is from 0.05 to 1.0 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.

Fig.147 DES prediction metrics for different alpha, beta values

Fig.148 DES forecast for different Alpha, Beta values


135

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• When there is simply trend and no seasonality in the time series data, the double
exponential smoothing model performs well. It is due to this reason it is only able to
capture the trend characteristics of the data and seasonality is not accounted for.
• The root means squared error (RMSE) for the double exponential smoothing model
with Alpha=0.6885, Beta=9.99e-05 is 2007.238and for Alpha=0.05, Beta=0.05 (Auto
tuned model), RMSE is 1418.407.
• The Double Exponential Smoothing with Alpha=0.05, Beta=0.05 is taken as the best
model among two as it has the lowest test RMSE.
• Additionally, it should be highlighted that compared to the simple exponential
smoothing model, the double exponential smoothing model has slightly higher RMSE.

Double Exponential Smoothing: Model Evaluation

Model Test RMSE


DES (Alpha=0.6885, Beta=9.99e-05) 2007.238526
DES (Alpha=0.05, Beta=0.05) 1418.407668
136

Model 7 – Triple Exponential Smoothing (Holt-Winter’s Model)


This model is an extension of DES known as Triple Exponential Smoothing model which estimates
three smoothing parameters. Applicable when data has both Trend and seasonality. Three separate
components are considered: Level, Trend and Seasonality.

One smoothing parameter α corresponds to the level series.

A second smoothing parameter β corresponds to the trend series.

A third smoothing parameter γ corresponds to the seasonality series

where,

0 < α <1,

0 < β <1,

0 < γ <1

For the selection criteria, the below Triple Exponential Smoothing is built by using optimized
parameters.

Fig.149 Sparkling Wine – Triple Exponential Smoothing Model


137

Fig.150 Sample of TES predictions

Fig.151 Sparkling Wine - TES predictions on Test data

The more recent observation is given more weight the higher the alpha value. That implies that the
recent events will repeat again. A loop with different alpha values is run to understand which
particular value works best for alpha on the test set.
138

The range of alpha value is from 0.1 to 1.0 and the respective RMSE for train and test data are
calculated for analyzing the performance metrics.

Fig.152 TES prediction metrics for different alpha, beta and gamma values

Fig.153 TES forecast for automated model parameters


139

Fig.154 TES forecast for different model parameters

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• When there is both trend and seasonality in the time series data, the triple
exponential model works well. It is due to this reason it able to capture both the
trend and seasonal characteristics and nearly match the actual test data plot.
• The root means squared error (RMSE) for the double exponential smoothing model
with Alpha=0.111, Beta=0.0617, Gamma=0.395 is 469.659 and for Alpha=0.35,
Beta=0.10, Gamma=0.20 (Auto tuned model), RMSE is 319.498.
• The Triple Exponential Smoothing with Alpha=0.35, Beta=0.10, Gamma=0.20 is
taken as the best model among two as it has the lowest test RMSE.
• Additionally, it should be highlighted that compared to the double exponential
smoothing model, the triple exponential smoothing model has almost reduced the
RMSE value by 75%.
140

Triple Exponential Smoothing: Model Evaluation

Model Test RMSE


TES (Alpha=0.111, Beta=0.0617, 469.659106
Gamma=0.395)
TES (Alpha=0.35, Beta=0.10, 319.498680
Gamma=0.20)

Let's compare the RMSE values of the models we have constructed so far and visualize the plot of the
best exponential smoothing models thus built.

Fig.155 Comparison of Test RMSE values of different exponential smoothing models


141

Fig.156 Comparison of different models on test data (SES, DES and TES)

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• Simple exponential smoothing is frequently employed when the time series doesn't
include a trend or a seasonal component. This is the reason why it is unable to
capture the time series data's features.
• The double exponential smoothing model works effectively when the time series
data just contains trend and no seasonality. This explains why seasonality is not taken
into consideration and just the trend features of the data are captured.
• The triple exponential model performs effectively when the time series data exhibit
both trend and seasonality. This is the reason why it is essentially identical to the test
data plot and is able to capture both the trend and seasonal aspects.
• The Triple exponential model is the best model we have built so far as it has the
lowest RMSE value.
142

5) Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.

Checking for Stationarity of Entire Data


The Augmented Dickey-Fuller test is an unit root test which determines whether there is a unit root and
subsequently whether the series is non-stationary.

Framing the hypothesis:

H0: The Time Series has a unit root and is thus non-stationary.

H1: The Time Series does not have a unit root and is thus stationary.

The series have to be stationary for building ARIMA/SARIMA models and thus we would want the
p-value of this test to be less than the α value.

Fig.157 Sparkling Wine – ADF summary

Inference:

We see that at 5% significant level the Time Series is non-stationary as p-value is 0.705 which is
more than alpha value (0.05), therefore we fail to reject the null hypothesis. Let us take one level
of differencing to see whether the series becomes stationary.
143

Fig.158 Sparkling Wine – ADF summary with differencing

Inference:

We see that at 5% significant level the Time Series becomes stationary as p-value is nearly 0 which
is less than alpha value (0.05), therefore we reject the null hypothesis. We can see that the
provided time series becomes stationary with differencing.

Fig.159 Time Series Plot of Entire data – With differencing


144

Checking for Stationarity of Training Data

Fig.160 Time Series Plot of Train data

Fig.161 Sparkling Wine – ADF summary on train data

Inference:

We see that at 5% significant level the Time Series of training data is non-stationary as p-value is
0.567 which is more than alpha value (0.05), therefore we fail to reject the null hypothesis. Let us
take one level of differencing to see whether the series becomes stationary.
145

Fig.162 Sparkling Wine – ADF summary on train data with differencing

Inference:

We see that at 5% significant level the Time Series of training data is non-stationary as p-value is
8.479e-11 which is less than alpha value (0.05), therefore we reject the null hypothesis. We can
see that the provided training time series becomes stationary with differencing.

Fig.163 Time Series Plot of Training data with differencing

Observation:

• As per the Augmented Dicky-Fuller test, we observed that the time series data by
itself is not stationary, however, it becomes stationary when differencing is done.
• The same thing is also observed with Training data. Therefore, for training the
models, it can be built with order of difference d=1.
146

6) Build an automated version of the ARIMA/SARIMA model in which the parameters


are selected using the lowest Akaike Information Criteria (AIC) on the training data
and evaluate this model on the test data using RMSE.

Model 8 – Auto-Regressive Integrated Moving Average (ARIMA)


Auto-regression means regression of a variable on itself. One of the fundamental assumptions of an
AR model is that the time series is assumed to be a stationary process. When the time series data is
not stationary, then we have to convert the non-stationary time-series data to stationary time-series
before applying AR.

ARIMA models may be used to represent any "non-seasonal" time series that has patterns and isn't
just random noise.

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

For the selection criteria of p,d,q the below ARIMA model is built by using automated model
parameters with lowest Akaike Information Criteria.
147

Fig.164 Parameter Combinations for ARIMA model Fig.165 AIC values for different parameter combinations

Fig.166 Sorted AIC values for different parameter combinations


148

We can see that among all the possible given combinations, the AIC is lowest for the combination
(4,1,4). Hence, the model is built with these parameters to determine the RMSE value of test data.

Fig.167 Sparkling Wine – Automated ARIMA model


149

Fig.168 Automated ARIMA – Diagnostics plot

Observation:

• The optimal parameters are decided based on the lowest Akaike Information Criteria
(AIC) values. The AIC is lowest for the combination (4,1,4) as we see from the above
results.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
150

Fig.169 Sample of Automated ARIMA (4,1,4) predictions

Fig.170 Plot of Automated ARIMA (4,1,4) predictions on Test data


151

Automated ARIMA: Model Evaluation


For evaluating the model’s performance metrics, we look at root means squared error (RMSE)
& mean absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=4, d=1, q=4) 1212.918 40.214

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• ARIMA models performs well on non-seasonal time series. It is due to this reason it is
unable to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the ARIMA model with (p=4,
d=1, q=4) is 1212.918.
• Not surprisingly, the RMSE of the aforementioned ARIMA model is lower than the
majority of previously constructed models but significantly higher than triple
exponential smoothing model.
152

Model 9 – Seasonal Auto-Regressive Integrated Moving Average (SARIMA)


SARIMA models or also known as Seasonal ARIMA is an extension of ARIMA for a time series data
with defined seasonality. SARIMA models use seasonal differencing which is similar to regular
differencing.

A SARIMA model is characterized by 7 terms: p, d, q, P, Q, D and F

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

P is the order of the Seasonal Auto Regressive (AR) term

Q is the order of the Seasonal Moving Average (MA) term

D is the number of seasonal differencing required to make the time series stationary

F is the seasonal frequency of the time series

We must examine the PACF and ACF plots, respectively, at delays that are the multiple of "F" in order
to determine the "P" and "Q" values, and determine where these cut-off values are (for appropriate
confidence interval bands).

By examining the lowest AIC values, we can also estimate "p," "q," "P," and "Q" for the SARIMA
models.

By examining the ACF plots, one may calculate the seasonal parameter 'F'. The existence of
seasonality should be shown by a spike in the ACF plot at multiples of "F."
153

Fig.171 ACF plot of Train data

From the above ACF plot we can observe that at every 12 th lag is significant indicating the presence of
seasonality. Hence for our model building we will consider the term F=12.
154

For the selection criteria of p, d, q, P, D, Q & F the below SARIMA model is built by using automated
model parameters with lowest Akaike Information Criteria.

Fig.172 Parameter Combinations for SARIMA model

Fig.173 AIC values for different parameter combinations


155

Fig.174 Sorted AIC values for different parameter combinations

We can see that among all the possible given combinations, the optimum AIC which is lowest for
the combination (3,1,2) (3,0,1,12). Hence, the model is built with these parameters to determine
the RMSE value of test data.

Fig.175 Sparkling Wine – Automated SARIMA model


156

Fig.176 Automated SARIMA – Diagnostics plot

Observation:

• The optimal parameters are decided based on the lowest Akaike Information Criteria
(AIC) values. The AIC is lowest for the combination (3,1,2) (3,0,1,12) as we see from
the above results.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution
• The correlogram plot of residuals shows that the residuals are not auto correlated.
157

Fig.177 Sample of Automated SARIMA (3,1,2) (3,0,1,12) predictions

Fig.178 Plot of Automated SARIMA (3,1,2) (3,0,1,12) predictions on Test data


158

Automated SARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


SARIMA (p=3, d=1, q=2) 579.925 25.052
(P=3, D=0, Q=1, F=12)

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• SARIMA model performs well on seasonal time series. It is due to this reason it is able
to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the SARIMA model with (p=3,
d=1, q=2) (P=3, D=0, Q=1, F=12) is 579.925.
• Additionally, it should be highlighted that compared to the ARIMA model, the
SARIMA model has almost more than halved the RMSE value.
159

7) Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the
training data and evaluate this model on the test data using RMSE.

Model 10 – Auto-Regressive Integrated Moving Average (ARIMA) - Manual

An ARIMA model is characterized by 3 terms: p, d, q

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

Indicating which previous series values are most beneficial in forecasting future values, autocorrelation
and partial autocorrelation are measures of relationship between present and past series values. You
may identify the sequence of processes in an ARIMA model using this information.

The parameters p & q can be determined by looking at the PACF & ACF plots respectively.

Autocorrelation function (ACF) - At lag k, this is the correlation between series values that
are k intervals apart.

Partial autocorrelation function (PACF) - At lag k, this is the correlation between series values that
are k intervals apart, accounting for the values of the intervals between.

In an ACF & PACF plots, each bar represents the size and direction of the connection. Bars that cross
the red line are statistically significant.
160

ACF Plot – Training Data

Fig.179 ACF plot on differenced train data

PACF Plot – Training Data

Fig.180 PACF plot on differenced train data


161

Observation:

• The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the
significant lag after which the PACF plot cuts-off below the confidence interval.
• The Moving-Average parameter in an ARIMA model is 'q' which comes from the
significant lag after which the ACF plot cuts-off below the confidence interval.
• We can observe from the above plots that after lag 1, we have few significant lags
and hence we would also build another model by taking value of p=2 and q=1
respectively.

Fig.181 Sparkling Wine – Manual ARIMA model


162

Fig.182 Manual ARIMA – Diagnostics plot

Observation:

• The model's parameters, p and q, were identified by examining the ACF (q=1) and
PACF (p=2) graphs. Since we differenced the series to make it stationary, the
parameter d=1.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero and slightly skewed to the right.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution.
• The correlogram plot of residuals shows that the residuals are not auto correlated.
163

Fig.183 Sample of Manual ARIMA (2,1,1) predictions

Fig.184 Plot of Manual ARIMA (2,1,1) predictions on Test data


164

Manual ARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=2, d=1, q=1) 1300.721 40.225

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• ARIMA models performs well on non-seasonal time series. It is due to this reason it is
unable to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the ARIMA model with (p=2,
d=1, q=1) is 1300.721
• Not surprisingly, the RMSE of the aforementioned ARIMA model is greater than the
majority of previously constructed models and also higher than Automated ARIMA
(4,1,4) model.
165

Model 11 – Seasonal Auto-Regressive Integrated Moving Average (SARIMA) – Manual

A SARIMA model is characterized by 7 terms: p, d, q, P, Q, D and F

where,

p is the order of the Auto Regressive (AR) term

q is the order of the Moving Average (MA) term

d is the number of differencing required to make the time series stationary

P is the order of the Seasonal Auto Regressive (AR) term

Q is the order of the Seasonal Moving Average (MA) term

D is the number of seasonal differencing required to make the time series stationary

F is the seasonal frequency of the time series

We must examine the PACF and ACF plots, respectively, at delays that are the multiple of "F" in order
to determine the "P" and "Q" values, and determine where these cut-off values are (for appropriate
confidence interval bands).

By examining the ACF plots, one may calculate the seasonal parameter 'F'. The existence of
seasonality should be shown by a spike in the ACF plot at multiples of "F."

The parameters P & Q can be determined by looking at the seasonally differenced PACF & ACF plots
respectively.

Autocorrelation function (ACF) - At lag k, this is the correlation between series values that
are k intervals apart.

Partial autocorrelation function (PACF) - At lag k, this is the correlation between series values that
are k intervals apart, accounting for the values of the intervals between.
166

In an ACF & PACF plots, each bar represents the size and direction of the connection. Bars that cross
the red line are statistically significant.
ACF Plot – Seasonally differenced (F=12) Training Data

Fig.185 ACF plot on differenced train data

PACF Plot – Seasonally differenced (F=12) Training Data


167

Fig.186 PACF plot on differenced train data

Observation:

• From the PACF plot it can be seen in early lags that till lag 4 is significant before cut-
off, so AR term ‘p = 4’ is chosen. From the multiples of seasonal lags, after first
seasonal lag of 12, it cuts off, so keep seasonal AR ‘P = 0’.
• From ACF plot, it can be seen in early lags, lag 1 and 2 are significant before it cuts off,
so let’s keep MA term ‘q = 2’ and at seasonal lag of 12, a significant lag is apparent
and no seasonal lags are apparent at lags 24, 36 or afterwards, so let’s keep ‘Q = 1'.
• The final selected terms for SARIMA model are (4, 1, 2) (0, 1, 1, 12), as inferred from
the ACF and PACF plots.
168

Fig.187 Sparkling Wine – Manual SARIMA model


169

Fig.188 Manual SARIMA – Diagnostics plot

Observation:

• The model's parameters, p, q, P, Q were identified by examining the ACF (q=2, Q=1)
and PACF (p=4, P=0) graphs. Since we differenced the series to make it stationary, the
parameter d=1, D=1.
• From the Standardized residual plot above, we can notice that the residuals seem to
fluctuate around the mean of zero and have uniform variance.
• The histogram plus estimated density plot suggests a slightly uniform distribution
with mean zero.
• In Normal Q-Q plot, all the dots fall more or less in line with the red line. Few
deviations are present implying minor skewed distribution
• The correlogram plot of residuals shows that the residuals are not auto correlated.
170

Fig.189 Sample of Manual SARIMA (4,1,2) (0,1,1,12) predictions

Fig.190 Plot of Manual SARIMA (4,1,2) (0,1,1,12) predictions on Test data


171

Manual SARIMA: Model Evaluation


For evaluating the model performance, we look at root means squared error (RMSE) & mean
absolute percentage error (MAPE)

Model Test RMSE Test MAPE


ARIMA (p=4, d=1, q=2) 468.677 19.324
(P=0, D=1, Q=1, F=12)

Observation:

• We can see from the graphs above that the time series has a marginal upward trend
and seasonality
• SARIMA model performs well on seasonal time series. It is due to this reason it is able
to capture the entire characteristics of the test data.
• The root means squared error (RMSE) of test data for the SARIMA model with (p=4,
d=1, q=1) (P=0, D=1, Q=1, F=12) is 468.677
• Additionally, it should be highlighted that compared to the all the ARIMA/SARIMA
models built so far, this SARIMA model has the lowest RMSE value.
172

8) Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.

Fig.191 RMSE values of all models


173

Fig.192 Sorted RMSE values of all models

Observation:

• From the above table, we can see that Triple Exponential Smoothing model with
parameters (Alpha=0.35, Beta=0.10, Gamma=0.20) has the lowest RMSE for test
data.
• Manual SARIMA (4,1,2) (0,1,1,12) model is having the second lowest RMSE value for
test data after Triple Exponential Smoothing model.
• The naïve forecast model has performed the worst in terms of RMSE.
174

9) Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.
From Fig.86 we observed the Triple Exponential Smoothing model is the optimum model for the
given data set as it has the lowest RMSE value.

However, as we know SARIMA models tend to perform better with seasonal time series, we are also
considering SARIMA model for the forecast.

Let us visually see the time series plots of different models we have built so far on test data

Fig.193 Time Series Plot 1 – Different Model predictions on test data


175

Fig.194 Time Series Plot 2 – Different Model predictions on test data


176

Plotting the lowest RMSE models

Fig.195 Time Series Plot 3 – Different Model predictions on test data


177

Optimum Model 1:
Triple Exponential Smoothing Model (Alpha=0.35, Beta=0.10, Gamma=0.20)

Fig.196 TES Optimum Model – Line plot of Predictions vs Actual values


178

Fig.197 TES Optimum Model – Line plot of Predictions vs Actual values on Test data
179

Fig.198 TES Optimum Model

Fig.199 TES Model – Forecast for next 12 months


180

Fig.200 TES Optimum Model – Time series plot forecast for next 12 months

Fig.201 TES Optimum Model – Future forecast with confidence intervals


181

Fig.202 TES Optimum Model – Time series plot forecast with confidence intervals

Fig.203 TES Optimum Model – Forecast for next 12 months with confidence intervals
182

Optimum Model 2:
Manual SARIMA Model (4, 1, 2) (0, 1, 1, 12)

Fig.204 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values
183

Fig.205 Manual SARIMA Optimum Model – Line plot of Predictions vs Actual values on Test data
184

Fig.206 Manual SARIMA Optimum Model


185

Fig.207 Manual SARIMA Model – Forecast for next 12 months with confidence intervals

Fig.208 Manual SARIMA Optimum Model – Time series plot forecast for next 12 months
186

Fig.209 Manual SARIMA Optimum Model – Time series plot forecast with confidence intervals

Fig.210 Manual SARIMA Optimum Model – Forecast for next 12 months with confidence interval
187

10) Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
We needed to construct an optimum model to forecast the sparkling wine sales for the next 12
months. The model information, insights and recommendations are as follows.

Model Insights:

• The time series in consideration exhibits a little rising trend and stable seasonality. When
comparing the various models, we can see that Triple Exponential Smoothing and SARIMA
models frequently deliver the greatest results. This is due to the fact that these models
are excellent at predicting time series that demonstrate trend and seasonality.
• We examine the root mean squared value of the forecast model to assess its performance
(RMSE). The model with the lowest RMSE value and characteristics that match the test
data is regarded as being a superior model.
• We observed that SARIMA and the Triple Exponential Smoothing model had the lowest
RMSE and the characteristics that most closely fit test data. As a result, they are regarded
as the best models for forecasting.
• The firm may use the aforementioned best forecasting models since they can accurately
collect time series data and allow for proactive action based on the forecast.

Historical Insights:

• The sparkling wine sales have remained stable throughout time. Sparkling wine sales
peaked in 1988 and fell to their present low position in 1995 (as we have data for only
first 7 months).
• The monthly sales trajectory appears to be exactly the opposite of the yearly plot, with a
progressive increase towards the end of each year. January has the lowest wine sales,
while December has the highest. From January to August, sales increase gradually, and
then they quickly increase after that.
• The average monthly sales of sparkling wine are 2402 bottles. More than 50% of the sold
units of sparkling wine fall between 1605 and 2549. 1070 units were sold as the lowest
and 7242 units as the most. Only 25% of monthly sales that were recorded were for
more than 2549 units.
188

• Around 60 to 70 percent of the units sold are fewer than 2500, and 80% of the units sold
are less than 4000. Only 20% of sales involved more than 3000 items. Therefore, it is
clear that the bulk of sales were in the range of 1000 to 3000 units.

Forecast Insights:

• Based on the forecast made by the Triple Exponential Smoothing model previously
presented, the following insights are offered.
• The forecast calls for average sale of 2639 units, up 237 units from the historical average
of 2402 units. Thus, we might observe an increase in average sales of 10%.
• The prediction is for a minimum sales volume of 1540 units, which is 470 units more than
the minimum sales volume of 1070 units in the past. Consequently, a 43% increase in
minimum sales is seen.
• The projection estimates a maximum sales volume of 6487 units, which is 755 units fewer
than the largest sales volume recorded in the past, which was 7242 units. Consequently,
a 10% decrease in maximum sales is visible.
• In comparison to the historical standard deviation of 1295 recorded in the past, the
forecast's standard deviation is 1439 units, or 144 units higher. It's gone up by 11%. This
is also anticipated because historical data tends to have less volatility than future data.
• We can see from the prediction that the months of October, November, and December
have increased sales. December is often when the sales are at their highest. There is a
startling decline in sales in January following December. The months after January
appear to witness a gradual improvement in sales until October, when it jumps sharply.

Recommendations:

• Records show that the months of September, October, November, and December
account for 50% of the total sales forecast. Many festivities take place in these months,
and many people travel during this time. One of the most popular types of wine used
during festive and event celebrations is sparkling wine.
• Wine sales often climb in the final two months of the year as people hurry to buy holiday
beverages. For forthcoming occasions like Thanksgiving, Christmas, and New Year's,
people typically stock up. The majority of individuals also buy in bulk for holiday
gatherings and gift-giving.
189

• Many individuals choose wine as their go-to gift when it comes to occasions like parties
and gift-giving. Sales of sparkling wine rise just before the winter holidays as more
collectors purchase these wines as presents or look for vintages to serve at holiday
gatherings.
• The festival seasons may vary depending on where you are geographically, however the
most of the celebrations take place in the last four months.
▪ In these months, promotional offers might be implemented to lower costs
and significantly boost revenue.
▪ To increase sales, we must take advantage of all holiday events and set prices
appropriately.
▪ Many individuals order in bulk to prepare for upcoming festivities, which may
result in a high shipping expenditure. Businesses may provide significant
discounts or free shipping beyond a certain threshold at these times.
▪ Giving customers gifts to improve their user experience is one of the greatest
marketing strategies to deploy. In order to attract more consumers and
increase sales, the company might provide free gifts on orders with significant
sales.
▪ To target various client demographics, the proper marketing campaigns must
be run
▪ Numerous ecommerce campaigns and competitions may be performed to
broaden the product's audience and enhance sales.
• The period from January to June is one of the key challenges for sparkling wine sales.
▪ To identify the elements affecting sales, in-depth market research must be
conducted.
▪ Due to the fact that sparkling wines are typically used while celebrating, a
market-friendly version of the existing product might be introduced by the
company, helping to make up for the drop in sales. Long-term, this may bring
in additional clients.
• There are other key elements that might be driving the sales, despite the present model's
ability to closely track the historical sales trend.
▪ The forecast might be improved by doing in-depth market research on the
factors that influence sales and incorporating that information into the model
for projection.

You might also like