100% found this document useful (3 votes)

336 views

Shoe Sales

The document is a Jupyter Notebook that analyzes time series data on shoe sales. It loads necessary packages, imports a CSV file containing monthly shoe sales data from 1980-1995, and cleans the data by dropping duplicates. Descriptive statistics are calculated and plots are made to visualize the time series data, including a line plot of the raw data, boxplots of yearly and monthly sales, and a pivot table showing monthly sales across years. The analysis explores the behavior of the shoe sales data over time through statistical measures and data visualization techniques common for time series forecasting.

Uploaded by

RemyaRS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

336 views

Shoe Sales

Uploaded by

RemyaRS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 105

07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [1]:  # loading packages

# basic + dates
import numpy as np
import pandas as pd
from pandas import datetime
from datetime import datetime

# data visualization
import matplotlib.pyplot as plt
import seaborn as sns # advanced vizs
%matplotlib inline

# statistics
from statsmodels.distributions.empirical_distribution import ECDF

# time series analysis
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

import warnings
warnings.filterwarnings("ignore")

<ipython-input-1-c8dcee756ab9>:5: FutureWarning: The pandas.datetime class is deprecated and will be removed from p
andas in a future version. Import from datetime module instead.

from pandas import datetime

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 1/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [18]:  df = pd.read_csv('Shoe-Sales.csv', parse_dates=True, index_col='YearMonth')

df.head()

Out[18]:
Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [217]:  df.head()

Out[217]: Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [3]:  df.describe().T

Out[3]:
count mean std min 25% 50% 75% max

Shoe_Sales 187.0 245.636364 121.390804 85.0 143.5 220.0 315.5 662.0

In [4]:  df.shape

Out[4]: (187, 1)

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 2/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [219]:  df.index

Out[219]: DatetimeIndex(['1980-01-01', '1980-02-01', '1980-03-01', '1980-04-01',

'1980-05-01', '1980-06-01', '1980-07-01', '1980-08-01',

'1980-09-01', '1980-10-01',

...

'1994-10-01', '1994-11-01', '1994-12-01', '1995-01-01',

'1995-02-01', '1995-03-01', '1995-04-01', '1995-05-01',

'1995-06-01', '1995-07-01'],

dtype='datetime64[ns]', name='YearMonth', length=187, freq=None)

To find the duplicated Data

In [220]:  df.duplicated().sum()

Out[220]: 42

In [6]:  df.drop_duplicates(inplace=True)

In [222]:  df.duplicated().sum()

Out[222]: 0

To find the null values

In [223]:  df.isnull().sum()

Out[223]: Shoe_Sales 0

dtype: int64

In [ ]: 

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 3/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Plot the Time Series to understand the behaviour of the data

In [7]:  df.plot(figsize=(20,8))
plt.grid()

Check the basic measures of descriptive statistics of the Time Series

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 4/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [225]:  df.describe()

Out[225]: Shoe_Sales

count 145.000000

mean 258.620690

std 123.428177

min 85.000000

25% 159.000000

50% 242.000000

75% 335.000000

max 662.000000

In [13]:  df.index.month_name()

Out[13]: Index(['January', 'February', 'March', 'April', 'May', 'July', 'August',

'September', 'October', 'November',

...

'August', 'September', 'November', 'December', 'January', 'February',

'March', 'April', 'June', 'July'],

dtype='object', name='YearMonth', length=145)

Plot a boxplot to understand the spread of sales across different years

and within different months across years.

Yearly boxplot

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 5/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [15]:  _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.year,y = df.values[:,0],ax=ax)
plt.grid()

Monthly bOxplot

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 6/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [227]:  _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.month_name(),y = df.values[:,0],ax=ax)
plt.grid();

Plot a graph of monthly sales across years.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 7/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [228]:  monthly_sales_across_years = pd.pivot_table(df, values = 'Shoe_Sales', columns = df.index.month_name(), index = df.in

monthly_sales_across_years

Out[228]: YearMonth April August December February January July June March May November October September

YearMonth

1980 95.0 128.0 140.0 89.0 85.0 96.0 NaN 109.0 91.0 178.0 111.0 124.0

1981 NaN NaN 163.0 132.0 150.0 NaN 94.0 155.0 NaN NaN 130.0 123.0

1982 112.0 NaN 214.0 NaN 101.0 153.0 116.0 127.0 108.0 170.0 142.0 NaN

1983 156.0 165.0 197.0 122.0 134.0 NaN 169.0 NaN 145.0 NaN NaN NaN

1984 137.0 171.0 196.0 NaN NaN 136.0 NaN 139.0 NaN 147.0 110.0 NaN

1985 NaN 344.0 523.0 118.0 NaN 281.0 NaN 125.0 120.0 580.0 362.0 366.0

1986 306.0 431.0 579.0 246.0 348.0 358.0 280.0 NaN 279.0 504.0 433.0 448.0

1987 496.0 468.0 662.0 335.0 384.0 NaN 377.0 320.0 NaN 493.0 520.0 428.0

1988 328.0 355.0 454.0 308.0 304.0 483.0 338.0 313.0 354.0 352.0 290.0 439.0

1989 254.0 294.0 545.0 303.0 NaN 379.0 310.0 NaN 309.0 405.0 318.0 356.0

1990 NaN 285.0 471.0 243.0 268.0 302.0 222.0 273.0 236.0 NaN 322.0 NaN

1991 186.0 NaN 378.0 253.0 198.0 228.0 105.0 173.0 185.0 277.0 270.0 189.0

1992 179.0 211.0 347.0 182.0 NaN 250.0 168.0 258.0 NaN 305.0 234.0 260.0

1993 242.0 319.0 NaN 217.0 203.0 252.0 175.0 227.0 NaN 336.0 NaN 202.0

1994 NaN 205.0 394.0 NaN NaN 225.0 NaN 187.0 193.0 275.0 NaN 259.0

1995 195.0 NaN NaN 230.0 159.0 274.0 220.0 188.0 NaN NaN NaN NaN

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 8/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [229]:  monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');

ECDF PLOT
localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 9/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [5]:  ## Plot ECDF: Empirical Cumulative Distribution Function

#ECDF - Adds up the number of observations that are there for a certain value.
sns.set(style = "ticks")# to format into seaborn
c = '#386B7F' # basic color for plots
plt.figure(figsize = (12, 6))

plt.subplot(312)
cdf = ECDF(df['Shoe_Sales'])
plt.plot(cdf.x, cdf.y, label = "statmodels", color = c)
plt.xlabel('Shoe_Sales');

Plot a time series monthplot to understand the spread of accidents across different
years and within different months across years.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 10/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [233]:  from statsmodels.graphics.tsaplots import month_plot

month_plot(df['Shoe_Sales'],ylabel='Shoe_Sales')
plt.grid();

Decompose the Time Series

In [ ]:  from statsmodels.tsa.seasonal import seasonal_decompose

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 11/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [ ]:  df.count()

Addictive Decomposition
In [19]:  decomposition = seasonal_decompose(df,model='additive')
decomposition.plot();

Multipicative Decomposition

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 12/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [235]:  decomposition = seasonal_decompose(df,model='multiplication')

decomposition.plot();

Split the data into train and test and plot the training and test data.
In [21]:  train = df[df.index.year<1991]
test = df[df.index.year>=1991]

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 13/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [237]:  train.shape

Out[237]: (132, 1)

In [238]:  test.shape

Out[238]: (55, 1)

In [22]:  train.head()

Out[22]:
Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [23]:  test.head()

Out[23]:
Shoe_Sales

YearMonth

1991-01-01 198

1991-02-01 253

1991-03-01 173

1991-04-01 186

1991-05-01 185

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 14/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [239]:  train['Shoe_Sales'].plot(figsize=(13,6),fontsize=15)
test['Shoe_Sales'].plot(figsize=(13,6),fontsize=15)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 15/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [240]:  df['Shoe_Sales'].plot(figsize=(13,6),fontsize=15)

Out[240]: <AxesSubplot:xlabel='YearMonth'>

Linear Regression

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 16/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [29]:  from sklearn.linear_model import LinearRegression

from sklearn import metrics

In [243]:  train_time = [i+1 for i in range(len(train))]

test_time = [i+133 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)

Training Time instance

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 3
1, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59,
60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 1
14, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]

Test Time instance

[133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 15
5, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 17
8, 179, 180, 181, 182, 183, 184, 185, 186, 187]

In [244]:  LinearRegression_train = train.copy()

LinearRegression_test = test.copy()

In [245]:  LinearRegression_train['time'] = train_time

LinearRegression_test['time'] = test_time

In [246]:  lr = LinearRegression()

In [247]:  lr.fit(LinearRegression_train[['time']],LinearRegression_train['Shoe_Sales'].values)

Out[247]: LinearRegression()

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 17/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [248]:  train_predictions_model1 = lr.predict(LinearRegression_train[['time']])

test_predictions_model1 = lr.predict(LinearRegression_test[['time']])
LinearRegression_test['RegOnTime'] = test_predictions_model1

plt.figure(figsize=(13,6))
plt.plot( train['Shoe_Sales'], label='Train')
plt.plot(test['Shoe_Sales'], label='Test')
plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_Test Data')
plt.legend(loc='best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 18/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Defining the accuracy metrics.

In [31]:  from sklearn import metrics

def MAPE(y, yhat):
y,yhat = np.array(y), np.array(yhat)
try:
mape = round(np.sum(np.abs(yhat-y))/np.sum(y)*100,2)
except:
print("Observed values are blank")
mape = np.nan
return mape

In [250]:  ## Train Data - RMSE

rmse_model1_train = metrics.mean_squared_error(train['Shoe_Sales'],train_predictions_model1,squared=False)
mape_model1_train = MAPE(train['Shoe_Sales'],train_predictions_model1)
print("For RegressionOnTime forecast on the Train Data, RMSE is %3.3f" %(rmse_model1_train))
print("For RegressionOnTime forecast on the Train Data, MAPE is %3.3f" %(mape_model1_train))

## Test Data - RMSE
rmse_model1_test = metrics.mean_squared_error(test['Shoe_Sales'],test_predictions_model1,squared=False)
mape_model1_test = MAPE(test['Shoe_Sales'],test_predictions_model1)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f" %(rmse_model1_test))
print("For RegressionOnTime forecast on the Test Data, MAPE is %3.3f" %(mape_model1_test))

For RegressionOnTime forecast on the Train Data, RMSE is 97.380

For RegressionOnTime forecast on the Train Data, MAPE is 28.500

For RegressionOnTime forecast on the Test Data, RMSE is 266.276

For RegressionOnTime forecast on the Test Data, MAPE is 110.080

In [251]:  resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test],'Test MAPE': [mape_model1_test]},index=['RegressionOnTime']

resultsDf

Out[251]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 19/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

NAIVE APPROACH
In [252]:  NaiveModel_train = train.copy()
NaiveModel_test = test.copy()

In [260]:  #Train set

NaiveModel_train['naive'] = np.asarray(train['Shoe_Sales'])[len(np.asarray(train['Shoe_Sales']))-1]
print(NaiveModel_train['naive'].head())

# Test set
NaiveModel_test['naive'] = np.asarray(train['Shoe_Sales'])[len(np.asarray(train['Shoe_Sales']))-1]
NaiveModel_test['naive'].head()

YearMonth

1980-01-01 471

1980-02-01 471

1980-03-01 471

1980-04-01 471

1980-05-01 471

Name: naive, dtype: int64

Out[260]: YearMonth

1991-01-01 471

1991-02-01 471

1991-03-01 471

1991-04-01 471

1991-05-01 471

Name: naive, dtype: int64

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 20/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [262]:  NaiveModel_train['Shoe_Sales']

Out[262]: YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

...

1990-08-01 285

1990-09-01 309

1990-10-01 322

1990-11-01 362

1990-12-01 471

Name: Shoe_Sales, Length: 132, dtype: int64

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 21/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [263]:  plt.figure(figsize=(12,8))
plt.plot(NaiveModel_train['Shoe_Sales'], label='Train')
plt.plot(test['Shoe_Sales'], label='Test')
plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')
plt.legend(loc='best')
plt.title("Naive Forecast")
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 22/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

MODEL Evaluation
In [265]:  rmse_model2_train = metrics.mean_squared_error(train['Shoe_Sales'],NaiveModel_train['naive'])
mape_model2_train = MAPE(train['Shoe_Sales'],NaiveModel_train['naive'])
print("For RegressionOnTime forecast on the Train Data, RMSE is %3.3f" %(rmse_model2_train))
print("For RegressionOnTime forecast on the Train Data, MAPE is %3.3f" %(mape_model2_train))

## Test Data - RMSE
rmse_model2_test = metrics.mean_squared_error(test['Shoe_Sales'],NaiveModel_test['naive'],squared=False)
mape_model2_test = MAPE(test['Shoe_Sales'],NaiveModel_test['naive'])
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f" %(rmse_model2_test))
print("For RegressionOnTime forecast on the Test Data, MAPE is %3.3f" %(mape_model2_test))

For RegressionOnTime forecast on the Train Data, RMSE is 67679.545

For RegressionOnTime forecast on the Train Data, MAPE is 92.360

For RegressionOnTime forecast on the Test Data, RMSE is 245.121

For RegressionOnTime forecast on the Test Data, MAPE is 101.470

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 23/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [266]:  resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test],'Test MAPE': [mape_model2_test]},index=['NaiveModel'])

resultsDf = pd.concat([resultsDf, resultsDf_2])
resultsDf

Out[266]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

Method 3: Simple Average

In [25]:  SimpleAverage_train = train.copy()
SimpleAverage_test = test.copy()

In [35]:  SimpleAverage_test['mean_forecast'] = train['Shoe_Sales'].mean()

SimpleAverage_test.head()

Out[35]:
Shoe_Sales mean_forecast

YearMonth

1991-01-01 198 250.575758

1991-02-01 253 250.575758

1991-03-01 173 250.575758

1991-04-01 186 250.575758

1991-05-01 185 250.575758

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 24/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [36]:  SimpleAverage_train['mean_forecast'] = train['Shoe_Sales'].mean()

SimpleAverage_train.head()

Out[36]:
Shoe_Sales mean_forecast

YearMonth

1980-01-01 85 250.575758

1980-02-01 89 250.575758

1980-03-01 109 250.575758

1980-04-01 95 250.575758

1980-05-01 91 250.575758

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 25/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [34]:  plt.figure(figsize=(12,8))
plt.plot(SimpleAverage_train['Shoe_Sales'], label='Train')
plt.plot(SimpleAverage_test['Shoe_Sales'], label='Test')
plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on Test Data')
plt.legend(loc='best')
plt.title("Simple Average Forecast")
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 26/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

MODEL EVALAUTION
In [32]:  ## Test Data - RMSE

rmse_model3_test = metrics.mean_squared_error(test['Shoe_Sales'],SimpleAverage_test['mean_forecast'],squared=False)
mape_model3_test = MAPE(test['Shoe_Sales'],SimpleAverage_test['mean_forecast'])
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(rmse_model3_test))
print("For RegressionOnTime forecast on the Test Data, MAPE is %3.3f" %(mape_model3_test))

For Simple Average forecast on the Test Data, RMSE is 63.985

For RegressionOnTime forecast on the Test Data, MAPE is 21.860

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 27/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [37]:  ## TRAIN Data - RMSE

rmse_model3_train = metrics.mean_squared_error(train['Shoe_Sales'],SimpleAverage_train['mean_forecast'],squared=False
mape_model3_train = MAPE(train['Shoe_Sales'],SimpleAverage_train['mean_forecast'])
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(rmse_model3_train))
print("For RegressionOnTime forecast on the Test Data, MAPE is %3.3f" %(mape_model3_train))

For Simple Average forecast on the Test Data, RMSE is 138.176

For RegressionOnTime forecast on the Test Data, MAPE is 47.750

In [271]:  resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_model3_test],'Test MAPE': [mape_model3_test]},index=['SimpleAverageMod

resultsDf = pd.concat([resultsDf, resultsDf_3])
resultsDf

Out[271]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

Method 4: Moving Average(MA)

In [ ]:  MovingAverage = df.copy()
MovingAverage.head()

Trailing Moving Average

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 28/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [272]:  MovingAverage['Trailing_2'] = MovingAverage['Shoe_Sales'].rolling(2).mean()

MovingAverage['Trailing_4'] = MovingAverage['Shoe_Sales'].rolling(4).mean()
MovingAverage['Trailing_6'] = MovingAverage['Shoe_Sales'].rolling(6).mean()
MovingAverage['Trailing_9'] = MovingAverage['Shoe_Sales'].rolling(9).mean()

MovingAverage.head()

Out[272]: Shoe_Sales Trailing_2 Trailing_4 Trailing_6 Trailing_9

YearMonth

1980-01-01 85 NaN NaN NaN NaN

1980-02-01 89 87.0 NaN NaN NaN

1980-03-01 109 99.0 NaN NaN NaN

1980-04-01 95 102.0 94.5 NaN NaN

1980-05-01 91 93.0 96.0 NaN NaN

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 29/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [273]:  ## Plotting on the whole data

plt.figure(figsize=(16,8))
plt.plot(MovingAverage['Shoe_Sales'], label='Orginal')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_4'], label='4 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.plot(MovingAverage['Trailing_9'],label = '9 Point Moving Average')

plt.legend(loc = 'best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 30/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [276]:  ### Creating train and test set

trailing_MovingAverage_train= df[df.index.year<1991]
trailing_MovingAverage_test=df[df.index.year>=1991]

In [279]: 

trailing_MovingAverage_train['Trailing_2'] = MovingAverage['Shoe_Sales'].rolling(2).mean()
trailing_MovingAverage_train['Trailing_4'] = MovingAverage['Shoe_Sales'].rolling(4).mean()
trailing_MovingAverage_train['Trailing_6'] = MovingAverage['Shoe_Sales'].rolling(6).mean()
trailing_MovingAverage_train['Trailing_9'] = MovingAverage['Shoe_Sales'].rolling(9).mean()

trailing_MovingAverage_test['Trailing_2'] = MovingAverage['Shoe_Sales'].rolling(2).mean()
trailing_MovingAverage_test['Trailing_4'] = MovingAverage['Shoe_Sales'].rolling(4).mean()
trailing_MovingAverage_test['Trailing_6'] = MovingAverage['Shoe_Sales'].rolling(6).mean()
trailing_MovingAverage_test['Trailing_9'] = MovingAverage['Shoe_Sales'].rolling(9).mean()

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 31/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [280]:  ## Plotting on both the Training and Test data

plt.figure(figsize=(16,8))
plt.plot(trailing_MovingAverage_train['Shoe_Sales'], label='Train')
plt.plot(trailing_MovingAverage_test['Shoe_Sales'], label='Test')

plt.plot(trailing_MovingAverage_train['Trailing_2'], label='2 Point Trailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_4'], label='4 Point Trailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_6'],label = '6 Point Trailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_9'],label = '9 Point Trailing Moving Average on Training Set')

plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trailing Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_4'], label='4 Point Trailing Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_6'],label = '6 Point Trailing Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_9'],label = '9 Point Trailing Moving Average on Test Set')
plt.legend(loc = 'best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 32/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

MODEL EVALUATION

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 33/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [281]:  ## Test Data - RMSE --> 2 point Trailing MA

rmse_model4_test_2 = metrics.mean_squared_error(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_2'],squared=
mape_model4_test_2 = MAPE(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_2'])
print("For 2 point Moving Average Model forecast on the Training Data, RMSE is %3.3f" %(rmse_model4_test_2))

## Test Data - RMSE --> 4 point Trailing MA

rmse_model4_test_4 = metrics.mean_squared_error(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_4'],squared=
mape_model4_test_4 = MAPE(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_4'])
print("For 4 point Moving Average Model forecast on the Training Data, RMSE is %3.3f" %(rmse_model4_test_4))

## Test Data - RMSE --> 6 point Trailing MA

rmse_model4_test_6 = metrics.mean_squared_error(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_6'],squared=
mape_model4_test_6 = MAPE(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_6'])
print("For 6 point Moving Average Model forecast on the Training Data, RMSE is %3.3f" %(rmse_model4_test_6))

## Test Data - RMSE --> 9 point Trailing MA

rmse_model4_test_9 = metrics.mean_squared_error(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_9'],squared=
mape_model4_test_9 = MAPE(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_9'])
print("For 9 point Moving Average Model forecast on the Training Data, RMSE is %3.3f " %(rmse_model4_test_9))

For 2 point Moving Average Model forecast on the Training Data, RMSE is 45.949

For 4 point Moving Average Model forecast on the Training Data, RMSE is 57.873

For 6 point Moving Average Model forecast on the Training Data, RMSE is 63.457

For 9 point Moving Average Model forecast on the Training Data, RMSE is 67.724

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 34/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [283]:  mape_model4_test_9 = MAPE(test['Shoe_Sales'],trailing_MovingAverage_test['Trailing_9'])

resultsDf_4 = pd.DataFrame({'Test RMSE': [rmse_model4_test_2,rmse_model4_test_4,rmse_model4_test_6,rmse_model4_test_9
'Test MAPE': [mape_model4_test_2, mape_model4_test_4,mape_model4_test_6
,mape_model4_test_9]}
,index=['2pointTrailingMovingAverage','4pointTrailingMovingAverage'
,'6pointTrailingMovingAverage','9pointTrailingMovingAverage'])

resultsDf = pd.concat([resultsDf, resultsDf_4])
resultsDf

Out[283]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Before we go on to build the various Exponential Smoothing models, let

us plot all the models and compare the Time Series plots.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 35/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [284]:  ## Plotting on both Training and Test data

plt.figure(figsize=(30,12))
plt.plot(train['Shoe_Sales'], label='Train')
plt.plot(test['Shoe_Sales'], label='Test')

plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_Test Data')

plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')

plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on Test Data')

plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trailing Moving Average on Training Set')

plt.legend(loc='best')
plt.title("Model Comparison Plots")
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 36/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Method 5 : Simple Exponential Smoothening

In [285]:  from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt

In [289]:  SES_train = train.copy()

SES_test = test.copy()
SES_train.head()

Out[289]: Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [291]:  model_SES = SimpleExpSmoothing(SES_train['Shoe_Sales'])

In [292]:  model_SES_autofit = model_SES.fit(optimized=True)

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 37/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [293]:  model_SES_autofit.params

Out[293]: {'smoothing_level': 0.6050489281498507,

'smoothing_trend': nan,

'smoothing_seasonal': nan,

'damping_trend': nan,

'initial_level': 88.82911441760453,

'initial_trend': nan,

'initial_seasons': array([], dtype=float64),

'use_boxcox': False,

'lamda': None,

'remove_bias': False}

In [294]:  SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))

SES_test.head()

Out[294]: Shoe_Sales predict

YearMonth

1991-01-01 198 420.229811

1991-02-01 253 420.229811

1991-03-01 173 420.229811

1991-04-01 186 420.229811

1991-05-01 185 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 38/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [296]:  ## Plotting on both the Training and Test data

plt.figure(figsize=(16,8))
plt.plot(SES_train['Shoe_Sales'], label='Train')
plt.plot(SES_test['Shoe_Sales'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =0.995 Simple Exponential Smoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.995 Predictions');

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 39/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Model Evaluation for 𝛼 = 0.995 : Simple Exponential Smoothing

In [298]:  ## Test Data

rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Shoe_Sales'],SES_test['predict'],squared=False)
print("For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model5_te
mape_model5_test_1 = MAPE(SES_test['Shoe_Sales'],SES_test['predict'])

For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is 196.405

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 40/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [299]:  resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1], 'Test MAPE': [mape_model5_test_1]},index=['Alpha=0.995

resultsDf = pd.concat([resultsDf, resultsDf_5])
resultsDf

Out[299]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 41/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [300]:  ## First we will define an empty dataframe to store our values from the loop

resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_6

SES_test

Out[300]: Shoe_Sales predict

YearMonth

1991-01-01 198 420.229811

1991-02-01 253 420.229811

1991-03-01 173 420.229811

1991-04-01 186 420.229811

1991-05-01 185 420.229811

1991-06-01 105 420.229811

1991-07-01 228 420.229811

1991-08-01 214 420.229811

1991-09-01 189 420.229811

1991-10-01 270 420.229811

1991-11-01 277 420.229811

1991-12-01 378 420.229811

1992-01-01 185 420.229811

1992-02-01 182 420.229811

1992-03-01 258 420.229811

1992-04-01 179 420.229811

1992-05-01 197 420.229811

1992-06-01 168 420.229811

1992-07-01 250 420.229811

1992-08-01 211 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 42/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Shoe_Sales predict

YearMonth

1992-09-01 260 420.229811

1992-10-01 234 420.229811

1992-11-01 305 420.229811

1992-12-01 347 420.229811

1993-01-01 203 420.229811

1993-02-01 217 420.229811

1993-03-01 227 420.229811

1993-04-01 242 420.229811

1993-05-01 185 420.229811

1993-06-01 175 420.229811

1993-07-01 252 420.229811

1993-08-01 319 420.229811

1993-09-01 202 420.229811

1993-10-01 254 420.229811

1993-11-01 336 420.229811

1993-12-01 431 420.229811

1994-01-01 150 420.229811

1994-02-01 280 420.229811

1994-03-01 187 420.229811

1994-04-01 279 420.229811

1994-05-01 193 420.229811

1994-06-01 227 420.229811

1994-07-01 225 420.229811

1994-08-01 205 420.229811

1994-09-01 259 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 43/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Shoe_Sales predict

YearMonth

1994-10-01 254 420.229811

1994-11-01 275 420.229811

1994-12-01 394 420.229811

1995-01-01 159 420.229811

1995-02-01 230 420.229811

1995-03-01 188 420.229811

1995-04-01 195 420.229811

1995-05-01 189 420.229811

1995-06-01 220 420.229811

1995-07-01 274 420.229811

In [301]:  for i in np.arange(0.3,1,0.1):

model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False,use_brute=True)
SES_train['predict',i] = model_SES_alpha_i.fittedvalues
SES_test['predict',i] = model_SES_alpha_i.forecast(steps=146)

rmse_model5_train_i = metrics.mean_squared_error(SES_train['Shoe_Sales'],SES_train['predict',i],squared=False)

rmse_model5_test_i = metrics.mean_squared_error(SES_test['Shoe_Sales'],SES_test['predict',i],squared=False)

resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train RMSE':rmse_model5_train_i

,'Test RMSE':rmse_model5_test_i}, ignore_index=True)

Model Evaluation

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 44/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [303]:  resultsDf_6.sort_values(by=['Test RMSE'],ascending=True)

Out[303]: Alpha Values Train RMSE Test RMSE

0 0.3 74.555356 143.400350

1 0.4 73.062722 162.553211

2 0.5 72.200617 180.072484

3 0.6 71.902349 195.663327

4 0.7 72.131707 209.658339

5 0.8 72.846955 222.417584

6 0.9 74.023429 234.188166

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 45/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [305]:  ## Plotting on both the Training and Test data

plt.figure(figsize=(18,9))
plt.plot(SES_train['Shoe_Sales'], label='Train')
plt.plot(SES_test['Shoe_Sales'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smoothing predictions on Test Set')

plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential Smoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 46/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 47/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [306]:  resultsDf_6_1 = pd.DataFrame({'Test RMSE': [resultsDf_6.sort_values(by=['Test RMSE'],ascending=True).values[0][2]]}

,index=['Alpha=0.3,SimpleExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_6_1])
resultsDf

Out[306]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,SimpleExponentialSmoothing 143.400350 NaN

Method 6: Simple Exponential Smoothing (Holt's Model)/ error in

repeating
In [307]:  from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 48/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [308]:  SES_train = train.copy()

SES_test = test.copy()
SES_train

Out[308]: Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

... ...

1990-08-01 285

1990-09-01 309

1990-10-01 322

1990-11-01 362

1990-12-01 471

132 rows × 1 columns

In [317]:  model_SES = SimpleExpSmoothing(SES_train['Shoe_Sales'])

In [318]:  model_SES_autofit = model_SES.fit(optimized=True)

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 49/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [324]:  model_SES_autofit.params

Out[324]: {'smoothing_level': 0.6050489281498507,

'smoothing_trend': nan,

'smoothing_seasonal': nan,

'damping_trend': nan,

'initial_level': 88.82911441760453,

'initial_trend': nan,

'initial_seasons': array([], dtype=float64),

'use_boxcox': False,

'lamda': None,

'remove_bias': False}

In [325]:  SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))

SES_test.head()

Out[325]: Shoe_Sales predict

YearMonth

1991-01-01 198 420.229811

1991-02-01 253 420.229811

1991-03-01 173 420.229811

1991-04-01 186 420.229811

1991-05-01 185 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 50/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [326]:  ## Plotting on both the Training and Test data

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 51/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Model Evaluation for 𝛼 = 0.995 : Simple Exponential Smoothing

In [327]:  ## Test Data

rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Shoe_Sales'],SES_test['predict'],squared=False)
print("For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model5_te
mape_model5_test_1 = MAPE(SES_test['Shoe_Sales'],SES_test['predict'])

For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is 196.405

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 52/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [328]:  resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1], 'Test MAPE': [mape_model5_test_1]},index=['Alpha=0.995

resultsDf = pd.concat([resultsDf, resultsDf_5])
resultsDf

Out[328]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,SimpleExponentialSmoothing 143.400350 NaN

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 53/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [330]:  ## First we will define an empty dataframe to store our values from the loop

resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_6

SES_test

Out[330]: Shoe_Sales predict

YearMonth

1991-01-01 198 420.229811

1991-02-01 253 420.229811

1991-03-01 173 420.229811

1991-04-01 186 420.229811

1991-05-01 185 420.229811

1991-06-01 105 420.229811

1991-07-01 228 420.229811

1991-08-01 214 420.229811

1991-09-01 189 420.229811

1991-10-01 270 420.229811

1991-11-01 277 420.229811

1991-12-01 378 420.229811

1992-01-01 185 420.229811

1992-02-01 182 420.229811

1992-03-01 258 420.229811

1992-04-01 179 420.229811

1992-05-01 197 420.229811

1992-06-01 168 420.229811

1992-07-01 250 420.229811

1992-08-01 211 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 54/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Shoe_Sales predict

YearMonth

1992-09-01 260 420.229811

1992-10-01 234 420.229811

1992-11-01 305 420.229811

1992-12-01 347 420.229811

1993-01-01 203 420.229811

1993-02-01 217 420.229811

1993-03-01 227 420.229811

1993-04-01 242 420.229811

1993-05-01 185 420.229811

1993-06-01 175 420.229811

1993-07-01 252 420.229811

1993-08-01 319 420.229811

1993-09-01 202 420.229811

1993-10-01 254 420.229811

1993-11-01 336 420.229811

1993-12-01 431 420.229811

1994-01-01 150 420.229811

1994-02-01 280 420.229811

1994-03-01 187 420.229811

1994-04-01 279 420.229811

1994-05-01 193 420.229811

1994-06-01 227 420.229811

1994-07-01 225 420.229811

1994-08-01 205 420.229811

1994-09-01 259 420.229811

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 55/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Shoe_Sales predict

YearMonth

1994-10-01 254 420.229811

1994-11-01 275 420.229811

1994-12-01 394 420.229811

1995-01-01 159 420.229811

1995-02-01 230 420.229811

1995-03-01 188 420.229811

1995-04-01 195 420.229811

1995-05-01 189 420.229811

1995-06-01 220 420.229811

1995-07-01 274 420.229811

MODEL EVALUATION
In [331]:  resultsDf_6.sort_values(by=['Test RMSE'],ascending=True)

Out[331]: Alpha Values Train RMSE Test RMSE

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 56/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [339]:  ## Plotting on both the Training and Test data

plt.figure(figsize=(18,9))
plt.plot(SES_train['Shoe_Sales'], label='Train')
plt.plot(SES_test['Shoe_Sales'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smoothing predictions on Test Set')
plt.plot(SES_test['predict',0.3], label='Alpha =0.3 Simple Exponential Smoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 57/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [ ]: 

Method 6: Double Exponential Smoothing (Holt's Model)

Two parameters 𝛼 and 𝛽 are estimated in this model. Level and Trend are
accounted for in this model.
In [340]:  DES_train = train.copy()
DES_test = test.copy()

In [342]:  model_DES = Holt(DES_train['Shoe_Sales'])

In [343]:  ## First we will define an empty dataframe to store our values from the loop

resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_7

Out[343]: Alpha Values Beta Values Train RMSE Test RMSE

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 58/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [346]:  for i in np.arange(0.3,1.1,0.1):

for j in np.arange(0.3,1.1,0.1):
model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing_slope=j,optimized=False,use_brute=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=146)

rmse_model6_train = metrics.mean_squared_error(DES_train['Shoe_Sales'],DES_train['predict',i,j],squared=False

rmse_model6_test = metrics.mean_squared_error(DES_test['Shoe_Sales'],DES_test['predict',i,j],squared=False)

resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Values':j,'Train RMSE':rmse_model6_train

,'Test RMSE':rmse_model6_test}, ignore_index=True)

In [347]:  resultsDf_7

Out[347]: Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.3 84.736667 890.968504

1 0.3 0.4 88.649551 1270.606989

2 0.3 0.5 92.421849 1638.027559

3 0.3 0.6 95.729786 1938.967123

4 0.3 0.7 98.137776 2121.073309

... ... ... ... ...

123 1.0 0.6 97.917760 2694.069323

124 1.0 0.7 102.519502 2972.867602

125 1.0 0.8 107.552006 3235.495464

126 1.0 0.9 113.113025 3482.173468

127 1.0 1.0 119.350991 3712.530034

128 rows × 4 columns

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 59/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [348]:  resultsDf_7.sort_values(by=['Test RMSE']).head()

Out[348]: Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.3 84.736667 890.968504

64 0.3 0.3 84.736667 890.968504

8 0.4 0.3 82.660727 1132.467007

72 0.4 0.3 82.660727 1132.467007

16 0.5 0.3 80.640171 1264.035724

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 60/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [349]:  ## Plotting on both the Training and Test data

plt.figure(figsize=(18,9))
plt.plot(DES_train['Shoe_Sales'], label='Train')
plt.plot(DES_test['Shoe_Sales'], label='Test')

plt.plot(DES_test['predict', 0.3, 0.3], label='Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing predictions on Test Set'

plt.legend(loc='best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 61/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [350]:  resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7.sort_values(by=['Test RMSE']).values[0][3]]}

,index=['Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_7_1])
resultsDf

Out[350]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,SimpleExponentialSmoothing 143.400350 NaN

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 890.968504 NaN

Method 7: Triple Exponential Smoothing (Holt - Winter's Model)

Three parameters 𝛼 , 𝛽 and 𝛾 are estimated in this model. Level, Trend and
Seasonality are accounted for in this model.¶
In [356]:  TES_train = train.copy()
TES_test = test.copy()

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 62/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [363]:  TES_train.head()

Out[363]: Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [374]:  model_TES = ExponentialSmoothing(TES_train['Shoe_Sales'],trend='additive',seasonal='multiplicative')

In [375]:  model_TES_autofit = model_TES.fit()

In [376]:  model_TES_autofit.params

Out[376]: {'smoothing_level': 0.5373361999081212,

'smoothing_trend': 0.00014086793240142598,

'smoothing_seasonal': 0.24246164909614723,

'damping_trend': nan,

'initial_level': 210.39351963064775,

'initial_trend': 0.2802724231796019,

'initial_seasons': array([0.51514538, 0.48033013, 0.60687238, 0.66697524, 0.5774032 ,

0.5200335 , 0.55827895, 0.75780619, 0.82782578, 0.6973361 ,

0.85449496, 0.86086391]),

'use_boxcox': False,

'lamda': None,

'remove_bias': False}

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 63/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [377]:  ## Prediction on the test data

TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(TES_test))
TES_test.head()

Out[377]: Shoe_Sales auto_predict

YearMonth

1991-01-01 198 260.606576

1991-02-01 253 244.323204

1991-03-01 173 259.928407

1991-04-01 186 269.997461

1991-05-01 185 267.662272

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 64/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [379]:  ## Plotting on both the Training and Test using autofit

plt.figure(figsize=(18,9))
plt.plot(TES_train['Shoe_Sales'], label='Train')
plt.plot(TES_test['Shoe_Sales'], label='Test')

plt.plot(TES_test['auto_predict'], label='Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing predictions o

plt.legend(loc='best')
plt.grid();

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 65/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [381]:  ## Test Data

rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Shoe_Sales'],TES_test['auto_predict'],squared=False)
print("For Alpha=0.676,Beta=0.088,Gamma=0.323, Triple Exponential Smoothing Model forecast on the Test Data, RMSE is

For Alpha=0.676,Beta=0.088,Gamma=0.323, Triple Exponential Smoothing Model forecast on the Test Data, RMSE is 84.9
14

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 66/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [382]:  resultsDf_8_1 = pd.DataFrame({'Test RMSE': [rmse_model6_test_1]}

,index=['Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_8_1])
resultsDf

Out[382]: Test RMSE Test MAPE

RegressionOnTime 266.276472 110.08

NaiveModel 245.121306 101.47

SimpleAverageModel 63.984570 21.86

2pointTrailingMovingAverage 45.948736 14.32

4pointTrailingMovingAverage 57.872686 19.48

6pointTrailingMovingAverage 63.456893 22.38

9pointTrailingMovingAverage 67.723648 23.33

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,SimpleExponentialSmoothing 143.400350 NaN

Alpha=0.995,SimpleExponentialSmoothing 196.404793 79.92

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 890.968504 NaN

Alpha=0.676,Beta=0.088,Gamma=0.323,TripleExponentialSmoothing 84.913778 NaN

Section 2

In [38]:  from statsmodels.tsa.stattools import adfuller

The hypothesis in a simple form for the ADF test is:

H0 : The Time Series has a unit root and is thus non-stationary.

H1 : The Time Series does not have a unit root and is thus stationary.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 67/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [384]:  dftest = adfuller(df,regression='ct')

print('DF test statistic is %3.3f' %dftest[0])
print('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])

DF test statistic is -1.577

DF test p-value is 0.8014186234536552

Number of lags used 13

In [42]:  dftest = adfuller(df.diff().dropna(),regression='ct')

print('DF test statistic is %3.3f' %dftest[0])
print('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])

DF test statistic is -3.532

DF test p-value is 0.036117034001363256

Number of lags used 12

In [43]:  def test_stationarity(timeseries):

rolmean = timeseries.rolling(window=7).mean() #determining the rolling mean
rolstd = timeseries.rolling(window=7).std() #determining the rolling standard deviation

#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)

#Perform Dickey-Fuller test:

print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 68/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [45]:  test_stationarity(train['Shoe_Sales'])

Results of Dickey-Fuller Test:

Test Statistic -1.361129

p-value 0.600763

#Lags Used 13.000000

Number of Observations Used 118.000000

Critical Value (1%) -3.487022

Critical Value (5%) -2.886363

Critical Value (10%) -2.580009

dtype: float64

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 69/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [46]:  test_stationarity(test['Shoe_Sales'])

Results of Dickey-Fuller Test:

Test Statistic -1.414290

p-value 0.575407

#Lags Used 11.000000

Number of Observations Used 43.000000

Critical Value (1%) -3.592504

Critical Value (5%) -2.931550

Critical Value (10%) -2.604066

dtype: float64

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 70/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [386]:  df.diff().dropna().plot(grid=True);

Plot the Autocorrelation and the Partial Autocorrelation function plots on the whole
data.
In [387]:  from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 71/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [388]:  plot_acf(df,alpha=0.05);

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 72/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [389]:  plot_pacf(df,zero=False,alpha=0.05);

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 73/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [390]:  plot_pacf(df,zero=False,alpha=0.05,method='ywmle');

In [392]:  df.index.year.unique()

Out[392]: Int64Index([1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,

1991, 1992, 1993, 1994, 1995],

dtype='int64', name='YearMonth')

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 74/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [394]:  # Train Data

train.head()

Out[394]: Shoe_Sales

YearMonth

1980-01-01 85

1980-02-01 89

1980-03-01 109

1980-04-01 95

1980-05-01 91

In [397]:  # TEst Data

test.head()

Out[397]: Shoe_Sales

YearMonth

1991-01-01 198

1991-02-01 253

1991-03-01 173

1991-04-01 186

1991-05-01 185

Check for stationarity of the Training Data Time Series

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 75/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [400]: 
train.plot(grid=True,figsize=(20,8));

In [401]:  dftest = adfuller(train,regression='ct')

print('DF test statistic is %3.3f' %dftest[0])
print('DF test p-value is' ,dftest[1])
print('Number of lags used' ,dftest[2])

DF test statistic is -1.749

DF test p-value is 0.7287654522797321

Number of lags used 13

Build an Automated version of an ARIMA model for which the best

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 76/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

parameters are selected in accordance with the lowest Akaike Information

Criteria (AIC).¶
In [49]:  ## The following loop helps us in getting a combination of different parameters of p and q in the range of 0 and 2
## We have kept the value of d as 1 as we need to take a difference of the series to make it stationary.

import itertools
p = q = range(0, 4)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
print('Examples of the parameter combinations for the Model')
for i in range(0,len(pdq)):
print('Model: {}'.format(pdq[i]))

Examples of the parameter combinations for the Model

Model: (0, 1, 0)

Model: (0, 1, 1)

Model: (0, 1, 2)

Model: (0, 1, 3)

Model: (1, 1, 0)

Model: (1, 1, 1)

Model: (1, 1, 2)

Model: (1, 1, 3)

Model: (2, 1, 0)

Model: (2, 1, 1)

Model: (2, 1, 2)

Model: (2, 1, 3)

Model: (3, 1, 0)

Model: (3, 1, 1)

Model: (3, 1, 2)

Model: (3, 1, 3)

In [50]:  # Creating an empty Dataframe with column names only

ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
ARIMA_AIC

Out[50]:
param AIC

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 77/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [51]:  from statsmodels.tsa.arima.model import ARIMA

for param in pdq:# running a loop within the pdq parameters defined by itertools
ARIMA_model = ARIMA(train['Shoe_Sales'].values,order=param).fit()#fitting the ARIMA model
#using the parameters from the loop
print('ARIMA{} - AIC:{}'.format(param,ARIMA_model.aic))#printing the parameters and the AIC
#from the fitted models
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic}, ignore_index=True)
#appending the AIC values and the model parameters to the previously created data frame
#for easier understanding and sorting of the AIC values

ARIMA(0, 1, 0) - AIC:1508.2837722095962

ARIMA(0, 1, 1) - AIC:1497.050322418796

ARIMA(0, 1, 2) - AIC:1494.964605366341

ARIMA(0, 1, 3) - AIC:1495.1484738738857

ARIMA(1, 1, 0) - AIC:1501.6431242011804

ARIMA(1, 1, 1) - AIC:1492.4871865078985

ARIMA(1, 1, 2) - AIC:1494.423859457242

ARIMA(1, 1, 3) - AIC:1496.385878255936

ARIMA(2, 1, 0) - AIC:1498.9504830259475

ARIMA(2, 1, 1) - AIC:1494.4314983035842

ARIMA(2, 1, 2) - AIC:1496.4107391804341

ARIMA(2, 1, 3) - AIC:1480.8640496603166

ARIMA(3, 1, 0) - AIC:1498.9303094231939

ARIMA(3, 1, 1) - AIC:1496.3468641049035

ARIMA(3, 1, 2) - AIC:1495.6558545472712

ARIMA(3, 1, 3) - AIC:1482.5631378700123

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 78/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [406]:  ## Sort the above AIC values in the ascending order to get the parameters for the minimum AIC value

ARIMA_AIC.sort_values(by='AIC',ascending=True).head()

Out[406]: param AIC

11 (2, 1, 3) 1480.864050

15 (3, 1, 3) 1482.563138

5 (1, 1, 1) 1492.487187

6 (1, 1, 2) 1494.423859

9 (2, 1, 1) 1494.431498

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 79/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [52]:  auto_ARIMA = ARIMA(train, order=(2,1,3))

results_auto_ARIMA = auto_ARIMA.fit()

print(results_auto_ARIMA.summary())

SARIMAX Results

==============================================================================

Dep. Variable: Shoe_Sales No. Observations: 132

Model: ARIMA(2, 1, 3) Log Likelihood -734.432

Date: Sat, 05 Mar 2022 AIC 1480.864

Time: 21:50:41 BIC 1498.115

Sample: 01-01-1980 HQIC 1487.874

- 12-01-1990

Covariance Type: opg

==============================================================================

coef std err z P>|z| [0.025 0.975]

------------------------------------------------------------------------------

ar.L1 0.0114 0.028 0.401 0.688 -0.044 0.067

ar.L2 -0.9971 0.018 -56.143 0.000 -1.032 -0.962

ma.L1 -0.3300 0.089 -3.724 0.000 -0.504 -0.156

ma.L2 0.9785 0.086 11.344 0.000 0.809 1.148

ma.L3 -0.2831 0.081 -3.507 0.000 -0.441 -0.125

sigma2 4235.6965 527.265 8.033 0.000 3202.276 5269.117

===================================================================================

Ljung-Box (L1) (Q): 0.05 Jarque-Bera (JB): 41.97

Prob(Q): 0.82 Prob(JB): 0.00

Heteroskedasticity (H): 13.39 Skew: -0.60

Prob(H) (two-sided): 0.00 Kurtosis: 5.50

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 80/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [416]:  results_auto_ARIMA.plot_diagnostics(figsize=(15,5));

Predict on the Test Set using this model and evaluate the model.
In [54]:  predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(test))

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 81/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [55]:  ## Mean Absolute Percentage Error (MAPE) - Function Definition

def mean_absolute_percentage_error(y_true, y_pred):
return np.mean((np.abs(y_true-y_pred))/(y_true))*100

## Importing the mean_squared_error function from sklearn to calculate the RMSE

from sklearn.metrics import mean_squared_error

In [56]:  rmse = mean_squared_error(test['Shoe_Sales'],predicted_auto_ARIMA,squared=False)

mape = mean_absolute_percentage_error(test['Shoe_Sales'],predicted_auto_ARIMA)
print('RMSE:',rmse,'\nMAPE:',mape)

RMSE: 183.8987336906776

MAPE: 85.36879105091518

In [57]:  resultsDf = pd.DataFrame({'RMSE': rmse,'MAPE':mape}

,index=['ARIMA(2,1,3)'])

resultsDf

Out[57]:
RMSE MAPE

ARIMA(2,1,3) 183.898734 85.368791

Build a version of the ARIMA model for which the best parameters are
selected by looking at the ACF and the PACF plots.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 82/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [421]:  plot_acf(train.diff(),title='Training Data Autocorrelation',missing='drop')

plot_pacf(train.diff().dropna(),title='Training Data Partial Autocorrelation',zero=False,method='ywmle')
plt.show()

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 83/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Here, we have taken alpha=0.05.

The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the significant lag before which the PACF plot cuts-off to 3.
The
Moving-Average parameter in an ARIMA model is 'q' which comes from the significant lag before the ACF plot cuts-off to 3.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 84/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [63]:  manual_ARIMA = ARIMA(train['Shoe_Sales'], order=(2,1,1))

results_manual_ARIMA = manual_ARIMA.fit()

print(results_manual_ARIMA.summary())

SARIMAX Results

==============================================================================

Dep. Variable: Shoe_Sales No. Observations: 132

Model: ARIMA(2, 1, 1) Log Likelihood -743.216

Date: Sat, 05 Mar 2022 AIC 1494.431

Time: 23:31:42 BIC 1505.932

Sample: 01-01-1980 HQIC 1499.105

- 12-01-1990

Covariance Type: opg

==============================================================================

coef std err z P>|z| [0.025 0.975]

------------------------------------------------------------------------------

ar.L1 0.4698 0.112 4.212 0.000 0.251 0.688

ar.L2 0.0234 0.111 0.211 0.833 -0.194 0.241

ma.L1 -0.8430 0.089 -9.497 0.000 -1.017 -0.669

sigma2 4942.4320 429.525 11.507 0.000 4100.578 5784.286

===================================================================================

Ljung-Box (L1) (Q): 0.02 Jarque-Bera (JB): 54.91

Prob(Q): 0.90 Prob(JB): 0.00

Heteroskedasticity (H): 12.71 Skew: 0.00

Prob(H) (two-sided): 0.00 Kurtosis: 6.17

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 85/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [64]:  results_manual_ARIMA.plot_diagnostics(figsize=(15,5));

Predict on the Test Set using this model and evaluate the model.¶

In [65]:  predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 86/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [66]:  rmse = mean_squared_error(test['Shoe_Sales'],predicted_manual_ARIMA,squared=False)

mape = mean_absolute_percentage_error(test['Shoe_Sales'],predicted_manual_ARIMA)
print('RMSE:',rmse,'\nMAPE:',mape)

RMSE: 143.25358508400214

MAPE: 66.47287651689274

In [69]:  temp_resultsDf = pd.DataFrame({'RMSE': rmse,'MAPE':mape}

,index=['ARIMA(2,1,1)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[69]:
RMSE MAPE

ARIMA(2,1,3) 183.898734 85.368791

ARIMA(6,1,6) 143.373555 66.528847

ARIMA(6,1,6) 143.253585 66.472877

ARIMA(2,1,1) 143.253585 66.472877

Build an Automated version of a SARIMA model for which the best parameters are selected in accordance
with the lowest Akaike Information Criteria (AIC).

Let us look at the ACF plot once more to understand the seasonal parameter for the SARIMA model.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 87/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [433]:  plot_acf(train.diff(),title='Training Data Autocorrelation',missing='drop');

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 88/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [434]:  import itertools

p = q = range(0, 4)
d= range(1,2)
D = range(0,1)
pdq = list(itertools.product(p, d, q))
PDQ = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D, q))]
print('Examples of the parameter combinations for the Model are')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], PDQ[i]))

Examples of the parameter combinations for the Model are

Model: (0, 1, 1)(0, 0, 1, 6)

Model: (0, 1, 2)(0, 0, 2, 6)

Model: (0, 1, 3)(0, 0, 3, 6)

Model: (1, 1, 0)(1, 0, 0, 6)

Model: (1, 1, 1)(1, 0, 1, 6)

Model: (1, 1, 2)(1, 0, 2, 6)

Model: (1, 1, 3)(1, 0, 3, 6)

Model: (2, 1, 0)(2, 0, 0, 6)

Model: (2, 1, 1)(2, 0, 1, 6)

Model: (2, 1, 2)(2, 0, 2, 6)

Model: (2, 1, 3)(2, 0, 3, 6)

Model: (3, 1, 0)(3, 0, 0, 6)

Model: (3, 1, 1)(3, 0, 1, 6)

Model: (3, 1, 2)(3, 0, 2, 6)

Model: (3, 1, 3)(3, 0, 3, 6)

In [435]:  SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])

SARIMA_AIC

Out[435]: param seasonal AIC

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 89/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [437]:  import statsmodels.api as sm

for param in pdq:
for param_seasonal in PDQ:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Shoe_Sales'].values,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)

results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, results_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_seasonal ,'AIC': results_SARIMA.aic}, ignore_i

SARIMA(3, 1, 0)x(2, 0, 2, 6) - AIC:1281.4835250356737

SARIMA(3, 1, 0)x(2, 0, 3, 6) - AIC:1241.9175308538108

SARIMA(3, 1, 0)x(3, 0, 0, 6) - AIC:1222.470536618146

SARIMA(3, 1, 0)x(3, 0, 1, 6) - AIC:1223.926701099537

SARIMA(3, 1, 0)x(3, 0, 2, 6) - AIC:1222.9724241984472

SARIMA(3, 1, 0)x(3, 0, 3, 6) - AIC:1223.9056176953923

SARIMA(3, 1, 1)x(0, 0, 0, 6) - AIC:1465.3971811551432

SARIMA(3, 1, 1)x(0, 0, 1, 6) - AIC:1406.3757978218312

SARIMA(3, 1, 1)x(0, 0, 2, 6) - AIC:1314.3734659122551

SARIMA(3, 1, 1)x(0, 0, 3, 6) - AIC:1254.3621621706939

SARIMA(3, 1, 1)x(1, 0, 0, 6) - AIC:1384.2293991936115

SARIMA(3, 1, 1)x(1, 0, 1, 6) - AIC:1355.3246664658004

SARIMA(3, 1, 1)x(1, 0, 2, 6) - AIC:1300.4358512274525

SARIMA(3, 1, 1)x(1, 0, 3, 6) - AIC:1233.6757446742633

SARIMA(3, 1, 1)x(2, 0, 0, 6) - AIC:1280.9990164201217

SARIMA(3, 1, 1)x(2, 0, 1, 6) - AIC:1281.3914566113174

SARIMA(3, 1, 1)x(2, 0, 2, 6) - AIC:1280.4546291216839

SARIMA(3, 1, 1)x(2, 0, 3, 6) - AIC:1230.4300973511295

SARIMA(3, 1, 1)x(3, 0, 0, 6) - AIC:1221.9018536832566

SARIMA(3, 1, 1)x(3, 0, 1, 6) - AIC:1221.2583355494353

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 90/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [438]:  SARIMA_AIC.sort_values(by=['AIC']).head()

Out[438]: param seasonal AIC

59 (0, 1, 3) (2, 0, 3, 6) 1208.149006

123 (1, 1, 3) (2, 0, 3, 6) 1209.791558

63 (0, 1, 3) (3, 0, 3, 6) 1210.147616

55 (0, 1, 3) (1, 0, 3, 6) 1211.609239

187 (2, 1, 3) (2, 0, 3, 6) 1211.763032

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 91/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [71]:  import statsmodels.api as sm

auto_SARIMA = sm.tsa.statespace.SARIMAX(train['Shoe_Sales'],
order=(1, 1, 3),
seasonal_order=(2, 0, 3, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA = auto_SARIMA.fit(maxiter=1000)
print(results_auto_SARIMA.summary())

SARIMAX Results

=========================================================================================

Dep. Variable: Shoe_Sales No. Observations: 132

Model: SARIMAX(1, 1, 3)x(2, 0, 3, 6) Log Likelihood -594.896

Date: Sat, 05 Mar 2022 AIC 1209.792

Time: 23:58:34 BIC 1236.705

Sample: 01-01-1980 HQIC 1220.706

- 12-01-1990

Covariance Type: opg

==============================================================================

coef std err z P>|z| [0.025 0.975]

------------------------------------------------------------------------------

ar.L1 0.2295 0.334 0.687 0.492 -0.425 0.885

ma.L1 -0.6394 0.334 -1.913 0.056 -1.294 0.016

ma.L2 0.1784 0.158 1.129 0.259 -0.131 0.488

ma.L3 -0.2425 0.093 -2.600 0.009 -0.425 -0.060

ar.S.L6 -0.4708 0.249 -1.891 0.059 -0.959 0.017

ar.S.L12 0.5742 0.236 2.433 0.015 0.112 1.037

ma.S.L6 0.4381 0.255 1.717 0.086 -0.062 0.938

ma.S.L12 -0.1000 0.248 -0.402 0.687 -0.587 0.387

ma.S.L18 0.2456 0.159 1.540 0.124 -0.067 0.558

sigma2 3088.8789 470.162 6.570 0.000 2167.379 4010.379

===================================================================================

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 22.64

Prob(Q): 1.00 Prob(JB): 0.00

Heteroskedasticity (H): 7.91 Skew: 0.27

Prob(H) (two-sided): 0.00 Kurtosis: 5.17

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 92/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [72]:  results_auto_SARIMA.plot_diagnostics(figsize=(15,5));

Predict on the Test Set using this model and evaluate the model.

In [74]:  predicted_auto_SARIMA = results_auto_SARIMA.get_forecast(steps=len(test))

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 93/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [75]:  predicted_auto_SARIMA.summary_frame(alpha=0.05).head()

Out[75]:
Shoe_Sales mean mean_se mean_ci_lower mean_ci_upper

1991-01-01 232.430212 55.606758 123.442968 341.417455

1991-02-01 224.682589 64.567550 98.132516 351.232662

1991-03-01 232.924037 74.668261 86.576934 379.271140

1991-04-01 260.924921 78.767957 106.542562 415.307280

1991-05-01 219.913153 81.849201 59.491668 380.334638

In [76]:  rmse = mean_squared_error(test['Shoe_Sales'],predicted_auto_SARIMA.predicted_mean,squared=False)

mape = mean_absolute_percentage_error(test['Shoe_Sales'],predicted_auto_SARIMA.predicted_mean)
print('RMSE:',rmse,'\nMAPE:',mape)

RMSE: 60.71587632681772

MAPE: 23.58195186032845

In [77]:  temp_resultsDf = pd.DataFrame({'RMSE': rmse,'MAPE':mape}

,index=['SARIMA(1,1,3)(2,0,3,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[77]:
RMSE MAPE

ARIMA(2,1,3) 183.898734 85.368791

ARIMA(6,1,6) 143.373555 66.528847

ARIMA(6,1,6) 143.253585 66.472877

ARIMA(2,1,1) 143.253585 66.472877

SARIMA(1,1,3)(2,0,3,6) 60.715876 23.581952

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 94/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

Build a version of the SARIMA model for which the best parameters are selected by looking at the ACF and
the PACF plots. - Seasonality at 6.

Let us look at the ACF and the PACF plots once more.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 95/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [450]:  plot_acf(train.diff(),title='Training Data Autocorrelation',missing='drop')

plot_pacf(train.diff().dropna(),title='Training Data Partial Autocorrelation',zero=False,method='ywmle');

Here, we have taken alpha=0.05.

We are going to take the seasonal period as 3 or its multiple e.g. 6. We are taking the p value to be 6 and the q value also to be 6 as the parameters
same as the ARIMA model.

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 96/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

The Auto-Regressive parameter in an SARIMA model is 'P' which comes from the significant lag after which the PACF plot cuts-off to 0.
The Moving-
Average parameter in an SARIMA model is 'Q' which comes from the significant lag after which the ACF plot cuts-off to 3.
Remember to check the
ACF and the PACF plots only at multiples of 3 (since 3 is the seasonal period).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 97/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [82]:  import statsmodels.api as sm

manual_SARIMA = sm.tsa.statespace.SARIMAX(train['Shoe_Sales'],
order=(3,1,3),
seasonal_order=(0, 0, 3, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_manual_SARIMA = manual_SARIMA.fit(maxiter=1000)
print(results_manual_SARIMA.summary())

SARIMAX Results

=========================================================================================

Dep. Variable: Shoe_Sales No. Observations: 132

Model: SARIMAX(3, 1, 3)x(0, 0, 3, 6) Log Likelihood -608.083

Date: Sun, 06 Mar 2022 AIC 1236.166

Time: 00:35:27 BIC 1263.079

Sample: 01-01-1980 HQIC 1247.080

- 12-01-1990

Covariance Type: opg

==============================================================================

coef std err z P>|z| [0.025 0.975]

------------------------------------------------------------------------------

ar.L1 -1.1231 0.229 -4.911 0.000 -1.571 -0.675

ar.L2 -0.1300 0.408 -0.318 0.750 -0.930 0.670

ar.L3 0.4542 0.210 2.159 0.031 0.042 0.867

ma.L1 0.7671 0.214 3.591 0.000 0.348 1.186

ma.L2 -0.3398 0.263 -1.290 0.197 -0.856 0.176

ma.L3 -0.7504 0.121 -6.223 0.000 -0.987 -0.514

ma.S.L6 -0.2049 0.147 -1.394 0.163 -0.493 0.083

ma.S.L12 0.4754 0.146 3.254 0.001 0.189 0.762

ma.S.L18 -0.0313 0.150 -0.209 0.835 -0.325 0.263

sigma2 3922.3170 436.530 8.985 0.000 3066.734 4777.900

===================================================================================

Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 38.69

Prob(Q): 0.93 Prob(JB): 0.00

Heteroskedasticity (H): 7.19 Skew: 0.09

Prob(H) (two-sided): 0.00 Kurtosis: 5.91

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 98/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [454]:  results_manual_SARIMA.plot_diagnostics(figsize=(15,5))
plt.show()

Predict on the Test Set using this model and evaluate the model.

In [79]:  predicted_manual_SARIMA = results_manual_SARIMA.get_forecast(steps=len(test))

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-79-2782ab37909c> in <module>

----> 1 predicted_manual_SARIMA = results_manual_SARIMA.get_forecast(steps=len(test))

NameError: name 'results_manual_SARIMA' is not defined

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 99/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [457]:  rmse = mean_squared_error(test['Shoe_Sales'],predicted_manual_SARIMA.predicted_mean,squared=False)

mape = mean_absolute_percentage_error(test['Shoe_Sales'],predicted_manual_SARIMA.predicted_mean)
print('RMSE:',rmse,'\nMAPE:',mape)

RMSE: 101.21577490213676

MAPE: 46.0819835423647

In [78]:  temp_resultsDf = pd.DataFrame({'RMSE': [rmse],'MAPE':mape}

,index=['SARIMA(3,1,3)(0,0,3,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[78]:
RMSE MAPE

ARIMA(2,1,3) 183.898734 85.368791

ARIMA(6,1,6) 143.373555 66.528847

ARIMA(6,1,6) 143.253585 66.472877

ARIMA(2,1,1) 143.253585 66.472877

SARIMA(1,1,3)(2,0,3,6) 60.715876 23.581952

SARIMA(3,1,3)(0,0,3,6) 60.715876 23.581952

Building the most optimum model on the Full Data.¶

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 100/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [86]:  full_data_model = sm.tsa.statespace.SARIMAX(df['Shoe_Sales'],

order=(1,1,3),
seasonal_order=(2, 0, 3, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_full_data_model = full_data_model.fit(maxiter=1000)
print(results_full_data_model.summary())

SARIMAX Results

=========================================================================================

Dep. Variable: Shoe_Sales No. Observations: 187

Model: SARIMAX(1, 1, 3)x(2, 0, 3, 6) Log Likelihood -886.454

Date: Sun, 06 Mar 2022 AIC 1792.909

Time: 00:40:22 BIC 1823.908

Sample: 01-01-1980 HQIC 1805.493

- 07-01-1995

Covariance Type: opg

==============================================================================

coef std err z P>|z| [0.025 0.975]

------------------------------------------------------------------------------

ar.L1 0.2210 0.408 0.541 0.588 -0.579 1.021

ma.L1 -0.7288 0.423 -1.725 0.085 -1.557 0.099

ma.L2 0.2532 0.223 1.138 0.255 -0.183 0.689

ma.L3 -0.1796 0.064 -2.788 0.005 -0.306 -0.053

ar.S.L6 0.0247 0.050 0.494 0.622 -0.073 0.123

ar.S.L12 0.9904 0.038 25.810 0.000 0.915 1.066

ma.S.L6 -0.1451 0.132 -1.102 0.271 -0.403 0.113

ma.S.L12 -0.6609 0.124 -5.345 0.000 -0.903 -0.419

ma.S.L18 -0.0761 0.103 -0.737 0.461 -0.279 0.126

sigma2 2730.7962 288.822 9.455 0.000 2164.715 3296.877

===================================================================================

Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB): 25.68

Prob(Q): 0.99 Prob(JB): 0.00

Heteroskedasticity (H): 0.68 Skew: 0.26

Prob(H) (two-sided): 0.16 Kurtosis: 4.87

===================================================================================

Warnings:

[1] Covariance matrix calculated using the outer product of gradients (complex-step).

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 101/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [105]:  results_full_data_model.plot_diagnostics(figsize=(15,5));

In [ ]: 

# Evaluate the model on the whole and predict 12 months into the future
(till the end of next year).
In [93]:  predicted_manual_SARIMA_6_full_data = results_full_data_model.get_forecast(steps=12)

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 102/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [95]:  predicted_manual_SARIMA_6_full_data.summary_frame(alpha=0.05).head()

Out[95]:
Shoe_Sales mean mean_se mean_ci_lower mean_ci_upper

1995-08-01 244.282253 52.332308 141.712815 346.851692

1995-09-01 256.964309 58.327042 142.645408 371.283211

1995-10-01 253.311652 67.082343 121.832676 384.790627

1995-11-01 311.737341 71.718613 171.171443 452.303240

1995-12-01 391.408368 75.516460 243.398825 539.417911

In [99]:  rmse = mean_squared_error(df['Shoe_Sales'],results_full_data_model.fittedvalues,squared=False)

print('RMSE of the Full Model',rmse)

RMSE of the Full Model 51.06582034880977

In [100]:  df.tail()

Out[100]:
Shoe_Sales

YearMonth

1995-03-01 188

1995-04-01 195

1995-05-01 189

1995-06-01 220

1995-07-01 274

In [102]:  SARIMA_6_full_data.summary_frame(alpha=0.05).set_index(pd.date_range(start='1995-08-01',end='1996-08-01', freq='M'))

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 103/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

In [104]:  # plot the forecast along with the confidence band

axis = df['Shoe_Sales'].plot(label='Observed')
pred_full_manual_SARIMA_date['mean'].plot(ax=axis, label='Forecast', alpha=0.7)
axis.fill_between(pred_full_manual_SARIMA_date.index, pred_full_manual_SARIMA_date['mean_ci_lower'],
pred_full_manual_SARIMA_date['mean_ci_upper'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Shoe_Sales')
plt.legend(loc='best')
plt.show()

In [ ]: 

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 104/105
07/03/2022, 12:19 time_series_shoesales - Jupyter Notebook

localhost:8888/notebooks/time_series_shoesales.ipynb#Evaluate-the-model-on-the-whole-and-predict-17-months-into-the-future-(till-the-end-of-next-year). 105/105

PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
Arnab Chowdhury As1
No ratings yet
Arnab Chowdhury As1
12 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Problem Statement2
0% (1)
Problem Statement2
2 pages
Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
GREEN Belt Statistics Cheat Sheet
No ratings yet
GREEN Belt Statistics Cheat Sheet
13 pages
Nelson Rules
No ratings yet
Nelson Rules
3 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Machine Learning Business Report - Compress (AutoRecovered)
100% (3)
Machine Learning Business Report - Compress (AutoRecovered)
69 pages
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
No ratings yet
Predictive Modelling Project Gloria Susan Raju 11 APR 2021 PDF
56 pages
Linear - Regression - Assignment: Problem Statement
100% (3)
Linear - Regression - Assignment: Problem Statement
24 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
Project Report
100% (3)
Project Report
36 pages
Business Report: Predictive Modelling
100% (2)
Business Report: Predictive Modelling
37 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
FRA Report
100% (1)
FRA Report
30 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
MRA - Project - Puvya - Ravi
100% (3)
MRA - Project - Puvya - Ravi
46 pages
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
100% (3)
DATA MINING PROJECT PAVITHRAA GOVINDARAJAN 24 OCT 2021 Jupyter Notebook PDF
49 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
Grocery Project
100% (5)
Grocery Project
40 pages
Predictive Modelling Project 2
100% (4)
Predictive Modelling Project 2
32 pages
Time Series Project
50% (4)
Time Series Project
2 pages
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
100% (1)
Executive Sumary - Rajarshi Das (Data Visualization Using Tableau Project)
11 pages
MRA Milestone-1 Graded Project
100% (2)
MRA Milestone-1 Graded Project
41 pages
FRA Assignment
100% (1)
FRA Assignment
31 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
100% (1)
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
14 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Predictive Modeling Business Report Seetharaman Final Changes PDF
100% (1)
Predictive Modeling Business Report Seetharaman Final Changes PDF
28 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Machine Learning Project: Raghul Harish
100% (2)
Machine Learning Project: Raghul Harish
46 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
RACHIT MITTAL Capstone Project. Notes 2 PDF
No ratings yet
RACHIT MITTAL Capstone Project. Notes 2 PDF
39 pages
ML Project Report
100% (2)
ML Project Report
35 pages
Assignment Report - Advanced Statistics
No ratings yet
Assignment Report - Advanced Statistics
12 pages
ML Quiz-2
No ratings yet
ML Quiz-2
5 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
State Wise Health Income Clustering 18th December 2021 PDF
100% (2)
State Wise Health Income Clustering 18th December 2021 PDF
29 pages
Time Series Rose Shehroz Arfeen
100% (1)
Time Series Rose Shehroz Arfeen
42 pages
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Data Mining Project Report
100% (1)
Data Mining Project Report
98 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Marketing & Retail Analytics - Report - Part A
100% (2)
Marketing & Retail Analytics - Report - Part A
18 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Milestone 1
No ratings yet
Milestone 1
2 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
FRA Business Report
100% (1)
FRA Business Report
21 pages
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
No ratings yet
Business Report DSBA Data Mining Project - Part 2 Segmentation Using K-Means Clustering
28 pages
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
100% (1)
Time Series Forecasting Business Report: Name: S.Krishna Veni Date: 20/02/2022
31 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Akhil M Nair: Contact
No ratings yet
Akhil M Nair: Contact
2 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Socio-Economic Sciences, 88 (4) .: References
No ratings yet
Socio-Economic Sciences, 88 (4) .: References
3 pages
Literature Review
No ratings yet
Literature Review
11 pages
Lampiran Hasil Uji Validitas Dan Reabilitas Pengetahuan Tentang IMD Reliability Scale: All Variables
No ratings yet
Lampiran Hasil Uji Validitas Dan Reabilitas Pengetahuan Tentang IMD Reliability Scale: All Variables
4 pages
Assignment 2 - Quantitative (Latest Update As On 19 Jan 24)
No ratings yet
Assignment 2 - Quantitative (Latest Update As On 19 Jan 24)
20 pages
Descriptive Statistics/Inferential Statistics
No ratings yet
Descriptive Statistics/Inferential Statistics
1 page
A First Course in Linear Model Theory 2nd Edition Nalini Ravishanker download pdf
100% (9)
A First Course in Linear Model Theory 2nd Edition Nalini Ravishanker download pdf
70 pages
Unit 3 Anova Two Way Classification
No ratings yet
Unit 3 Anova Two Way Classification
18 pages
QCH 3
No ratings yet
QCH 3
2 pages
Unit 2 Practice Questions
No ratings yet
Unit 2 Practice Questions
5 pages
[Slides] 16 Statistical Reasoning (II)
No ratings yet
[Slides] 16 Statistical Reasoning (II)
25 pages
EPID 5001 Lecture 4 Cohort Studies - Sampling2024
No ratings yet
EPID 5001 Lecture 4 Cohort Studies - Sampling2024
45 pages
Unit 3 (ML)
No ratings yet
Unit 3 (ML)
26 pages
Sas Tutorial Procunivariate
No ratings yet
Sas Tutorial Procunivariate
10 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
Activity 1 4TH
No ratings yet
Activity 1 4TH
1 page
[Ebooks PDF] download Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein full chapters
100% (7)
[Ebooks PDF] download Multilevel Statistical Models Wiley Series in Probability and Statistics 4th Edition Harvey Goldstein full chapters
50 pages
TASK 4 - Descriptive and Inferential Statistics
No ratings yet
TASK 4 - Descriptive and Inferential Statistics
1 page
ISO Standards Handbook Statistical Methods For Quality Control
No ratings yet
ISO Standards Handbook Statistical Methods For Quality Control
1 page
8614 Assignment 2 by M MURSLAIN BY677681
No ratings yet
8614 Assignment 2 by M MURSLAIN BY677681
18 pages
Business Mathematics Formulae
No ratings yet
Business Mathematics Formulae
5 pages
Day1
No ratings yet
Day1
36 pages
Module 4
No ratings yet
Module 4
5 pages
Applied Statistics: From Bivariate Through Multivariate Techniques Seconddownload
100% (2)
Applied Statistics: From Bivariate Through Multivariate Techniques Seconddownload
53 pages
The Normal Distribution: Armando A. Camana JR., Maed
No ratings yet
The Normal Distribution: Armando A. Camana JR., Maed
23 pages
UT Dallas Syllabus For hcs6313.501 05s Taught by Herve Abdi (Herve)
No ratings yet
UT Dallas Syllabus For hcs6313.501 05s Taught by Herve Abdi (Herve)
4 pages
Worksheet 7
No ratings yet
Worksheet 7
3 pages
Module 3 4 Summative Assessment
No ratings yet
Module 3 4 Summative Assessment
3 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
Full download Business Statistics 9th Edition Groebner Test Bank pdf docx
100% (32)
Full download Business Statistics 9th Edition Groebner Test Bank pdf docx
56 pages
STATPROB Module 7
No ratings yet
STATPROB Module 7
16 pages