0% found this document useful (0 votes)
49 views

A Data-Driven Framework For Predicting Weather Impact On High-Volume - Journal Consumer - Retailing

Uploaded by

Rathi Priya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

A Data-Driven Framework For Predicting Weather Impact On High-Volume - Journal Consumer - Retailing

Uploaded by

Rathi Priya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Retailing and Consumer Services 48 (2019) 169–177

Contents lists available at ScienceDirect

Journal of Retailing and Consumer Services


journal homepage: www.elsevier.com/locate/jretconser

A data-driven framework for predicting weather impact on high-volume T


low-margin retail products
Gylian Verstraetea,c, , El-Houssaine Aghezzafa,d, Bram Desmetb,c

a
Department of Industrial Systems Engineering and Product Design, Ghent University, Technologiepark 903, 9052 Zwijnaarde (Gent), Belgium
b
Operations & Supply Chain Management, Vlerick Business School, Reep 1, 9000 Gent, Belgium
c
Solventure, Sluisweg 1 bus 18, 9000 Gent, Belgium
d
Flanders Make, Belgium

ARTICLE INFO ABSTRACT

Keywords: Accurate demand forecasting is of critical importance to retail companies operating in high-volume low-margin
Sales forecasting industries. Inaccuracies in the forecasts lead either to stock-outs or to excess inventories, resulting in either lost
Machine learning sales or higher working capital, and for both cases in extra unnecessary costs. Prediction accuracy is essential to
Weather retail companies having a part of their product portfolio manufactured in low-cost countries and requiring long
delivery times. It is rather vital when the demand for these goods is strongly weather dependent. The combi-
nation of long delivery times and weather dependence creates a business challenge, as the availability period of
accurate weather information is much shorter than the lead time. In this paper we propose a methodology that
handles the impact of both the short-term (with available weather data) and the long-term weather uncertainty
on the forecast. For the former, the proposed framework is capable of automatically selecting the best prediction
model. For latter, the framework fits a distribution on simulated and aggregated sales using the short-term
regression model with historical weather data. This framework has been tested on a company's sales data and is
proven to satisfactorily address the challenges that the company is facing.

1. Introduction Saharan Africa and the Caribbean labor productivity will drop between
11.4% and 26.9%. Weather conditions directly influence the mood and
Demand forecasting is fundamental in operations management the physique of individuals. Steele (1951) states four of those weather
(Oliva and Watson, 2009) and plays an important role, especially in influences applied to retail:
retail enterprises (Xiao and Qi, 2008; Beheshti-Kashi et al., 2015). Re-
tailers rely on sales forecasts to make business decisions on marketing, 1. “The weather could be of such a nature that it is for one reason or
production, inventory, and finance (Ma et al., 2016). Inaccurate fore- another uncomfortable to go to the store.”
casts may lead to stock-outs (resulting in lost sales and customers) or 2. “The weather could produce situations that would physically pre-
excess inventories, inducing extra cash usage (idle inventory) and costs vent people from going to the store, as in the case of snow drifts over
(the product expiring). Factors affecting the demand of retail goods are roads and streets.”
promotional activity, public holidays and events, meteorological in- 3. “Weather may have psychological effects on people that may change
fluences and the general economic context (Thomassey, 2010). their shopping habits.”
A study carried out by the National Oceanic and Atmospheric 4. “Some kinds of merchandise may be more desirable during a period
Administration (NOAA) estimated that “weather and climate sensitive in which certain types of weather prevail.”
industries, both directly and indirectly, account for about one-third of
the nation's GDP” (Weiher and Sen, 2006). Hurricanes, for example, In this paper, we propose a framework to create both short-term and
may have a devastating effect on the economy (Cashell and Labonte, long-term sales predictions while considering weather information. The
2005). As global warming continues, Kjellstrom et al. (2009) estimated short-term method uses available weather data to predict sales in the
that in Southeast Asia, Andean and Central America, Eastern Sub- upcoming days. The typical challenges for these methods are linked to


Corresponding author at: Department of Industrial Systems Engineering and Product Design, Ghent University, Technologiepark 903, 9052 Zwijnaarde (Gent),
Belgium.
E-mail addresses: [email protected] (G. Verstraete), [email protected] (E.-H. Aghezzaf), [email protected] (B. Desmet).

https://doi.org/10.1016/j.jretconser.2019.02.019
Received 23 August 2018; Received in revised form 25 December 2018; Accepted 19 February 2019
0969-6989/ © 2019 Elsevier Ltd. All rights reserved.
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

forecasting using weather (nonlinear effect, multicollinearity), fore- Khosravi, 2015; Chae et al., 2016), support vector regression (Ahmad
casting of sales (limited training data) and the combination of both et al., 2014; Hu et al., 2015) and tree-based techniques, with a focus on
(skewed dependent variable, interaction effects between the in- boosting (Taieb and Hyndman, 2014; Mayrink and Hippert, 2016; Taieb
dependent variables). The proposed methodology can be used to dis- et al., 2016).
tribute goods from the distribution center to the stores and to fill racks Long-term forecasting in the energy demand sector has been in-
in stores. vestigated less intensively. Morita et al. (1996) computed annual load
The long-term model is used by purchasing departments to order estimates using annual load data and the standard method to compute
goods from distant low-cost countries. As the lead time for sourcing confidence intervals in linear regression modeling. McSharry et al.
goods from these countries is longer than the available weather pre- (2005) suggested a model to predict energy demand for up to one year
dictions, this information cannot be used directly in the forecasting in the future using daily weather information simulated by a novel
process. technique that replicates both the distribution and autocorrelation. This
The methods we utilize are known to improve accuracy in the allows the model to handle both the timing and magnitude of the peak
forecasting of the demand for energy but have not been tested in the demand. Al-Hamadi and Soliman (2005) suggested structuring the
context of retail demand forecasting. Still, the forecasting of sales is problem by using separate models for the hour in the day and the week
different from energy demand forecasting due to several reasons. First, in the year. The authors then estimate the energy load for the next year
the relation between the dependent and the independent variables by extrapolating the difference of the annual growth between the two
follows a different pattern. For energy demand forecasting, the con- previous years and taking into account the temperature uncertainty.
sumption is typically high when the temperature is either high or low, Hyndman and Fan (2010) extended the literature by simulating the
as cooling or heating are required. In retail demand forecasting, this is temperature patterns using a double seasonal block bootstrap (for the
not necessarily the case. Second, the directness of influence is different. period in the year as well as year itself). This bootstrap allows the au-
Energy demand reacts quicker to changes in temperature than retail thors to build a probability distribution to model the temperature un-
demand. Third, retail sales forecasting is influenced by past or future certainty. The authors then use the temperature distribution and known
variables (such as upcoming holidays, a forecast of good weather). calendar effects together with demographic and economic variables in a
The rest of the paper is organized as follows: In Section 2 we review logarithmic transformed regression model. In a future study, Fan and
the literature on sales forecasting and forecasting using weather in- Hyndman (2011) use the same methodology to model price elasticity.
formation. In Section 3 we describe the typical characteristics of sales Hong et al. (2014) provided a practical overview of the current best
forecasting using weather information (Section 3.1). Then we present practices in probabilistic energy demand forecasting.
the short-term (Section 3.2) and the long-term forecasting methdology For forecasting retail sales a plethora of techniques are available.
(Section 3.3). In Section 4, we validate the proposed framework using Generally, the best techniques cost/benefit-wise are automated proce-
sales series of inflatable swimming pools provided by a Belgian retailer dures based on past observations and trends (Ramos et al., 2015). Ex-
and discuss the managerial impact. Conclusions and recommended fu- amples of such methods are ExponenTial Smoothing using state space
ture research are proposed in Section 5. models (ETS) (Hyndman et al., 2002) and Auto Regressive Integrated
Moving Average (ARIMA). However, these techniques simply extract
2. Literature review patterns from historical observations of the time series itself. If external
effects influence retail sales other techniques are required that take
The last three decades witnessed an evolution in research on fore- account of these influences. For example, Tehrani and Ahrens (2016)
casting using weather. Dell et al. (2014) reported that the latest re- forecast sales in the fashion retail industry by applying a probabilistic
search on weather effects has been on conflict and aggression, economic approach to identify different classes of products in terms of sale.
growth, agriculture, labor productivity, industrial output, health, en- Thereafter, the authors combine kernel machines with a probabilistic
ergy and political stability. Of these research areas, the literature on approach to empower the performance of kernel machines and use
forecasting energy production and demand is the most related to pre- them to predict the number of sales. Silva et al. (2018) predicted the
dicting retail sales, as it has a directly measurable effect and a similar sales of smartphones using a data mining approach based on informa-
aggregation level. Therefore, we draw heavily on this literature. tion of the seller and the product. The authors feed the extracted data to
In energy demand forecasting, it is common to create separate a support vector machine to forecast future sales. Di Fatta et al. (2018)
models for every fundamentally different time period, which is typically predicted the conversion rates on webshops of small and medium-sized
each hour of each day (Taylor and Espasa, 2008). Dordonnat et al. enterprises (SMEs) by analyzing a pleothora of potential influences
(2008) suggested a multivariate regression model using weather data (such as free shipping, discounts, brand/no brand and the loading speed
that achieved satisfying forecasting results. Cancelo et al. (2008) pro- of the website).
posed using a long-term daily aggregated forecast model for energy Little attention has been paid to forecasting retail sales using weather
demand more than three days in the future. This forecast is dis- information. Steele (1951) first improved retail sales prediction accuracy
aggregated back to hourly time buckets based on the split of the short- using weather data in a linear regression model. The author stated that the
term model (for less than three days). The authors take into account main difficulty with this method was having an accurate weather forecast
special events such as public holidays. for the following days. Starr (2000) created monthly and quarterly ag-
Weather has a nonlinear impact on the demand of energy (Dell gregated forecasts of retail sales using a log-linear sales transformation.
et al., 2014). Therefore, further research investigated the nonlinear The author determined that weather has a significant impact on sales, but
relation between weather and the demand. Mirasgedis et al. (2006) the significance level decreased in the quarterly buckets, due to loss of
suggested fitting a linear regression model on the natural logarithm of information in aggregation. Murray et al. (2010) provided empirical evi-
the demand in order to predict using the growth rate. The authors dence of weather influence on sales and explained the connection psy-
suggested models that are shown to provide high accuracy forecasts. chologically. Tran (2016) investigated the effect of weather on the sales of
Other regression based approaches such as regularization (Hsu, 2015) sporting goods. The author uses quadratic and cubic transformations of the
and quadratic or cubic transformations of the variables (Hart and de independent variables, while transforming the sales using the natural
Dear, 2004) have been proven to address the nonlinearity. logarithm. The author uses LASSO regression to select variables. Steinker
Limited by the assumption of a linear relation between independent et al. (2016) evaluated the relation between weather and online fashion
and dependent variable, the application of a new series of learning retail sales. Using ordinary least squares regression, auto regression, in-
methods was investigated. Existing literature of these machine learning teractions and log-linear sales transformation a reduction in MAPE of
methods includes neural networks (Hippert et al., 2001; Raza and 50.6% was achieved.

170
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

Fig. 1. An overview of the proposed framework for the modeling weather influence on retail sales and its uses in business.

3. Framework 3.1. Data properties

This study proposes a framework for predicting retail demand while Retail sales influenced by weather effects has some typical char-
considering weather effects. We use two types of prediction models: a acteristics. First, the sales data in those cases is often strongly skewed.
short-term model to predict the demand for goods in the near future, According to Steele (1951) the skewness of the sales can be attributed
and a long-term model for handling long lead times. From the short- to the weather as it influences people minds to adapt their spending
term prediction methods, we can extract insights in the factors driving pattern. This effect is in most cases not linear: the temperature in-
the business. Fig. 1 visualizes the framework of this study. creasing 1 °C from 5 °C will have a different effect than increasing 1 °C
In this section, we first discuss the properties of retail demand that is from 25 °C. In the case of the retailer, the demand for inflatable
influenced by weather effects. Then, we discuss the used methods in the swimming pools was high when the weather was good. Another ex-
short-term model, while linking the models to the properties of retail ample is heating, ventilation, and air conditioning (HVAC) systems
demand that is influenced by weather. Then, we elaborate on the se- which are sold more if the temperature is either low or high.
lection methodology of the framework in Section 3.2.7. Finally, in The demand for certain retail goods can be affected by influences
Section 3.3, we describe the methodology used to create long-term other than weather. Examples of other influences are calendar effects
predictions based on weather information. (seasonality or holidays), promotions, displays and hypes. Often there
are interaction effects of these variables on the sales. In between

171
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

weather effects interactions are also possible: the sales for a certain towards zero (and thus performing variable selection) (Tibshirani,
temperature might have a different influence if it is raining. However, 1996). Mathematically this is equivalent to minimizing
including multiple weather variables might lead to high multi- (y
i i 0 x )2 +
j j ij
| |, j > 0 . The penalty factor
j j is de-
collinearity, as temperature metrics such as average, minimum and termined by cross validation on the training data. The regularization
maximum temperature are strongly correlated. Collinearity between makes LASSO robust to multicollinearity in the independent variables.
calendar effects, promotions and weather effects is also possible. Literature suggests two values for :
The combination of these characteristics generally require a large
set of trainings data. However, in sales forecasting a long history is • min ,
where the cross validation error is minimized;
often unavailable. • 1se ,
where the error lies within 1 standard deviation from the
To summarize, we identified the following characteristics for fore- minimal cross validation error.
casting sales using weather information:
Following the inventors' guidelines (Hastie et al., 2009), we chose the
• Skewed dependent variable parsimonious 1se approach, as the risk curves are estimated with error
• Limited training data (Friedman et al., 2010) and there is a distinction between overfitting
• Nonlinear effect on dependent variable during variable selection and model fitting (Cawley and Talbot, 2010).
• Multicollinearity between the independent variables The LASSO method can be combined with Poisson regression. The
• Interaction effects between the independent variables combination of both methods resolves many of the characteristics of
Section 3.2.6.
3.2. Methodology for handling short-term weather influence
3.2.3. Artificial neural networks
In this section, we describe the methodology for predicting weather An artificial neural network is a machine learning technique that is
influenced retail sales. Our framework consists of adapted methods that based on our own neural network, the human brain. The neural net-
are proven to perform well when used to accurately predict the demand work consists of network nodes (the artificial neurons) that are con-
for energy. The methods include Poisson regression, LASSO regression, nected via edges (the artificial synapses). These edges are used to send
neural networks, support vector regression and gradient boosting. In signals between neurons. The network is composed of multiple layers of
Section 3.2.6, we link these methods to the characteristics described in nodes that connect the dependent with the independent variables. The
Section 3.1. layer closest to the former is the output layer, the layer to the latter is
As the best prediction method depends on the outspokenness of the the input layer. In between those layers there are potentially hidden
characteristics, we suggest to carry out a forecasting competition be- layers. The more nodes in the hidden layers, the more complex the
tween the well performing methods. However, to this end, we need to network.
choose a selection metric. Conforming to the literature, we choose to Neural networks process numerical signals that are send to con-
select the prediction model based on out-of-sample accuracy metrics. nected nodes. In turn, these nodes will (based on a weighted linear
Section 3.2.7 offers an overview of the selection metrics. combination of the input signals) decide what their output signal will
be. Mathematically, this becomes:
3.2.1. Poisson regression k
Poisson regression is a form of generalized linear models that is z j = bj + wi,j*x i , (k input nodes for node j)
suitable for modeling count data (Nelder and Baker, 1972). The method i=0
is closely related to Log-Linear regression. The assumptions of the
This combination is transformed using a nonlinear activation function.
method are that the response variable Y has a Poisson distribution, and
An often used activation function is the sigmoid function:
that the logarithm of its expected value can be explained by a linear
combination of the variables. Mathematically, the model can be written 1
x j = S (z j ) =
as: 1 + exp z

log (E [Y |X ]) = 0 + X Finally, the neuron j puts the output x j on its output nodes. The non-
linear activation function shows that neural networks can handle
By applying the inverse of the natural logarithmic link function, this nonlinear effects in the data. As complexity increases (number of
becomes: nodes), the network is able to capture more effects, but it is more likely
E [Y |X ] = exp 0+ X (1) that the network is overfitting. Generally, neural networks require more
training data than other methods.
Expanding X in the equation, this results in:

E [Y |x1…xn]= exp 0 + 1 x 1+ + n xn 3.2.4. Support vector regression


Support vector machines (Cortes and Vapnik, 1995) form an artifi-
= exp 0 exp 1 x1 …exp n xn (2)
cial intelligence technique that can be used for both classification and
Some properties of the regression technique can be found in the regression. For classification, the technique finds an optimally separ-
equations above. First, Eq. (1) shows that only strictly positive depen- ating hyperplane in the finite (number of variables) space. However,
dent variables are possible. This makes Poisson regression ideal for the problem might not be linearly separable. Therefore, the problem is
modeling count data. Second, Eq. (2) shows that by taking the natural often solved in a higher dimensional space, using a kernel trick to retain
logarithm of the dependent variable, the equation incorporates the easy computation. Often used kernel methods are polynomial and ra-
multiplicative effect of the n predictors. This results in synergies (in- dial basis functions. Consequently, the technique is able to model
teractions) between independent variables being modeled implicitly. nonlinear separations in the dimension space. In this study, we use both
linear and non-linear kernels. The support vector machine technique
3.2.2. LASSO regression can be used for regression problems (see Drucker et al., 1997).
LASSO is a regression technique that performs both variable selec-
tion and model fitting. LASSO complements the ordinary least squares 3.2.5. Gradient boosting
method by penalizing the absolute value of the coefficients, which leads Gradient boosting (Friedman, 2001) is a machine learning method
to the coefficient of variables with a weak influence being shrunk that creates an accurate learner by combining many weak learners.

172
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

These weak learners are typically decision trees. Where single decision heavily skewed when the sales is near zero/low (Section 3.1). To select
trees are prone to overfitting, gradient boosting combines these trees as the best forecasting model, we opted for metrics that are less affected by
weak learners to avoid overfitting. Gradient boosting builds the com- the skewness such as:
bination of learners iteratively, but shrinks the influence of individual
trees with a learning factor . This allows the individual learner to • mean absolute error (MAE): calculated as the mean of the absolute
overfit, but reduces the impact of the learner and combines it with a |y y^ |
errors. Mathematically this is written as: i i i . This method is
group of other (potentially overfitted) learners, which in practice re-
n
chosen as it gives general information on the size of the errors.
duces the chance of overfitting in the final model.
Considering the regression case, we want to build a learning model
• root mean squared error (RMSE): calculated as the square root of
mean of the squared error. Mathematically this is written as:
F that predicts y^ = F (x ) and minimizes a certain error metric. The y^i )2
i (yi . This method is chosen because it gives information on
procedure that constructs F goes as follows: n
the size of the error while penalizing larger errors.
1. Start by assigning F = ȳ • mean error (ME): calculated as the mean of the errors.
(y y^ )
2. For each m < M : Mathematically this is written as: i i i . This method is chosen as
n
(a) Calculate the residuals r = y Fm (x ) it gives general information on the size of the bias.
(b) Fit a new weak learner hm + 1 (x ) on residuals r by minimizing the
error metric For making the results interpretable, the results are scaled to a
(c) Update the learning model Fm + 1 (x ) = Fm (x ) + hm + 1 (x ) baseline. Using naïve forecasting as a baseline for error measurement
(mean absolute scaled error (MASE), as suggested by Hyndman and
We see that Fm + 1 (x ) is a correction on Fm (x ) by a factor hm + 1 (x ) . Koehler, 2006) does not give a fair evaluation as the demand is mainly
Empirical research has shown that using a small learning rate ( < 0.1) driven by external variables. Therefore, for this experiment the or-
leads to the best results (Hastie et al., 2009). We note that with a dinary least squares regression model with the same set of variables is
smaller learning rate, we require more weak learners (M) for the al- chosen as a baseline.
gorithm to reach an optimum, thus increasing computation time.
3.3. Methodology for handling long-term weather uncertainty
3.2.6. Link with the data properties
Each specific machine learning technique has its advantages and Forecasting is used in many areas of the supply chain. One of these
disadvantages. In choosing a specific learning agent, it is therefore areas is procurement. In case of retailers, procurement often makes a
important to look at the characteristics of the problem. In this section, prediction for the entire selling season before it starts, as the goods are
we link the characteristics of Section 3.1 to the prediction models we sourced from low-cost countries. For toy retailers, about 70% of all toys
selected based on the performance in energy demand forecasting. An globally are manufactured in China alone (Chen et al., 2016). This low-
overall summary is found in Table 1. Some of the focus points are: cost sourcing comes at a cost, as the delivery lead time is often multi-
fold compared to local sourcing (Moatti, 2008). For example, shipping

• Poisson regression is not able to deal with multicollinearity, as the goods from China to Western Europe takes between 20 and 40 days,
which means by the time the retailer has to place the order, no accurate
core of the method is still ordinary least squares-like.
• LASSO regression is not able to derive interactions from the data weather information is available.
An option to tackle long-term weather uncertainty is scenario
directly. However, we can include them as variables in the model.
• The combination of Poisson regression with a LASSO penalty is able forecasting. In scenario forecasting, one is interested in finding pre-
diction intervals that are translated in scenarios such as the best case,
to handle the characteristics of the data.
• Neural networks and support vector regression need a lot of training the most likely and the worst case. When forecasting using external
variables it is possible to create estimates of these cases by assigning
data in order to predict accurately, as they are prone to overfitting
on training data. values to these variables. For weather specific variables, this is difficult

• Gradient boosting is a technique that is able to handle all the because of day-to-day variability and uncertainty of the weather.
In energy demand forecasting, historical weather patterns (often
characteristics of the data.
bootstrapped) are used to create statistical distributions on the tem-
perature. Sampling from the independent variables has been proven to
3.2.7. Selection method provide accurate confidence intervals in literature under certain as-
In the prediction procedure, we used leave-one-period-out cross sumptions. One of those assumptions is the independence and identi-
validation, where repeatedly a full selling period of sales is taken out as cally distribution of the variables. Considering the use of multiple
a test set, as this is the prediction window of the long term forecast (see weather variables that are dependent (such as average temperature and
Section 3.3). maximum temperature), we consider the independence assumption
Furthermore, for the selection method metric we chose not to use invalid. Also, weather information is autocorrelated and seasonal re-
MAPE, as this metric is undefined when the actuals are zero and is lated, suggesting a time dependency. Another issue with bootstrapping
is that it creates unrealistic high swings between the boundaries of the
Table 1
blocks. Given we use shifted variables to represent the influence of past
The applicability of each of the prediction techniques on the properties of the
and future weather in our analyses, this would lead to problems.
data of this study. A ‘+’ shows a model can handle the characteristic; a ‘-’ shows
the method is not suited; a ‘± ’ indicates the model is neither suited or unsuited.
Therefore, we argue not to estimate the distribution of weather
variables. We propose to use known historical weather data and make
Characteristic Poisson LASSO Neural SVR Gradient predictions using the short-term prediction model and these known
network boosting
weather data. Then, we aggregate the estimated sales for the entire
Skewed dependent var. + ± ± + + selling period. The advantage of using historical weather information is
Limited training data + + – – + that we are not limited by the available sales history, which results in a
Nonlinearity + ± + + + much larger set of aggregated sales estimates, while maintaining rea-
Multicollinearity – + + ± +
listic weather patterns.
Interactions + ± + ± +
On these aggregated estimates, we carry out a probability dis-
tribution competition, in which we fit probability distributions for the

173
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

Fig. 2. Overview of the methodology for handling long-term weather uncertainty.

aggregated sales using maximum likelihood and select the best fitting this service level from its cumulative distribution function. The relation
distribution using the corrected Akaike's Information Criterion (Harrell, between the demand in a certain period, service level and the in-
2015). The corrected Akaike's Information Criterion (AICc) is written as ventory and order quantity is given in the Eq. (3).

p+1 = Prob {Period demand Inventory on hand + order quantity} (3)


AICc = 2L + 2p 1 +
n p 1
Eq. (3) links the uncertainty of the weather to other areas of supply
chain management such as inventory management, procurement and
where L is the log-likelihood of the distribution, p is the number of
customer service management.
parameters in the distribution and n is the number of observations used
in fitting the distribution. This criterion will balance the accuracy of the
fit with the number of parameters estimated, and will thus avoid 4. Case study on swimming pool sales
overfitting.
Fig. 2 shows a schematic overview of the methodology for handling 4.1. Used data
long-term weather uncertainty in sales forecasting. By using this
methodology, we handle the uncertainty of the sales estimates on an In this study, we use sales history of portable, inflatable swimming
aggregated basis without needing to worry about issues unrealistic or pools of a Belgium based retailer. The retailer provided point of sales
incorrect weather variables might introduce. data on the number of units sold in daily time buckets over the period
Handling the long-term weather uncertainty now simply becomes between 2012 and 2016. As the season for selling swimming pools in
an service level (type 1) decision. In this service decision, we choose a Belgium runs from the beginning of May until the end of August, we
service level , that denotes the probability we can fulfill all demand of only consider the data in these months. There is no need to disaggregate
a certain year. As we created a probability distribution, we can sample the sales data geographically, as the weather is uniform in the region.

174
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

mean squared error is the metric to use. When looking at the bias metric
(ME), we observe that LASSO Poisson outperforms gradient boosting
and Log-Linear OLS regression. This means that LASSO Poisson not only
predicts more accurate, but also over- and under-predicts more equally.
Furthermore, we note that neural networks and support vector regres-
sion methods do not predict accurately in this case. This is a result
training data being too small for those methods to converge (see Section
3.2.6).
The root mean squared error performance of Table 2 suggests that
the combination of Poisson regression with LASSO penalties outper-
forms the other techniques in periods where the sales volume is very
high. Therefore, we split the dependent variable in half, based on the
sales volume. Not surprisingly (considering Fig. 3), we see that 50% of
the sales volume occurs in on average 6.4% of the days in the reference
period.
Table 3 shows the error metrics for the top 50% of the sales volume.
In this case, the combination of Poisson regression with LASSO penal-
ties outperform the other techniques for both accuracy metrics. The
second best technique is now gradient boosting. Table 3 shows that
gradient boosting performs 4.8% (RMSE) to 7.6% (MAE) worse than the
winning technique. These results are conform to the literature study
and the summary of the used methods (see Section 3.2.6).
Conveniently, the best performing methods (Poisson regression with
Fig. 3. Leave-one-year-out cross validation predictions made by the Lasso
LASSO penalties and gradient boosting) offer valuable insights into the
Poisson model for small swimming pools.
factors influencing the sales. Fig. 4 shows the importance of variables
selected by both models, relative to the most influential variable. Both
The data is split between larger and smaller inflatable swimming pools. models determined that the maximum temperature (Max. Temp.) on
Fig. 3 shows an overview of the sales for smaller swimming pools. The the same day is the most influential variable. Both models assigned a
retailer knows the sales are strongly weather dependent, as the sales high importance to leading maximum temperatures (Lead max. T, 2x
peaks occur when there is good weather. Although employees of the Lead max. T), suggesting a high importance of temperature in the two
retailer had a gut feeling what this good weather was, they were unable upcoming days. Both methods suggest the maximum temperature of a
to quantify the relation between the multiple weather variables and the certain day is more important than the average or minimum tempera-
sales. ture. Both techniques determined that the month of the season is im-
As independent variables, publicly available exogenous data of the portant. More swimming pools are sold in June and less in August, as
National Oceanic and Atmospheric Administration (NOAA) from a customers don't buy items at the end of the season. Other noted influ-
centrally located weather station in Belgium are used. The extracted ences include average temperature (Temp.), future average tempera-
data (in daily buckets) includes: ture (Lead T, 2x Lead T), day in the week (Saturday, Thursday), pre-
cipitation, maximum temperature of the previous day (Lag. max. T), the
• Average temperature dew point and holiday effects (Before holiday inter. for the days before
• Maximum temperature Ascension day and Pentecost; July-temp inter. for the first two weeks of
• Minimum temperature the summer holiday). Some of these insights were known by the pur-
• Dew point chasing team of the retailer, but they could not quantify the importance
• Precipitation of each feature.

Shifted instances of the average and maximum temperature are 4.3. Long-term method results
included. These instances represent the influence of past good weather
and future good weather on the sales. Future weather is uncertain, but Following the methodology of Section 3.3, we use the short-term
it is possible that a forecast of good weather triggers buying behavior. model and the weather information of 1981–2017 to create daily esti-
Therefore we include future weather information as variables in the mates of the sales. Next, the estimates are aggregated for all selling
framework. seasons. A histogram containing these results can be seen in Fig. 5. On
Furthermore, seasonality (day in the week, month in the year) and this set of observations, several statistical distributions are fitted (the
specific events, such as days off (Ascension day, Pentecost) and holiday normal, log-normal, Cauchy, logistic, uniform and Weibull distribu-
specific effects (first weeks of the summer vacation) are used in the tions). The log-normal distribution had the best fit based on Akaike's
analysis. The purchasers know these influence swimming pool sales, corrected information criterion. Considering the used model is Poisson
and therefore they need to be modeled. based, this selection is obvious, as Poisson regression uses natural
logarithms to transform the variables (Section 3.2.1). The density
4.2. Short-term method results function of this log-normal distribution can be found on Fig. 5. From
this density function, using the cumulative distribution function allows
Table 2 shows the error metrics scaled relatively to ordinary least us to extract the annual demand at a certain probability level . From
squares (OLS) regression. When comparing the mean absolute error this annual demand, we subtract the current inventory levels and obtain
metric, we see three competitive methods: gradient boosting, LASSO the order quantity for the next selling period.
Poisson and Log-Linear OLS. The root mean squared error shows that
LASSO Poisson outperforms the other methods. This suggests that the 4.4. Practical & managerial implications
LASSO Poisson method can better capture extreme values. To the re-
tailer, the high peaks are the most interesting, as the items need to be in The proposed framework was implemented for predicting the
the store in great quantity at that time. This indicates that the root swimming pools sales for the retailer. The forecasting models are used

175
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

Table 2
Average accuracy measurements for the sales.
Neural network Gradient boosting SVR LASSO LASSO Poisson Log-Linear OLS OLS

ME 0.243 3.037 6.817 −0.237 0.272 10.094 1


RMSE 1.078 0.642 0.878 1.004 0.591 0.707 1
MAE 0.865 0.503 0.673 0.861 0.496 0.501 1

Table 3
Average accuracy measurements for the top 50% of sales volume.
Neural network Gradient boosting SVR LASSO LASSO Poisson Log-Linear OLS OLS

ME 1.998 1.092 1.295 1.942 0.869 1.658 1


RMSE 1.247 0.695 0.921 1.108 0.662 0.798 1
MAE 1.242 0.767 0.884 1.104 0.708 0.888 1

in the sales and operations planning (S&OP) system of the retailer to


replenish stores from the central distribution center. Previously, the
retailer was unable to predict this accurately using statistics, as the
retailer had to manually adjust the forecast depending on the weather
of the upcoming days. This led to the retailer overstocking the goods in
the stores (to be sure they miss as few sales as possible), which in turn
resulted in unnecessary inventory and transportation costs.
The retailer did not know what the specific influencing factors were.
As the selected short-term model was rolled out, its scaled coefficients
delivered insights in what drives the sales. Due to the coefficients re-
sulting in intuitive directions and magnitudes the users trusted the
model.
The purchasing department makes orders for the next selling period
right after the selling period of each year. Typically this was done based
on the sales volume of the previous periods without having any exact
knowledge of the weather pattern of those periods. The distribution Fig. 5. Histogram of historical predictions including the log-normal distribution
fitting approach of this study allows them to make strategic service fitted on the estimates.
level decisions based on the impact (with timing) of historical weather
observations over the past 40 years. We expect this to have a huge yet been tested on retail sales. Our findings are in line with the existing
improvement for the purchasing department. literature on energy demand forecasting and general literature on ap-
plicability of the learning methods.
5. Conclusions and future research In the case study, the LASSO Poisson regression model performed
better than other competing state of the art prediction techniques. We
In this paper we propose a data-driven framework for handling benchmarked the techniques to an ordinary least squares regression
weather impact on retail sales of high-volume low-margin products. model and showed that the framework is able to achieve a considerable
The framework adopts the methods that are proven to be effective for reduction in out-of-sample forecast error (both MAE and RMSE) and
predicting energy demand, but have (to the best of our knowledge) not

Fig. 4. Scaled variable importance graph of one model for swimming pools.

176
G. Verstraete, et al. Journal of Retailing and Consumer Services 48 (2019) 169–177

bias (ME) using cross validation. The advantages of using the LASSO Energy Build. 36 (2), 161–174.
Poisson model is mainly its interpretability and its indirect way of Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning: data
mining, inference and prediction, 2 ed. Springer.
modeling interactions due to the synergies between the multiplicative Hippert, H.S., Pedreira, C.E., Souza, R.C., 2001. Neural networks for short-term load
coefficients. forecasting: a review and evaluation. IEEE Trans. Power Syst. 16 (1), 44–55.
Using the winning technique, we predicted sales using historical Hong, T., Wilson, J., Xie, J., 2014. Long term probabilistic load forecasting and nor-
malization with hourly information. IEEE Trans. Smart Grid 5 (1), 456–462.
weather patterns in this region and aggregated them yearly. On these Hsu, D., 2015. Identifying key variables and interactions in statistical models ofbuilding
yearly historical sales predictions, we fitted a probability distribution. energy consumption using regularization. Energy 83 (Supplement C), 144–155.
Using this density function, the retailer can make orders based on a Hu, Z., Bao, Y., Chiong, R., Xiong, T., 2015. Mid-term interval load forecasting using
multi-output support vector regression with a memetic algorithm for feature selec-
chosen service level, taking into account the long-term uncertainty of tion. Energy 84, 419–431.
weather. We are currently not aware of research articles that tackle the Hyndman, R.J., Fan, S., 2010. Density forecasting for long-term peak electricity demand.
long-term uncertainty of weather in sales forecasting by distribution IEEE Trans. Power Syst. 25 (2), 1142–1153.
Hyndman, R.J., Koehler, A.B., 2006. Another look at measures of forecast accuracy. Int. J.
fitting on predicted values.
Forecast. 22 (4), 679–688.
A possible extension of this work is to relax the assumption of in- Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S., 2002. A state space framework for
dependence of the demand in time. When the first selling peak hits the automatic forecasting using exponential smoothing methods. Int. J. Forecast. 18 (3),
market, many people buy the product. Other peaks will generally have 439–454.
Kjellstrom, T., Kovats, R.S., Lloyd, S.J., Holt, T., Tol, R.S., 2009. The direct impact of
a smaller magnitude as people who previously bought the product will climate change on regional labor productivity. Arch. Environ. Occup. Health 64 (4),
not buy this product again until the product reaches the end of its life 217–227.
time. In Fig. 3, we see that we either under-forecast the first peak or Ma, S., Fildes, R., Huang, T., 2016. Demand forecasting with high dimensional data: the
case of sku retail sales forecasting with intra-and inter-category promotional in-
over-forecast latter peaks. Therefore, modeling the first peak in a selling formation. Eur. J. Oper. Res. 249 (1), 245–257.
season could improve accuracy in the methodology. We note that Mayrink, V., Hippert, H.S., 2016. A hybrid method using exponential smoothing and
modeling this first peak is difficult as the threshold value and the timing gradient boosting for electrical short-term load forecasting. In: Proceedings of the
IEEE Latin American Conference on Computational Intelligence (LA-CCI). IEEE.
is difficult to obtain. Another extension is taking the expected lifetime pp. 1–6.
of the item into account and assuming people will only buy a new McSharry, P.E., Bouwman, S., Bloemhof, G., 2005. Probabilistic forecasts of the magni-
product if the current one needs replacement. tude and timing of peak electricity demand. IEEE Trans. Power Syst. 20 (2),
1166–1172.
Mirasgedis, S., Sarafidis, Y., Georgopoulou, E., Lalas, D., Moschovits, M., Karagiannis, F.,
Acknowledgements Papakonstantinou, D., 2006. Models for mid-term electricity demand forecasting
incorporating weather influences. Energy 31 (2), 208–227.
Moatti, V., 2008. Low cost sourcing…or high cost supplying? Actes de la
This research did not receive any specific grant from funding
XVIIѐmeconférence de l'Association Internationale de Management Stratégique,. pp.
agencies in the public, commercial, or not-for-profit sectors. 28–31.
Morita, H., Kase, T., Tamura, Y., Iwamoto, S., 1996. Interval prediction of annual max-
References imum demand using grey dynamic model. Int. J. Electr. Power Energy Syst. 18 (7),
409–413.
Murray, K.B., Di Muro, F., Finn, A., Leszczyc, P.P., 2010. The effect of weather on con-
Ahmad, A., Hassan, M., Abdullah, M., Rahman, H., Hussin, F., Abdullah, H., Saidur, R., sumer spending. J. Retail. Consum. Serv. 17 (6), 512–520.
2014. A review on applications of ann and svm for building electrical energy con- Nelder, J.A., Baker, R.J., 1972. Generalized Linear Models. Wiley Online Library.
sumption forecasting. Renew. Sustain. Energy Rev. 33, 102–109. Oliva, R., Watson, N., 2009. Managing functional biases in organizational forecasts: a case
Al-Hamadi, H., Soliman, S., 2005. Long-term/mid-term electric load forecasting based on study of consensus forecasting in supply chain planning. Prod. Oper. Manag. 18 (2),
short-term correlation and annual growth. Electr. Power Syst. Res. 74 (3), 353–361. 138–151.
Beheshti-Kashi, S., Karimi, H.R., Thoben, K.-D., Lütjen, M., Teucke, M., 2015. A survey on Ramos, P., Santos, N., Rebelo, R., 2015. Performance of state space and arima models for
retail sales forecasting and prediction in fashion markets. Syst. Sci. Control Eng. 3 (1), consumer retail sales forecasting. Robot. Comput.-Integr. Manuf. 34, 151–163.
154–161. Raza, M.Q., Khosravi, A., 2015. A review on artificial intelligence based load demand
Cancelo, J.R., Espasa, A., Grafe, R., 2008. Forecasting the electricity load from one day to forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 50,
one week ahead for the spanish system operator. Int. J. Forecast. 24 (4), 588–602. 1352–1372.
Cashell, B.W., Labonte, M., 2005. The macroeconomic effects of hurricane katrina. Congr. Silva, A.T., Moro, S., Rita, P., Cortez, P., 2018. Unveiling the features of successful ebay
Res. Serv., Libr. Congr. smartphone sellers. J. Retail. Consum. Serv. 43, 311–324.
Cawley, G.C., Talbot, N.L., 2010. On over-fitting in model selection and subsequent se- Starr, M., 2000. The Effects of Weather on Retail Sales.
lection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107. Steele, A., 1951. Weather's effect on the sales of a department store. J. Mark. 15 (4),
Chae, Y.T., Horesh, R., Hwang, Y., Lee, Y.M., 2016. Artificial neural network model for 436–443.
forecasting sub-hourly electricity usage in commercial buildings. Energy Build. 111, Steinker, S., Hoberg, K., Thonemann, U.W., 2016. The value of weather information for e-
184–194. commerce operations. Prod. Oper. Manag.
Chen, D., Wei, W., Hu, D., Muralidharan, E., 2016. Survival strategy of oem companies: a Taieb, S.B., Huser, R., Hyndman, R.J., Genton, M.G., 2016. Forecasting uncertainty in
case study of the chinese toy industry. Int. J. Oper. Prod. Manag. 36 (9), 1065–1088. electricity smart meter data by boosting additive quantile regression. IEEE Trans.
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20 (3), 273–297. Smart Grid 7 (5), 2448–2455.
Dell, M., Jones, B.F., Olken, B.A., et al., 2014. What do we learn from the weather? The Taieb, S.B., Hyndman, R.J., 2014. A gradient boosting approach to the kaggle load
new climate-economy literature. J. Econ. Lit. 52 (3), 740–798. forecasting competition. Int. J. Forecast. 30 (2), 382–394.
Di Fatta, D., Patton, D., Viglia, G., 2018. The determinants of conversion rates in sme e- Taylor, J.W., Espasa, A., 2008. Energy forecasting. Int. J. Forecast. 24 (4), 561–565.
commerce websites. J. Retail. Consum. Serv. 41, 161–168. Tehrani, A.F., Ahrens, D., 2016. Enhanced predictive models for purchasing in the fashion
Dordonnat, V., Koopman, S.J., Ooms, M., Dessertaine, A., Collet, J., 2008. An hourly field by using kernel machine regression equipped with ordinal logistic regression. J.
periodic state space model for modelling french national electricity load. Int. J. Retail. Consum. Serv. 32, 131–138.
Forecast. 24 (4), 566–587. Thomassey, S., 2010. Sales forecasts in clothing industry: the key success factor of the
Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J., Vapnik, V., 1997. Support vector supply chain management. Int. J. Prod. Econ. 128 (2), 470–483.
regression machines. Adv. Neural Inf. Process. Syst. 155–161. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser.
Fan, S., Hyndman, R.J., 2011. The price elasticity of electricity demand in south australia. B (Methodol.) 267–288.
Energy Policy 39 (6), 3709–3719. Tran, B.R., 2016. Blame it on the Rain Weather Shocks and Retail Sales. Technical report,
Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear Working Paper UC, San Diego.
models via coordinate descent. J. Stat. Softw. 33 (1), 1. Weiher, R., Sen, A., 2006. Economic Statistics for Noaa. National Oceanic and
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Ann. Atmospheric Sciences Administration, Washington DC, pp. 1–67.
Stat. 1189–1232. Xiao, T., Qi, X., 2008. Price competition, cost and demand disruptions and coordination of
Harrell, F., 2015. Regression Modeling Strategies: With Applications to Linear Models, a supply chain with one manufacturer and two competing retailers. Omega 36 (5),
Logistic and Ordinal Regression, and Survival Analysis. Springer Series in Statistics 741–753.
Springer International Publishing.
Hart, M., de Dear, R., 2004. Weather sensitivity in household appliance energy end-use.

177

You might also like