Full Text 02
Full Text 02
May 2020
The authors declare that they are the sole authors of this thesis and that they have not used
any sources other than those listed in the bibliography and identified as references. They further
declare that they have not submitted this thesis at any other institution to obtain a degree.
Contact Information:
Author(s):
Sai Nikhil Boyapati
E-mail: [email protected]
Ramesh Mummidi
E-mail: [email protected]
University advisor:
Suejb Memeti
Department of computer science (DIDA)
Background: Sales forecasting is an important field in the food sector, and it has
recently got immense popularity to boost market operations and productivity due to
new technologies. The industry has traditionally focused on a conventional statisti-
cal model but in the recent years, Machine Learning techniques have received more
attention.
Objectives: This thesis will help to identify the critical features that influence sales
and also an experiment is performed to find the best suitable algorithm for sales
forecasting.
Methods: Machine Learning Algorithms such as Simple Linear Regression, Gradi-
ent Boosting Regression, Support Vector Regression, and Random Forest Regression
were considered in this thesis, which they expected to perform well on the issues. An
experiment is carried out to determine the efficiency of the algorithms.
Results: Algorithms such as Simple Linear Regression, Gradient Boosting Re-
gression, Support Vector Regression, and Random Forest Regression are commonly
known for performing better than others, this has been clearly shown that Random
Forest Regression is the most appropriate algorithm compared to the others.
Conclusions: The Random Forest Regression algorithm performed well after do-
ing all the study when compared with other algorithms. Hence the Random Forest
Regression is considered as the best suitable algorithm for forecasting product sales.
First and foremost, praises and thanks to God, the Almighty, for His showers of bless-
ings throughout our research work to complete the research successfully. We would
like to express our deep and sincere gratitude to our supervisor Suejb Memeti, for
giving us the opportunity to do research and providing invaluable guidance through-
out this research. Under his guidance it was a great privilege and honour to work
and study. We are extremely appreciative of what he has offered us.
Finally, our special thanks to all the people who have helped us directly or indi-
rectly in completing the research work.
ii
Contents
Abstract i
Acknowledgments ii
1 Introduction 1
1.0.1 Aims and Objectives . . . . . . . . . . . . . . . . . . . . . . . 2
1.0.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Machine Learning Algorithms: . . . . . . . . . . . . . . . . . . 5
1.1.4 Selection of Machine Learning Algorithms . . . . . . . . . . . 6
1.1.5 Selection of Performance Metrics . . . . . . . . . . . . . . . . 7
2 Related Work 8
3 Method 10
3.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Experimentation Environment . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Data overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Data Correlation Method . . . . . . . . . . . . . . . . . . . . 13
3.5 Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.6.1 Encoding Categorical Values . . . . . . . . . . . . . . . . . . . 15
3.6.2 Stratified K-fold Cross-Validation . . . . . . . . . . . . . . . . 17
3.7 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7.1 Accuracy score . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7.2 Max Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7.3 Mean Absolute Error . . . . . . . . . . . . . . . . . . . . . . . 18
4 Results 19
4.1 Simple Linear Regressor . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Gradient Boosting Regressor . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Support Vector Regressor . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Random Forest Regressor . . . . . . . . . . . . . . . . . . . . . . . . 23
4.5 Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
5 Analysis and Discussion 27
5.1 Comparative analysis of Performance Metrics . . . . . . . . . . . . . 27
5.1.1 Average Accuracy Score . . . . . . . . . . . . . . . . . . . . . 27
5.1.2 Average Mean Absolute Error . . . . . . . . . . . . . . . . . . 28
5.1.3 Average Max Error . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Validity Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . 30
References 32
A Appendix 36
iv
List of Figures
v
List of Tables
vi
Chapter 1
Introduction
Earlier companies used to produce goods without considering the number of sales
and demand. For any manufacturer to determine whether to increase or decrease the
production of several units, data regarding the demand for products on the market
is required. Companies can face losses if they fail to consider these values while
competing on the market. Different companies choose specific criteria to determine
their demand and sales [1].
In today’s highly competitive environment and ever-changing consumer land-
scape, accurate and timely forecasting of future revenue, also known as revenue
forecasting, or sales forecasting, can offer valuable insight to companies engaged in
the manufacture, distribution or retail of goods[2]. Short-term forecasts primarily
help with production planning and stock management, while long-term forecasts can
deal with business growth and decision-making[1].
Sales forecasting is particularly important in the industries because of the limited
shelf-life of many of the goods, which leads to a loss of income in both shortage and
surplus situations. Too many orders lead to a shortage of products and still too
few orders lead to a lack of opportunity. Therefore, competition in the food market
is continuously fluctuating due to factors such as pricing, advertisement, increasing
demand from the customers[3].
Managers usually make sales predictions randomly. Professional managers, how-
ever, become hard to find and not always available (e.g., they can get sick or leave).
Sales predictions can be assisted by computer systems that can play the qualified
managers’ role when they are not available or allow them to make the right decision
by providing potential sales predictions. One way of implementing such a method is
to try and model the professional managers’ skills inside a computer program[4].
Alternatively, the abundance of sales data and related information can be used
through Machine Learning techniques to automatically develop accurate sales pre-
dictive models. This approach is much simpler. It is not prejudiced by a single sales
manager’s particularities and is flexible, which means it can adapt to data changes.
It has, however, the potential to overestimate the accuracy of the prediction of a
human expert, which is normally incomplete. For example, once companies used
to produce the products without taking into consideration the number of sales and
demand as they faced several problems. Since they don’t know how much to sell, for
any manufacturer to decide whether to increase or decrease the number of units, data
regarding the consumer demand for products is essential. If companies do not con-
sider these principles when competing in the market, they will face losses. Different
companies choose different parameters to determine their market and sales.
1
Chapter 1. Introduction 2
There are several ways of forecasting sales in which companies have previously
focused on various statistical models such as time series and linear regression, feature
engineering and random forest models to obtain future sales and demand prediction.
Time series contains data points that are stored over a fixed period and are used
to forecast the future. Time series is a collection of data points which are collected
in period at sequential, evenly spaced points. The most important components to
analyze are patterns, seasonality, irregularity, cyclicity.
Linear regression is a mathematical tool used to forecast past values. It can help
to determine the underlying trends and address cases involving overstated rates[5][6].
Feature engineering is the use of data on domain knowledge and the development
of features to make predictive Machine Learning models more accurate. It makes
for deeper data analysis and a more useful perspective[7]. A decision tree is a fun-
damental principle behind a model of random forests. The decision tree approach
is a technique used in data mining to forecast and classify data. The decision tree
approach does not provide any conceptual understanding of the issue itself. Random
forest is the more sophisticated method that allows and merges many trees to make
decisions. The random forest model results in more accurate forecasts by taking out
an average of all individual tree decision predictions.
The entire data set is usually divided into two parts, namely the training data and
the test data. Training data is a data that is used to train the model, and test data
is the data used to evaluate the trained model. A classical approach is 80-20 split,
stating that 80 percent of the data is used to train the model, and the remaining 20
percent of the data is used to test the model. But approaches like stratified K-fold
cross-validation are known to provide good results. There were many cross-validation
variants, such as simple k-folds, leave one out, stratified k-fold cross-validation, and
so on[8][9].
Objectives:
• Converting data into an appropriate form using various preprocessing tech-
niques for the implementation of Machine Learning algorithms.
• Finding critical features that will most influence sales of the product.
RQ1:
What are the critical features that influence product sales?
Motivation:
The motivation of this research question is to find critical features in the data that
can be useful while experimenting for RQ2 to build the Machine Learning model.
This will help us reduce computational power and improves the quality of the results.
RQ2:
What is the best suitable algorithm for sales and demand prediction using Machine
Learning techniques?
Motivation:
The critical features identified from RQ1 are used to develop the Machine Learning
model using different algorithms. These models are compared by using various met-
rics such as accuracy score, mean absolute error, and max error to select the best fit
model for the data.
1.1 Background
There are several methods for forecasting future demand for the goods and services
a business provides. The forecasts are used for planning production and business
activities, purchasing materials, inventory management, scheduling work hours, ad-
vertising, and often more across most industries. Traditional forecasting approaches
were primarily focused on experienced employee opinions or statistical analysis of
past data, but in recent years Machine Learning techniques have been implemented
with great success in this field.
experience E[14]. In general, Machine Learning is a program that can manage various
tasks by analyzing and exploring data[15].
Common Machine Learning applications such as email spam detection, credit card
fraud, stock predictions, smart assistants, product recommendations, self-driving
cars, sentiment analysis, etc.
Supervised Learning:
The most popular model for performing Machine Learning processes is supervised
learning. It is commonly used for data where the mapping between input-output data
is accurate. Supervised learning is the subset of Machine Learning which concentrates
on learning a model of classification or regression, that is, learning from labeled test
data[15].
Unsupervised Learning:
The data is not explicitly labeled into different classes in the case of unsupervised
learning that is there is only unlabeled data. By identifying implicit patterns the
model can learn from the data. Unsupervised Learning categorizes the densities,
structures, related segments, and other similar properties based on the data[16].
Reinforcement Learning:
Reinforcement Learning is a sub-field of Machine Learning. In a given scenario,
it is about taking appropriate action to optimize reward. Various algorithms and
computers are employed to determine the best possible action or path it will follow
in a specific scenario. Reinforcement learning varies with supervised learning in such
a way that the training data has the answer key with it in supervised learning such
that the model is trained with the correct response itself while in reinforcement
learning there will be no response but the reinforcement agent determines how to
Chapter 1. Introduction 5
execute the task. It is required to learn from its experience, in the absence of training
data[15].
Y = a + bX
where Y is the expected value of the dependent variable y for every specified
value of the independent variable X, a is the intercept, b is the regression coefficient
and X is independent variable.
Where the final classifier g is the amount of the specific classifiers fi. Each base
classifier is a simple decision tree for model boosted trees. This broad approach of
using multiple models is called model ensembling to achieve better predictive perfor-
mance. Unlike Random Forest, which independently builds all the base classifiers,
each using a subsample of data, GBRT uses a particular technique of assembly called
gradient boosting[20].
f (x) = x β + b
Previously a lot of sales and demand forecasting work was performed using Machine
Learning. Most of the work in this research will concentrate on the sales of food
items.
Due to the importance of forecasting in various fields, there are so many different
types of approaches taken previously, some of the methods such as Machine Learn-
ing models, hybrid models, and statistical models. To handle this work, some of
the statistical methods such as auto regressive moving average (ARMA) and auto
regressive integrated moving average (ARIMA) will be helpful[4].
İrem İşlek and Şule Gündüz Öğüdücü experimented with the use of bipartisan
graphic clusters that clustered different warehouses according to the sales behavior.
They addressed the application by applying the Bayesian network algorithm in which
they managed to produce the enhanced forecasting experience[23].
Grigorios Tsoumakas had used Machine Learning techniques to perform a survey
on the forecasting of food sales. They had addressed data analyst design decisions
such as temporal granularity, output variable, and input variables in this survey[4].
In this paper the authors experimented by taking the point of sale (POS) as internal
data and even external data by considering different environments to enhance the
efficiency of demand forecasting. They considered different Machine Learning algo-
rithms such as Boosted Decision Tree Regression, Bayesian Linear Regression, and
Decision Forest Regression for evaluation[24].
The paper’s authors had researched interestingly about customers coming to the
restaurants using Random Forests, k-nearest neighbor, and XGBoost. They chose
two real-world data sets from different booking sites and also made different input
variables from restaurant features. The results have shown that XGBoost is the most
appropriate model for the dataset[25].
Holmberg and Halldén had observed that regular restaurant sales to be influenced
by the weather. They considered two Machine Learning algorithms as XGBoost
and neural network, and the results showed that the XGBoost algorithm is more
accurate than the other algorithm, and they also found that they had improved
their model performance by 2-4 percentage points by taking weather factors into
consideration. To improve accuracy, they had considered numerous variables such as
date characteristics, sales history, and weather factors[26].
Most of the recent studies focused on sales modeling without considering the
relationship between the training and testing data, they used training data directly.
This causes many errors which lead to a reduction in accuracy. Recent studies
have suggested clustering techniques to separate the entire forecasting data into
8
Chapter 2. Related Work 9
In this thesis research questions are answered by using research methods. Research
aspects for this work are examined by the execution of the experiments.
3.1 Experiment
An experiment is chosen for the first research question i.e. correlation. Each data
attribute can be selected by applying feature selection methods like data correlation
and which will make the predictable attributes more accurate. This will reduce a
lot of strain on the Machine Learning model during pre-processing and cleansing
the data. For the second research question an experiment is chosen because the
experiments provide control over factors and a deeper understanding of many com-
mon research techniques such as a case study or survey[29]. One can describe the
procedure followed in this experiment as follows:
10
Chapter 3. Method 11
NumPy
NumPy is a library that consists of multidimensional array objects and a set of array
processing routines. NumPy is used along with SciPy and Matplotlib packages. This
combination is used for technical computing. Mathematical and logical operations
are performed with the help of NumPy[32].
Pandas
Pandas is a software library that is designed for manipulating the data and analysis
in a python programming language. It is open-source which is released under the
BSD license of three clauses. It is based on the Numpy package, and the DataFrame
is its main data structure[33].
Matplotlib
Matplotlib is a module of Python used to plot the attractive Graphs. Visual rep-
resentation in data science is a significant step. One can quickly understand how
data is split by using visual representation. There are many libraries to represent
the data, but the matplotlib is very widely known and easier to visualize[34].
SKlearn
Scikit-learn is a free python library. It features multiple clustering classification
and regression algorithms including random forests, DBSCAN, k-means, gradient
boosting, support vector machines, and gradient boosting which is programmed to
interface with the NumPy and SciPy libraries[35].
Seaborn
Seaborn is a open-source python library that is used for statistical graphics. It offers
a data set-oriented API to analyze relationships among different variables, as well as
resources to select color palettes that truly in the data[36].
which will have a major impact on the model’s performance. This will reduce a lot of
strain on the Machine Learning model during preprocessing and cleansing the data.
The data attributes chosen for training the Machine Learning model would have
a major impact on the efficiency of the model. Because of the irrelevant features
that are presented, the model output will be reduced. The feature selection method
provides an efficient way to remove data redundancy and irrelevant data that helps
to reduce computation time, improve accuracy, and also enhance understanding of
the model[37].
The selection of features plays a crucial role in classification and involves selecting
a subset of features that reflect the complete attributes that currently exist. Feature
selection techniques are intended to improve classification efficiency by selecting the
essential features from the data sets according to particular algorithms.
The heat map for correlation between non-numerical attributes is plotted as follows:
would be 0 on the test set in a properly fitted single-output regression analysis, and
while this would be extremely impossible in the modern world, this measurement
indicates the amount of error the model has when it was placed in[45].
M axError(y, x) = max(|yi − xi |)
Where Yi describes the actual values, Xi describes the expected values.
Simple Linear Regressor, Gradient Boosting Regressor, Random Forest Regressor and
Support Vector Regressor are trained with the set of data using a 10-fold stratified
cross-validation approach that dynamically selected the training and testing with
fixed proportion each time and the efficiency was calculated using max error, mean
absolute error and accuracy metrics.
19
Chapter 4. Results 20
The box plot in Figure 4.6 shows the Max Error(ME) obtained by Gradient
Boosting Regressor during a 10-fold stratified cross-validation test. The upper box
plot represents the maximum ME of 0.464, the middle quartile represents a median
ME of 0.441 and the lower quartile of the box plot represents a minimum ME of
0.425.
Support Vector Regressor during a 10-fold stratified cross-validation test. The upper
box plot represents the maximum MAE of 4.3507, the middle quartile represents a
median MAE of 3.1647 and the lower quartile of the box plot represents a minimum
MAE of 2.7238.
The box plot in Figure 4.9 shows the Max Error(ME) obtained by Support Vec-
tor Regressor during a 10-fold stratified cross-validation test. The upper box plot
represents the maximum ME of 0.4686, the middle quartile represents a median ME
of 0.4485 and the lower quartile of the box plot represents a minimum ME of 0.4343.
represents a median accuracy score of 87.72 percent and the lower quartile of the box
plot represents a minimum accuracy score of 78.31 percent.
The box plot in Figure 4.11 shows the Mean Absolute Error(MAE) obtained by
Random Forest Regressor during a 10-fold stratified cross-validation test. The upper
box plot represents the maximum MAE of 5.8058, the middle quartile represents a
median MAE of 4.458 and the lower quartile of the box plot represents a minimum
MAE of 3.4156.
The box plot in Figure 4.12 shows the Max Error(ME) obtained by Random
Forest Regressor during a 10-fold stratified cross-validation test. The upper box plot
represents the maximum ME of 0.6568, the middle quartile represents a median ME
of 0.6135 and the lower quartile of the box plot represents a minimum ME of 0.5964.
Figure 4.13 shows that the feature importance of price of the products would
depend primarily on the sales followed by the type of outlet and grocery store and
the rest of the features would not even close to these features. There will surely be
a huge impact on sales forecasting with these features.
Table 4.1 shows the comparison of evaluation results where Random Forest Re-
gression performed well with all the metrics accuracy score, Mean Absolute Error
and Max Error. Random Forest Regression had the minimum error in predicting the
sales when compared to the Simple Linear Regression, Gradient Boosting regression
and Support Vector Machine. Simple Linear Regression demonstrated the worst per-
formance with the highest error in all the metrics. A simplified tabular form based
on the results is created above.
Chapter 5
Analysis and Discussion
Figure 5.1 shows the average accuracy score of the 10-fold stratified cross-validation
obtained by the Simple Linear Regressor is 81.2 percent, followed by the Gradient
Boosting Regressor with 86.27 percent accuracy score, then SVR with the 84.82 per-
cent accuracy score and finally Random Forest Regressor with 87.72 percent accuracy
27
Chapter 5. Analysis and Discussion 28
score. From figure 5.1, it can be seen that Random Forest Regressor is the best per-
former with approximately 88 percent accuracy score compared to other methods,
and Simple Linear Regressor is the poor performer with an accuracy score of 81.2
percent.
Figure 5.2 shows the average MAE of the 10-fold stratified cross-validation ob-
tained by Simple Linear Regressor is 3.21 error, followed by Gradient Boosting Re-
gressor with 3.19 error, then SVR with 3.21 error, and finally Random Forest Re-
gressor with 3.15 error respectively. From figure 5.2, it can be shown that Random
Forest Regressor is the best performer with less error relative to other approaches
and with the highest error, Simple Linear Regressor is the poor performer.
5.2 Discussion
RQ1: What are the critical features that influence product sales?
Answer:
It was clearly observed in figure 4.13 that the price of the products and followed by
the type of outlet and grocery store will heavily influence the product sales.
RQ2:
What is the best suitable algorithm for sales and demand prediction using Machine
Learning techniques?
Answer:
Random Forest Regression is the most appropriate algorithm for forecasting the
product sales. When compared to the Simple Linear Regression, Gradient Boosting
Regression and Support Vector Regression, Random Forest Regression technique will
produce the least error while predicting the product sales.
Average accuracy score, mean absolute error and max error for the Random Forest
Regressor across the 10-fold stratified cross-validation is 87.72 percent, 3.15 and 0.44
error respectively which is quite impressive compared to other techniques. Simple
linear Regressor produced the very poor results compared to the other techniques,
Chapter 5. Analysis and Discussion 30
the average accuracy score, mean absolute error and max error across the 10-fold
stratified cross-validation is 81.2 percent, 3.21, 0.49 error which is the poor one. And
we can also observe from figure 4.13, Item price and outlet type grocery store are
the critical features that will mainly influence the product sales. If the sales forecast
is carried on every day across a large number of stores speed will play a key aspect
in this process. Another useful metric to train the model, which will also play a
crucial role while training several algorithms. The other important measure is the
time required to train the model, which will also play a critical role while training
different types of algorithms.
5.3 Contributions
As there are so many ongoing experiments that use statistical approaches and some
traditional methods to focus on predicting item sales. Most researches have experi-
mented by taking a single algorithm to predict sales. In this thesis Machine Learning
algorithms such as Simple Linear Regression, Support Vector Regression, Gradient
Boosting algorithm, and Random Forest Regression are considered for prediction and
the most effective metrics such as accuracy, mean absolute error, and max error are
considered for measuring algorithm efficiency. This method will be very beneficial in
the future for advanced item sales forecasting.
6.1 Conclusion
Sales forecasting plays a vital role in the business sector in every field. With the
help of the sales forecasts, sales revenue analysis will help to get the details needed
to estimate both the revenue and the income. Different types of Machine Learning
techniques such as Support Vector Regression, Gradient Boosting Regression, Simple
Linear Regression, and Random Forest Regression have been evaluated on food sales
data to find the critical factors that influence sales to provide a solution for forecasting
sales. After performing metrics such as accuracy, mean absolute error, and max error,
the Random Forest Regression is found to be the appropriate algorithm according
to the collected data and thus fulfilling the aim of this thesis.
31
References
[1] Patrick Bajari, Denis Nekipelov, Stephen P Ryan, and Miaoyu Yang. Ma-
chine learning methods for demand estimation. American Economic Review,
105(5):481–85, 2015.
[2] Kris Johnson Ferreira, Bin Hong Alex Lee, and David Simchi-Levi. Analytics for
an online retailer: Demand forecasting and price optimization. Manufacturing
& Service Operations Management, 18(1):69–88, 2016.
[3] Ankur Jain, Manghat Nitish Menon, and Saurabh Chandra. Sales forecasting
for retail chains, 2015.
[4] Grigorios Tsoumakas. A survey of machine learning techniques for food sales
prediction. Artificial Intelligence Review, 52(1):441–447, 2019.
[5] Xiaogang Su, Xin Yan, and Chih-Ling Tsai. Linear regression. Wiley Interdis-
ciplinary Reviews: Computational Statistics, 4(3):275–294, 2012.
[6] Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear
regression. Journal of the american statistical association, 83(404):1023–1032,
1988.
[7] Zheng Li, Xianfeng Ma, and Hongliang Xin. Feature engineering of machine-
learning chemisorption models for catalyst design. Catalysis today, 280:232–238,
2017.
[10] Chris Rygielski, Jyun-Cheng Wang, and David C Yen. Data mining techniques
for customer relationship management. Technology in society, 24(4):483–502,
2002.
[11] Krzysztof J Cios, Witold Pedrycz, Roman W Swiniarski, and Lukasz Andrzej
Kurgan. Data mining: a knowledge discovery approach. Springer Science &
Business Media, 2007.
32
References 33
[12] Maike Krause-Traudes, Simon Scheider, Stefan Rüping, and Harald Meßner.
Spatial data mining for retail sales forecasting. In 11th AGILE International
Conference on Geographic Information Science, pages 1–11, 2008.
[13] Stephen Marsland. Machine learning: an algorithmic perspective. CRC press,
2015.
[14] ML documentation. https://www.mathworks.com/discovery/
machine-learning.html). Accessed: 2020-04-22.
[15] Ethem Alpaydin. Introduction to machine learning. MIT press, 2020.
[16] Arvin Wen Tsui, Yu-Hsiang Chuang, and Hao-Hua Chu. Unsupervised learning
for solving rss hardware variance problem in wifi localization. Mobile Networks
and Applications, 14(5):677–691, 2009.
[17] Bohdan M Pavlyshenko. Machine-learning models for sales time series forecast-
ing. Data, 4(1):15, 2019.
[18] Taiwo Oladipupo Ayodele. Types of machine learning algorithms. New advances
in machine learning, pages 19–48, 2010.
[19] Sanford Weisberg. Applied linear regression, volume 528. John Wiley & Sons,
2005.
[20] Gradient Boosting documentation. https://turi.com/learn/userguide/
supervised-learning/boosted_trees_regression.html). Accessed: 2020-
05-19.
[21] JN Hu, JJ Hu, HB Lin, XP Li, CL Jiang, XH Qiu, and WS Li. State-of-charge
estimation for battery management system using optimized support vector ma-
chine for regression. Journal of Power Sources, 269:682–693, 2014.
[22] Wangchao Lou, Xiaoqing Wang, Fan Chen, Yixiao Chen, Bo Jiang, and Hua
Zhang. Sequence based prediction of dna-binding proteins based on hybrid
feature selection using random forest and gaussian naive bayes. PloS one, 9(1),
2014.
[23] İrem İşlek and Şule Gündüz Öğüdücü. A retail demand forecasting model based
on data mining techniques. In 2015 IEEE 24th International Symposium on
Industrial Electronics (ISIE), pages 55–60. IEEE, 2015.
[24] Takashi Tanizaki, Tomohiro Hoshino, Takeshi Shimmura, and Takeshi Take-
naka. Demand forecasting in restaurants using machine learning and statistical
analysis. Procedia CIRP, 79:679–683, 2019.
[25] Xu Ma, Yanshan Tian, Chu Luo, and Yuehui Zhang. Predicting future visitors
of restaurants using big data. In 2018 International Conference on Machine
Learning and Cybernetics (ICMLC), volume 1, pages 269–274. IEEE, 2018.
[26] Mikael Holmberg and Pontus Halldén. Machine learning for restaurant sales
forecast, 2018.
References 34
[27] I-Fei Chen and Chi-Jie Lu. Sales forecasting by combining clustering and
machine-learning techniques for computer retailing. Neural Computing and Ap-
plications, 28(9):2633–2647, 2017.
[28] Malek Sarhani and Abdellatif El Afia. Intelligent system based support vector
regression for supply chain demand forecasting. In 2014 Second World Confer-
ence on Complex Systems (WCCS), pages 79–83. IEEE, 2014.
[29] Jason Brownlee. Introduction to time series forecasting with python: how to
prepare data and develop models to predict the future. Machine Learning Mastery,
2017.
[31] Guido Van Rossum et al. Python programming language. In USENIX annual
technical conference, volume 41, page 36, 2007.
[32] Travis E Oliphant. A guide to NumPy, volume 1. Trelgol Publishing USA, 2006.
[33] Wes McKinney. Pandas, python data analysis library. see http://pandas. pydata.
org, 2015.
[35] Raul Garreta and Guillermo Moncecchi. Learning scikit-learn: machine learning
in python. Packt Publishing Ltd, 2013.
[37] Chung-Jui Tu, Li-Yeh Chuang, Jun-Yang Chang, Cheng-Hong Yang, et al. Fea-
ture selection using pso-svm. International Journal of Computer Science, 2007.
[38] Tao Zhang, Tianqing Zhu, Ping Xiong, Huan Huo, Zahir Tari, and Wanlei
Zhou. Correlated differential privacy: Feature selection in machine learning.
IEEE Transactions on Industrial Informatics, 2019.
[41] Kedar Potdar, Taher S Pardawala, and Chinmay D Pai. A comparative study
of categorical variable encoding techniques for neural network classifiers. Inter-
national journal of computer applications, 175(4):7–9, 2017.
References 35
[43] Sebastian Raschka. Model evaluation, model selection, and algorithm selection
in machine learning. arXiv preprint arXiv:1811.12808, 2018.
The graph below is a comparative analysis between the item sales and the item
weight.
36
Appendix A. Appendix 37
The graph below is a comparative analysis between the item sales and the item
visibility.
The graph below is a comparative analysis between the item sales and the item price.
The following graph illustrates the impact of item type on item outlet sales.
The following graph illustrates the impact of outlet establishment year on item outlet
sales.
The following graph illustrates the impact of outlet sizes on item outlet sales.
The following graph illustrates the impact of outlet location type on item outlet sales.