Inventory Optimization Project
Inventory Optimization Project
This thesis discusses the inventory management problem in the supply chain. A direct
supply chain consists of a company, a supplier, and a customer involved in the upstream
and/or down-stream flows of products, services, finances, and/or information [1]. In reality,
the flow of products between a company, a supplier and a customer is not instant due to the
shipping time. Thus, the company might keep some products in his warehouse to eliminate
the shipping time from the supplier that way minimizing the overall product shipping time
to the final customer. Keeping the right products and the right amount in the warehouse is
crucial for the company since the long delivery times is usually a deal breaker for the
customer buying a product. To do that the company needs to know what to order from the
supplier beforehand to keep up with the customer demand in the future.
Such problem exists for decades in the supply chain and there is a wide variety of
inventory management policies and algorithms [2]–[4]. The algorithms try to calculate what
to order from the supplier based on the order history of the customers. However, such
algorithms are generic and do not consider the case if the supplier applies order line fee.
Order line fee is a fixed fee which the company pays to the supplier per every order line.
For example, if the order line fee is 1€, no matter if the company orders 1 or 1000 units of
the product, it has to pay that fee to the supplier for ordering the product. This fee makes the
whole problem way more complex because the number of supplier orders has to be
minimized.
1.1 Problem
The problem of the project is inefficient supplier order and warehouse inventory
management in a supply chain with a supplier applying order line fees. This problem can be
divided in several subproblems presented as research questions:
• What products should be ordered from the supplier applying order line fees?
The cause of such the problem and its subproblems is inefficient warehouse inventory
management process. To solve the problem a new machine learning based inventory
management algorithm is created which solves the issues of the current process.
There is a little research done in machine learning application to the warehouse
inventory management and there is no found research for the case if the supplier applies
order line fees. Therefore, solving the problem would be a large contribution to the research
field as a whole.
The problem currently exists at the company ASWO Baltic. The company is selling spare
parts of various electronics and home appliances [5]. The company has a warehouse where
it stores the most frequently bought products to make them available for the customers
instantly. However, the company cannot store all the supplier products in stock due to the
limited warehouse size and storing expenses. Therefore, the unpopular products are ordered
directly from the supplier and have delivery time of few days. If the popular products run
out-of-stock they are ordered from the supplier as well.
ASWO Baltic has a supplier ASWO, which applies order line fees - the company pays a
fixed fee of few euros for every ordered product independently from the ordered quantity or
item value.
Optimizing supplier orders and products kept in stock is crucial to run a profitable
business. If stock and orders are managed badly, the company can lose money or even go
into bankruptcy. This can be illustrated in the simple example. Let us say there is a
customer who wants to buy a cable from the company. The company does not have it in its
warehouse so orders it from the supplier at the price of 1€ and sells it to the customer for
2€. It looks that the company made 1€ profit. However, it is only the case if there are no
supplier fees. If the supplier order line fee is 4€, the company pays to the supplier 1+4=5€
and receives from the customer only 2€, so the company loses 3€ by pursuing this deal.
Such deal is only profitable if the company orders 5 units or more and stores the others in
its warehouse for the future sales.
Efficient stock and order management reduces not only the fees, but also the storage
expenses. If the company orders the products without predicting the future customer
demand, it could cause long inventory turnover – there will be many products staying in the
warehouse for the long period of time. Because of long inventory turnover, the company
would need larger warehouse to store all the product. That would increase expenses spent
on rent, utility bills and staff, since the management of the larger warehouse would require
more human labor.
|3
Moreover, inefficient order management is bad for the environment. Usually, every
order is packed separately. As a result, reducing the number of orders by ordering more in a
single batch could reduce the materials used for packaging. In addition, lowering the
number of orders could also reduce the shipping costs, fuel consumption as well as carbon
emissions, since the orders are delivered less often.
Most importantly, having the right items at the right time is crucial for the high
customer satisfaction and sales. One study [6] showed that if the product is out-ofstock,
around 33% of customers buy the item in the other store or do not buy it at all. Another
study [7] found that 69% of customers are less likely to shop with a retailer for future
purchases if an item is not delivered within two days of the date promised. That shows the
importance of keeping the items in stock as well as delivering them fast and on time.
Finally, in the current company’s system, most of the decisions what to order from the
supplier are done manually. It is far from optimum since the company sells a few millions
different products and receives hundreds of orders per day. Such large amount of data is
impossible to be processed by a human efficiently.
1.3 Purpose
The purpose of the thesis is to lower fees paid by the company to the supplier and reduce
product delivery times to the final customer. As a result, this would increase the overall
customer satisfaction, sales, and profits. It would also lower the expenses and human
resources needed for the order and warehouse management. In addition, it would lower the
company’s environmental impact by reducing packaging waste and fuel consumption.
1.4 Goals
The main goal of this project is to create an algorithm which can predict when and which
products should be ordered from the supplier based on the customer order history. To
achieve the thesis purpose the algorithm should outperform the current company’s system.
1.5 Research Methodology
This section describes the high-level research methodology concepts used in the project.
The concepts described in this section are implemented by applying research methods and
techniques which are explained in Chapter 3 in more detail.
1.5.1 Method
Quantitative experimental research method is used in the project [8]. This research method
is chosen because an algorithm which optimizes the supplier orders depending on the
historical order data is being developed. The performance of various algorithms is measured
in metrics extracted from the resulting data. All the metrics are depended on each other, so
finding an optimal solution just looking at the problem is almost an impossible task to do.
Thus, the best way to solve such problem is to test many different algorithms and models,
experiment with their various parameters and that way find an optimal solution.
Apart from that, the experimental research method is selected because the project
involves the machine learning. A large amount of machine learning algorithms has
hyperparameters, which optimal values are usually impossible to calculate for the specific
data. Therefore, the only option is to experiment and test which hyperparameters’ values are
the optimal.
1.5.2 Approach
Deductive research approach is used in the project [8]. This approach is selected because
the company has a large historical dataset containing more than 360K received orders in the
period of more than 5 years. Running an algorithm using the historical data and extracting
the certain metrics from the results allow to validate the system and expect that it will still
be valid in the future.
|5
1.5.3 Strategy
Experimental research strategy is used in the project [8]. This research strategy is chosen
because during the testing of the algorithm all the factors and variables are controlled. This
is because the results of the algorithm depend only on the historical data and
hyperparameters which can be changed. Such environment allows to analyze the causalities
between variables and results as well as draw conclusions about such behavior.
1.5.4 Process
In this thesis the research method, approach and strategy are applied by following CRISP-
DM process (CRoss-Industry Standard Process for Data Mining) [9]. CRISP-DM process
consists of a cycle that comprises six stages:
2. Data understanding. The data understanding phase starts with the data collection
and proceeds with activities in order to get familiar with the data, to identify data
quality problems, to discover first insights into the data or to detect interesting
subsets.
3. Data preparation. The data preparation phase covers all activities to construct the
final dataset from the initial raw data.
4. Modeling. In this phase, various modeling techniques are selected and applied, and
their parameters are calibrated to optimal values.
5. Evaluation. At this stage, the model obtained are more thoroughly evaluated and
the steps executed to construct the model are reviewed to be certain it properly
achieves the business objectives.
This process is chosen because it is an industry standard used in many various scale
projects and companies. According to the survey [11] CRISP-DM was the most popular
methodology for analytics, data mining, or data science projects both in 2007 and 2014.
Using widely known and approved research process helps to avoid common mistakes and
prove research truthiness and legitimacy.
The whole project is conducted by following this process. This is also illustrated in the
thesis structure - Chapters 3 and 4 of the thesis are written by following the steps of this
process. Business Understanding, Data Understanding, Data Preparation, and Modeling
stages are detailly described in Chapter 3. Evaluation is described in Chapter 4. Deployment
is out of scope of this thesis and is left for the future research.
The algorithm described in the thesis focuses only on the quantities of the items without
considering other properties such as dimensions, value, availability, shipping or storing
conditions and others. As a result, some actions proposed by the algorithm might be
impossible, impractical, or unprofitable. Due to this limitation the algorithm should work as
a recommendation system rather than be selffunctioning since the actions recommended by
the algorithm should be reviewed before execution.
Challenges such as supply chain disruptions and bullwhip effect are not specifically
addressed by the algorithm. However, these concepts might still be learnt and addressed by
the machine learning algorithms used in the project.
Chosen methods and results of the project are highly dependent on the data. Even
though it is likely, there is no guarantee that the similar results are achieved if the different
data is used. In order to reproduce the results with the different data, it is recommended to
focus on the process described in thesis rather than the machine learning algorithms or their
hyperparameters chosen in this project.
Simulation done for the project evaluation does not consider all the real-world
scenarios. Supply chain disruptions, supplier being out of stock, and warehouse size are not
considered in the simulation. Thus, if the system is used in the reality the improvement
found by the simulation might be lower.
The thesis does not cover the deployment, visualization, or integration of the algorithm
to the current company’s infrastructure. Such work is left for the future.
Chapter 2 describes the relevant context and background in the supply chain area. In the end
of the chapter, the related works are reviewed.
In Chapter 4 the project is evaluated by running a simulation and doing execution time
as well as reliability analysis. The chapter ends with the discussion.
In Chapter 5 the conclusions are stated, and the future research opportunities presented.
7
10
Background |
2 Background
This chapter provides the background information about the supply chain. In the end of the
section the other related works are analyzed which are related to this thesis.
A supply chain is a network of organizations that are involved in the different processes and
activities that produce value in the form of products and services in the hands of the ultimate
consumer [12].
Three degrees of supply chain complexity can be identified: a “direct supply chain,” an
“extended supply chain,” and an “ultimate supply chain.” A direct sup-ply chain consists of
a company, a supplier, and a customer involved in the upstream and/or down-stream flows
of products, services, finances, and/or information. An extended supply chain includes
suppliers of the immediate supplier and customers of the immediate customer, all involved
in the upstream and/or downstream flows of products, services, finances, and/or information.
An ultimate supply chain includes all the organizations involved in all the upstream and
downstream flows of products, services, finances, and information from the ultimate
supplier to the ultimate customer [1]. This is illustrated in Figure 2.
The work done in this thesis considers a direct supply chain only.
| 11
There are many different definitions of supply chain management [1]. Generally, it can be
seen as the management of upstream and downstream relationships with suppliers and
customers in order to deliver superior customer value at less cost to the supply chain as a
whole [12]. Supply chain management is a wide concept which includes many areas such as
logistics, inventory management, demand management, manufacturing, procurement and
many more.
Another situation which largely disrupted the global supply chain was the blockage of
Suez Canal in 2021. While sailing through the canal, a massive container ship Ever Given
lost control due to the strong wind and became stuck sideways blocking the entire passage.
Around 12% of total global trade goes through Suez Canal and it was estimated that each
hour while the canal was blocked cost 400 million dollars to the global economy. The ship
was freed in 6 days and during that period the blockage caused price changes and shipping
delays across various industries [14].
The aforementioned examples were major supply chain disruption events in the recent
years. However, there might be various other types of events which effect supply chains
such as strikes [15], weather [16], political decisions [17] and others. Such events are usual,
yet they cannot be predicted. Thus, the needed adjustments must be made as fast as possible
to reduce their impact to the final customers.
In the long history of the inventory management there has been a wide variety of methods,
which were defined and used across the industry. This section describes two popular
inventory management methods – Periodic Review System and Min/Max.
• At the regular time intervals (ex. – 3 days, two weeks, etc.), the inventory
level is reviewed. This new inventory level is called I.
RS = D + SS
D - average demand during the reorder period plus the replenishment lead time (if there
is a delay getting new products in).
The main advantage of Period Review System is its simplicity. However, it has many
disadvantages. Firstly, in this method the supplier orders do not depend on the demand. For
example, if the demand increases and the item runs out of stock, the supplier order is not
executed until the next review time comes. Secondly, such
11
model creates many small supplier orders. This is because every item, which is below the
restocking level RS (at least by 1), is ordered from the supplier, even though the current
inventory would be enough until the next review. For our case this is not acceptable, since
every line in the supplier order has an additional fee and such model would increase the total
fees paid to the supplier tremendously.
This method tracks the current total stock level, which is typically the sum of the stock-
on-hand plus the stock-on-order for every single item. When total stock reaches the Min
value, a reorder is triggered. The reorder quantity targets the Max value for the new total
stock level; hence the reorder quantity is the difference between Max and Min [2].
Min/Max solves the problem of the unnecessary supplier orders. Orders also depend on
the supply since the order is executed when Min level is reached not waiting until the next
review cycle. However, the biggest challenge for Min/Max method is calculating Min and
Max levels.
For any Min/Max system, you should set the Max and Min levels high enough to avoid
stockouts, yet low enough so you do not increase the risk of expiration or damage. You must
set a Min level high enough to ensure that the facility never completely runs out of stock. At
the same time, you must still set the Max low enough to ensure that space in the storeroom is
adequate and that the stock does not expire before it can be used [18].
Setting the correct Min/Max levels is a very difficult task which is crucial for the
business success. The levels are very hard to calculate correctly because the future demand
needs to be predicted. For that case machine learning algorithms are employed in this thesis.
One of the major areas discussed by this thesis is supplier order line fee. This section
analyzes the background of such fee and why it is applied by the supplier.
| 15
Every supplier having a warehouse has the certain warehouse processes. One of the main
processes used in warehouse is a picking process. During this process, the worker processes
the order by picking the items, packing them, and shipping. If the discrete picking process
[19] is used, the warehouse operator has to take the order, walk to the first item’s location,
pick the needed amount of items and repeat this until all the order items are collected. After
that, the order is packed and shipped to the customer. Such process is illustrated in Figure 5.
A warehouse might use other picking processes [19] or add additional steps to this
process like barcode scanning or label printing but the general idea remains the same – most
of the time while executing the process is spend on going from one item’s location to the
other’s and picking the items. For example, a regular worker in Amazon can walk around 17
miles during a shift by picking items [20]. As a consequence, the amount of human labor
needed to fulfil the order is proportional to the number of different items in the order, since
the worker has to walk from item to item.
Human labor is expensive and most importantly limited resource in the warehouse. If
there is a large number of orders of different items, the warehouse management cannot just
simply employ enormous number of people to fulfil them, since there is a limited amount of
working space. Therefore, each warehouse has a certain limit of orders it can process, and
that threshold cannot be increased just by employing additional staff.
Some warehouses increase that threshold by using robots and automations. A good
example of this could be Amazon. In some warehouses walking workers were
13
replaced by robots. Instead of walking to the shelf and picking the item, the robots bring the
entire shelf to the worker who picks the item. That way the number of items taken by each
worker increased 3-4 times and in the same way the employees no longer needed to walk
long distances during their workday [21]. However, even though the total number of orders
which can be handled by the warehouse increased, it still remained limited. This is due to
16 | Background
the limited number of robots and employees who can be fitted in the warehouse. Thus, the
number of orders should be controlled to not overcome the warehouse limits.
The unique items order control discussed in this thesis is an order line fee. As earlier
described in Introduction, order line fee is a fixed fee which the company pays to the
supplier per every order line, not depending on the order size. For instance, if the company
buys 3 products at unit price 2€ and order line fee is 3€, the company pays to the supplier in
total 3 x 2€ + 3€ = 9€. Such fee allows the supplier to control the number of unique items
orders by initiating its customers to order more units of the item in a single order.
Bullwhip effect or whiplash effect is a common phenomenon where the orders to the
supplier tend to have larger variance than sales to the buyer and the distortion propagates
upstream in the supply chain in an amplified form [22]. There are many causes of the
bullwhip effect but the major one is demand signaling which can be solved by sharing stock
and order information across various supply chain members and reducing lead time [22] –
the time period between order initiation and receiving of the items [23]. The bullwhip effect
can be observed in a real supply chains [24] or in “Beer Distribution Game” [25].
“Beer Distribution Game” is a boardgame simulating a supply chain. During the game,
every player (or a group of players) plays a role of a retailer, wholesaler, distributor, or a
factory. During every round, the retailer draws a card from the pile with the number
representing the end customer order. If the retailer has enough bottles of beer in stock,
he/she fulfils the customer order and places the order for his/her neighbor - the wholesaler.
The wholesaler tries to fulfil the retailer’s order and places the order for the distributor.
Distributor does the same and creates the order for the factory. Manufacturer tries to fulfil
the distributor’s order and orders the materials for future production. Between every order
and delivery there is a certain delay representing the shipping time. The goal of each player
is to fulfil as many orders received from the neighbor while keeping as less items in stock as
possible [25].
During the play, the factory’s orders fluctuate by a huge margin, even though the
retailer’s orders change a little. This is an example of the bullwhip effect. The changes in
orders can be seen in Figure 6.
| 17
Figure 6: Changes in order size of each role while playing "Beer Distribution Game"
[26]
However, the bullwhip effect can be eliminated if “Beer Distribution Game” is played by
the computers. A study showed that by empowering a genetic machine learning algorithm
the computer managed to track demand, eliminate the bullwhip effect, discover the optimal
policies under complex scenarios [26].
This section describes the related research to the future demand forecasting which is done in
the thesis. The described works analyze the usage and performance of various machine
learning and statistical models for the future demand forecasting task.
when the different supply chain entities (company, supplier, customer) do not share any
information about their stock levels and orders is considered and the impact of information
sharing is left for the future research.
The paper “Inventory Optimization in Efficient Supply Chain Management” [28] written by
S.R. Singh and Tarun Kumar dicusses the genetic algorithm used for calculation of optimal
excess and shortage stock levels. The algorithm is implement using MATHLAB and proved
to be sufficient by comparing the results with the data from the past.
The research “Demand Forecasting using Neural Network for Supply Chain Management“
[29] conducted by Ashvin Kochak and Suman Sharma observes the performance of
Artificial Neural Network algorithm implemented in MATHLAB. The study used 4 years
sales data from the fuel filter distributor. Different backpropagation training methods are
tested: Batch Gradient Descent, Variable Learning Rate, Conjugate Gradient Algorithms and
Levenberg-Marquardt algorithm. The research found that the performance of Levenberg-
Marquardt algorithm was more effective and reliable compared to the others tested for that
case.
The literature survey done by Agnieszka Lasek, Nick Cercone and Jim Saunders in the paper
“Restaurant Sales and Customer Demand Forecasting: Literature Survey and Categorization
of Methods” [30] describes various methods used for restaurant sales and demand
forecasting as well as revenue management. The methods of Multiple regression, Poisson
regression, Box-Jenkins model (AR, MA, ARIMA), Exponential smoothing and Holt-
Winters models, Artificial neural networks, Bayesian network model, Hybrid model, and
Association rules are described clearly providing references and usage examples as well as
advantages and disadvantages of each of them.
| 19
2.7 Summary
Supplier order line fee discussed in the thesis is a fixed fee paid to the supplier per every
order line, not depending on the order size or number of units bought. The reason of this fee
lies in the warehouse processes.
Inventory management methods Period Review and Min/Max can help to manage the
inventory and solve some supply chain vulnerability problems. However, the methods by
themselves do not solve the problem since their parameters have to be optimally chosen for
the methods to work efficiently.
Various research has been done in the field of the future demand forecasting, which
showed that the best performing algorithms for such task are Support Vector Machine,
Artificial Neural Network, and genetic algorithms.
20 | Background
17
22
Building an algorithm |
3 Building an algorithm
This chapter describes the research process stages, activities, methods and tools used to
build an inventory management algorithm. Firstly, the hardware and technologies are
described. Later, Business Understanding, Data Understanding, Data Preparation, and
Modeling stages of CRISP-DM process are detailly described in the appropriate sections.
Python [31] programming language versions 3.8 and 3.9 are used for the project
implementation. This language is selected due to the large number of libraries available for
machine learning, data processing and visualization. The following libraries are used in the
project:
Jupyter Notebook [37] and JetBrains PyCharm [38] were used for code editing and
execution. Pip [39] and Anaconda [40] tools were used for dependency management and
Git [41] was used for the version control.
For the development and testing of the project ASUS N551VW laptop running Windows 10
Pro was used. Its specifications are listed below:
• RAM: 16 GB DDR4
During the business understanding stage the business needs, problem and background are
analyzed. In this phase the existing company processes and systems are reviewed and their
drawbacks understood. In this project the business understanding stage has been done by
doing interviews with the company’s employees by asking questions about the problem and
its background. Later, the existing company’s systems and data has been analyzed. The
exact questions asked in the interviews, or the methods used for system analysis are not
mentioned, since there are out of scope of this thesis. Chapter 1 and 2 provide the relevant
information which has been found out during this phase and they can be considered as the
results of the completed business understanding stage of CRISP-DM process.
During the business understanding phase it was found out that the company has the
following data relevant for this project: the history of customer orders, the history of
supplier orders and the history of inventory changes.
Every customer order is received from the ecommerce system in a form of automatically
generated email to the specific mailbox. The order information is stored in the attached text
file in CSV format [42]. In the current company’s system when a new email arrives, the
CSV file is automatically imported into the internal company’s Microsoft SQL database for
further processing. The internal database also stores the history of the stock quantity
changes for each item and supplier orders history. The data from the internal database was
exported and used for the project. The data has more than 360K rows and 28 columns.
However, not all data columns are relevant for the project. The relevant values and their
types are described in Table
1.
Table 1: Customer orders and inventory changes data types use in the project
date date Date and time when the order was placed
24 | Building an algorithm
The exploratory data analysis [43] are used in order to understand the data, its structure
and patterns. The primary aim with exploratory analysis is to examine the data for
distribution, outliers, and anomalies to direct specific testing of your hypothesis. It also
provides tools for understanding the data usually through graphical representation [43].
That knowledge helps to decide on which machine learning algorithms to use for the project
and their parameters. It also allows to identify missing or corrupted data, which helps to
decide what exact data preparation steps are needed.
The exploratory data analysis is performed by drawing plots and charts which allow to
understand the data from the different perspectives and identify its patterns. The following
subsections present the data distribution from the various angles.
By counting the number of orders made in each day and plotting the data throughout the
whole period (Figure 7), no correlation between different months or seasons are found.
There is an overall increase in orders, which happens due to the company expansion. The
plot illustrates the usual drops, which happen during the weekends or national holidays.
21
Order distribution by the weekday is illustrated in Figure 9. A high decrease in orders can
be seen during the weekend – Sunday has ~27 times less orders compared to Monday. The
difference is also visible throughout the workdays Monday and Thursday have more orders
compared to other workdays. This can be explained by the fact that on these days the
products are shipped from the supplier. As a result, customers do more orders on these days
to receive the goods as fast as possible.
26 | Building an algorithm
Figure 10 plots the distribution of number of items having certain amount of orders on the
logarithmic scale. From the plot it can be observed that there is a high number of items
which have a low number of orders and a low number of items which have a lot of orders.
The number of items having certain amount of orders decreases in a double exponential
way.
Such behavior can be seen in Figure 11 more clearly. In the figure all the items are
divided in the groups depending on the number of orders they have. Around half of items
have a single order, around a quarter of the items have 2 or 3 orders and so on. Even the
group size is increased exponentially, the number of items in the group decreases
exponentially. Thus, the distribution of orders between items is double exponential.
| 27
Figure 11: Distribution of number of items between item groups having certain
amount of
orders
23
However, as Figure 12 illustrates, the number of total orders between the items which
have certain amount of orders first decreases in the double exponential way and later
changes to the linear increase. This is due to the fact that usually there is just a single item
which has a certain high number of orders. This is also illustrated in Figure 13. If compared
to Figure 11, it shows more stable distribution.
To conclude, most of the items have a low number of orders, however most of the
orders come from the small amount of items. For example, around 26% of all items have 4
or more orders and these items produce 81% of all orders. This applies for a widely known
Pareto principle [44] or 80/20 rule which is observed in many real world scenarios.
28 | Building an algorithm
Figure 12: Distribution of number of orders between items having certain amount of
orders
Figure 13: Distribution of number of orders between item groups having certain
amount of orders
| 29
Figure 14 represents the distribution of orders between customers. There are many
customers having a single or a couple of orders, but this number quickly decreases by
increasing the number of orders per customer and remains almost stable.
25
This behavior can also be observed in Figure 15 where the customers are divided into
groups depending on the number of orders made. Even though the group sizes change
exponentially, the number of customers per group varies a little.
30 | Building an algorithm
Figure 15: Distribution of number of customer between customers’ groups having
certain amount of orders
However, the total number of orders made by each group increases exponentially. This
is illustrated in Figure 16. If the customers having more than 500 orders are taken, Pareto
principle [44] can be applied since around 16% of customers produce 85% of orders.
Data preparation is divided into three main activities: missing values cleaning, feature
engineering and data normalization.
Missing values cleaning depending on the data type is done using one of two methods:
deletion or previous data imputation.
Deletion method is used for the data where restoring corrupted information is not
possible. Using the deletion method, the entire data row is deleted if it has empty values in
one of the following columns: sku, customer_number, price, quantity, date. In total 625
rows were deleted.
Previous data imputation method is used to insert the previous known value. Previous
value imputation was used for columns: stock, stock_yesterday, stock_supplier,
stock_supplier_available. 233K previous values were inserted for stock column and 274K
for stock_yesterday column. stock_supplier and stock_supplier_available columns have no
missing values.
| 31
Feature engineering is the science of extracting more information from the existing data.
Feature engineering itself can be divided in 2 steps: variable transformation and variable
(feature) creation. These two techniques are vital in data exploration and have a remarkable
impact on the power of prediction [45].
Feature creation is done for date variable. Year, month, day, day of the year, week of
the year, weekday, quarter, month end and start indicators are extracted as separate
variables. After that the categorical variables year, month, weekday, quarter are converted
to indicators to the separate columns. All the orders are grouped by the product and
additional variables total_ordered_quantity, number_of_orders, unique_customers are
generated for each item. Also, for each item and day in the observed history, the perfect
Min and Max stock levels are calculated. Min level is taken as the sum of ordered quantity
in the following 4 days and Max level as the sum of ordered quantity in 90 days.
27
32 |
Building an algorithm
As of now the data structure consists of 5-year time series data of each item. However,
such data structure cannot be used for machine learning because every data point in the time
series represents only a single day. This is because all the machine algorithms selected for
this project (except ARIMA and Prophet) consider each data point separately and cannot
build historical relationship with the previously provided data. As a workaround to such
problem, additional columns are added to every data point with the data of the previous 90
days. In such way the algorithms can build the relation with the historical data since it is
provided in the same data row. Additionally, data transformation is done by calculating
difference, sum, mean, median, standard deviation, variance of each attribute in the interval
range 2-90 days and adding them as additional columns. The final dataset contains 6992
columns for every item and day.
Finally, data normalization is done which attempts to give all attributes an equal weight.
Normalization is particularly useful for classification algorithms involving neural networks
or distance measurements such as nearest-neighbor classification and clustering. Data
normalization is done by using MinMax scaling method. It performs a linear transformation
on the original data so that the range of features scales to the range of [0, 1] [46].
3.5 Modeling
• If the current stock of the item is below Min level, the supplier order needs to be
executed to fill the stock up to the Max level.
| 33
• If the predicted Min level is more than 0 and the item is not kept in stock, the
item should be put in stock.
• If the predicted Max level is 0 and the item is in stock, the item should no longer
be kept in stock.
As described earlier in the thesis, the biggest challenge for Min/Max method is
calculating Min and Max levels. Currently, in the company these levels are static and
changed manually in rare occasions. Such system is inefficient since it does not follow
trends in the customer demand. The improvement proposed by the thesis is to use the
dynamic Min/Max levels which are predicted based on the historical data using the machine
learning algorithms.
Having the historical data makes it possible to calculate the most optimal Min and Max
levels on the certain day based on the orders which happened in the following days. The
calculated levels are provided to the machine algorithms as target values for prediction. The
other data generated in the data preparation phase is provided as inputs.
It was decided to build a different prediction model for every product. Such decision was
made because building a single model which fits all products without sacrificing
performance is much more difficult task since every product follows different sale trends.
The performance of machine learning models deeply relates to the number of data points
used for learning. Therefore, building a model for items with the low number of orders
would result in a low model performance. Additionally, machine learning is a
computationally expensive task, so building and testing the models for high number of items
would take very long time. As a consequence, only the items having more than 200 orders
are used for building and evaluating the models. As observed in Data understanding section,
there are 111 of these items and they produce around 10% of all orders.
Min and Max stock levels prediction are two different tasks, and the performance of the
specific algorithm might vary depending on whether it is a Min or Max level prediction.
Thus, all the algorithms are separately tested for predicting both Min and Max level.
During the project Support Vector Regression, Random Forest, Artificial Neural
Network, k-nearest neighbors, Prophet and ARIMA algorithms are tested. All the algorithms
are tested with the initial hyperparameters. The performance of the algorithms are calculated
by using cross validation and symmetric mean absolute percentage error metric [47].
The models for every item are evaluated by using 3 step cross-validation on a rolling
basis. The cross-validation works as follows: start with a small subset of data for training
purpose, forecast for the later data points and then checking the accuracy for the forecasted
34 |
data points. The same forecasted data points are then included as part of the next training
dataset and subsequent data points are forecasted [48]. The process of cross-validation data
splits in each step is visualized in Figure 17.
29
Building an algorithm
The metric used for model evaluation is symmetric mean absolute percentage error
(sMAPE). One of the downsides of this metric is that the error increases tremendously if the
true value is zero. To overcome this limitation the true scaled Min and Max values are
increased by 1 to be in the range [1, 2].
The average Min/Max stock levels prediction errors of items having more than 200
orders are presented in Table 2. Figure 18 and Figure 19 present error ranges between
different runs.
| 35
31
Building an algorithm
Hyperparameters are used to configure various aspects of the learning algorithm and can
have wildly varying effects on the resulting model and its performance. Hyperparameter
search is commonly performed manually, via rules-of-thumb or by testing sets of
hyperparameters on a predefined grid [49]. Hyperparameter for this project optimization was
done for k-nearest neighbors and Random Forest algorithms using grid search method. Grid
search method brute forces all the combinations of hyperparameters in the given range and
finds the best performing parameters.
Tested values for each algorithm are shown in Table 3 and Table 4. Every
hyperparameter of k-nearest neighbors algorithm is explained in more detail in [50] and
hyperparameters of Random Forest algorithm are explained in [51].
Table 3: Tested hyperparameters of k-nearest neighbors algorithm
n_neighbors 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39,
41, 43, 45, 47, 49, 51
p 1, 2
n_estimators 10, 20, 40, 75, 100, 200, 400, 500, 1000
The pipeline of data preparation, model creation and cross-validation for a single item
takes around 16 seconds for k-nearest neighbors algorithm and around 17 seconds for
Random Forest algorithm. From Table 3 and Table 4 it can be seen that there are 100 (25 x 2
x 2) hyperparameters’ combinations for k-nearest neighbors algorithm and 60 (10 x 2 x 3)
combinations for Random Forest algorithm. As a result, completing hyperparameter
38 |
optimization for 111 products would take around 111 x 100 x 16 = 177 600 seconds ~ 49
hours for k-nearest neighbors algorithm and around 111 x 60 x 17 = 113 220 seconds ~ 31
hours for Random Forest algorithm. As a workaround to such computationally complex
problem, hyperparameter optimization was done for a single product rather than all. The
optimization for a single item takes around 100 x 16 = 1600 seconds ~ 27 minutes for k-
nearest neighbors algorithm and 60 x 17 = 1020 seconds = 17 minutes for Random Forest
algorithm. The item used for hyperparameter optimization was selected as a median item in
the list of products which have more than 200 orders sorted by the total number of orders.
Optimizing the algorithms for the median item in terms of the number of orders should
result in the optimal performance both for items which have high and low number of orders.
The best performing values found during hyperparameter optimization are marked in
bold in Table 3 and Table 4. Performance optimization of k-nearest neighbors algorithm
improved performance of Min stock level prediction for the median item from 5.8809% to
5.6970% sMAPE. The average sMAPE of Min stock level prediction for all items which
have more than 200 orders decreased from 7.0079% to 6.8730%. The optimization of
Random Forest algorithm improved the performance of Max stock level prediction for the
median item from 13.5160% to 10.9788% sMAPE. The average sMAPE of Max stock level
prediction for all items which have more than 200 orders decreased from 15.0303% to
14.6813%.
33
40
Results and Analysis |
This chapter describes the evaluation methods and techniques used in the project as well as
results. The chapter starts with the description of the simulation. Later execution time and
reliability analysis are presented. The chapter ends with the discussion of the achieved
results.
4.1 Simulation
Evaluation is done by simulating a real usage of the algorithm throughout the year of 2020
for items which have more than 200 orders. The simulation goes through every Tuesday
and Thursday (the days the company executes supplier orders), takes the known customer
order history until that day and creates two prediction models for every item: one for Min
stock level prediction and one for Max stock level prediction. After that, the models predict
the next Min and Max stock level values. If the current stock is below Min level, a supplier
order is created to fill the stock up to the Max level. The simulation algorithm is presented
in Figure 20.
CO2 emissions per year 393 tons 262 tons 131 tons ~ 131
trees planted
| 41
To prove that the proposed algorithm performs better than the current company’s system
the following evaluation metrics are defined based on which the comparison is done:
• the number of backorders in the given period (the lower, the better)
• the total number of supplier orders in the given period (the lower, the better) •
inventory turnover in the given period (the shorter, the better)
These performance metrics represent the business expenses – backorders (orders which
cannot be executed immediately due to the shortage in stock) reduce the customer
satisfaction; the supplier orders cause additional fees; the long inventory turnover increases
storage expenses. The selected metrics are dependent on each other. It is easy to optimize
the results for a single metric, but it is very hard to do that for all at once. For example, to
have a low number of backorders the large number of products can be kept in the inventory.
However, that would result in very long inventory turnover. To change that, the low number
of products can be kept in the warehouse and ordered from the supplier more frequently, but
that would increase the number of supplier orders and fees paid to the supplier and so on.
The metrics of the current system are calculated from the historical supplier orders and
stock data. The metrics of the suggested algorithm are calculated from the stored simulation
data of supplier orders and stock changes.
The simulation treats all the items equally not depending on their price or value.
Therefore, the value of the products is not included in the inventory turnover calculation,
and it is considered that all the items have equal value of 1€. Due to this simplification the
inventory turnover of one year period is calculated by the following formula:
Backorders are calculated by counting the number of orders which were not fully fulfilled
due to the shortage in stock.
37
| 43
The first simulation was run with the model built from data with the perfect Max stock level
calculated as sum of sold quantity in the next 90 days. However, such Max level calculation
resulted in the average inventory turnover of 47.1 days. It is way larger compared to the
current company’s inventory turnover of the items having more than 200 orders which is
33.2 days. Since all the metrics are dependent on each other such large difference in
inventory turnover cannot prove that the proposed algorithm improves the current solution
because it is unknown how much supplier orders and backorders would decrease if
inventory turnover were increased in the current system. Furthermore, increasing inventory
turnover in practice is a difficult task because it means the company needs to store more
goods for the longer period of time and that might require warehouse expansion.
To prove that the algorithm improves the current company system, inventory turnover
should match the current inventory turnover. To do that the second simulation was run with
the model built from the data with the perfect Max stock level calculated as sum of sold
quantity in the next 60 days. This resulted in inventory turnover of 33.7 days which differs
from the current system only by 1.5%. However, the supplier orders decreased by around
36% from 1150 to 738. The number of backorders decreased by around 49% from 769 to
390. The results of the simulations are also presented in Table 5.
Table 5: Simulation metrics compared to the current system
Inventory
33.2 days 47.1 days (+41.86%) 33.7 days (+1,5%)
turnover
Supplier
1150 557 (-51,57%) 738 (-35,83%)
orders
To predict the next Min and Max stock levels for each item the algorithm takes the following
steps:
The average execution time of each step for a single item is displayed in Table 6. Table
6: Execution time of each algorithm step
Total 12 611
The data shows that the longest step is historical data preparation which takes more than
12 seconds. All the other steps (building the models and predicting Min and Max stock
levels) combined take around 400 milliseconds. In total the execution of the algorithm for a
single item takes around 12.6 seconds. Running a simulation for 111 items for the period of
one year took around 40 hours.
Data preparation takes relatively long time because of not optimized feature creation
operations. This shows a lot of room for improvement because code optimization could be
done to reduce the computational complexity of this step. Other steps of the algorithm are
harder to optimize. This is because the project uses the popular open-source
implementations of the machine learning algorithms which are already optimized.
Reliability analysis of the algorithm is done by testing if the algorithm manages to find
patterns for the items with increasing, decreasing, or fluctuating demand. To do that, 3
different items are tested representing each case.
Figure 21 presents the predicted Min level (presented as solid blue line) and the perfect Min
level (presented as dashed orange line) of the item with the increasing
| 45
39
demand throughout the year of 2020. Even though the data is noisy, the algorithm still
manages correctly predict some dumps in the demand.
Figure 22 compares the predicted and the perfect Max levels of the same item
throughout the year of 2020. It can be easily recognized that the demand of the item is
increasing, and the algorithm correctly predicts the overall direction.
Figure 21: Min level prediction of the item with the increasing demand
Figure 22: Max level prediction of the item with the increasing demand
46 | Results and Analysis
Figure 23 displays the perfect and the predicted Min level of the item with the decreasing
demand throughout the year of 2020. Even though the item has small number of orders, the
algorithm predicted most of the sudden increases.
Figure 24 plots the perfect and predicted Max level of the same item throughout the year
of 2020. Despite that the algorithm misses to predict the correct values almost throughout
the entire period, the overall direction of decreasing demand is predicted correctly.
Figure 23: Min level prediction of the item with the decreasing demand
Figure 24: Max level prediction of the item with the decreasing demand
| 47
41
Figure 25 shows the perfect and the predicted Min level of the item with the fluctuating
demand. Most of the sudden increases predicted by the algorithm are either too early or too
late. However, the predicted values are in the same range as the real values.
The perfect and predicted Max levels of the same item are presented in Figure 26. It can
be seen that the algorithm with the small relative error correctly predicted most of the
demand increases and decreases.
Figure 25: Min level prediction of the item with the fluctuating demand
Figure 26: Max level prediction of the item with the fluctuating demand
48 | Results and Analysis
4.4 Discussion
Min and Max level prediction using machine learning algorithms improved the current
company’s system. The simulation showed that the algorithm lowered the number of
supplier orders by around 36% (from 1150 to 738) and the number of backorders by around
49% (from 769 to 390) with the minimal increase in inventory turnover. If the company had
used the algorithm throughout the year of 2020, it would have saved (1150 – 738) x 5€ =
2060€ on supplier order line fees considering the supplier order line fee of 5€.
However, the simulation was run only for the top-selling 111 items which have more
than 200 orders and receives around 10% of all customer orders. Such limitation was done
due to the computational complexity of the algorithm and simulation – simulating a single
item through a single day takes around 12 seconds. As a consequence, simulating 1120
items having more than 50 orders through the period of one year would take around 16
days. Nevertheless, if the algorithm produces the same results for the items which have
more than 50 orders, the saving on supplier fee could be much higher. Throughout 2020 the
company made around 32K supplier orders of items which have more than 50 customer
orders in total. If the same improvement is made by the algorithm as it is for items which
have more than 200 orders, the supplier orders could have been reduced by around 12K,
which would have resulted in around 60K Euros of annual savings.
As much as the lower number of supplier orders, the reduced number of backorders is
also important for company’s income and sales. A study [7] showed that 69% of customers
are less likely to shop with a retailer for future purchases if an item is not delivered within
two days of the date promised. Knowing that each backorder usually results in a late
delivery, the reduction of backorders by 49% across 111 top-selling items should result in
increase in sales of such items by 24%.
Moreover, the number of shipments from the supplier might also be reduced. This is
because the simulation simulated the delivery of goods from the supplier twice a week and
still outperformed the current system where the products are delivered 3 times a week. If
additional shipment is eliminated, it would result in huge saving on shipping expenses and
would lower carbon emission. A supplier warehouse is around 1400 km away and each
roundtrip burns around 1000 liters of fuel and emits around 2520 kg of CO2 [52]. If such
weekly shipment is removed, it would result in 131 tons of CO2 saved annually, which is
equal to around 131 trees planted annually and not cut for 100 years [53].
| 49
43
51
Conclusions and Future work |
This chapter presents the conclusions of the project and future work opportunities.
5.1 Conclusions
The project addressed an inefficient inventory management problem in the supply chain
with the supplier applying order line fees. The problem was divided into subproblems:
when the supplier order should be executed, what products and how many units should be
ordered and what products should be kept in stock.
To tackle this problem, the experimental research method and CRISP-DM process was
chosen. The process fit the project needs well and following the process steps helped to
avoid common mistakes and pitfalls.
During the business understanding phase it was found out that the order distribution
between items in the dataset is double exponential – the high number of orders come from
the small number of items and the majority of items have a small number of orders. Thus,
only the data of the items which have more than 200 orders were used for model creation
and evaluation.
The data preparation was done by doing feature engineering and data transformation.
Even though the step was completed, the number of generated additional features could be
reduced and the whole data preparation process optimized.
The simulation was run to compare the proposed algorithm with the current company’s
system. It showed that for the items which have more than 200 orders the algorithm
decreased the number of supplier orders by 35,83% and the number of backorders by
49,29% while keeping almost the same inventory turnover. If the same results are achieved
with all products, it is expected that the company would save around 60K euros per annum
on supplier order line fees and the lower number of backorders would increase sales by
24%.
The project outperformed the current company’s system and the purpose as well as all
goals were achieved. The system has not been deployed and it is not yet used by the
company employees, but it will be done in the near future.
The project opened a lot of opportunities for the future research. All the future work
possibilities are presented in the bullet points below.
• The execution time of the data preparation step could be improved. This would
lower the overall computational complexity of the algorithm and would speed up
the predictions and testing of the algorithm.
• The algorithm could be tested with the different inventory management methods
(e.g., Periodic Review System) instead of Min/Max and the results compared
between different solutions.
• Instead of using the same algorithms and hyperparameters for all items, the machine
learning algorithm and its hyperparameters could be chosen for every item
individually rather. That should increase the overall system performance.
• More item attributes such as value, dimensions, availability at the supplier, shipping
or storing conditions could be considered when predicting the supplier orders.
• The algorithm could be improved to react to the supply chain disruptions and avoid
the bullwhip effect. The information from the social media or news could be
53
gathered and used by the algorithm to adjust the prediction based on the supply
chain disruption events happening around the globe.
• The data available at the company could be merged with the similar opensourced
data. The higher number of datapoints gathered from open data and provided to the
algorithm should lead to the overall better performance.
• The system should be deployed and integrated with the current company’s system
so that the employees can use it and benefit from it.
5.3 Reflections
The project has a large impact for the research field, the company, and the environment.
The research field is highly impacted due to the lack of research of the inventory
management algorithms when the supplier applies order line fees. Since there are no found
research for such case, this project would be the main guideline for the future researchers
solving similar problems. Even if the supplier order line fee is irrelevant, the process and
the results of the thesis will help researchers choose or develop the correct machine learning
algorithm for the inventory management or customer demand forecasting tasks.
The project a has a large impact for the company ASWO Baltic. The developed
algorithm should dramatically reduce company’s expenses on supplier order line fees,
reduce shipping time to the final customer, increase sales and lower human resources
needed for the inventory and order management. Overall, this would increase company’s
income and profits by a huge margin.
Finally, the project impacts the environment by lowering the packaging waste and
carbon emissions. This is caused by the reduced number of orders because combining many
items in a single order might require less packaging material and fuel spent on shipping.
47
|
References
50 | References
[21] J. D. Rey, “How robots are transforming Amazon warehouse jobs — for better and
worse,” Vox, 11-Dec-2019. [Online]. Available:
https://www.vox.com/recode/2019/12/11/20982652/robots-amazon-warehouse-
jobsautomation. [Accessed: 14-May-2021]
[22] H. L. Lee, V. Padmanabhan, and S. Whang, “Information Distortion in a Supply
Chain: The Bullwhip Effect,” Manag. Sci., vol. 43, no. 4, pp. 546–558, 1997 [Online].
Available: https://www.jstor.org/stable/2634565. [Accessed: 16-May2021]
[23] W. Kenton, “What Is Lead Time?,” Investopedia. [Online]. Available:
https://www.investopedia.com/terms/l/leadtime.asp. [Accessed: 16-May-2021]
[24] “Chapter 13: The Bullwhip Effect - Fundamentals of Supply Chain Theory, 2nd
Edition [Book].” [Online]. Available:
https://www.oreilly.com/library/view/fundamentals-of-supply/9781119024842/
c13.xhtml. [Accessed: 16-May-2021]
[25] J. D. Sterman, “Modeling Managerial Behavior: Misperceptions of Feedback in a
Dynamic Decision Making Experiment,” Manag. Sci., vol. 35, no. 3, pp. 321–339,
1989 [Online]. Available:
https://www.jstor.org/stable/2631975. [Accessed: 10-May-2021]
[26] “Computers play the beer game: can artificial agents manage supply chains?,” Decis.
Support Syst., vol. 33, no. 3, pp.
323–333, Jul. 2002, doi: 10.1016/S0167-9236(02)00019-2. [Online]. Available:
References 59
https://www.sciencedirect.com/science/article/pii/S0167923602000192. [Accessed:
16-May-2021]
[27] R. Carbonneau, K. Laframboise, and R. Vahidov, “Application of machine learning
techniques for supply chain demand forecasting,” Eur. J. Oper. Res., vol. 184, no. 3,
pp. 1140–1154, Feb. 2008, doi: 10.1016/j.ejor.2006.12.004.
[Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0377221706012057. [Accessed:
18-May2021]
[28] S. R. Singh and T. Kumar, “Inventory Optimization in Efficient Supply Chain
Management,” May 2012.
[29] A. Kochak and S. Sharma, “Demand forecasting using neural network for supply
chain management,” Int. J. Mech.
Eng. Robot. Res., vol. 4, no. 1, pp. 96–104, 2015.
[30] A. Lasek, N. Cercone, and J. Saunders, “Restaurant Sales and Customer Demand
Forecasting: Literature Survey and Categorization of Methods,” in Smart City 360°,
Cham, 2016, pp. 479–491, doi: 10.1007/978-3-319-33681-7_40.
[31] “Welcome to Python.org,” Python.org. [Online]. Available: https://www.python.org/.
[Accessed: 23-May-2021]
[32] “pandas - Python Data Analysis Library.” [Online]. Available:
https://pandas.pydata.org/. [Accessed: 23-May-2021]
[33] “NumPy.” [Online]. Available: https://numpy.org/. [Accessed: 23-May-2021]
[34] “scikit-learn: machine learning in Python — scikit-learn 0.24.2 documentation.”
[Online]. Available: https://scikitlearn.org/stable/. [Accessed: 23-May-2021]
[35] M. Löning et al., alan-turing-institute/sktime: v0.6.1. Zenodo, 2021 [Online].
Available:
https://zenodo.org/record/3749000. [Accessed: 23-May-2021]
[36] “Matplotlib: Python plotting — Matplotlib 3.4.2 documentation.” [Online].
Available: https://matplotlib.org/. [Accessed: 23-May-2021]
[37] “Project Jupyter.” [Online]. Available: https://www.jupyter.org. [Accessed: 23-May-
2021]
[38] “PyCharm: the Python IDE for Professional Developers by JetBrains,” JetBrains.
[Online]. Available: https://www.jetbrains.com/pycharm/. [Accessed: 23-May-2021]
[39] T. pip developers, pip: The PyPA recommended tool for installing Python packages.
[Online]. Available:
https://pip.pypa.io/. [Accessed: 23-May-2021]
[40] “Anaconda | The World’s Most Popular Data Science Platform,” Anaconda. [Online].
Available:
https://www.anaconda.com/. [Accessed: 23-May-2021]
[41] “Git.” [Online]. Available: https://git-scm.com/. [Accessed: 23-May-2021]
[42] Y. Shafranovich, “Common Format and MIME Type for Comma-Separated Values
(CSV) Files,” RFC Editor,
RFC4180, Oct. 2005 [Online]. Available: https://www.rfc-editor.org/info/rfc4180.
[Accessed: 20-May-2021]
[43] “Exploratory Data Analysis | SpringerLink.” [Online]. Available:
https://link.springer.com/chapter/10.1007/978-3319-43742-2_15. [Accessed: 20-May-
2021]
[44] R. Dunford, Q. Su, and E. Tamang, “The Pareto Principle,” 2014 [Online]. Available:
https://pearl.plymouth.ac.uk/handle/10026.1/14054. [Accessed: 23-May-2021]
[45] “A Complete Tutorial which teaches Data Exploration in detail,” Analytics Vidhya,
10-Jan-2016. [Online]. Available:
https://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/. [Accessed:
20-May-2021]
[46] J. Han and M. Kamber, Data mining: concepts and techniques, 3rd ed. Burlington,
MA: Elsevier, 2012 [Online].
Available: http://myweb.sabanciuniv.edu/rdehkharghani/files/2016/02/The-Morgan-
Kaufmann-Series-in-Data-
Management-Systems-Jiawei-Han-Micheline-Kamber-Jian-Pei-Data-Mining.-
Concepts-and-Techniques-3rdEdition-Morgan-Kaufmann-2011.pdf
[47] C. Tofallis, “A better measure of relative prediction accuracy for model selection and
model estimation,” J. Oper. Res.
Soc., vol. 66, Mar. 2015, doi: 10.1057/jors.2014.124.
References 61
[48] S. Shrivastava, “Cross Validation in Time Series,” Medium, 17-Jan-2020. [Online].
Available:
https://medium.com/@soumyachess1496/cross-validation-in-time-series-
566ae4981ce4. [Accessed: 20-May-2021] [49] M. Claesen and B. De Moor,
“Hyperparameter Search in Machine Learning,” ArXiv150202127 Cs Stat, Apr. 2015
[Online]. Available: http://arxiv.org/abs/1502.02127. [Accessed: 24-Sep-2021]
[50] “sklearn.neighbors.KNeighborsRegressor — scikit-learn 0.24.2 documentation.”
[Online]. Available:
https://scikitlearn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegress
or.html. [Accessed: 05-Jun-2021]
[51] “sklearn.ensemble.RandomForestRegressor — scikit-learn 0.24.2 documentation.”
[Online]. Available:
https://scikitlearn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegre
ssor.html. [Accessed: 05-Jun-2021] [52] “2015 09 TE Briefing Truck CO2 Too
big to ignore_FINAL.pdf.” [Online]. Available:
https://www.transportenvironment.org/sites/te/files/publications/2015%2009%20TE
%20Briefing%20Truck%20CO
2%20Too%20big%20to%20ignore_FINAL.pdf. [Accessed: 08-Jun-2021]
[53] “How much CO2 does a tree absorb?” [Online]. Available:
https://www.viessmann.co.uk/heating-advice/how-muchco2-does-tree-absorb.
[Accessed: 09-Jun-2021]
51
References 64
|
References 65
53
TRITA-EECS-EX-2021:811
www.kth.se