Big Data Analytics
Big Data Analytics
Júri
Presidente: Doutor Arnaldo Manuel Guimarães Batista - FCT/UNL
Arguentes: Doutor André Dionísio Bettencourt da Silva Rocha - FCT/UNL
Vogais: Doutor Ruben Duarte Dias da Costa - FCT/UNL
Outubro, 2020
Big data analytics for intra-logistics process planning in the automotive sector
Copyright © Luís Carlos Guimarães Lourenço, Faculty of Sciences and Technology, NOVA
University Lisbon.
The Faculty of Sciences and Technology and the NOVA University Lisbon have the right,
perpetual and without geographical boundaries, to file and publish this dissertation
through printed copies reproduced on paper or on digital form, or by any other means
known or that may be invented, and to disseminate through scientific repositories and
admit its copying and distribution for non-commercial, educational or research purposes,
as long as credit is given to the author and editor.
Este documento foi gerado utilizando o processador (pdf)LATEX, com base no template “novathesis” [1] desenvolvido no Dep. Informática da FCT-NOVA [2].
[1] https://github.com/joaomlourenco/novathesis [2] http://www.di.fct.unl.pt
Lorem ipsum.
Ac k n o w l e d g e m e n t s
The conclusion of this project represents the end of a very important chapter of my life
and many people deserve to be mentioned.
I would like to thank Professor Ricardo Gonçalves for giving me the opportunity to work
on this project.
I would like to thank Professor Ruben Costa for all the guidance provided throughout
this dissertation.
The remaining members of the research centre also deserve credit for all the work and
insights provided.
A special thanks to Diogo Graça for his mentorship during my internship, and to the
entire logistics department in VWAE for all the help provided.
To all my friends and colleagues’ thanks for making this journey easier and happier.
Last but not least a very special thanks to my family for all the love and support.
vii
A b s t r ac t
The manufacturing sector is facing an important stage with Industry 4.0. This paradigm
shift impulses companies to embrace innovative technologies and to pursuit near-zero
fault, near real-time reactivity, better traceability, and more predictability, while working
to achieve cheaper product customization.
The scenario presented addresses multiple intra-logistic processes of the automotive fac-
tory Volkswagen Autoeuropa, where different situations need to be addressed. The main
obstacle is the absence of harmonized and integrated data flows between all stages of the
intra-logistic process which leads to inefficiencies. The existence of data silos is heavily
contributing to this situation, which makes the planning of intra-logistics processes a
challenge.
The objective of the work presented here, is to integrate big data and machine learn-
ing technologies over data generated by the several manufacturing systems present, and
thus support the management and optimisation of warehouse, parts transportation, se-
quencing and point-of-fit areas. This will support the creation of a digital twin of the
intra-logistics processes. Still, the end goal is to employ deep learning techniques to
achieve predictive capabilities, all together with simulation, in order to optimize pro-
cesses planning and equipment efficiency.
The work presented on this thesis, is aligned with the European project BOOST 4.0, with
the objective to drive big data technologies in manufacturing domain, focusing on the
automotive use-case.
Keywords: Industry 4.0, Data Mining, Machine Learning, Big Data, Digital-Twin
ix
Resumo
O setor de manufatura enfrenta uma etapa importante com a Indústria 4.0. Esta mu-
dança de paradigma impele as empresas a adotar tecnologias inovadoras para atingir
falhas quase nulas, reatividade em tempo real, melhor rastreabilidade e previsibilidade,
enquanto trabalham para obter uma customização mais barata do produto.
O cenário em estudo aborda vários processos intra-logísticos da fábrica automóvel Volkswa-
gen Autoeuropa, onde diferentes situações necessitam melhoramentos. O principal obs-
táculo é a ausência de fluxos de dados e integração entre todas as etapas do processo
intra-logístico, o que leva a ineficiências. A existência de silos de dados contribui forte-
mente para estas situações, o que torna o planeamento de processos um desafio.
O objetivo do trabalho apresentado aqui é integrar tecnologias de big data e machine le-
arning nos dados gerados pelos diversos sistemas de produção presentes e, assim, apoiar
o gerenciamento e a otimização das áreas de armazém, transporte de peças, sequencia-
mento e pontos de aplicação. Esta dissertação apoiará também a criação de um gêmeo
digital dos processos intra-logísticos, ainda assim, o objetivo final é empregar técnicas de
deep learning para obter capacidades preditivas e juntamente com a simulação otimizar
o planeamento de processos e a eficiência de equipamentos.
O trabalho apresentado neste documento está embebido no projeto europeu BOOST 4.0,
com o objetivo de impulsionar tecnologias de big data no domínio da manufatura, com
foco no setor automóvel.
xi
Co n t e n t s
1 Introduction 1
1.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research question and hypothesis . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 BOOST 4.0 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 Architecture 35
4.1 Extract Transform Load layer . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Storage layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Machine learning layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Processing layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Visualization layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
xiii
CO N T E N T S
5 Results 45
Bibliography 55
xiv
L i s t o f Fi g u r e s
xv
L i s t o f Fi g u r e s
xvi
L i s t o f Ta b l e s
xvii
Ac r o n y m s
ML Machine Learning
MLP Multilayer Perceptron
MSE Mean Squared Error
xix
AC R O N Y M S
xx
Chapter
1
Introduction
We live in times of innovation in all fields, and thanks to constant technological de-
velopments, globalization, increasing customer expectations and aggressive markets all
through the world, companies, business’s and academics are working to apply these revo-
lutionary innovations in our advantage.
In the last few years major advances in technologies like internet of things, big data,
cloud computing, artificial intelligence and many others are fuelling a new Industrial
revolution and in result of that smart manufacturing is becoming the focus of the global
manufacturing transformation.
This new revolution is called Industry 4.0 and like in the previous industrial revolutions,
technology changed the paradigm, the first one created the mechanization of processes
with the steam engine, the second introduced mass production thanks to electricity and
the third offered automation due to the introduction of electronic devices. The industrial
revolutions and it’s driving technologies are represented in figure 1.1 .
Today manufacturing industries are changing from the mass production of the past
to customized production to meet the growing customer expectations [1].
Another trending topic is Big Data due to the enormous growth of data available in the
1
CHAPTER 1. INTRODUCTION
past few years, a study by IDC (International Data Corporation) titled “data age 2025”
predicts that worldwide data creation will grow to an enormous 163 zettabytes by 2025,
that is ten times more than 2017.[2]
This paradigm shift brings a huge amount of data associated that needs to be properly
processed in order to achieve the desired outcomes. Integrated analysis for the manufac-
turing big data is beneficial to all aspects of manufacturing.[3] But not just manufacturing
can benefit from data all organizations whether large or small, with a data-dependent
business model or not can benefit from a better understanding of its data [4]. Big data
analytics is a trending subject that companies from all fields are working into their busi-
nesses, and with the value and quantity of data rapidly growing every year, we can expect
the trend to continue for multiple years making big data an important field to research
nowadays.
In the subject of industry 4.0 a technology that also benefits from data growth is Digital-
Twin and is likely to become more relevant over the next decade [5]. Digital twin is a live
model useful to gather business insights that can be implemented for a specific asset, for
an entire facility or even for an individual product.
The concept of machine learning has been around for decades and now it is more relevant
than ever because there is an increasing availability of data and computing power, with
fast paced developments in the area of algorithms the applications for machine learning
in manufacturing will increase [6].
The combination of machine learning and digital-twin can amplify both technologies
benefits, where as digital twin can test the accuracy of the machine learning models and
different scenarios suggested by the machine learning layers.
The logistics sector does not escape the trends and some major changes are predicted, in
fact, logistics represents an appropriate application area for industry 4.0 and its technolo-
gies [7]. Logistics has always been a data driven area of business and now more than ever,
with the perspectives of real-time tracking of material and product flows[8], improved
transport handling, risk management and other features companies need to prepare their
logistics departments for the incoming growth of data. “In fact, one could argue that
industry 4.0 in its pure vision can only become reality if logistics is capable of providing
production systems with the needed input factors . . . ”[7].
This dissertation is embedded in the VWAE (Volkswagen Autoeuropa) pilot of the biggest
European initiative in Big Data for Industry 4.0 called BOOST 4.0. This project is funded
by the European union and pretends to lead the construction of the industrial European
data space and provide the industrial sector with the necessary tools to obtain the maxi-
mum benefit from Big Data.
2
1.1. PROBLEM DESCRIPTION
The processes within intra-logistics on VWAE are the following and they are repre-
sented in figures 1.3,1.4 and 1.5 [9]:
• Warehousing – On the warehouse parts are stored either in shelve or block storage
concept. System wise there is one database to control the parts coming from each
truck and then a separate database, which registers the unloading, transportation
and storing of the material in the warehouse.
• Transport (to sequencing) - An automatic line feeding system based on real vehicle
demands generates parts call offs after interacting with real time stock data to re-
plenish the points of use at commissioning areas called SUMA’s, or directly at the
3
CHAPTER 1. INTRODUCTION
assembly line, for parts that do not require sequencing, using a pull methodology/-
concept. The transport is then made by tow trucks, and the record of these internal
transports is stored in a different database. In this process there is an area called
Bahnof where parts are placed by the warehouse forklifts to wait for transport to
the production line or sequencing.
• Sequencing - The next step will be the picking process for the correct sequencing in
the SUMA. Here, the operator follows system electronic picking of parts according
to the vehicle sequence on the production line. These operations are executed under
the principles of the lean production system[10] [11].
• Transport (to point of fit) – The transport from the sequencing areas to the point
of application is made either by AGV’s (Automated Guided Vehicles) or again tow
trucks. AGV’s have data stored in different databases depending on its manufac-
turer.
• point of fit - Finally, the parts are manually delivery at the point of fit by the line-
feeding operator.
Throughout the years, VWAE has done numerous optimizations in its logistics pro-
cess, namely, with the introduction of AGV’s and with the implementation of auxiliary
sequencing tools. Having this said, there are still some constraints.
The main issue regarding logistic processes is that there is an absence of a “Big picture”,
all of the different parts of the process are disconnected on data and on knowledge, there
is not an integrated data source nor a single entity with a deep understanding of the
whole process.
The lack of communication and integration between the different systems create data
silos which makes managing process flows throughout the different steps a challenging
task, and the multiple generations of technologies found aggravate this issue since recent
systems are prepared for the 4.0 revolution and older ones require multiple steps to even
4
1.1. PROBLEM DESCRIPTION
gather data. The complexity of the logistics process on a plant of this size and the multi-
source, multi-structured, high volume, variety and veracity nature of the data make it
very hard to handle, analyse and correlate.
Most of the organizations have huge volumes of structured data housed in different data
sources such as mainframes and databases, and also unstructured data sets. Providing
integrated data from such a variety of sources is prerequisite for effective business intelli-
gence[12].
Gathering data from heterogenous sources and manipulating them to prepare for big data
analysis processes is still a big problem[13]. The logistics department at VWAE suffers
from the absence of any predictive and adaptive functionalities and that force logistics
planners to use conventional methods, relying on past experience and trial and error for
every decision they make. This reduces the ability of optimizing the system because it
5
CHAPTER 1. INTRODUCTION
takes considerable time, effort and, until deployment, there is no way of validating these
changes or to predict their outcomes with an acceptable degree of confidence.
Data errors are present with some regularity, due to lack of both data validation and
awareness of the importance of data validity. This problem reduces the confidence of
both the decision makers and planners at VWAE in the data available which leads to its
lack of use and value.
One important aspect of logistics in manufacturing is warehouse management. Managing
a warehouse is very complex because of the multiple variables to consider like physical
space, internal transport times, inventory costs, security and material quality to name a
few. The warehousing management is also constantly under pressure to feed the produc-
tion line because stoppages can be very expensive.
Warehouse management at VWAE is no different and must account for all these vari-
ables for a few thousand different parts with very different characteristics and processes
associated. For our selected part, car batteries, inventory management is especially im-
portant for multiple reasons, it’s a valuable component so stall money is a factor, it has
to be stored in ground level and does not support stacking of other packages above so
it occupies premium warehouse location and they also have expiration dates. There is
a necessity to be more efficient in warehousing space, stall money, transport costs and
pollutant emissions and that lead to a necessity of optimizing inventory levels.
6
1 . 4 . M E T H O D O LO G Y
We intended to implement a multi-layer architecture, which for this use-case wont nec-
essary be a “big data” architecture but will be built with a future integration in a cloud
computing system like apache spark in mind for the purpose of scalability. The multiple
layers will be described in detail throughout this dissertation. The said layers are the
following:
1. ETL (extract, transform, load) – To implement a data driven system its necessary to
extract the data from its source, transform it into a clean structured format to load
into said system. This layer consists of multiple operations of loading, cleaning,
reshaping, resampling and normalization to get the data from its source into our
machine learning (ML) layer or directly to the processing or presentation layer.
2. Storage layer – this layer consists of a database with the clean, structured and inte-
grated data necessary to save historical data.
3. ML layer – This layer consists of a machine learning model that receives the already
prepared data and learns patterns, behaviours, and trends without being explicitly
programmed from that data to forecast the future states of that said data. Those
forecasted values are then forward to the next layer.
4. Processing – This layer consults the available data, including the forecasts provided
by the ML layer, and calculates the suggested values of each target.
5. Presentation – This layer consists of ways to present data and insights to the plan-
ners, by the creation of graphs and data tables in intuitive ways. With this addition
decision makers are equipped with data and insights to make the best possible
decision.
This architecture was chosen because the main problem encountered was the absence of a
“Big Picture” of the data available, because of the existence of data silos and lack of data
validation. This way our ETL layer can eliminate the data silos by integrating the data and
cleaning and structuring the data in the process. The machine learning layer can be used
to learn from data and output insights, in this use case it will learn from historical data
to forecast the next 5 days of consumption of car batteries by the production line. The
processing is necessary to interpret the output of the machine learning layer, and finally
the presentation layer intends to solve the absence of the “Big Picture” by presenting the
data from all clusters in the same platform.
1.4 Methodology
This dissertation will follow a CRISP-DM (Cross-industry standard process for data min-
ing) reference model. Within this model the life cycle of a data mining project is broken
down in six phases which are shown in figure 1.6. [14]
7
CHAPTER 1. INTRODUCTION
1. Business understanding – Vital to know what to look for and defining objectives
5. Evaluating – Evaluate the results obtained in the previous step and decide new
objectives and future tasks.
This methodology does not have a strict sequence of the phases and moving back and
forth between phases is always required in order to improve the outcome iteratively.
To acquire the necessary knowledge about the business, logistics process and general
work of the factory an internship for the duration of the work was seen by both parts as a
positive measure and the most fruitful way to proceed.
8
1 . 5 . B O O S T 4 . 0 CO N T R I B U T I O N S
As previously stated, this work is integrated in the VWAE pilot of the BOOST 4.0 Euro-
pean project. This section will provide a description of the pilot and its objectives as well
as description of the different phases of the pilot and the contributions of this thesis to
the project.
Boost 4.0 is seeking to improve the competitiveness of Industry 4.0 and to guide the
European manufacturing industry in the introduction of Big Data in the factory, along
with the necessary tools to obtain the maximum benefit of Big Data. In respect to global
standards, Boost 4.0 is committed to the international standardization of European Indus-
trial Data Space data models and open interfaces aligned with the European Reference
Architectural Model Industry 4.0 (RAMI 4.0).
The standardization of industry 4.0 compliant systems or smart manufacturing systems
include many aspects [15]. Future smart manufacturing infrastructures must enable the
exploitation of new opportunities. Even today, people are surrounded by interconnected
digital environments continuously generating more synergies with connected devices and
software. Such an evolution happens also in the manufacturing domain as in Volkswa-
gen. Future Smart Manufacturing infrastructures are confronted with the digitalisation
and virtualisation of (physical) objects enhanced with sensors, processors, memory and
communication devices, able to communicate coactively and to exchange information
independently through a reactive, predictive, social, self-aware and/or autonomous be-
haviour [16] [17]. A used term for such intelligent physical objects is Cyber-Physical
System (CPS) which are communicating in (Industrial) Internet of Things ((I)IoT) net-
works.
To exploit new opportunities, specific requirements as real-time, security or safety have
to be considered. Smart Manufacturing infrastructures have to be based on network
technologies which enable a secure (encryption, authentication, robustness, safety), verti-
cal and horizontal cross-domain and cross-layer communication between stationary and
mobile objects (as virtual objects, sensors, actors, devices, things or systems). Network
technologies must comply with specific requirements related to e.g. real-time, safety,
security, data amounts, wired or wireless, passive, or active, etc.[18]. Lower level fields
(process control or real-time statistics) require time frame abilities of seconds or even mil-
liseconds for response, whereas higher levels (production planning or accounting) only
require time frames of weeks or months[18]. Architectures, as the RAMI 4.0 in figure 1.7,
in general, are describing the ordering of components/modules and their interaction and
should provide a unified structure and wording for used terms. An architecture should
include a logical, a development, a process and a validation view, and should provide
scenarios for a validation as proposed by Philippe Kruchten in his 4+1 architectural view
model [19]. A smart manufacturing architecture should also provide a unified structure
and wording covering mandatory aspects in smart manufacturing as product, system
9
CHAPTER 1. INTRODUCTION
or order life cycles, value streams, information flows, or hierarchical layers. Such archi-
tectures are currently under development. (Physical) reachable objects inside a smart
manufacturing network (e.g. digitalised and virtualised field level devices, systems, ma-
terial, integrated humans, virtual concepts (e.g. of products in the design phase), etc.),
have to fulfil a range of requirements. Objects should communicate using a unified com-
munication protocol, at least at the application level, and should be based on a unified
semantic to enable a mutual identifiability and understanding. The object itself should
provide its own features as a service (e.g. state information or functionalities) and should
be able to provide its own description next to extended information as manuals, specifica-
tions or wear information. All these have to be kept next to further requirements related
to security, safety or quality of service [20] [21]. Finally, various applications that use
services of deployed objects to realise e.g. control systems, systems of systems through
service orchestration, or - as focused in in this work - Big Data analysis applications can
be implemented. Standards can be classified according to what role they play in the
system architecture. At this stage in the Boost 4.0 project we have the RAMI 4.0 (fig 1.7).
In figure 1.8 we have the Boost 4.0 architecture. Here we can see Boost 4.0 horizontal
layers, visualization, data analytics, data processing, data management and external Data
sources/Infrastructure as well as the Vertical layers, development, communications and
connectivity, data sharing platforms and privacy/security. This dissertation will con-
tribute mainly on the visualization, data analytics, data processing, data management
and external data sources/Infrastructure layers.
This pilot is structured in four different phases as we can see on figure 1.9.
Phase 1 - The initial version of the pilot’s implementation mainly comprised the overall
test of the closeness of the simulation with the reality of the logistics operations at VWAE.
10
1 . 5 . B O O S T 4 . 0 CO N T R I B U T I O N S
This phase was divided into several iterations in order to have the best possible fitting
between what is being simulated in Visual Components “Digital-Twin” and the reality in
terms of logistics processes, in accordance to the main business scenarios for this pilot.
The tasks of this phase are represented in figure 1.10 and this dissertation contributed to
the data cleaning and data transformation tasks.
Phase 2 - Real-Time Scenario. In this phase, real-time data is fed into the simulation
in order to confirm that the simulation clearly depicts the real-world processes, to val-
idate if the real-world processes can be optimized and also to check the as-is situation
11
CHAPTER 1. INTRODUCTION
when tweaks in the actual process are performed in the simulation. The data is fed to
the simulation environment through a Publish-Subscribe mechanism, as is the case of
the OPC-UA standard or the FIWARE ORION Context Broker, meaning that when a set
of data is published into the service, the simulation environment will get it through its
subscription to the Pub-sub service. On figure 1.11 we have the tasks of this phase, this
dissertation contributed to the Big data aggregation.
Phase 3 Prediction. This phase is characterized by the use of Data Mining and Machine
Learning algorithms, both for prediction of future data values and on the analysis of data
retrieved from the simulation environment. The first process will be to predict future
data depending on specific tweaks to the processes, whether they are made directly on
the physical dimension of the simulation (e.g. placing the sequencing area in a different
place, changing an human operator for an AGV or robot, etc.) or on the data per se (e.g.
increase the number of jobs in the Point-of-fit, increase the time intervals between truck
arrivals, etc.). The predicted data would then be fed into the simulation environment, in
order to check the impact of the tweaks in the logistics operation, i.e. if the whole process
would still correspond to the necessary production requirements or not. In the case the
process does not meet the necessary requirements, then solutions for the encountered
issues must be found. The tasks of this phase are in figure 1.12 and this dissertation
contributed to the predictive algorithms task.
Phase 4 Future Digital Twin. This phase comprises the final version of the digital twin,
in which the future operations of the logistics area at VWAE will be tested prior to real
implementations. This process of digital twin testing will provide a solid ground to create
new processes, optimize existing ones and test significant changes in the overall logistics
operation without the need of performing real-world pilots, saving money, time and hu-
man resources that would otherwise be needed to perform such piloting activities. This is
12
1.6. THESIS OUTLINE
crucial to VWAE since, up to now, the only way to perform testing activities is to couple
them into the everyday operation, which brings serious problems in terms of execution
times, resource usage and return of investment. Furthermore, when piloting some opti-
mization of processes or the assessment of the use of new technologies on the logistics
operations does not meet the required expectations, all of the above problems are even
more critical, since the effort spent in the tests, regarding money, time and resources does
not contribute to a substantial improvement of the operation. The tasks of this phase are
in figure 1.13 and this dissertation contributed to the analytics task.
13
Chapter
2
Stat e o f t h e A rt
This chapter contains a review of the concepts and technologies addressed in this disser-
tation.
• Unsupervised learning – Is used when the data provided is neither classified nor
labelled, and instead of figuring out the right output identifies common features
within the data and can infer functions to describe hidden structure on data.
15
C H A P T E R 2 . S TAT E O F T H E A R T
Figure 2.1 shows the structuring of machine learning techniques and algorithms, and that
in all categories there is a big range of different algorithms, each one with its advantages
and disadvantages.
Machine learning has applications in multiple areas, approaches for predicting future
inbound logistic processes already exist[22], forecasting of supply chains showed im-
provements and increased adaptability with the use of machine learning algorithms[23],
in healthcare machine learning algorithms proved successful in predicting early colorec-
tal cancer metastasis using digital slide images[24].
In industry, supervised machine learning techniques are mostly applied due to the data-
rich but knowledge-sparse nature of the problems [25]. The general process contains
several steps handling the data and setting up the training and test dataset by the teacher,
hence supervised [26].
In 2017 an article[27] implemented a machine learning based system to respond to a
problem of optimal order placements in electronic equity markets and achieved substan-
tial reductions of transactions costs.
Multiple machine layer techniques are being applied with success on scheduling prob-
lems like this article [28] that proposes a framework to optimize scheduling of processes
in order to reduce power consumption in data-centre’s, they utilize machine learning
techniques to deal with uncertain information and use models learned from previous sys-
tem behaviours in order to predict power consumption levels, CPU (Central Processing
Unit) loads, and SLA(service-level agreement) timings, and improve scheduling decisions.
16
2 . 1 . M AC H I N E L E A R N I N G
Machine learning algorithms are becoming more and more useful with the growth of big
data, since it is not possible or practical to have programmer’s constantly adapting code
to extract useful information from data. There are multiple examples of cases where the
use of big data techniques aided by machine learning produced valuable results in varied
areas like energy, logistics, agriculture, marketing or even health. A good example are
search engines that use ML algorithms to recommend advertisements related with the
content searched [29].
There is a wide range of predictive techniques and mainly two categories, regression tech-
niques and machine learning ones. With regression models the focus lies on establishing
a mathematical equation as a model to represent the interactions between the different
variables in consideration. Depending on the situation there are a wide variety of models
that can be applied while performing predictive analytics.
One of this models is the linear regression model that analyses the relationship between
the response or dependent variable and a set of independent or predictor variables. This
relationship is expressed as an equation that predicts the response variable as a linear
function of the parameters. These parameters are adjusted so that a measure of fit is
optimized. Much of the effort in model fitting is focused on minimizing the size of the
residual, as well as ensuring that it is randomly distributed with respect to the model
predictions. A proposed local linear regression model was applied to short-term traffic
prediction in this paper[30] and the performance of the model was compared with pre-
vious results of nonparametric approaches that are based on local constant regression,
such as the k-nearest neighbour and kernel methods, by using 32-day traffic-speed data
collected on US-290, in Houston, Texas, at 5-min intervals. It was found that the local
linear methods consistently showed better performance than the k-nearest neighbour and
kernel smoothing methods.
Logistic regression is a statistical model that in its basic form uses a logistic function
to model a binary dependent variable, although many more complex extensions exist.
Although it’s a simple model in some cases it can outperform more advanced models.
This study [31] uses Logistic Regression, Moving Average and BPNN (Back-Propagation
Neural Network) methods for sales models designed to predict daily fresh food sales and
found that the correct percentage obtained by the logistic regression to be better than
that obtained by the BPNN and moving average models
Machine learning was already described before and some of its techniques can be used to
conduct predictive analytics.
Neural networks are nonlinear sophisticated modelling techniques that are able to model
complex functions. They can be applied to problems of prediction, classification or con-
trol in a wide spectrum of fields and are used when the exact nature of the relationship
between inputs and output is not known. A key feature of neural networks is that they
17
C H A P T E R 2 . S TAT E O F T H E A R T
2.1.2 LSTM
Long short-term memory (LSTM) networks are a type of RNN and were discovered in
1997 by Hochreiter and Schmidhuber and set accuracy records in multiple applications
domains. [34]
LSTM are deep learning systems that avoid the vanishing gradient problem which means
that prevent backpropagated errors from disappearing or overgrowing . LSTM are nor-
mally augmented by recurrent gates called “forget gates”. [35]. So, errors can flow back-
wards through unlimited numbers of virtual layers unfolded in space. LSTM can learn
tasks that require memories of events that happened thousands or even millions of dis-
crete time steps earlier. [36] LSTM differ from other networks because they can work with
long delays between events and mainly because they can handle high and low frequency
events at the same time.
Multiple authors are using LSTM to make predictions to important datasets, a paper [37]
proposed an approach to forecast PM2.5 (Particulate Matter) concentration using LSTM
by exploiting Keras[38], which is a high-level neural networks API written in Python and
capable of running on top of Tensorflow, to build a neural network and run RNN with
LSTM through Tensorflow. The results showed that the proposed approach can effectively
forecast the value of PM2.5.
18
2 . 2 . B I G DATA A N A LY T I CS
Another paper [39] modelled and predicted China stock returns using LSTM. The his-
torical data of China stock market were transformed into 30-days-long sequences with
10 learning features. That LSTM model compared with random prediction method im-
proved the accuracy of stock returns prediction.
LSTM models accept multiple input and output types of data, one example of that is
a paper [40] that introduced an algorithm of text-based LSTM networks for automatic
composition and reported results for generating chord progressions and rock drum tracks.
The experiments show LSTM provides a way to learn the sequence of musical events even
when the data is given as text and the authors plan to examine a more complex network
with the capability of learning interactions within music (instruments, melody/lyrics) for
a more complete automatic composition algorithm.
• Volume, for the scale of the data produced, which makes it difficult to be processed
by regular data processing techniques.
• Velocity, by the pace at which the data is produced, demanding a much higher
processing capacity.
• Variety, in terms of content, format and size, which does not enable a standard
method for processing all the data.
• Value of the hidden information that can be collect by analysing such a large amount
of data.
Data Analytics may correspond to the application of tools and techniques to extract in-
sights and knowledge from data, by analysing it through any of Statistics, Data Mining
and Machine Learning techniques. Although statistical analytics is supported by well-
known statistical techniques, which are more easily deployed on a Big Data context, in
the case of Data Mining and Machine Learning, the passage to a Big Data environment is
not a trivial task, since it comprises the reconfiguration of algorithms to be deployed in
Big Data execution engines.
In typical data mining systems, the mining procedures require computational intensive
computing units for data analysis and comparisons. A computing platform is, therefore,
needed to have efficient access to, at least, two types of resources: data and computing
19
C H A P T E R 2 . S TAT E O F T H E A R T
processors.
For Big Data mining, because data scale is far beyond the capacity that a single personal
computer can handle, a typical Big Data processing framework will rely on cluster com-
puters with a high-performance computing platform. The role of the software component
is to make sure that a single data mining task, such as finding the best match of a query
from a database with billions of records, is split into many small tasks each of which is
running on one or multiple computing nodes [41].
Big Data Analytics refers to the implementation of analytic tools and technologies within
the scope of Big Data [9]. Hence, Big Data Analytics may be described by two specific
concepts, Big Data + Analytics, and the interactions between technologies supporting
both concepts.
So, why merge these concepts [42]? First, Big Data provides gigantic statistical samples,
which enhance analytic tool results. In fact, the general rule is that the larger the data
sample, the more accurate are the statistics and other products of the analysis. Second,
analytic tools and databases can now handle big data, and can also execute big queries
and parse tables in record time. Moreover, due to a precipitous drop in the cost of data
storage and processing bandwidth, the economics of analytics is now more embraceable
than ever.
The manufacturing sector is also implementing Big Data Analytics, this paper [43] pro-
poses a big data driven analytical framework to reduce the energy consumption and
emission for energy-intensive manufacturing industries. Then an application scenario
of ball mills in a pulp workshop of a partner company is presented to demonstrate the
proposed framework. The results show that the energy consumption and energy costs
are reduced by 3% and 4% respectively.
According to [44] the semiconductor manufacturing industry has been taking advantage
of the big data and analytics evolution by improving existing capabilities such as fault
detection, and supporting new capabilities such as predictive maintenance. For most of
these capabilities, data quality is the most important big data factor in delivering high
quality solutions and incorporating subject matter expertise in analytics is often required
for realizing effective on-line manufacturing solutions. In the future, an improved big
data environment incorporating smart manufacturing concepts such as digital twin will
further enable analytics; however, it is anticipated that the need for incorporating subject
matter expertise in solution design will remain.
Internet of Things generated data is characterized by its continuous generation, large
amount, and unstructured format. The existing relational database technologies are in-
adequate to handle such IoT generated data because of the limited processing speed and
the significant storage-expansion cost, to counter that a paper [45] proposes a sensor-
integrated radio frequency identification (RFID) data repository-implementation model
using MongoDB and show that the proposed design strategy, which is based on horizontal
data partitioning and a compound shard key, is effective and efficient for the IoT gener-
ated RFID/sensor big data.
20
2 . 2 . B I G DATA A N A LY T I CS
In this paper[46], an overall architecture of big data-based analytics for product lifecy-
cle (BDA-PL) was proposed. It integrated big data analytics and service-driven patterns
that helped to overcome the lack of complete data and valuable knowledge. Under the
architecture, the availability and accessibility of data and knowledge related to the prod-
uct were achieved. Focusing on manufacturing and maintenance process of the product
lifecycle, and the key technologies were developed to implement the big data analytics.
The presented architecture was demonstrated by an application scenario, and the results
showed that the proposed architecture benefited customers, manufacturers, environment
and even all stages of product lifecycle management, and effectively promoted the imple-
mentation of cleaner production.
Big Data in supply chain problems makes it possible to analyse the data at a more ad-
vanced level than traditional tools, allowing the processing and combining of data col-
lected from several systems and databases in order to provide a clear picture of the situa-
tion. It can provide information on potential interference with the supply chain through
the collection and evaluation of data, it is possible not only to protect but also improve
the efficiency of the supply chain. This way, interruptions on production are avoided and
operational efficiency is increased. Big Data enables the optimization of logistic processes
while making the supply chain less prone to failures [47].
There are several surveys, starting from early 2000’s up to today, regarding Big Data
Analytics. These surveys often describe the same Big Data technologies, which have been
evolving throughout the years, coupled with Analytics techniques. The following para-
graphs present the most prevalent technologies and tools on all the surveys[42] [48] [49]
[50].
Regarding execution engines, the following are the most referred to in literature. Apache
Hadoop software library is a framework that allows for the distributed processing of large
data sets across clusters of computers using simple programming models. It is designed
to scale up from single servers to thousands of machines, each offering local computation
and storage. Rather than rely on hardware to deliver high-availability, the library itself
is designed to detect and handle failures at the application layer, so delivering a highly-
available service on top of a cluster of computers, each of which may be prone to failures.
It builds over a data processing paradigm called MapReduce. The MapReduce workflow
looks like this: read data from the cluster, perform an operation, write results to the
cluster, read updated data from the cluster, perform next operation, write next results to
the cluster, etc., Apache Spark is a general-purpose cluster computing engine which is
very fast and reliable [51] that started as a research project at the UC Berkeley AMPLab
in 2009, and was open sourced in early 2010. Many of the ideas behind the system were
presented in various research papers over the years.
Spark offers an abstraction called resilient distributed datasets (RDDs) to support these
21
C H A P T E R 2 . S TAT E O F T H E A R T
applications efficiently. RDDs can be stored in memory between queries without re-
quiring replication. Instead, they rebuild lost data on failure using lineage: each RDD
remembers how it was built from other datasets (by transformations like map, join or
groupBy) to rebuild itself. RDDs allow Spark to outperform existing models by up to
100x in multi-pass analytics. Spark showed that RDDs can support a wide variety of
iterative algorithms, as well as interactive data mining and a highly efficient SQL engine
called Spark SQL, which enables queries in SQL to be executed on NoSQL environments.
While MapReduce operates in steps, Spark operates on the whole data set in one fell
swoop. Spark completes the full data analytics operations in-memory and in near real-
time and Spark also works for both batch offline data processing and online stream pro-
cessing, through its real-time counterpart: Spark Streaming.
Apache Spark also has a Machine Learning library called MLlib [52], which include:
Beyond Big Data execution engines, storage and query systems for Big Data also had
an enormous evolution in the past few years. MongoDb[53], Apache Cassandra[54] two
different storage engines which do not rely on traditional RDBMS (Relational Database
Management System) and SQL technologies. Instead, each use a specific type of data
storage mechanism. Mongo is based on a document structure, relying on JSON(JavaScript
Object Notation) formatted documents to store data. Cassandra is also supported by a
file storage system, while HBase maintains the traditional tabular form, used in RDBMS
systems. Because most companies are used to using SQL query tools in order to perform
complex queries on their systems, several abstractions to NoSQL technologies were added,
in order to provide SQL query functionality to these systems.
22
2 . 3 . I N V E N TO RY M A N AG E M E N T
23
C H A P T E R 2 . S TAT E O F T H E A R T
24
Chapter
3
I n t r a l o g i s t i c s Data A n a ly s i s
This chapter will include a description and analysis of the whole range of intralogistics
data generated at the VWAE automotive factory.
The objective of this thesis and the BOOST 4.0 project is to contribute to the optimizations
of intralogistics processes by applying emerging technologies and take advantage of the
data available. To ensure this a data assessment to shed light on how to integrate data in
order to achieve a “big picture” of the intralogistics process.
One of the first tasks I set out to accomplish was an overview of the data available. A
description of the data available by each cluster.
Note that some of the data is not described in detail to prevent any issues of data pro-
tection and confidentiality. For the same reasons some data is described but the sample
presented contains less data than the described.
The external material transports in this context means transports that start outside the
factory. Most of the incoming parts arrive by truck, the available data consisted of raw
excel files with tabular data from each truck arriving at the factory.
These files contained timestamps of multiple events for each truck, like arriving time,
start and end of unloading, information about the material unloaded like quantity and
description, licence plate of the truck and the unloading position as well as transport
identification number and material order number. The licence plate, transport identifi-
cation number and material order number fields are important to the data integration
process because these fields represent the same information on the receiving cluster, thus
enabling a connection to be made.
This dataset contained a lot of repeated data and a substantial amount of errors and some
25
C H A P T E R 3 . I N T R A LO G I S T I CS DATA A N A LYS I S
nº guia transporte ID Chegada fabrica inicio descarga fim descarga saida fabrica local de descarga part number
000180576 219779693 03.01.19 10:03 03/01/2019 10:54:00 03/01/2019 11:30:00 03/01/2019 12:23:00 LOZ_5_KLT 6R0915105B
016648930 219812549 04.01.19 01:00 04/01/2019 01:35:00 04/01/2019 01:50:00 04/01/2019 01:51:00 LOZ10_GLT 1S0915105A
016648931 219812549 04.01.19 01:00 04/01/2019 01:35:00 04/01/2019 01:50:00 04/01/2019 01:51:00 LOZ10_GLT 5TA915105B
016648932 219812549 04.01.19 01:00 04/01/2019 01:35:00 04/01/2019 01:50:00 04/01/2019 01:51:00 LOZ10_GLT 7P0915105
016648927 219809991 04.01.19 01:36 04/01/2019 02:04:00 04/01/2019 02:17:00 04/01/2019 02:18:00 LOZ10_GLT 1S0915105A
016648928 219809991 04.01.19 01:36 04/01/2019 02:04:00 04/01/2019 02:17:00 04/01/2019 02:18:00 LOZ10_GLT 7P0915105
016648929 219809991 04.01.19 01:36 04/01/2019 02:04:00 04/01/2019 02:17:00 04/01/2019 02:18:00 LOZ10_GLT 5TA915105B
000180616 219843212 04.01.19 05:28 04/01/2019 05:54:00 04/01/2019 06:34:00 04/01/2019 06:35:00 LOZ_5_KLT 6R0915105B
016649484 219891419 07.01.19 07:05 07/01/2019 08:04:00 07/01/2019 08:37:00 07/01/2019 10:07:00 LOZ_5_KLT 7P0915105A
Area Nr. Fornec Nr Guia Posição Dt Guia Dt Entrada Peça Gr Arm Embalagem
FCC1 0001551600 000180576 MS05A04A03 2018-12-19 04/01/2019 00:30 6R0915105B T2 DB0011
FCC1 0001551600 000180576 MS05A08A03 2018-12-19 04/01/2019 00:30 6R0915105B T2 DB0011
FCC1 0001551600 000180576 MS05A09A01 2018-12-19 04/01/2019 00:30 6R0915105B T2 DB0011
FCC1 0002522100 016648927 INSPECAO 2018-12-27 04/01/2019 03:35 1S0915105A T2 DB0011
FCC1 0002522100 016648927 INSPECAO 2018-12-27 04/01/2019 03:35 1S0915105A T2 DB0011
FCC1 0002522100 016648927 MS05B22A02 2018-12-27 04/01/2019 03:36 1S0915105A T2 DB0011
FCC1 0002522100 016648927 MS05B22A03 2018-12-27 04/01/2019 03:36 1S0915105A T2 DB0011
FCC1 0002522100 016648927 MS05B27A03 2018-12-27 04/01/2019 03:36 1S0915105A T2 DB0011
FCC1 0002522100 016648927 MS10B25A02 2018-12-27 04/01/2019 03:36 1S0915105A T2 DB0011
26
3 . 4 . T R A N S P O R T TO S E QU E N C I N G
Area NrReferencia Zona Loc. Peça Fornecedor QStatus GrArm Embalagem Nr. Guia Dt Guia Ultimo Mov. Quantid
43B1 04314028017754 PSO BN05A14D01 7M3810630A 00156324 00X B9 0015SCH 007011214 01/12/2014 13/07/2017 116
43B1 04316031877318 U20 BN06B04E01 7N0864633A 00153479 00X K3 0006PAL 026301258 01/09/2016 13/03/2018 500
43B1 04317035155166 PSO BN05A14B02 1K8827209A 00057588 280 B8 111902 000397189 03/03/2017 23/02/2018 28
43B1 04317036301736 PSO BN05A14B03 1K8827210A 00057588 280 B8 111902 000405075 05/07/2017 23/02/2018 12
43B1 04317036919162 V05 BN03B03C01 1K0809495 00016954 280 B8 111902 000379297 14/09/2017 20/10/2017 500
43B1 04317036938926 INK MAT-NOK. 1K8864629B 00051288 000 68 006280 002161428 04/09/2017 02/11/2017 200
43B1 04317036950013 ING MAT-NOK 1K0813146 00071142 280 B8 111950 040125786 05/09/2017 13/02/2018 129
43B1 04317036968762 V05 BN03B02C02 1K0809495 00016954 280 B8 111902 000377602 04/09/2017 20/10/2017 361
43B1 04317036978923 BKL BN99011A05 7N0864623A 00153479 00X K2 0001SCH 060705981 08/09/2017 25/10/2017 1080
43B1 04317036978927 BKL BN99011A05 7N0864623A 00153479 00X K2 0001SCH 060705981 08/09/2017 25/10/2017 1080
27
C H A P T E R 3 . I N T R A LO G I S T I CS DATA A N A LYS I S
28
3 . 7 . P O F DATA
present in VWAE and has no type of connection to any database. Regarding this trans-
portation there was no direct data available.
To counter this issue, we prepared a raspberry pi with movement and position sensors
and attached it to the AGV for some basic data gathering about the AGV behaviour and
workload.
This step was also made to ensure data validity and plausibility, by doing it we were able
to compare the transport times measure with the times recorded by the sequencing clus-
ter and by the production line. This data validation is very important for planners, if the
data is correct the KPI’s (Key Performance Indicators) obtained from it can be relied upon.
For the purposes of our use-case, where car batteries are the part selected, we con-
sulted the log files from the receiver placed at the point of application of the batteries on
the car in the production line. This data comes in excel format (.xlsx) and in a tabular
29
C H A P T E R 3 . I N T R A LO G I S T I CS DATA A N A LYS I S
shape, with 5 columns, one with the timestamp, other with the car identification number,
and the other tree with info about the car and the number of entries is equal to the number
of cars that passed that receiver during the time frame observed.
The totality of the dataset gathered during this study consists of the logs from the receiver
correspondent to the car batteries for the start of 2018 to the end of February 2020, this
dataset will be used to predict the future consumption by the production line of each
type of batterie further on this thesis.
Table 3.6 is a data sample of the POF data.
KNR Sequence Date - T300 FAM BAT FAM AAU FAM MOT FAM GSP Model
5240356 8602 02/01/2018 07:12 J0V E0A D60 G1A 7N2
5140145 8603 02/01/2018 07:13 J0T E0A DQ6 G1D A11
5110392 8604 02/01/2018 07:15 J0V E0A DN4 G1D A11
4930177 8605 02/01/2018 07:16 J0S E0A DS9 G0K A11
5250312 8606 02/01/2018 07:22 J0V E0A D60 G1A 7N2
4930189 8607 02/01/2018 07:24 J0S E0A DS9 G0K A11
4930337 8608 02/01/2018 07:25 J0S E0A DS9 G0K A11
4930015 8609 02/01/2018 07:26 J0S E0A DS9 G0K A11
5250314 8610 02/01/2018 07:29 J0V E0A D60 G1A 7N2
30
3 . 8 . I N V E N TO RY DATA A N A LYS I S
The inventory data was generated with the purpose of analysing the problem in study
is based on the difference between entries of batteries to the warehouse (receiving data)
and the supplying of batteries to the sequencing area (internal transport data), and for
the initial state of the warehouse levels I utilized the warehousing data for the first day
of production in 2018.
This dataset has data of the entire 2018 year and the first 3 months of 2019.
The data consists of a table with 6 columns and 12861 rows, the first column being the
date and hour of the day and the following 5 the number of stored packages for each
battery type.
Since this dataset reflects the magnitude of the problem at hand an analysis was made.
The first step of this was to define what would be an optimal value, car batteries were
selected to this study because each and every car that is produced in VWAE utilizes one
and only one batterie, but there are still multiple batterie types, in this particular case
five of them, each one with a different usage rate by the production line which is called
take rate.
Based on the take rate we create two categories on the batterie types, low runners for
types with have a take rate below 10% and high runners for those above.
Since a stop in production is very expensive that cannot happen because of material
shortage situations and to unsure that management and specialists from the logistics
department at VWAE calculate, based on the importance of the part and on supplier lo-
calization, a security stock for each part, and the inventory levels should never dip below
that level, except on shutdown occasions.
Based on feedback from the planners responsible for these components a security stock
level of two and a half days was considered for all batterie types present.
To establish a baseline of overstock a steady production of 900 cars a day was considered,
so we will consider that at the start of each day, we should have an inventory level of 3.5
days, the production of the day itself plus the security stock level of 2.5 days.
For each batterie type this baseline was obtained by multiplying the daily production by
the take rate and then dividing that number by the number of batteries per container.
This way we obtain the daily consumption in packages for each type of batteries.
As an example, let us consider the 1S0915105A batterie type. ((900 * 0,4383) 54) = 7,3
packages per day but since we only store full packages, we need to ensure 8 packages per
day in this case.
Now all that there is left to do is multiply that value by 3.5 to get our reference value for
this part. Another reference value we considered was of one week of production so again
8*7 = 56 packages.
The table 3.7 will have the results of this exercise for each type.
This values can be compared with the statistical indicators of the inventory levels, such
as mean, standard deviation, maximum, minimum and percentiles of 25, 50, 75 and 90
to perceive the size of the problem.
The first indicator that situations of overstock occur frequently is that the mean value
31
C H A P T E R 3 . I N T R A LO G I S T I CS DATA A N A LYS I S
is bigger than our reference of 3.5 days for all types except for one that happens to be one
of the low runners, and the maximum value is bigger than the 7 days usage for all the
different types.
In the case of the high runners we observe that our 3.5-day reference is always smaller
than the 75th percentile so we can safely say that in at least 25% of the time we are facing
situations of overstock, and still on the case of the high runners two of them have inven-
tory levels superior to 7 days of consumption 10% of the time.
To illustrate this the graphs of the inventory levels for the month of September of 2018
of the high runners 1S0915105A and 5TA915105B are presented in the figures 3.1 and
3.2. In both graphs we have the inventory levels and the reference line is plotted with the
value of the 3.5 days of production as explained before and represented on table 3.1.
Both of these have a great area above the reference lines and some observations can be
made.
When the inventory levels go up in a given instant it means that a truck with new bat-
teries was received at the factory. There are multiple occasions on both of these graphs
where situations of overstock were already happening, and a new batch of material was
unloaded. In 2018 more than 1770 containers corresponding to over 95 thousand bat-
teries distributed in over 100 trucks of the high runner 1S0915105A were unloaded at
VWAE. This goes to show that there is room for improvement that can bring multiple
optimizations on this process alone, like reductions of the number of trucks (Co2 and
money), reduction of inventory space occupied and reduction of stall money. If situa-
tions like these are detected in advance by our system, it can alert inventory management
32
3 . 8 . I N V E N TO RY DATA A N A LYS I S
33
Chapter
4
Architecture
This chapter will describe the architecture implemented in the proposed solution to solve
the problem presented before.
The objective is to implement a data-driven system capable of improve efficiency in
the intralogistics processes at VWAE and to that end a layered architecture was chosen
to maintain flexibility and scalability as layers allow to test and work on components
independently of each other, changes to one of the layers do not require changes in others,
the usage of layers helps to control and encapsulate the complexity of large applications
and with a layered approach multiple applications can effortlessly reuse the components.
Since the objective is a data-driven system the first layer is the ETL layer that will gather,
prepare and load all the data available to the storage layer. This layer is then connected to
a machine learning layer or directly to a processing layer that processes the output of the
machine learning layer and connects to our visualization layer that presents the logistics
data in a visual way to planners.
On figure 4.1 we can see the different layers and the flow of data from the collection on
the shop floor to the data visualization layer.
In order to utilize the gathered data in a fruitful way and to apply machine-learning algo-
rithms data needs preparation. This layer is responsible for getting the data gathered and
transform it in a format acceptable by the storage layer and by the machine learning layer.
Each cluster of data has data in different formats and different sources has we observed
in the logistics data analysis chapter of this document, each of those clusters required
different operations of ETL and some of the more important are described in the logistics
data analysis chapter of this document.
35
CHAPTER 4. ARCHITECTURE
The different sources have different processes to gather data nonetheless all of the differ-
ent data is gathered in excel format. This layer cleans and prepares the data using python
scripts to then loads it to the storage layer with a python connection.
Our machine learning layer needs data of the production of batteries dataset in a specific
format to predict the future production cars with that batterie, the steps to prepare the
data were made with the Python library pandas[65] [66] and will be described here.
The raw data we receive from the production systems consist of a excel table that has
the 5 following columns, car identification number, sequence number, date, type of car
batteries and model of the car, and we have records for the entire 2018 and 2019 years.
After a quick analysis to this data we can see that it’s not prepared for machine learning
input the first step was dropping the sequence number, because it served no purpose for
our machine learning objectives.
The machine learning layer requires data with a given frequency or time steps, since
we have production data and we have no time step defined, each entry represents a car
produced at a given moment, to solve this issue, we decided to resample the data to a
daily format and created a dataset with the index being the dates from the first day of
2018 to the last day of 2019. After this operation our dataset now has one entry for each
day and has 8 columns, one for each batterie type(5) and one for each model(3) produced,
this is exemplified in table 4.1.
After this some operations on the data were made in a trial and error based on the
performance results from the machine learning layer.
36
4 . 2 . S TO R AG E L AY E R
Date J0S J0T J0V J1N J2D 711 7N2 A11 Month weekday Year Week
01/01/2018 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1 0 2018 1
02/01/2018 269.0 150.0 136.0 23.0 16.0 38.0 106.0 450.0 1 1 2018 1
03/01/2018 349.0 36.0 364.0 19.0 35.0 60.0 134.0 609.0 1 2 2018 1
04/01/2018 183.0 49.0 550.0 21.0 40.0 81.0 122.0 640.0 1 3 2018 1
05/01/2018 519.0 31.0 263.0 14.0 29.0 73.0 134.0 649.0 1 4 2018 1
06/01/2018 184.0 39.0 83.0 8.0 4.0 29.0 48.0 241.0 1 5 2018 1
07/01/2018 2.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 1 6 2018 1
08/01/2018 583.0 35.0 220.0 20.0 20.0 77.0 135.0 666.0 1 0 2018 2
The model with better results in terms of accuracy had as additional columns with the
month number of the date, information about day of the week, and the calendar week
number. These were added with the hope that some of these basic additions can make
some patterns more evident to the machine learning model and we got positive results.
One other test that was made but did not improve accuracy significantly was the intro-
duction of lag values, each entry of data would have a set of columns with the value from
the n entries before. As this operation did not prove advantageous it was dropped.
After these steps a simple division on training and test datasets was made.
Finally, before being feed into the machine learning model the data needs to normalize
to numbers between 0 and 1. To achieve this we utilized the MinMaxScaler feature from
the Scikit library.
Scikit-learn[67] is an open source machine learning library that supports supervised
and unsupervised learning. It also provides various tools for model fitting, data pre-
processing, model selection and evaluation, and many other utilities.
37
CHAPTER 4. ARCHITECTURE
made between them. This database was created to store all of the historical data described
in the chapter 3 of this document. The objects of this database and connections between
them were created in a way to integrate all of the part numbers present at VWAE and this
process wouldn’t be possible without the data analysis made before, it was this process
that identified the connections between the different data clusters, and the valuable and
redundant information.
Data is now structured, clean and integrated in a single database.
38
4 . 3 . M AC H I N E L E A R N I N G L AY E R
KERAS API[38].
Since in the previous layer (ETL layer) we prepared the data for machine learning input
all we need to do is create and feed a model. The objective is to input all of the produc-
tion data available at the moment for each type of batteries and predict production for
the next 3 days, to allow us to calculate the exact amount of batterie orders we need to
place. To simplify we decided to predict the production of a single type of batterie the
1S0915105A(J0S), however some tests to predict the production of the other high runners
showed similar results as we could expect.
For evaluation purposes we decided to use cross validation which is a technique to evalu-
ate the performance of ML models, the objective of cross validation is to test the model’s
ability to predict new data that was not used in estimating it, in order to flag problems
like overfitting or selection bias and to give an insight on how the model will generalize
to an independent dataset, in this case we divided the input dataset in 90% of the records
for training of the model and the last 10% for testing the model, and also the model will
set apart 10% of the training data, will not train on it, and will evaluate the loss on this
data at the end of each epoch. The loss function that we are utilizing is the MSE (Mean
Squared Error), that like the name says measures the average squared difference between
the estimated values and the actual value.
The process of building a machine learning model is iterative and throughout this process
various combinations of models and parameters have been tested and the choice of the
implemented model and its parameters was based on the performance of the different
models tested.
One of the first decision was the steps ahead that our model would forecast, in this case
we are utilizing daily data which means that each time step corresponds to one day, since
we need to predict 3 days in to the future this parameter was locked on 3. This means
that our model will try to predict 3 days after the last data provided.
Throughout this process it became clear that for our data and models we need at least
50 epochs of training because the validation and training errors would consistently drop
during the first 50 epochs and that more than 200 epochs of training are impractical since
from this point most of the models showed no significant improvements in performance
and in some cases the validation error climbed which is a sign that the model is overfit-
ting.
Regarding optimizers, which are algorithms or methods used to change the attributes of
your neural network such as weights and learning rate in order to reduce the losses, we
experimented with “Adam” and “RMSProp” and ended up using “Adam” because it we
got better results.
For the batch size we tried multiple values and ended up choosing 64 because it was the
one with better results without compromising the training time.
Regarding the optimal number of layers, we observed that one and two layers of LSTM
presented similar results, but the addition of more layers would result in worse perfor-
mances.
39
CHAPTER 4. ARCHITECTURE
To prevent our model from overfitting we inserted a dropout a layer. Dropout is a tech-
nique for addressing this problem. The idea is to randomly drop units (along with their
connections) from the neural network during training. This prevents units from co-
adapting too much and improves the performance of neural networks in multiple tasks
[71].
The model implemented consisted of one LSTM layer with 2000 units and a dropout of
20%, we utilized a batch size of 64, and trained for 200 epochs. The model was com-
piled using the “Adam” optimizer and the training and validation errors evolution across
epochs are shown in the results chapter in figure 5.5 and the predictions for the last 15
days in green as well as the true values in red are shown in figure 5.6.
40
4 . 5 . V I S UA L I Z AT I O N L AY E R
and visualization makes it easier to detect patterns, trends, and outliers in groups of data.
Good data visualizations should place meaning into complicated datasets so that their
message is clear and concise. We are an inherently visual world, where images speak
louder than words. Data visualization is especially important when it comes to big data
and data analyzation projects.
With this in mind and in order to get the most out of our data all of the features previously
developed were aggregated and displayed in an interactive dashboard.
To build this dashboard we utilized an open source visualization and analytics software
called Grafana. It provides charts, graphs, and alerts for the web when connected to
supported data sources. It is expandable through a plug-in system. End users can create
complex monitoring dashboards using interactive query builders.
This dashboard contains data from all of the analysed clusters with pre-defined views
and graphs but also allows user interaction, like the ability to adjust the time window
selected and apply filters to the data.
Another feature present is the automatic connection to the digital twin, this is visible in
figure 4.3, where planners can select a time range and press the start simulation button
and automatically an instance of the simulation software (Visual Components) starts with
the selected data to allow the user to validate and gather insights from a simulation point
of view with minimal effort.
In the figures 4.4 and 4.5 the temporal selection provides an easy way to visualize trends
and changes over time and across all clusters. Each cluster has a dedicated view in the
dashboard where multiple graphics and tables are presented to users, in figure 4.4 one of
the graphs represents the take rate of the five car batteries part numbers in VWAE and
in figure 4.5 all 3 graphics represent the internal movements of containers regarding car
41
CHAPTER 4. ARCHITECTURE
batteries in the warehouse. The integration of datasets is visible in the figures and with
this integration and visualization tools we eliminated all data-silos present in the logistics
data available in VWAE. Planners can now in an intuitively manner consult data from
multiple sources at the same time and also analyse historical data to make validations or
look for patterns.
This functionality will be especially important when the planners are looking for make
changes or even validate patterns recognized by the machine learning functionalities. It’s
all about providing information in an easy to read format to planners who will make the
informed decisions.
42
4 . 5 . V I S UA L I Z AT I O N L AY E R
43
Chapter
5
R e s u lt s
The objective of this chapter is to evaluate the impact of our system in optimizing the
inventory levels at VWAE, our study focused particularly on car batteries and because of
that all the results will be regarding this parts, however our approach can be applied to
many of the car components existent at VWAE. The first part of this chapter presents the
results of 3 machine learning models predicting the application of the 1S0915105A(J0S)
batterie production and one for predicting the 5TA915105B(J0V) batterie to explain the
decisions made throughout the process of building a machine learning model described
in the architecture chapter of this document.
As stated in the architecture chapter throughout the process of building our machine
learning model we provided different input data to the model, but to allow comparisons
between the models presented all had the same input. This input data consisted of the
normalized daily production data, and the addition of the weekday number and the cal-
endar week number as showed in table 4.1.
Model I consisted on one LSTM layer with 2000 units and a dropout of 20% , we utilized
a batch size of 64, and trained for 500 epochs. The model was compiled using the “RM-
Sprop” optimizer and on figure 5.1 the training and validation errors (Y axis) evolution
across epochs (X axis) are shown and on figure 5.2 the predictions of production of cars
with the 1S0915105A batterie in the last 62 days of 2019 days in green as well as the true
values in red are shown. With the validation error stabilizing with values very close to
zero very different from the high validation error we can see that the model was overfit-
ted.
Model II consisted on one LSTM layer with 3000 units and a dropout of 20% , we utilized
a batch size of 64, and trained for 200 epochs. The model was compiled using the “Adam”
optimizer and on figure 5.3 the training and validation errors (Y axis) evolution across
epochs (X axis) are shown and on figure 5.4 the predictions of production of cars with the
45
C H A P T E R 5 . R E S U LT S
1S0915105A batterie in the last 62 days of 2019 days in green as well as the true values
in red are shown.
Model III consisted on one LSTM layer with 2000 units and a dropout of 20% , we uti-
lized a batch size of 64, and trained for 100 epochs. The model was compiled using the
“Adam” optimizer and on figure 5.5 the training and validation errors (Y axis) evolution
across epochs (X axis) are shown and on figure 5.6 the predictions of production of cars
with the 1S0915105A batterie in the last 62 days of 2019 days in green as well as the true
values in red are shown. This was the model that presented better results therefore was
46
Figure 5.3: Error model II
47
C H A P T E R 5 . R E S U LT S
described before and presents similar results and consists on one LSTM layer with 2000
units and a dropout of 20% , we utilized a batch size of 64, and trained for 200 epochs.
The model was compiled using the “Adam” optimizer and on figure 5.7 the training and
validation errors (Y axis) evolution across epochs (X axis) are shown and on figure 5.8
the predictions of production of cars with the 5TA915105B batterie in the last 62 days of
2019 days in green as well as the true values in red are shown.
This shows that the machine learning layer can predict the production with a although
we have only 2 years of data to train the model and we expect the model’s performance
to improve significantly with the addition of more data and even though some different
models and parameters were tested this is still an early stage of the developing phase and
we expect improvements moving forward.
Now regarding the results from the entire system we can’t exactly show actual results be-
cause the system needs to be put in place to be tested and then it would require multiple
months until conclusions on the actual results can be analysed. The application of this
system on historical data to evaluate its performance would be clearly biased so it was
disregarded.
However historical data can be used to estimate the possible optimizations. In chapter 3
of this document we pointed out that in 2018 there were multiple cases of overstock of
48
Figure 5.6: Predictions model III
49
C H A P T E R 5 . R E S U LT S
car batteries at VWAE, particularly in table 3.7 we can see that for every car batterie in
VWAE we are at least half of the time in overstock situations and 25% of the time in severe
overstock, to give some perspective the maximum inventory value of the 1S0915105A
component in 2018 was 75 packages, if the factory was functioning at optimal pace (900
cars day) and only produced cars with the 1S0915105A batterie it would take more than
9 days to deplete the warehouse. If for example, we look at the situation in September
illustrated in chapter 3 in figure 3.2 and by analysing this data in detail we can see how
our system can reduce situations of overstock. The inventory levels for the 1S0915105A
batterie starts the month at 26 packages and continues to drop to 21, that is still over
the security level, until a truck arrives with 21 packages on day 2 increasing the level
to 42, and again on the forth 26 more packages and again on the fifth 10 more. During
these 4 days arrived 57 packages and were consumed around 25. This led to a value of 65
packages on the end of the fifth of September. Even after this peak the inventory levels
drop for a few days but never dip below the overstock threshold and then a new mountain
of overstock where the maximum value reached 61 packages. The orders of this material
were placed 5 days before their arrival at VWAE, we believe that if planners get access to
this data and our visualization capabilities as we can see in figure 5.9 and our machine
learning predictions situations like the one described before would occur less frequently
and with less impact. With transport, inventory and production data available as well
as our machine learning predictions in a single platform with easy access we believe we
50
can help logistics planners at VWAE reduce the occurrence of overstock situations in half
regarding car batteries. This will save warehouse space and reduce warehouse costs.
51
Chapter
6
Co n c lu s i o n s a n d Fu t u r e w o r k
6.1 Conclusions
The main objectives of this work were to improve efficiency in intralogistics processes at
VWAE and to present a data-driven system capable of reducing situations of overstock.
We believe that both objectives were accomplished.
Regarding the machine learning model, we feel good about the choice of a LSTM model
as it quickly noticed patterns in a time series data.
The decision to do an internship at VWAE during this project was in my opinion positive
for both sides. I learned a lot during this internship and was able to apply some of
the skills that I learned during my academic path. I also believe that the work I did
at VWAE may prove to be an asset, especially the work focused on the data gathering
automatization, and I believe that I contributed to a better data culture at VWAE and that
in the future this will bring great advantages to VWAE. The toughest objective during
this project was the integration of data between the multiple sources and the subsequent
elimination of data silos and this challenge could have been even more difficult had it not
been for the constant support that I enjoyed thanks to the internship. The fact that I can
talk to people who really know the systems in place and that I can validate data on the
shop floor has proven important.
Looking at the whole scope of the BOOST 4.0 project this system is only one way we can
improve intralogistics processes and we see the actions in the data gathering and data
integration as the biggest contribute made for improvement in the present and future.
This integration process was made with the purpose of being scalable and this can be the
groundwork for multiple future applications of data-driven solutions.
This now allows management at VWAE to select a focus problem and with minimal effort
from planner’s, powerful data-based solutions can be quickly built.
53
C H A P T E R 6 . CO N C LU S I O N S A N D F U T U R E WO R K
The contribute to the creation of a digital twin of the intralogistics processes at VWAE also
added value to the final solution and opened paths for more innovative and interactive
optimizations.
The possibility of improvements in the intralogistics process planning with data-driven
systems is demonstrated and this system can be used as experience to build more solutions
without as much initial resistance created by the necessity of data quality, quantity and
integration.
54
Bibliography
[1] S. Vaidyaa. “Industry 4.0 and the current status as well as future prospects on
logistics.” In: Procedia Manufacturing 20 (2018), pp. 233–238. i s s n: 2351-9789.
[2] D. Reinsel, J. Gantz, and J. Rydning. “The Digitization of the World - From
Edge to Core.” In: Framingham: International Data Corporation November (2018),
US44413318. u r l: https : / / www . seagate . com / files / www - content / our -
story/trends/files/idc-seagate-dataage-whitepaper.pdf.
[3] Q. QI and F. TAO. “Digital Twin and Big Data Towards Smart Manufacturing
and Industry 4.0: 360 Degree Comparison.” In: IEEE Access 6 (2018), pp. 23–
34. i s s n: 3585-3593. d o i: http://www.ieee.org/publications_standards/
publications/rights/index.html.
[4] P. Simon. The visual organization: Data visualization, big data, and the quest for better
decisions. John Wiley & Sons, 2014.
[5] M. Holler, F. Uebernickel, and W. Brenner. “Digital Twin Concepts in Manufac-
turing Industries - A Literature Review and Avenues for Further Research.” In:
Proceedings of the 18th International Conference on Industrial Engineering (IJIE), Ko-
rean Institute of Industrial Engineers: Korean Institute of Industrial Engineers,
2016. u r l: https://www.alexandria.unisg.ch/249292/.
[6] T. Wuest, D. Weimer, C. Irgens, and K.-D. Thoben. “Machine learning in manu-
facturing: advantages, challenges, and applications.” In: Production & Manufac-
turing Research 4.1 (2016), pp. 23–45. d o i: 10.1080/21693277.2016.1192517.
eprint: https : / / doi . org / 10 . 1080 / 21693277 . 2016 . 1192517. u r l: https :
//doi.org/10.1080/21693277.2016.1192517.
[7] E. Hofmann and M. Rusch. “Industry 4.0 and the current status as well as future
prospects on logistics.” In: Computers in Industry 89 (2017), pp. 23–34. issn: 0166-
3615. d o i: http://dx.doi.org/10.1016/j.compind.2017.04.002.
[8] N. Schmidtke, F. Behrendt, L. Thater, and S. Meixner. “Technical potentials and
challenges within internal logistics 4.0.” In: 2018 4th International Conference on
Logistics Operations Management (GOL). 2018, pp. 1–10. d o i: 10.1109/GOL.2018.
8378072.
[9] D2.3 – Pilots description, adaptations and executive plans v1. 2019. u r l: https :
//cordis.europa.eu/project/id/780732/results.
55
BIBLIOGRAPHY
[10] J. F. Krafcik. “Triumph of the lean production system.” In: MIT Sloan Management
Review 30.1 (1988), p. 41.
[11] J. P. Womack and D. T. Jones. “Lean thinking—banish waste and create wealth
in your corporation.” In: Journal of the Operational Research Society 48.11 (1997),
pp. 1148–1148.
[12] N. Stefanovic and D. Stefanovic. “Supply chain business intelligence: technologies,
issues and trends.” In: Artificial intelligence an international perspective. Springer,
2009, pp. 217–245.
[13] A. Kadadi, R. Agrawal, C. Nyamful, and R. Atiq. “Challenges of data integration
and interoperability in big data.” In: 2014 IEEE International Conference on Big Data
(Big Data). 2014, pp. 38–40.
[14] R. Wirth and J. Hipp. “CRISP-DM: Towards a standard process model for data min-
ing.” In: Proceedings of the Fourth International Conference on the Practical Application
of Knowledge Discovery and Data Mining. 2000, pp. 29–39.
[15] D2.7 – Boost 4.0 standardization certification v1. 2019. u r l: https : / / cordis .
europa.eu/project/id/780732/results.
[16] A. W. Colombo, T. Bangemann, S. Karnouskos, J. Delsing, P. Stluka, R. Harrison,
F. Jammes, J. L. Lastra, et al. “Industrial cloud-based cyber-physical systems.” In:
The Imc-aesop Approach 22 (2014), pp. 4–5.
[17] D. Lukač. “The fourth ICT-based industrial revolution "Industry 4.0"— HMI and
the case of CAE/CAD innovation with EPLAN P8.” In: 2015 23rd Telecommunica-
tions Forum Telfor (TELFOR). 2015, pp. 835–838.
[18] G. Rathwell and P. Ing. “Design of enterprise architectures.” In: pera. net,[Online],
Available: http://www. pera. net/Levels. html,[Accessed: Apr. 30, 2010] (2004).
[19] P. B. Kruchten. “The 4+ 1 view model of architecture.” In: IEEE software 12.6 (1995),
pp. 42–50.
[20] M. Hankel and B. Rexroth. Das Referenzarchitekturmodell Industrie 4.0 (RAMI 4.0).
2015.
[21] DIN SPEC 91345:2016-04, Referenzarchitekturmodell Industrie 4.0 (RAMI4.0). d o i:
10.31030/2436156. u r l: https://doi.org/10.31030/2436156.
[22] D. Knoll, M. Prüglmeier, and G. Reinhart. “Predicting Future Inbound Logistics
Processes Using Machine Learning.” In: Procedia CIRP 52 (2016). The Sixth Inter-
national Conference on Changeable, Agile, Reconfigurable and Virtual Production
(CARV2016), pp. 145 –150. i s s n: 2212-8271. d o i: https://doi.org/10.1016/
j . procir . 2016 . 07 . 078. u r l: http : / / www . sciencedirect . com / science /
article/pii/S2212827116308770.
56
BIBLIOGRAPHY
57
BIBLIOGRAPHY
58
BIBLIOGRAPHY
[43] Y. Zhang, S. Ma, H. Yang, J. Lv, and Y. Liu. “A big data driven analytical framework
for energy-intensive manufacturing industries.” In: Journal of Cleaner Production
197 (2018), pp. 57 –72. i s s n: 0959-6526. d o i: https : / / doi . org / 10 . 1016 /
j . jclepro . 2018 . 06 . 170. u r l: http : / / www . sciencedirect . com / science /
article/pii/S0959652618318201.
[44] J. Moyne and J. Iskandar. “Big data analytics for smart manufacturing: Case studies
in semiconductor manufacturing.” In: Processes 5.3 (2017), p. 39.
[45] Y. Kang, I. Park, J. Rhee, and Y. Lee. “MongoDB-Based Repository Design for IoT-
Generated RFID/Sensor Big Data.” In: IEEE Sensors Journal 16.2 (2016), pp. 485–
497.
[46] Y. Zhang, S. Ren, Y. Liu, and S. Si. “A big data analytics architecture for cleaner
manufacturing and maintenance processes of complex products.” In: Journal of
Cleaner Production 142 (2017). Special Volume on Improving natural resource man-
agement and human health to ensure sustainable societal development based upon
insights gained from working within ‘Big Data Environments’, pp. 626 –641. issn:
0959-6526. d o i: https://doi.org/10.1016/j.jclepro.2016.07.123. u r l:
http://www.sciencedirect.com/science/article/pii/S0959652616310198.
[47] K. Witkowski. “Internet of Things, Big Data, Industry 4.0 – Innovative Solutions
in Logistics and Supply Chains Management.” In: Procedia Engineering 182 (2017).
7th International Conference on Engineering, Project, and Production Management,
pp. 763 –769. i s s n: 1877-7058. d o i: https : / / doi . org / 10 . 1016 / j . proeng .
2017.03.197. u r l: http://www.sciencedirect.com/science/article/pii/
S1877705817313346.
[48] D. Maltby. “Big data analytics.” In: 74th Annual Meeting of the Association for
Information Science and Technology (ASIST). 2011, pp. 1–6.
[49] S. Srinivasa and S. Mehta. Big Data Analytics: Third International Conference, BDA
2014, New Delhi, India, December 20-23, 2014. Proceedings. Vol. 8883. Springer,
2014.
[50] J. Zakir, T. Seymour, and K. Berg. “BIG DATA ANALYTICS.” In: Issues in Informa-
tion Systems 16.2 (2015).
[51] A. G. Shoro and T. R. Soomro. “Big data analysis: Apache spark perspective.” In:
Global Journal of Computer Science and Technology (2015).
[52] MLlib: Apache Spark. u r l: https://spark.apache.org/mllib/.
[53] The MongoDB 4.2 Manual. u r l: https://docs.mongodb.com/manual/.
[54] ur l: http://cassandra.apache.org/doc/latest/.
[55] D. Sheakh. “A Study of Inventory Management System Case Study.” In: Journal of
Dynamical and Control Systems 10 (May 2018), pp. 1176–1190.
59
BIBLIOGRAPHY
[56] J.-S. Song, G.-J. van Houtum, and J. A. Van Mieghem. “Capacity and Inventory
Management: Review, Trends, and Projections.” In: Manufacturing Service Opera-
tions Management 22.1 (2020), pp. 36–46. d o i: 10.1287/msom.2019.0798. eprint:
https://doi.org/10.1287/msom.2019.0798. u r l: https://doi.org/10.1287/
msom.2019.0798.
[57] A. Dolgui and C. Prodhon. “Supply planning under uncertainties in MRP environ-
ments: A state of the art.” In: Annual Reviews in Control 31 (Dec. 2007), pp. 269–
279. d o i: 10.1016/j.arcontrol.2007.02.007.
[58] S. Koh and S. Saad. “MRP-controlled manufacturing environment disturbed by
uncertainty.” In: Robotics and Computer-Integrated Manufacturing 19 (Feb. 2003),
pp. 157–171. d o i: 10.1016/S0736-5845(02)00073-X.
[59] S. Axsäter. “A Capacity Constrained Production-Inventory System with Stochas-
tic Demand and Production Times.” In: International Journal of Production Research -
INT J PROD RES 48 (Oct. 2010), pp. 6203–6209. doi: 10.1080/00207540903283808.
[60] J. Gijsbrechts, R. Boute, D. Zhang, and J. Van Mieghem. “Can Deep Reinforcement
Learning Improve Inventory Management? Performance on Dual Sourcing, Lost
Sales and Multi-Echelon Problems.” In: SSRN Electronic Journal (July 2019). d o i:
10.2139/ssrn.3302881.
[61] N. Stefanovic, D. Stefanovic, and B. Radenkovic. “Application of Data Mining for
Supply Chain Inventory Forecasting.” In: Applications and Innovations in Intelligent
Systems XV. Ed. by R. Ellis, T. Allen, and M. Petridis. London: Springer London,
2008, pp. 175–188. i s b n: 978-1-84800-086-5.
[62] N. Xue, I. Triguero, G. P. Figueredo, and D. Landa-Silva. “Evolving Deep CNN-
LSTMs for Inventory Time Series Prediction.” In: 2019 IEEE Congress on Evolution-
ary Computation (CEC). 2019, pp. 1517–1524.
[63] B. Hussein, A. Kasem, S. Omar, and n. z. Siau. “A Data Mining Approach for
Inventory Forecasting: A Case Study of a Medical Store: Proceedings of the Compu-
tational Intelligence in Information Systems Conference (CIIS 2018).” In: Jan. 2019,
pp. 178–188. i s b n: 978-3-030-03301-9. d o i: 10.1007/978-3-030-03302-6_16.
[64] S. Li and X. Kuo. “The inventory management system for automobile spare parts
in a central warehouse.” In: Expert Systems with Applications 34.2 (2008), pp. 1144
–1153. i s s n: 0957-4174. d o i: https : / / doi . org / 10 . 1016 / j . eswa . 2006 .
12 . 003. u r l: http : / / www . sciencedirect . com / science / article / pii /
S0957417406004015.
[65] T. pandas development team. pandas-dev/pandas: Pandas. Version latest. Feb. 2020.
d o i: 10 . 5281 / zenodo . 3509134. u r l: https : / / doi . org / 10 . 5281 / zenodo .
3509134.
60
BIBLIOGRAPHY
[66] Wes McKinney. “Data Structures for Statistical Computing in Python.” In: Proceed-
ings of the 9th Python in Science Conference. Ed. by Stéfan van der Walt and Jarrod
Millman. 2010, pp. 56 –61. do i: 10.25080/Majora-92bf1922-00a.
[67] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
M. Brucher, M. Perrot, and E. Duchesnay. “Scikit-learn: Machine Learning in
Python.” In: Journal of Machine Learning Research 12 (2011), pp. 2825–2830.
[68] PostgreSQL: The world’s most advanced open source database. u r l: https://www.
postgresql.org/ (visited on 06/06/2020).
[69] W. Ali, M. U. Shafique, M. A. Majeed, and A. Raza. “Comparison between SQL and
NoSQL databases and their relationship with big data analytics.” In: Asian Journal
of Research in Computer Science (2019), pp. 1–10.
[70] T. N. Khasawneh, M. H. AL-Sahlee, and A. A. Safia. “SQL, NewSQL, and NOSQL
Databases: A Comparative Survey.” In: 2020 11th International Conference on Infor-
mation and Communication Systems (ICICS). 2020, pp. 013–021.
[71] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. “Dropout:
a simple way to prevent neural networks from overfitting.” In: The journal of ma-
chine learning research 15.1 (2014), pp. 1929–1958.
61