Full Download (Ebook) Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications) by Thomas Bartz-Beielstein, Eva Bartz ISBN 9789819970063, 9819970067 PDF DOCX
Full Download (Ebook) Online Machine Learning: A Practical Guide with Examples in Python (Machine Learning: Foundations, Methodologies, and Applications) by Thomas Bartz-Beielstein, Eva Bartz ISBN 9789819970063, 9819970067 PDF DOCX
com
OR CLICK HERE
DOWLOAD EBOOK
(Ebook) Hyperparameter Tuning for Machine and Deep Learning with R: A Practical
Guide by Eva Bartz, Thomas Bartz-Beielstein, Martin Zaefferer, Olaf Mersmann, (eds.)
ISBN 9789811951695, 9811951691
https://ebooknice.com/product/hyperparameter-tuning-for-machine-and-deep-
learning-with-r-a-practical-guide-48885888
ebooknice.com
(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James ISBN
9781459699816, 9781743365571, 9781925268492, 1459699815, 1743365578, 1925268497
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
ebooknice.com
(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans Heikne, Sanna
Bodemyr ISBN 9789127456600, 9127456609
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312
ebooknice.com
(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success) by Peterson's
ISBN 9780768906677, 0768906679
https://ebooknice.com/product/sat-ii-success-math-1c-and-2c-2002-peterson-s-sat-
ii-success-1722018
ebooknice.com
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT Subject Test: Math
Levels 1 & 2) by Arco ISBN 9780768923049, 0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-arco-master-
the-sat-subject-test-math-levels-1-2-2326094
ebooknice.com
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study: the United
States, 1919-41 2nd Edition by Benjamin Harrison ISBN 9781398375147, 9781398375048,
1398375144, 1398375047
https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044
ebooknice.com
https://ebooknice.com/product/machine-learning-the-basics-machine-learning-
foundations-methodologies-and-applications-37751790
ebooknice.com
https://ebooknice.com/product/machine-learning-the-basics-machine-learning-
foundations-methodologies-and-applications-37757544
ebooknice.com
(Ebook) Practical Machine Learning for Streaming Data with Python: Design, Develop,
and Validate Online Learning Models by Sayan Putatunda ISBN 9781484268667,
9781484268674, 1484268660, 1484268679
https://ebooknice.com/product/practical-machine-learning-for-streaming-data-
with-python-design-develop-and-validate-online-learning-models-24035852
ebooknice.com
Machine Learning: Foundations, Methodologies,
and Applications
Eva Bartz
Thomas Bartz-Beielstein Editors
Online
Machine
Learning
A Practical Guide with Examples in
Python
Machine Learning: Foundations, Methodologies,
and Applications
Series Editors
Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University,
Hong Kong, China
Dacheng Tao, University of Technology, Sydney, Australia
Books published in this series focus on the theory and computational foundations,
advanced methodologies and practical applications of machine learning, ideally
combining mathematically rigorous treatments of a contemporary topics in machine
learning with specific illustrations in relevant algorithm designs and demonstrations
in real-world applications. The intended readership includes research students and
researchers in computer science, computer engineering, electrical engineering, data
science, and related areas seeking a convenient medium to track the progresses made
in the foundations, methodologies, and applications of machine learning.
Topics considered include all areas of machine learning, including but not limited
to:
. Decision tree
. Artificial neural networks
. Kernel learning
. Bayesian learning
. Ensemble methods
. Dimension reduction and metric learning
. Reinforcement learning
. Meta learning and learning to learn
. Imitation learning
. Computational learning theory
. Probabilistic graphical models
. Transfer learning
. Multi-view and multi-task learning
. Graph neural networks
. Generative adversarial networks
. Federated learning
This series includes monographs, introductory and advanced textbooks, and state-
of-the-art collections. Furthermore, it supports Open Access publication mode.
Eva Bartz · Thomas Bartz-Beielstein
Editors
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Do you hear the rumble of the drums? That’s the world of data analytics moving
towards real time. A lot of effort is being poured into turning batch data warehouses
into real-time data warehouses. It seems inevitable that more advanced use cases,
such as machine learning, will also move towards real time. And yet, the field of
online machine learning has already existed for decades. In fact, a lot of modern
deep learning is powered by online learning methods. However, online machine
learning is yet to be fully appreciated. The fact a model operates online is only
scratching the surface. Doing online machine learning can reap great benefits if done
comprehensively and properly. But it also requires a different mental model to what
most practitioners are used to.
I’ve been working on online machine learning for over 5 years. Admittedly, I have
not observed a great shift towards online machine learning. In spite of that, I’ve never
been more convinced online machine learning has enormous merits that are yet to
be uncovered and held in high regard. I believe there are several ways for online
machine learning to grow in popularity. For of all, although it’s quite clear Big Tech
companies are running online models in production, there are not enough public
details of how they do it. Practitioners have to be convinced by real and concrete
examples. Secondly, there are not enough tools and libraries that make it easy to do
online machine learning, akin to what scikit-learn did for batch machine learning.
This is something I tried to resolve by creating River, although there are other great
tools out there, such as Vowpal Wabbit. Thirdly, there is a lack of educational material
that explains how to do online machine learning.
This book is a wonderful attempt to address the third point. It covers all the standard
topics of machine learning, but with an online twist. It’s a great introduction to online
machine learning. I hope it will inspire more people to do online machine learning, to
appreciate the value of processing data online, and to do so properly. Once you have
understood the concepts in this book, you will be able to view the world of machine
learning through a different lens. You will be able to see the world as a stream of
data, and you will be able to process it as such. You will be able to build models that
learn from data as it arrives. You will be able to build models that adapt to change.
You will be able to build models that are always up-to-date. You will be able to build
v
vi Foreword
models that are always learning. You will be able to build models that are evaluated
in real time. Trust me, it’s worth it.
This book deals with the exciting, seminal topic of Online Machine Learning (OML).
It is divided into three parts: First, we look in detail at the theoretical foundations of
OML. We describe what OML is and ask how it can be compared to Batch Machine
Learning (BML) and what criteria one should develop for a meaningful comparison.
In the second part, we provide practical considerations, and in the third part, we
substantiate them with concrete practical applications.
Why OML? Among other things, it is about the decisive time advantage. This
can be months, weeks, days, hours, or even just seconds. This time advantage can
arise if Artificial Intelligence (AI) can evaluate data continuously, i.e., online. It
does not have to wait until a complete set of data is available, but can already use a
single observation to update the model. Does OML have other advantages besides
the obvious time advantage? If so, what are they? We ask, are there limitations of
BML that OML overcomes? It needs to be carefully examined at what price one gets
these advantages from OML. How high is the memory requirement compared to
conventional methods? Memory requirements also mean financial costs, e.g., due to
higher energy requirements. Is OML possibly energy-saving and thus more sustain-
able, i.e., Green IT? Is it possible to obtain comparably good results? Does the quality
(performance) suffer, do the results become less accurate? In order to answer these
questions reliably, we first give an understandable introduction to OML in the theo-
retical part, which is suitable for beginners as well as for advanced users. Then we
justify the criteria we found for the comparability of OML and BML, namely a well-
comprehensible representation of quality, time, and memory requirements. In the
second part, we address the question of exactly how OML can be used in practice.
We are joined by experts from the field who report on their practical experiences,
e.g., requirements for official statistics. We give reasons for recommendations for
the practical use of OML.
We comprehensively present the software packages currently available for OML,
especially “River”,1 and offer Sequential Parameter Optimization Toolbox for River
(spotRiver), a software we developed specifically for OML. We deal in detail with
1 https://riverml.xyz/.
vii
viii Preface
special problems that can occur with data streams. The central problem for data
streams is drift. We deal with the explainability of AI models, interpretability, and
reproducibility as required in upcoming regulations for AI systems. These aspects
can contribute to higher acceptance of AI.
In the application section, we present two detailed studies, one of which uses a
large dataset with one million data. We provide evidence of when OML performs
better than BML. Of particular interest is the study on hyperparameter tuning of
OML. Here we show how OML can perform significantly better by optimizing
hyperparameters.
Notebook
Supplementary program code for the applications and examples from this book
can be found in so-called “Jupyter Notebooks” in the GitHub repository https://
github.com/sn-code-inside/online-machine-learning/. The notebooks are orga-
nized by chapter.
The consulting firm Bartz & Bartz GmbH2 laid the foundation for this book when
it was awarded a contract from a tender of the Federal Statistical Office of Germany
in 2023.3 The Federal Statistical Office of Germany wanted to know whether it makes
sense to use OML now for the treasure trove of data and the evaluation on behalf
of the public sector (see the comments in Chap. 7). The slightly sobering result of
our expertise was: interesting perspectives for the future are opening up, but at the
moment there is no immediate prospect of using it. In some cases, there are technical
and organizational hurdles to adapting processes in such a way that the advantages of
OML can really come into play. In some cases, OML processes and implementations
are not yet mature enough.
The topic fascinated us so much that we decided to pursue it further. Prof. Dr.
Thomas Bartz-Beielstein took the question of the practical relevance of OML with
him to the TH Köln, where he continued his research in the field, which had been
ongoing for years. Under his guidance, the research group at the Institute for Data
Science, Engineering, and Analytics (IDE+A)4 was able to develop software to the
point where we believe we have advanced its suitability quite a bit. Thus, we have
combined the expertise of Bartz & Bartz GmbH with the research at the TH Köln,
which resulted in this book.
Overall, the book is equally suitable as a reference manual for experts dealing
with OML, as a textbook for beginners who want to deal with OML, and as a scien-
tific publication for scientists dealing with OML, since it reflects the latest state of
research. But it can also serve as quasi-OML consulting, as decision-makers and
2 https://bartzundbartz.de.
3 https://destatis.de.
4 https://www.th-koeln.de/idea.
Preface ix
practitioners can use our explanations to tailor OML to their needs and use it for
their application, and ask whether the benefits of OML might outweigh the costs.
To name just a few examples from military and civilian practice:
. You use state-of-the-art sensor systems to predict floods. Here, faster prediction
can save lives.
. You need to fend off terrorist attacks and use underwater sensors to do so. Here,
it can be crucial that the AI “recognizes” more quickly whether harmless water
sportsmen are involved.
. You are responsible for observing the airspace. Reconnaissance drones, for
example, can be used more efficiently if they can be programmed and trained
with very recent AI data evaluations.
. You must be very expeditious in adjusting the production of critical infrastructure
goods, such as vaccines, protective clothing, or medical equipment. Here, it can
be useful to keep the entire production process, including the raw materials to be
used, as up-to-date as possible. This can be achieved by real-time evaluation and
translation into requirements based on hospital bed occupancy or sick notes.
. You are a payment service provider and you need to detect fraud attempts virtually
in real time.
In conclusion, we note: OML will soon become practical, it is worthwhile to get
involved with it now. This book already presents some tools that will facilitate the
practice of OML in the future. A promising breakthrough is to be expected, because
practice shows that due to the large amounts of data that accumulate, the previous
BML is no longer sufficient. OML is the solution to evaluate and process data streams
in real time and to deliver results that are relevant for practice. Specifically, the book
covers the following topics:
Chapter 1 describes the motivation for this book and the objective. It describes
the drawbacks and limitations of BML and the need for OML. Chapter 2 gives an
overview and evaluation of methods and algorithms with special focus on supervised
learning (classification and regression). Chapter 3 describes procedures for drift
detection. Updateability of OML procedures is discussed in Chap. 4. Chapter 5
explains procedures for the evaluation of OML methods. Chapter 6 deals with special
requirements for OML. Possible OML applications are presented in Chap. 7 and
evaluated by experts in official statistics. The availability of the algorithms in software
packages, especially for R and Python, is presented in Chap. 8.
The computational effort required for updating the OML models, also in compar-
ison to an algorithmically similar offline procedure (BML), is examined experimen-
tally in Chap. 9. There, it is also discussed to what extent the model quality could be
affected, especially in comparison to similar offline methods. Chapter 10 describes
hyperparameter tuning for OML. Chapter 11 presents a summary and gives important
recommendations for practitioners.
xi
xii Contents
xiii
Chapter 1
Introduction: From Batch to Online
Machine Learning
Thomas Bartz-Beielstein
The volume of data generated from various sources has increased tremendously
in recent years. Technological advances have enabled the continuous collection of
data. The “three Vs” (volume, velocity, and variety) were initially used as criteria to
describe big data1 : Volume here refers to the large amount of data, velocity refers to
the high speed at which the data is generated, and variety refers to the large variety
of data.
The data streams (streaming data) considered in this book pose an even greater
challenge to Machine Learning (ML) algorithms than big data. In addition to the
three big data Vs, there are other challenges that arise, in particular, from volatility
and the possibility that abrupt changes (“drift”) can occur.
Definition 1.1 (Streaming-Data) Streaming data is data that is generated in a con-
tinuous data stream. It is loosely structured, volatile, always “flowing”, and contains
unpredictable, sometimes abrupt, changes. Streaming data is a subset of big data
with the following characteristics:
. Volume: Streaming data is generated in very large quantities.
1 The three Vs were expanded over time by adding veracity and value to the “five Vs”.
T. Bartz-Beielstein (B)
Institute for Data Science, Engineering, and Analytics, TH Köln, Gummersbach, Germany
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
E. Bartz and T. Bartz-Beielstein (eds.), Online Machine Learning,
Machine Learning: Foundations, Methodologies, and Applications,
https://doi.org/10.1007/978-981-99-7007-0_1
2 T. Bartz-Beielstein
Example: Streaming-Data
A great deal of data is generated during various daily transactions, such as online
shopping, online banking, or online stock trading. In addition, there is sensor data,
social media data, data from operational monitoring and data from the Internet of
Things, to name just a few examples.
Streaming data requires real-time or near real-time analysis. Since the data stream
is constantly being produced and never ends, it is not possible to store these enormous
volumes of data.
Definition 1.2 (Static Data) By static data we mean data that have been collected
at a certain point in time and are no longer changed. They are used in the field of
classical ML and have the following properties:
. Volume: Static data usually have a manageable volume.
. Persistence: Static data can be retrieved as often as required. They do not change
their structure.
. Structure: Static data are usually structured and are available in tabular form.
The idea for this book is based on a study conducted for the German Federal
Statistical Office. The algorithms described here may also become relevant for official
statistics. One of the main objectives of the Federal Statistical Office is to publish
statistics at regular intervals. New data is continuously being accumulated, which has
to be evaluated. The publication intervals and data volumes are still manageable, but
the current trend goes towards new digital data and shorter publication cycles. The
large data volumes and analysis requirements that will then exist could necessitate
novel ML algorithms. This issue is explored in Sect. 7.1.
In this book, we distinguish between algorithms and models: Models are built using
algorithms and data. Most ML algorithms use static data in three steps to build
models:
1 Introduction: From Batch to Online Machine Learning 3
BML problems occur when the size of the data set exceeds the size of the available
amount of Random Access Memory (RAM). Possible solutions are
. optimization of the data types (“sparse representations”),
. use of a partial data set (“out-of-core learning”),
i.e., dividing the data into blocks or mini-batches, see Spark MLlib2 or Dask,3 and
. use of highly simplified models.
In these solutions, the data is fitted to the model rather than the model to the data.
Therefore, the full potential of online data is not used.
1.2.2 Drift
In general, structural changes (“drift”) in the data cause problems for ML algorithms.4
For example, in energy consumption forecasting, previously known consumption
2 https://spark.apache.org/mllib/.
3 https://examples.dask.org/machine-learning.html.
4 This section describes the different types of drift. The OML algorithms for drift detection and
levels are only one element needed for modeling. In practice, future demand is
driven by a number of non-stationary forces such as climate variability, population
growth, or by the introduction of disruptive clean energy technologies. These may
require both gradual and sudden domain adjustments.
Drift causes problems for ML models because models can become outdated—
they become unreliable over time because the relationships they capture are no longer
valid. This leads to a decrease in the performance of these models. Therefore, pre-
diction, classification, regression, or anomaly detection approaches should be able
to detect and respond to concept deviations in a timely manner so that the model can
be updated as soon as possible.
In time series applications, in many fields such as finance, e-commerce, eco-
nomics, and healthcare, the statistical properties of the time series may change,
rendering forecasting models useless. Although the concept of the drift problem has
been well studied in the literature, surprisingly little effort has been made to solve it.
We can distinguish three types of drift: feature, label, and concept drift.
In the following, .(X, y) denotes a sample, where . X is a set of features and . y is the
target variable. Features can be derived from attributes. Attributes are also referred
to as independent variables, and target variables correspondingly as dependent vari-
ables. In classification problems, the target variable is a class label, in regression
problems the predicted value. Often . y is not only determined by . X but also by a set
of unknown underlying conditions. This leads to the definition of the concept:
Definition 1.4 (Concept) A concept is a relationship between . X and . y given a set
of unknown constraints (a context . K ).
Definition 1.5 (Feature Drift) Feature drift describes a change in the independent
variable . X .
A regulatory intervention is an example of feature drift: New laws can change
consumer behavior (Auffarth, 2021; Castle et al., 2021).
Definition 1.6 (Label Drift) Label drift is a change in the target . y.
The increase in the average value of goods at retail is given here as an example
of label drift.
Definition 1.7 (Concept Drift) Concept drift is a change in the concept, i.e., the
relationship between the independent variables . X and the target variable . y.
ML models can not observe the underlying conditions that determine a concept
and therefore have to make an assumption about which relationship applies to each
sample. This is difficult when conditions change, leading to a change in the concept,
which is called concept drift. The synthetically generated Friedman-Drift data set
provides a vivid example of concept drift.
Definition 1.8 (The Friedman-Drift Data Set) Each observation in the Friedman-
Drift data set consists of ten features. Each feature value is drawn equally distributed
from the interval.[0, 1]. Only the first six features,.x0 to.x5 , are relevant. The dependent
variable is defined by two functions that depend on whether drift is present:
1 Introduction: From Batch to Online Machine Learning 5
[
10 sin(π x0 x1 ) + 20(x2 − 0.5)2 + 10x3 + 5x4 , if drift occurs
. f (x) =
10 sin(π x3 x5 ) + 20(x1 − 0.5)2 + 10x0 + 5x1 , otherwise.
Note the change in active variables, e.g., from .x0 to .x3 , which implements the
change in concept.
The synthetically generated Friedman-Drift data set is used in Sect. 9.2.
The example shown in Fig. 1.1 illustrates how a simple concept drift occurs by
combining three data sets.5
The changes in data streams or concept drift patterns can be either gradual or abrupt.
Abrupt changes in data streams or abrupt concept drift mean a sudden change in the
properties of the data. For example, a change in mean, a change in variance, etc. It
is important to recognize these changes because they have practical implications for
applications in quality control, system monitoring, fault detection, and other areas.
If the changes in the distributions of the data in the data streams occur slowly
but over a long period of time, then this is drawn as gradual concept drift. Gradual
concept drift is relatively difficult to detect. Figure 1.2 shows the difference between
gradual and abrupt drift.
In recurrent concept drift, certain features of older data streams reappear after
some time.
5 The example is based on the “Concept Drift” section in the River documentation, see https://
riverml.xyz/dev/introduction/getting-started/concept-drift-detection/.
6 T. Bartz-Beielstein
Fig. 1.1 Synthetically generated drift created by combining 1,000 data each from three different
distributions. For the first thousand data, a normal distribution with mean .μ1 = 0.8 and standard
deviation .σ = 0.05 was used. The second thousand data use .μ1 = 0.4 and .σ1 = 0.02 and the last
thousand data use .μ3 = 0.6 and .σ3 = 0.1. The left figure shows the data over time and histograms
of the three data sets are shown on the right
Fig. 1.2 Gradual and abrupt concept drift. The data were generated synthetically. The SEA synthetic
data set (SEA) drift generator described in Sect. 5.4.2 was used for this purpose. Concept changes
occurring after 25,000, 50,000, and 75,000 steps are indicated by red vertical lines
ria (event-based, e.g., when new data arrives) to avoid performance degradation.6
Alternatively, training can be triggered on an as-needed basis, i.e., either based on
performance monitoring of the models or based on change-detection methods.
Another problem for BML is that it cannot easily learn from new data that contains
unknown attributes. When unknown attributes appear in the data, the model must
learn from scratch with a new data set composed of the old data and the new data.
This is especially difficult in a situation where data with new attributes may occur
every week, day, hour, minute, or even every time a measurement is taken.
Recommender Systems
Features are generated from the attributes of the data when models are trained. Feature
generation can improve ML performance by generating new features that correlate
better with the target variable and are therefore easier to learn.
Definition 1.9 (Feature Generation) Feature generation describes the process of
creating new features from one or more attributes of the data set.
Feature Generation
In practical applications, some attributes are no longer available after some time,
e.g., because they have been overwritten or simply have been deleted. Thus, features
6 This approach is implemented in the context of “mini batch machine learning”, cf. Definition 1.11.
8 T. Bartz-Beielstein
Table 1.1 Problems and solutions for BML for streaming data
Problem BML solution Disadvantages of solution
Memory requirements Optimization of data types, Performance degradation,
mini-batch learning, simplified lower accuracy
models
Drift Re-training High effort
New, unknown data Re-training High effort
Accessibility, availability of No general solution available
data
that were available recently may no longer be available at the current time. In general,
the provision of all data at the same time and in the same place is not always possible
(Auffarth, 2021). Table 1.1 summarizes the problems and solutions for BML.
The challenges of processing data streams described in Sect. 1.2 led to the devel-
opment of a class of methods known as incremental or online learning methods,
the development of which has been heavily promoted in recent years (Bifet, 2018;
Losing et al., 2018). In particular, the development of the River7 framework has
helped incremental learning gain popularity in recent years (Montiel et al., 2021).
7 https://riverml.xyz/.
1 Introduction: From Batch to Online Machine Learning 9
The gradient descent method8 is a popular batch method to find the minimum of a
function (the so-called cost or also objective function). For large data sets, a single
update of the parameters takes a long time because the entire training data set is used
for this purpose.
The SGD method is an iterative optimization method and can be considered as
a stochastic approximation of the gradient descent method. The gradient, which is
calculated from the entire data set in the gradient descent method, is replaced by
an estimate that uses only a randomly selected subset of the data set. The SGD
algorithm is an example of an OML algorithm that updates the model parameters at
each training observation.
References
Auffarth, D. (2021). Machine learning for time-series with Python: Forecast, predict, and detect.
Packt.
Bifet, A., et al. (2018). Machine learning for data streams with practical examples in MOA. MIT
Press.
Castle, S., Schwarzenberg, R., & Pourvali, M. (2021). Detecting covariate drift with explana-
tions. Natural Language Processing and Chinese Computing: 10th CCF International Confer-
ence, NLPCC 2021, Qingdao, China, October 13–17, 2021, Proceedings, Part II (pp. 317–322).
Springer.
Korstanje, J. (2022). Machine learning for streaming data with Python. Packt.
Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-line learning: A review and com-
parison of state of the art algorithms. Neurocomputing, 275, 1261–1274.
Montiel, J., et al. (2021). River: Machine learning for streaming data in Python. Journal of Machine
Learning Research, 22(1), 4945–4952. ISSN: 1532-4435.
Chapter 2
Supervised Learning: Classification
and Regression
Thomas Bartz-Beielstein
2.1 Classification
In the area of OML classification, there are so-called “baseline algorithms” that are
briefly presented here, as they serve as building blocks for more complex OML
methods.
The Majority-Class classifier counts the occurrences of the individual classes
and selects the class with the highest frequency for new instances. The No-Change
classifier selects the last class from the data stream. The Lazy Classifier is a classifier
that stores some already observed instances and their classes. A new instance is
classified into the class of the nearest already observed instance.
Examples for lazy classifiers are .k-nearest neighbor algorithms (.k-NN algorithms).
In .k-NN, the class membership is determined based on a majority decision: The class
that occurs most frequently in the neighborhood of the data point to be classified is
chosen. .k-NN is a “lazy classifier” because no training process is run. Only the
training data set is stored. Learning only takes place when a classification is made.
T. Bartz-Beielstein (B)
Institute for Data Science, Engineering, and Analytics, TH Köln, Gummersbach, Germany
e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 13
E. Bartz and T. Bartz-Beielstein (eds.), Online Machine Learning,
Machine Learning: Foundations, Methodologies, and Applications,
https://doi.org/10.1007/978-981-99-7007-0_2
14 T. Bartz-Beielstein
The Naive-Bayes classifier is based on the Bayes theorem (see Theorem A.1 in
the appendix). It calculates the probabilities of the individual classes based on the
attributes and then selects the class with the highest probability. Since the Naive-
Bayes classifier is a simple and inexpensive incremental method, it is briefly presented
here. In addition, its elements play an important role in the creation of Hoeffding
trees, which are presented in Sect. 2.1.3.1.
Naive-Bayes Classifier
We assume that there are .k discrete attributes .x1 , x2 , . . . , xk and .n c different
classes. In the following, .v j denotes the value of an attribute and .c the class
to which an observation belongs. The information from the training data is
summarized in a table that stores a counter .n i, j,c for each triple .(xi , v j , c).
For example, if the observations shown in Table 2.1 are available and a new
observation . B with the values
(x1 = 1, x2 = 1, x3 = 1, x4 = 0)
.
are calculated using Bayes’ theorem. For the two classes “0” and “1”, we obtain
Table 2.2, the table of absolute frequencies. The Laplace correction is applied
to calculate the frequencies for the classes that do not occur in the training data.
The Laplace correction results from.n i, j,c + 1, i.e., the frequency for each class
.c is increased by 1.
After applying the Laplace correction, we obtain the values shown in
Table 2.3, with which we can calculate the probabilities for . P(B|Y = 0) or
. P(B|Y = 1). It holds:
and thus . P(Y = 1|B) = 1/2 × 3/64 = 3/128. Since . P(Y = 1|B) > P(Y =
0|B), the Naive-Bayes classifier selects the class “1” for the new observation
. B.
The table entries shown in Table 2.2 play an important role as statistics in trees,
see Definition 2.1. They can be represented as triples .(xi , v j , c) with values .n i, j,c .
For the first entry .(xi = 1, v j = 0, c = 0) in Table 2.2b, we obtain .n 1,0,0 = 1. For
the last entry .(xi = 4, v j = 1, c = 1) in Table 2.2b we obtain .n 4,1,1 = 2.
Prominent in the area of OML classification are tree-based methods (so-called
“trees”), such as Hoeffding Trees (HTs) and Hoeffding Adaptive Trees (HATs). In
addition to the tree-based methods presented in Sect. 2.1.3, we also present more
specific methods such as Support Vector Machine (SVMs) and Passive-Aggressive
(PA) in Sect. 2.1.4.
x_1
> 4.5455
4.5455
Fig. 2.1 Tree. Classification of SEA data set. The root of the tree is a node where the first test of
the attributes .x1 takes place. It is tested whether .x1 is greater or less than 4.5455. The branches
represent the results of the test. They lead to more nodes until the final nodes or leaves are reached.
The leaves are the predictions for the classes .Y = 0 and .Y = 1. The color scale symbolizes the
relative class frequencies in the nodes: from dark blue for high probability “false” to light blue and
light orange to dark orange for high probability “true”
2.1.3.1 Hoeffding-Trees
A Batch Machine Learning (BML) tree uses instances multiple times to calculate
the best split attributes (“splits”). Therefore, the use of BML decision tree methods
such as Classification And Regression Tree (CART) (Breiman, 1984) is not possible
in a streaming-data context. Hoeffding trees are the OML counterpart to the BML
trees (Domingos & Hulten, 2000). However, they do not use the instances multiple
times, but work directly on the incoming instances. They thus fulfill the first axiom
for stream learning (Definition 1.12).
Hoeffding trees are better suited for OML as incremental decision tree learners.
They are based on the idea that a small sample is often sufficient to select an optimal
split attribute. This idea is supported by the statistical result known as the Hoeffding
bound, see Theorem A.2 in the appendix. This result can be simplified as shown in
Example 2.1.3.1:
An urn contains a very large number of red and black balls. We want to answer the
question of whether the urn contains more black or more red balls. To do this, we
draw a ball from the urn and observe its color, where the process can be repeated as
often as desired.
After the process has been carried out ten times, we have obtained 4 red and 6
black balls, after one hundred attempts 47 red and 53 black balls, after one thousand
attempts 501 red and 499 black balls. We can now (with a small uncertainty) say that
the urn contains the same number of black and red balls, without having to draw all
the balls. The probability that we are wrong is very low.
The Hoeffding bound depends on the number of observations and the permitted
uncertainty. This uncertainty can be determined at the beginning using a confidence
bound .s.
Definition 2.1 (Hoeffding Tree (HT)) The Hoeffding tree stores the statistics . S in
each node to perform a split. For discrete attributes, this is the same information that
is also used by the Naive-Bayes predictor: For each triple .(xi , v j , c), a table with
the counter .n i, j,c of the instances with .xi = v j and for the count values .C = c is
managed.
The HT uses two input parameters, the data stream . D with labeled examples and
a confidence bound .s. The following code shows an algorithmic description of the
HT algorithm according to Bifet (2018):
HoeffdingTree(D, s)
1 let HT be a tree with a single leaf (root)
2 init counts .n i, j,c at root
3 for each example .(x, y) in . D
4 do HTGrow .((x, y), H T , s)
Exploring the Variety of Random
Documents with Different Content
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebooknice.com