Learning Automata And Their Applications To Intelligent Systems Zhang download
Learning Automata And Their Applications To Intelligent Systems Zhang download
https://ebookbell.com/product/learning-automata-and-their-
applications-to-intelligent-systems-zhang-57397714
https://ebookbell.com/product/learning-automata-and-their-
applications-to-intelligent-systems-junqi-zhang-mengchu-zhou-53739614
https://ebookbell.com/product/advances-in-learning-automata-and-
intelligent-optimization-1st-ed-2021-javidan-kazemi-
kordestani-57227176
https://ebookbell.com/product/grammatical-inference-learning-automata-
and-grammars-de-la-higuera-c-2628528
https://ebookbell.com/product/express-learning-automata-theory-and-
formal-languages-kandar-22067030
Cellular Learning Automata Theory And Applications 1st Ed Reza
Vafashoar
https://ebookbell.com/product/cellular-learning-automata-theory-and-
applications-1st-ed-reza-vafashoar-22501672
https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-linz-peter-linz-23456896
https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-7th-edition-7th-edition-peter-linz-58828398
https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-7th-edition-peter-linz-susan-h-rodger-42278378
https://ebookbell.com/product/learning-robotic-process-automation-
create-software-robots-and-automate-business-processes-with-the-
leading-rpa-tool-uipath-tripathi-35055986
Learning Automata and Their Applications
to Intelligent Systems
IEEE Press
445 Hoes Lane
Piscataway, NJ 08854
JunQi Zhang
Tongji University, Shanghai, China
MengChu Zhou
New Jersey Institute of Technology, New Jersey, USA
Copyright © 2024 by The Institute of Electrical and Electronics Engineers, Inc.
All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,
MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to
the Publisher for permission should be addressed to the Permissions Department, John Wiley &
Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley &
Sons, Inc. and/or its affiliates in the United States and other countries and may not be used
without written permission. All other trademarks are the property of their respective owners.
John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with respect to the
accuracy or completeness of the contents of this book and specifically disclaim any implied
warranties of merchantability or fitness for a particular purpose. No warranty may be created or
extended by sales representatives or written sales materials. The advice and strategies contained
herein may not be suitable for your situation. You should consult with a professional where
appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or other
damages.
For general information on our other products and services or for technical support, please
contact our Customer Care Department within the United States at (800) 762-2974, outside the
United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic formats. For more information about Wiley products,
visit our web site at www.wiley.com.
Contents
1 Introduction 1
1.1 Ranking and Selection in Noisy Optimization 2
1.2 Learning Automata and Ordinal Optimization 5
1.3 Exercises 7
References 7
2 Learning Automata 9
2.1 Environment and Automaton 9
2.1.1 Environment 9
2.1.2 Automaton 10
2.1.3 Deterministic and Stochastic Automata 11
2.1.4 Measured Norms 15
2.2 Fixed Structure Learning Automata 16
2.2.1 Tsetlin Learning Automaton 16
2.2.2 Krinsky Learning Automaton 18
2.2.3 Krylov Learning Automaton 19
2.2.4 IJA Learning Automaton 20
2.3 Variable Structure Learning Automata 21
2.3.1 Estimator-Free Learning Automaton 22
2.3.2 Deterministic Estimator Learning Automaton 24
2.3.3 Stochastic Estimator Learning Automaton 26
2.4 Summary 27
2.5 Exercises 28
References 29
vi Contents
Index 249
ix
JunQi Zhang received the PhD degree in computing science from Fudan
University, Shanghai, China, in 2007. He became a post-doctoral research
fellow and a lecturer in the Key Laboratory of Machine Perception, Ministry
of Education in Computer Science, Peking University, Beijing, China, in 2007.
He is currently a full professor in the Department of Computer Science and
Technology, Tongji University, Shanghai. His current research interests include
intelligent and learning automata, reinforcement machine learning, particle
swarm optimization, firework algorithm, big data and high-dimensional index,
and multimedia data management. He has authored 20+ IEEE Transactions and
60+ conference papers in the above areas. Prof. Zhang was a recipient of the
Outstanding Post-Doctoral Award from Peking University.
MengChu Zhou received his BS degree in control engineering from the Nanjing
University of Science and Technology, China, in 1983; MS degree in automatic
control from the Beijing Institute of Technology, China, in 1986; and PhD degree
in computer and systems engineering from Rensselaer Polytechnic Institute,
USA, in 1990. He then joined the New Jersey Institute of Technology in 1990
and has been a distinguished professor since 2013. His interests are in intelligent
automation, robotics, Petri nets, and AI. He has over 1100 publications including
14 books, 600+ IEEE Transactions papers, and 31 patents. He is a recipient
of Humboldt Research Award for US Senior Scientists from Alexander von
Humboldt Foundation; Franklin V. Taylor Memorial Award and Norbert Wiener
Award from IEEE Systems, Man, and Cybernetics Society; and Edison Patent
Award from the Research & Development Council of New Jersey. He is a fellow
of IEEE, IFAC, AAAS, CAA, and NAI.
xi
Preface
Stochastic ranking and selection aim to design statistical procedures that select
a candidate with the highest mean performance from a finite set of candidates
whose performance is uncertain but may be estimated by a learning process based
on interactions with a stochastic environment or by simulations. The number of
interactions with environments or simulations of candidates is usually limited
and needs to be minimized due to limited computational resources. Traditional
approaches taken in the literature include frequentist statistics, Bayesian statis-
tics, heuristics, and asymptotic convergence in probability. Novel and recent
approaches to stochastic ranking and selection problems are learning automata
and ordinal optimization.
Surprisingly, none introduces or studies learning automata and ordinal opti-
mization in the same one book as if these two techniques have no relevance at
all. A learning automaton is a powerful tool for reinforcement learning and aims
at learning the optimal action that maximizes the probability of being rewarded
out of a set of allowable actions by the interaction with a stochastic environment.
An update scheme of its state probability vector of actions is critical for learn-
ing automata. Its action probability vector plays two roles: (i) deciding when it
converges, which is highly related to its used total computing budget, and (ii) allo-
cating computing budget among actions to identify the optimal one, where only
ordinal optimization is required. Ordinal optimization has emerged as an efficient
technique for simulation optimization. Its underlying philosophy is to obtain good
estimate of the optimal action or design while the accuracy of an estimate need
not be that high. It is much faster for selecting an outstanding action or design in
many practical situations if our goal is to find the best one rather than an accu-
rate performance value of the best one. Therefore, learning automata and ordi-
nal optimization share the common objective and the latter can provide efficient
methods as an update scheme of the state probability vector of actions for learning
automata.
xii Preface
This book introduces and combines learning automata and ordinal optimiza-
tion to solve stochastic ranking and selection problems for the first time. This
book may serve as a reference for those in the field and as a means for those new
to the field for understanding and applying the main reinforcement learning and
intelligent optimization approaches to the problems of their interest. This book is
both a research monograph for the intended audience including researchers and
practitioners, and a main reference book for a first-year graduate course for the
graduate students, in the fields of business, engineering, management science,
operation management, stochastic control, economics, and computer science.
Only basic knowledge of probability and an undergraduate background in the
above-mentioned majors are needed for understanding this book’s materials.
JunQi Zhang
Shanghai, China
MengChu Zhou
New Jersey, USA and Hangzhou, China
xiii
Acknowledgments
This book is written for senior students, graduate students, researchers, and prac-
titioners in relevant machine learning fields. This book can be a reference book
for those studying computer science/engineering, computational intelligence,
machine learning, artificial intelligence, systems engineering, industrial engi-
neering, and any field that deals with the optimal design and operation of
complex systems with noisy disturbance to apply learning automata to their
problem solving. The readers are assumed to have a background of computer
programming and discrete mathematics including set theory and predicate logic.
If readers have knowledge of formalization, optimization, and machine
learning, it should be easy for them to understand our presented algorithms and
problem formalizations. Readers may choose to learn the ideas and concepts
of learning automata. Readers may focus on the novel ideas and their related
algorithms that are useful for their research and particular applications. If you
do not like the ordinal optimization, you may choose to skip it and focus on the
learning automata. The following chart should help readers better obtain what
they need to read based on what they would learn from this book.
xvi A Guide to Reading this Book
Start
Chapters 1–2
Introduction and LA
Yes Interested in No
math
formalizations
No Interested in Yes
ordinal
optimization?
Interested in
Chapters 7–8 math
Applications and future directions formalizations?
Yes No
Chapters 5–8
Ordinal Optimization, applications, Chapters 5–8 without
and future directions optimality analysis
End
xvii
To overview the contents, this book first introduces the basic concept of learning
automata in Chapter 2. Two pioneering and improved variants of learning
automata from the perspectives of convergence and computational cost, respec-
tively, are presented in Chapter 3. Two application-oriented learning automata
are given Chapter 4. One discovers and tracks spatiotemporal event patterns.
The other solves the problem of stochastic search on a line. Ordinal optimization
is another method to solve stochastic ranking and selection problems and is intro-
duced in Chapter 5 through demonstrations of two pioneering variants of optimal
computing budget allocation (OCBA). Chapter 6 incorporates learning automata
with ordinal optimization to further improve their convergence performance.
The role of ordinal optimization is separated from the action probability vector
in learning automata. Then, as a pioneering ordinal optimization method, OCBA
is introduced into learning automata to allocate the computing budget to actions
in a way that maximizes the probability of selecting the true optimal action.
The differences and relationships between learning automata and ordinal opti-
mization can be well understood through this chapter. Chapter 7 demonstrates
an example to show how both learning automata and OCBA can be used in noisy
optimization. Finally, Chapter 8 summarizes the existing applications of learning
automata and suggests their future research directions.
1
Introduction
It is an expensive process to rank and select the best one from many complex
discrete event dynamic systems that are computationally intensive to simulate.
Therefore, learning the optimal system, action, alternative, candidate, or design
is a classical problem and has many applications in the areas of intelligent sys-
tem design, statistics and stochastic simulation, machine learning, and artificial
intelligence.
Stochastic ranking and selection of given system designs can be treated as a
simulation optimization problem. Its solution requires one to design statistical
procedures that can select the one with the highest mean performance from a
finite set of systems, alternatives, candidates, or designs. Their mean values are
unknown and can be estimated only by statistical sampling, because their reward
from environments is not deterministic but stochastic due to unknown noise. It is
a classical problem in the areas of statistics and stochastic simulation.
Example 1.1 Finding the most effective drug or treatment from many different
alternatives where the economic cost of each sample for testing the effectiveness of
the drug is very expensive and risky. Another representative example is the archery
competition. How can we select the real champion while using the least number
of arrows?
Noise comes mainly from three kinds of uncertainties [1]. (i) Environmental
uncertainties: operating temperature, pressure, humidity, changing material prop-
erties and drift, etc. are a common kind of uncertainties. (ii) Design parameter
uncertainties: the design parameters of a product can only be realized to a certain
degree of accuracy because high precision machinery is expensive. (iii) Evaluation
uncertainties: the uncertainty happens in the evaluation of the system output and
the system performance including measuring errors and all kinds of approxima-
tion errors if models instead of the real physical objects are used.
Example 1.2 Ranking and selection are the critical part in PSO. First, each par-
ticle has to compare the fitness of its new position to its previous best and retain
the better one. Second, the overall best solution found so far has to be determined
as a global best solution to lead the swarm flying and refining the accuracy. Since
PSO has a memory that stores the estimated personal best solution of a particle
and the estimated global best solution of a swarm, noise leads such memory to
be inaccurate over iterations and particles eventually fail to rank and select good
solutions from bad ones, which drives the particle swarm toward a wrong direc-
tion. Concretely, a noisy fitness function induces noisy fitness evaluations and
causes two types of undesirable selection behavior [3]: (i) A superior candidate
may be erroneously believed to be inferior, causing it to be eliminated; (ii) An infe-
rior candidate may be erroneously believed to be superior, causing it to survive
and reproduce. These behaviors in turn cause the following undesirable effects:
(i) The learning rate is reduced. (ii) The system does not retain what it has learnt.
(iii) Exploitation is limited. (iv) Fitness does not monotonically improve with gen-
eration even with elitism.
×104
3
0
100
50 100
0 50
0
–50 –50
–100 –100
(a)
×104
4
2
1
0
100
50 100
0 50
–50 0
–50
–100 –100
(b)
×104
4
3
2
1
0
100
50 100
0 50
–50 0
–50
–100 –100
(c)
Figure 1.1 3-D map of a Sphere function with additive and multiplicative noise. (a) True.
(b) Corrupted by additive noise with 𝜎 = 0.1. (c) Corrupted by multiplicative noise with
𝜎 = 0.3.
4 1 Introduction
×105
15
10
0
100
50 100
0 50
–50 0
–50
–100 –100
(a)
×105
20
15
10
0
100
50 100
0 50
–50 0
–50
–100 –100
(b)
×106
3
0
100
50 100
0 50
–50 0
–50
–100 –100
(c)
Figure 1.2 3-D map of a Different Powers function with additive and multiplicative
noise. (a) True. (b) Corrupted by additive noise with 𝜎 = 0.1. (c) Corrupted by
multiplicative noise with 𝜎 = 0.3.
1.2 Learning Automata and Ordinal Optimization 5
respectively. It is worth noting that they are identical in the environment where
𝜎 = 0. It is obvious that multiplicative noise is much more challenging, since it
can bring much larger disturbance to fitness values than additive noise.
Example 1.3 The 3-D maps of functions [5] with additive and multiplicative
noise with different 𝜎 are illustrated in Figs. 1.1 and 1.2. The figures show that
the additional challenge of multiplicative noise over additive noise is a larger
corruption of the objective values whose magnitude changes across the search
space proportionally to the objective values of the solutions. For multiplicative
noise, its impact depends on the optimization objective and the range of objective
space. Specifically, on minimization problems whose objective space is only
positive, the objective values of better solutions (whose fitness value is small) can
be less affected by multiplicative noise. Conversely, in maximization problems,
the objective values of better solutions can be more affected [2].
in 1974 [10]. These early models were referred to as deterministic and stochastic
automata operating in random environments. Systems built with LA have been
successfully employed in many difficult learning situations and the reinforcement
learning represents a development closely related to the work on LA over the
years. This has also led to the concept of LA being generalized in a number of
directions in order to handle various learning problems.
An LA represents an important tool in the area of reinforcement learning
and aims at learning the optimal action that maximizes the probability of being
rewarded out of a set of allowable actions by the interaction with a random
environment. During a cycle, an automaton chooses an action and then receives a
stochastic response that can be either a reward or penalty from the environment.
The action probability vector of choosing the next action is then updated by
employing this response. The ability of learning how to choose the optimal action
endows LA with high adaptability to the environment. Various LAs and their
applications have been reviewed in survey papers [10], [11] and books [12–15].
Ordinal Optimization (OO) is introduced by Ho et al. [16]. There are two
basic ideas behind it: (i) Estimating the order among solutions is much easier
than estimating the absolute objective values of each solution. (ii) Softening the
optimization goal and accepting good enough solutions leads to an exponential
reduction in computational burden. The Optimal Computing Budget Allocation
(OCBA) [17–22] is a famous approach that uses an average-case analysis rather
than a worst-case bound of the indifference zone. It attempts to sequentially
maximize the probability that the best alternative can be correctly identified after
the next stage of sampling. Its procedure tends to require much less sampling
effort to achieve the same or better empirical performance for correct selection
than the procedures that are statistically more conservative.
Example 1.4 This example shows the objective of ordinal optimization. A sys-
tem works with one of 10 components that have their own time-to-failure. The
objective is to decide which component is the worst one to minimize steady-state
system unavailability given budget constraints. We have the budget to perform
only 50 component tests to obtain if the tested component leads to a failure. How
do we allocate these budget to these 10 components so as to identify the worst one
is the objective of ordinal optimization.
The next example shows the reason of softening the optimization goal, i.e.,
reducing computational burden.
Example 1.5 This example shows the reason of softening the optimization goal
and accepting good enough solutions leads to an exponential reduction in compu-
tational burden. A class consists of 50 students. The objective is to find the tallest
one. However, we do not have to measure all student’s accurate height. We can use
some sorting algorithms like bubble sorting to identify the tallest student. In this
References 7
way, we do not need all students’ accurate heights. Furthermore, the order of the
other students except the tallest one is not important and can be inaccurate. An
ordinal optimization method can also be used to identify the tallest student.
1.3 Exercises
1 What is the objective of stochastic ranking and selection?
6 For additive and multiplicative noise, which one is more difficult to handle
and why?
7 In Example 1.5., how to find the tallest student without their accurate
height data?
References
Learning Automata
2.1.1 Environment
An LA consists of two parts: an automaton and an environment. The latter shown
in Fig. 2.1 is defined mathematically as a triple ⟨A, B, C⟩, which can be explained
as follows.
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r } is a set of actions (r ≥ 2). The action selected at instant t is
denoted by 𝛼(t).
2) B = {𝛽1 , 𝛽2 , ..., 𝛽u } is the output set of possible environmental responses. The
environmental response at instant t is denoted by 𝛽(t). To simplify our dis-
cussions, let B = {𝛽1 , 𝛽2 } = {0, 1}. “1” and “0” denote the reward and penalty
responses, respectively.
Learning Automata and Their Applications to Intelligent Systems, First Edition.
JunQi Zhang and MengChu Zhou.
© 2024 The Institute of Electrical and Electronics Engineers, Inc. Published 2024 by John Wiley & Sons, Inc.
10 2 Learning Automata
2.1.2 Automaton
An automaton shown in Fig. 2.2 can be described by a quintuple ⟨A, B, Q, T, G⟩.
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r }, 2 ≤ r < ∞ is the set of outputs or actions of an automaton.
The action selected at instant t is denoted by 𝛼(t).
2) B = {𝛽1 , 𝛽2 ..., 𝛽u }, as an environmental response, is an input set of an automa-
ton. At instant t, it is denoted as 𝛽(t). B could be infinite or finite.
3) Q = {q1 , q2 , ..., qv } is a state of an automaton. At instant t, the state is denoted
by q(t).
4) T ∶ Q × B → Q is a state transfer function of an automaton. T determines how
an automaton migrates to a state of t + 1 according to the output, input, and
the state at an instant t.
5) G ∶ Q → A is an output function, which determines how an automaton pro-
duces output based on the state.
2.1 Environment and Automaton 11
Output function G : Q → A
In the above definitions, if A, B, and Q are all finite sets, the automaton is said
to be finite.
Example 2.1 Consider an automaton with two inputs, i.e., B = {0, 1}, two
outputs, i.e., A = {𝛼1 , 𝛼2 }, and four states, i.e., Q = {q1 , q2 , q3 , q4 }. A deterministic
state transition graph and deterministic output graph are shown in Figs. 2.3
and 2.4, respectively, where each hollow point represents a state, each link
12 2 Learning Automata
β=0
q1 q2 q3 q4
β=1
q1 q2 q3 q4
q1 q2
α1 α2
q3 q4
represents a state transition, and the arrow on the link indicates the direction
of state transition. We can also depict the corresponding deterministic state
transition function T and the deterministic output function G with a matrix set
and a matrix, respectively. Assume that the entries 𝜏ij0 and 𝜏ij1 have definition as
follows:
{
1, if qi → qj with an input 𝛽 = 0
𝜏ij𝛽 = .
0, otherwise
β=0
q1 q2 q3 q4
0.2 0.2 0.8
β=1
q1 q2 q3 q4
0.8 0.8 0.2
α2
0.7 1
q1 q2
0.2
0.3
q3 q4
1 α1 0.8
T(𝛽0 ) and T(𝛽1 ). Each of them is a v × v matrix following an input 𝛽. The elements
in T have definition as follows:
𝜏ij𝛽 = Pr{q(t + 1) = qj |q(t) = qi , 𝛽(t) = 𝛽}. i, j ∈ ℤv , 𝛽 ∈ B.
According to Figs. 2.5 and 2.6, the corresponding stochastic state transition func-
tion T and the stochastic output function G can also be depicted with a matrix set
and a matrix, respectively.
The stochastic state transition function T can be identified as
q1 q2 q3 q4 q1 q2 q3 q4
q1 ⎡ .8 .2 0 0⎤ q1 ⎡ .2 .8 0 0⎤
⎢ ⎥ ⎢ ⎥
T(0) = q2 ⎢ .8 0 .2 0⎥ T(1) = q2 ⎢ .2 0 .8 0 ⎥.
q3 ⎢0 .2 0 .8 ⎥ q3 ⎢0 .8 0 .2 ⎥
q4 ⎢0 0 .2 .8 ⎥⎦ q4 ⎢0 0 .8 .2 ⎥⎦
⎣ ⎣
The stochastic output function G can be identified as
𝛼1 𝛼2
q1 ⎡ 0.3 0.7 ⎤
⎢ ⎥
G = q2 ⎢0 1 ⎥.
q3 ⎢1 0 ⎥
q4 ⎢ 0.8 0.2 ⎥⎦
⎣
Assume that we have state q1 and input 𝛽 = 0 in the present instant. We have
the transition probability to obtain the next state and may choose the next action
as follows:
𝜏11
0
= Pr{q(t + 1) = q1 |q(t) = q1 , 𝛽(t) = 0} = 0.8
𝜏12
0
= Pr{q(t + 1) = q2 |q(t) = q1 , 𝛽(t) = 0} = 0.2
𝜏13
0
= Pr{q(t + 1) = q3 |q(t) = q1 , 𝛽(t) = 0} = 0
𝜏14
0
= Pr{q(t + 1) = q4 |q(t) = q1 , 𝛽(t) = 0} = 0
and
g11 = Pr{𝛼(t) = 𝛼1 |q(t) = q1 } = 0.3
.
g12 = Pr{𝛼(t) = 𝛼2 |q(t) = q1 } = 0.7
Apparently, in state q1 and input 𝛽 = 0, we have a chance to access different
states and to choose different actions.
2.1 Environment and Automaton 15
Thereby, for the pure chance automaton, the penalty mean is:
∑ 1
𝛷̄ 0 = ci ⋅ . (2.4)
i=r
r
An LA is expedient if it meets:
̄
lim E{𝛷(t)} = lim E{E{𝛽(t)|P(t)}}
t→∞ t→∞
= lim E{𝛽(t)} < 𝛷̄ 0 (2.5)
t→∞
∑ 1
= ci ⋅ .
i=r
r
It is absolutely expedient if it meets:
̄ + 1)|P(t)} < 𝛷(t).
E{𝛷(t ̄ (2.6)
In fact, the optimality condition mentioned above is so strict that many LAs
cannot satisfy it. Therefore, a slightly weaker definition called 𝜀-optimality is pro-
posed. An LA is 𝜀-optimal, and if for each 𝜀 > 0 and 𝛿 > 0, there is an instant
t0 < ∞ and a learning parameter 𝜆0 > 0, such that for all t ≥ t0 and 𝜆 < 𝜆0 :
Pr{| pmin (t) − 1| < 𝜀} > 1 − 𝛿. (2.9)
In the sense of penalty mean, this definition with any 𝜀 > 0 can also be writ-
ten as:
̄
lim E{𝛷(t)} < č + 𝜀. (2.10)
t→∞
In Tsetlin LA, there are two actions, 2̂s states and each action can lead a state to
one of ŝ states. This LA is denoted as L̄ 2̂s,2 . It is easy to expand to r actions with r ⋅ ŝ
states.
The output function of a Tsetlin LA is relatively simple. When it is in state q(t) =
qi , i ∈ ℤŝ , the output is 𝛼1 . If it is in state q(t) = qi , i ∈ ℤ2̂s − ℤŝ , the output is 𝛼2 .
The output function can be defined as:
{
𝛼1 , i ∈ ℤŝ
G(qi ) = .
𝛼2 , i ∈ ℤ2̂s − ℤŝ
At the initial moment, a Tsetlin LA is randomly in state q2̂s or qŝ without any
priori knowledge. After obtaining a feedback from the environment, the state tran-
sition is performed according to the state transition graph mentioned in Fig. 2.7.
For example, when q(t) = qŝ , its output action is 𝛼1 . Then if the Tsetlin LA receives
the reward feedback from the environment, the state changes to q(t) = qŝ−1 ; on the
contrary, if the feedback obtained from the environment is penalty, then the state
is transferred to q(t) = q2̂s .
18 2 Learning Automata
The finite state irreducible Markov chain is ergodic. So is the LA. It has been
proved that the penalty mean of the Tsetlin LA [1] is:
ŝ ŝ ŝ ŝ
1 𝛽̌1 −𝛽̂1 1 𝛽̌2 −𝛽̂2
ŝ−1 ⋅ 𝛽̌1 −𝛽̂1
+ ŝ−1 ⋅ 𝛽̌2 −𝛽̂2
𝛽̌1 𝛽̌2
̄ L̄ 2̂s,2 ) =
𝛷( , 𝛽̌1 + 𝛽̂1 = 1, 𝛽̌2 + 𝛽̂2 = 1. (2.13)
ŝ ŝ ŝ ŝ
1 𝛽̌1 −𝛽̂1 1 𝛽̌2 −𝛽̂2
ŝ ⋅ 𝛽̌1 −𝛽̂1
+ ŝ ⋅ 𝛽̌2 −𝛽̂2
𝛽̌ 1 𝛽̌2
In the environment min{𝛽̌1 , 𝛽̌2 } < 0.5, the Tsetlin LA has been proven to be
𝜀-optimal [5].
The Krinsky LA has been proven to be 𝜀-optimal [5] in any deterministic envi-
ronment.
β1 β2
β1 β1 β2 β2
β1 β2
β1 β1 β1 β2 β2 β2
1 2 s–1 s 2s 2s – 1 s+2 s+1
(a)
β1 β1 β2 β2
β1
β1 β1 β2 β2
β1 β2
β1 β1 β1 β1 β2 β2 β2 β2
β1
1 2 s/2 s–1 s 2s 2s – 1 3s/2 s+2 s +1
(b)
Figure 2.10 The state transition graph of IJA LA. (a) 𝛽̂1 represent favorable response and
(b) 𝛽̌2 represent unfavorable one.
2.3 Variable Structure Learning Automata 21
The limit of the penalty mean is: limŝ→∞ 𝛷(̄ Ī 2̂s,2 ) = min{𝛽̌1 , 𝛽̌2 }, which means
IJA LA is 𝜀-optimal [3] in any deterministic environment.
In Section 2.2, we have introduced the fixed structure stochastic automata in sta-
tionary random environments. Their state transition probabilities and action prob-
abilities are fixed.
In a pioneering paper, Varshavskii and Vorontsova first present automata that
update transition probabilities [6]. Fu and his associates give an extension updat-
ing action probabilities [7–9]. In this book, the emphasis is on schemes for updat-
ing action probabilities.
Compared with the fixed structure stochastic automata, the variable structure
learning automata could change its state with iteration count according to the
environment, and has the advantages of faster convergence and adaptive learn-
ing ability. Its quintuple ⟨A, B, Q, T, G⟩ model can be simplified into a quadruple
⟨A, B, Q, T⟩. The concepts of A, B, and T have the same definitions as introduced
before. In variable structure LAs, Q = ⟨P, 𝔼⟩. Here, P represents the action prob-
ability matrix, and 𝔼 is the estimator. Because the state is composed of two parts,
the update of the state is also divided into two parts: updates P and update 𝔼. The
updated formulas for the estimator and probability matrix are expressed as
Therefore, the state transition function should also consist of two parts:
T = ⟨T𝔼 , TP ⟩.
From the updated formula of the probability vector, we know that {P(t)}t≥0 is
a discrete time homogeneous Markov process. According to whether the Markov
process is ergodic or absorbing, the update algorithm can be referred to as an
ergodic algorithm and an absorbing one [10–12]. If the updated formula of
P(t + 1) = TP (P(t), 𝛼(t), 𝛽(t), 𝔼(t + 1)) is linear, such LAs are called linear variable
structure LA; otherwise, nonlinear variable structure LA.
22 2 Learning Automata
In the update strategies, there are two basic principles for LA algorithms: when
the output behavior is punished by the environment, the probability of this behav-
ior should be reduced; otherwise, if the output behavior is rewarded by the envi-
ronment, the probability of this behavior should be increased. Different specific
algorithms may use different principles. Examples of combinations of these prin-
ciples are as follows:
Variable structure learning automata can be further divided into three cate-
gories.
extended to the case of multiple behaviors A = {𝛼1 , 𝛼2 , ..., 𝛼r }. Then the probability
[ ]
space P is two-dimensional, i.e., P(t) = p1 (t), p2 (t) .
The probability vector update formula of DLRP , DLRI , and DLIP described as
follows where at any instant t, p1 (t) + p2 (t) = 1.
if 0 < p1 (t) < 1,
p1 (t + 1) = p1 (t) + 𝛥1 if 𝛼(t) = 𝛼1 and 𝛽t = 1
( )
p1 (t + 1) = 1 − 𝛥1 if 𝛼(t) = 𝛼1 and 𝛽t = 0
p1 (t + 1) = p (t) + 𝛥1 if 𝛼(t) = 𝛼2 and 𝛽t = 0
(1 )
p1 (t + 1) = 1 − 𝛥1 if 𝛼(t) = 𝛼2 and 𝛽t = 1 (2.25)
if p1 (t) ∈ {0, 1} ,
p1 (t + 1) = p1 (t) if p1 (t) ∈ {0, 1} and 𝛽t = 1
p1 (t + 1) = 𝛥1 if p1 (t) = 0 and 𝛽t = 0
( )
p1 (t + 1) = 1 − 𝛥1 if p1 (t) = 1 and 𝛽t = 0
{ }
1
p1 (t + 1) = min p1 (t) + 𝛥 , 1 if 𝛼(t) = 𝛼1 and 𝛽t = 1
p1 (t + 1) = p1 (t) { } if 𝛼(t) = 𝛼1 and 𝛽t = 0 (2.26)
1
p1 (t + 1) = max p1 (t) − 𝛥
,0 if 𝛼(t) = 𝛼2 and 𝛽t = 1
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t = 0
{ }
p1 (t + 1) = max p1 (t) − 𝛥1 , 0 if 𝛼(t) = 𝛼1 and 𝛽t = 0
p1 (t + 1) = p1 (t){ } if 𝛼(t) = 𝛼1 and 𝛽t = 1 (2.27)
1
p1 (t + 1) = min p1 (t) + 𝛥
,0 if 𝛼(t) = 𝛼2 and 𝛽t = 0
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t = 1
3) Q = ⟨P, 𝔼⟩ is the state set. P = {p1 (t), p2 (t), ..., pr (t), } is the state of the
automaton at instant t, where pi (t) = Pr{𝛼(t) = 𝛼i }. 𝔼 is the estimator, and the
̃
deterministic estimator is defined as 𝔼 = D(t). ̃ = {d̃ 1 (t), d̃ 2 (t), ... d̃ r (t)} is
D(t)
the estimator vector at instant t. The estimated reward for each behavior is:
ℍ (t)
d̃ i (t) = i ,
𝔾i (t)
where ℍi (t) represents the number of times that the ith action has been
rewarded up to instant t, i ∈ ℤr . 𝔾i (t) is the number of times that the ith action
has been selected up to instant t, i ∈ ℤr .
4) T ∶ Q × B → Q is the state transfer function of the automaton. T determines
how the automaton migrates to the state at t + 1 according to the output, input
and the state at instant t.
In the estimator-free algorithm, the probability vector is updated based on the
selected behavior and feedback from the environment. In the estimator algorithm,
the estimator is updated according to the selected behavior and environment feed-
back information, and finally the information of the estimator is used to update
the probability vector. In order to more intuitively reflect the steps of the estimator
algorithm, the continuous pursuit reward-penalty algorithm is given next.
̃si (t) = ̃
di (t) + Zi (t);
̃sm (t) = maxi {̃si (t)}.
4: Update P(t) by following:
{
pj (t + 1) = max{pj (t) − 𝛥, 0} j ≠ m
if 𝛽(t) = 1, ∑
pm (t + 1) = 1 − j≠m pj (t + 1) .
else pj (t + 1) = pj (t), j ∈ ℤr
𝐰𝐡𝐢𝐥𝐞(max{pj (t + 1)} ≤ T) ̂
END
2.4 Summary
In this chapter, we first introduce an LA and its components, namely automata
and environment, and how it works. Then, mathematical terms for describing its
nature are introduced. According to its development, LAs can be classified. In
terms of whether the state transition and output functions of an automaton are
deterministic or stochastic, LAs can be divided into deterministic and stochastic
automata. Compared with the former, the latter can adapt to environmental con-
ditions to change its states, converge faster, and achieve the purpose of adaptive
learning. These advantages make stochastic automata more widely studied and
used [20–24]. According to whether the state transition and output functions of
an automaton change with time, the stochastic automaton can be divided into
fixed and variable structure LAs. In terms of whether there is an estimator and
whether the estimator is stochastic, LAs with a variable structure can be divided
into estimator-free, deterministic estimator, and stochastic estimator LAs. This
chapter depicts the learning automata with these different characteristics in detail.
Figure 2.11 illustrates the LA classification.
Deterministic automaton
Stochastic estimator
2.5 Exercises
1 What is the mathematical model of learning automata?
2 Please list several examples where learning automata have been applied.
5 Please express how an automaton interacts with the environment and draw
an interaction diagram.
8 The probability update formula listed in (2.22) ∼ (2.24) are formulas for two
actions. Please extend them to the case of r actions.
9 What is the difference between the fixed structure stochastic LA and the vari-
able structure stochastic LA?
10 Please draw the state transition graphs of Tsetlin LA, Krinsky LA, Krylov LA,
and IJA LA.
References
I. 5 November, 68[232]
6 November or December, 68
7 February, 67
8 February, 67
9 February, 67
10 Before July, 67
11 July or August, 67
3 Late in 67
4 Early in 66
1 July, 65
2 July, 65
12 January 1, 61
13 January 25, 61
14 February 13, 61
15 March 15, 61
16 June, 61
17 December 5, 61
18 January 20, 60
19 March 15, 60
20 May, 60
II. 1 June, 60
2 December, 60
3 December, 60
10 March 29, 59
4 April, 59
5 April, 59
6 April, 59
7 April, 59
8 April, 59
9 April, 59
12 April 18, 59
11 April, 59
13 April 23, 59
14 April, 59
15 April, 59
16 May, 59
17 May, 59
18 June or July, 59
19 July-October, 59
20 July-October, 59
21 July-October, 59
22 July-October, 59
23 July-October, 59
24 Before October 18, 59
25 Before November 1, 59
III. 3 April 5(?),58
2 April 8, 58
5 April 10 (?), 58
4 April 13, 58
1 April, 58
6 April 17, 58
7 April 29, 58
8 May 29, 58
9 June 13, 58
10 June 17, 58
11 June 27, 58
12 July 17, 58
14 July 21, 58
13 August 5, 58
15 August 17, 58
16 August 19, 58
17 September 4, 58
18 September, 58
19 September 15, 58
20 October 4, 58
21 October 28, 58
22 November 25, 58
23 November 29, 58
24 December 10, 58
25 December, 58
26 January, 57
27 January, 57
IV. 1 September, 57
2 October, 57
3 November 23, 57
4 January 28, 56
4a April or May, 56
5 April or May, 56
6 April or May, 56
7 April or May, 56
8 April or May, 56
8a Autumn, 56
10 April 22, 55
9 April 27, 55
11 May, 55
12 May, 55
13 November 14, 55
14 May 10, 54
16 June or July, 54
15 July 27, 54
17 October 1, 54
18 October, 54
19 November, 54
V. 1 May 5 or 6, 51
2 May 10, 51
3 May 11, 51
4 May 12, 51
5 May 15, 51
6 May 19, 51
7 May 20, 51
8 June 2 or 3, 51
9 June 14, 51
10 June 29 or July 1, 51
11 July 6, 51
12 July, 51
13 July 26, 51
14 July 27, 51
15 August 3, 51
16 August, 51
17 August, 51
18 September 20, 51
19 September 20, 51
20 December, 51
21 February 13, 50
VI. 1 February 24, 50
2 May, 50
3 June, 50
4 June, 50
5 June 26, 50
7 July, 50
6 August, 10 (?) 50
8 October 1, 50
9 October 15, 50
231. In many cases the dates and order are only approximate and authorities
differ about them. I have generally accepted the dates given in the Teubner edition.
232. Some date this letter early in 67, and the next towards the end of
January, 67.
INDEX OF NAMES.
Abdera, 322
Academia, 12, 22, 28, 440, 470
Acastus, 478, 480
Achaia, 204
Achaici, 32
Acidinus, 276
Acilius Glabrio, 418
Actium, 354
Acutiliana controversia, 10;
-num negotium, 14
Acutilius, 14, 20
Aegyptus, 120
Aelia lex, 62, 136
Aelius Tubero (Q), 314
Aemilia tribus, 148
Aemilius Paulus (L), 188, 302, 324, 422, 456
Aemilius Scaurus (M), 308, 310, 316
Aetolia, 388
Afranius (L), consul 60 B.C., 82, 262.
See also Auli filius
Africanus, see Cornelius Scipio Africanus
Agamemnon, 78
Ahala, see Servilius Ahala
Albanum (praedium), 298
Albanus (mons), 10
Alcibiades, 434
Alexander Magnus, 390
Alexander, poet, 184
Alexandrea, 120
Alexandrinus rex, i.e. Ptolemy Auletes, 154
Alexis, 396
Aliphera, 444
Allobroges, 34, 102, 194
Amalthea (Ἀμαλθεία), 32, 64, 110, 132, 172
Amaltheum (Ἀμαλθεῖον), 64
Amanus (mons), 390, 394
Amianus, 428
Andromacha, 308
Andronicus (C.), 374
Anicatus, 170
Anneius (M.), 346
Anniana domus, 276
Annius, 424
Annius Milo Papianus (T.), 276, 278, 290, 302, 316, 354, 360, 474,
cf. 464, 466.
See also Κροτωνιάτης
Annius Saturninus, 336
Antias (praedium), 142
Antiates, 126
Antilibanus, 154
Antiochia, 382, 390, 398
Antiochus Gabinius, 330
Antiphon, 308
Antium, 100, 116, 124, 134, 138, 140, 142, 290, 300
Antius, 324
Antonius (C.), consul 63 B.C., 2, 28, 30, 32, 64, 112, 114
Antonius (M.), triumvir, 472
Apamea, 376, 388, 458, 460
Apamense forum, 404
Apelles, 176, 386
Apenas, 292
Apollinares ludi, 166
Apollonidensis, 370
Apollonius, 288
Appia (via), 142
Appi Forum, 140
Appius, see Claudius Pulcher (Appius)
Appuleia, 300
Appuleius, 364
Apulia, 332
Aquilius Gallus (C.), 2, 300
Aquinum, 338
Arabarches, 160
Araus, 354
Arbuscula, 308
Arcadia, 444
Arcanum (praedium), 338
Archias, 64
Archilochium edictum, 174;
-a -ta, 178
Ἄρειος πάγος, 42
Argiletanum aedificium, 46
Ariminum, 386
Ariobarzanes, 394, 416, 458
Ariopagitae, 52, 364
Ariopagus, 364
Aristarchus, 40
Aristodemus, 132
Aristus, 360
Aristoteles, 298, 314
Aristotelium pigmentum, 100
Armenii reges, 128;
-nius, 388
Arpinas (homo), 58, 290;
-ates aquae, 58
Arpinas (praedium), 18, 156, 158, 336, 338
Arpinum, 134, 140, 148
Arretini, 84
Arrius (C.), 148, 150
Arrius (Q.), 74, 122, 130
Artavasdes, 388, 398
Asia, 46, 64, 72, 156, 200, 208, 222, 236, 318, 372, 380, 402, 404
Asiaticum edictum, 430;
iter, 306
Asinius Dento, 390–2
Astyanax, 308
Athenae, 14, 22, 102, 204, 206, 208, 318, 358, 360, 362, 366, 438,
440, 452, 462, 470, 482
Ἀθηναίων (πολιτεία), 112
Athenio, 142
Atiliana praedia, 336;
-num nomen, 386
Atilius Serranus (Sex.), 268
Attica talenta, 404, 418, 438
Atticula, 468.
See also Caecilia
Atticus, see Pomponius Atticus
Atticus homo, 90
Aufidius (T.), 2
Aufidius Lurco, 60
Auli filius, i.e., Afranius (L.) q.v., 60, 80, 96, 114
Auli lex, see Gabinia lex
Aurelianus, 316
Autroniana domus, 38
Autronius Paetus, 196, 202
Axius (Q.), 28, 308, 398
Baiae, 58
Balbus, see Cornelius Balbus
Batonius, 476
Beneventum, 344
Bibulus, see Calpurnius Bibulus
Bona Dea, 118
Bovillana pugna, 368
Βοῶπις, i.e., Clodia, 146, 182, 186
Britannia, 310, 330
Britannicum bellum, 324
Brundisina colonia, 260;
-ni, 260
Brundisium, 48, 196, 198, 200, 202, 204, 260, 346, 352, 364, 414
Brutus, see Junius Brutus
Bursa, see Munatius Plancus Bursa
Buthrotum, 126, 290, 312, 318, 400
Darius, 390
Decimius (C.), 318
Decimus, 274
Deiotarus, 378, 382, 384, 396, 398, 412, 418, 430, 438
Delos, 366
Δημήτηρ, 292
Demetrius, 298
Demetrius Magnes, 300
Democritus, 428
Demosthenes, 102
Dicaearchus, 112, 144, 154, 444
Didia lex, 136
Diodotus, 174
Dionysius, 292, 300, 302, 304, 312, 332, 334, 344, 356, 428, 444.
See also Pomponius (M.)
Diphilus, 166
Dodonaea quercus, 120
Dolabella, see Cornelius Dolabella
Domitius Ahenobarbus (L.), 6, 62, 190, 228
Domitius Calvinus (Cn.), 292, 308, 316, 320
Drusus, see Livius Drusus
Drusus, of Pisaurum, 130
Duris, of Samos, 434
Duronius (C.), 354
Dyrrachini, 244
Dyrrachium, 66, 204, 244, 260
Fabius, 106
Fabius Luscus, 294
Fadius Gallus (T.), 250
Fannius (C.), son-in law of Laelius, 314
Fannius (C.), tribune 59 B.C., 190
Fausta, 354
Faustus, see Cornelius Sulla Faustus
Favonius (M.), 44, 110, 264, 324
Figulus, see Marcius Figulus and Nigidius Figulus
Firmani fratres 294;
-nus 294
Flaccus, see Fulvius Flaccus, Laenius Flaccus and Valerius Flaccus
Flaminia via, 4;
-nius circus 40
Flavius (Cn.), 424, 434
Flavius (L.), 80, 84
Fonteius (Fontius) (M.), 18, 308
Formiae, 146
Formiani, 146–8
Formianum (estate), 12, 120, 134, 138, 140, 144, 146, 148, 150, 270,
272
Fufia lex, 62, 316
Fufius Calenus (Q.), 38, 44, 46, 48, 50, 160
Fulvius Flaccus (Q.), 276, 278
Fulvius Nobilior (M.), 330
Funisulanus, 344
Furius, 170, 172
Furius Camillus (C.) (Κάμιλλος), 354, 434, 466
Furius Crassipes, 284, 300
Furius Philus, 314
Furnius, 340, 384, 426
Haedui, 82
Halimetus, 300
Helonius, 366
Helvetii, 82
Hercules, 434
Herennius (C.), 78, 86
Herennius (Sex.), 78
Hermathena, 8, 12
Hermeraclae, 24
Hermes, 12;
-ae, 20, 22
Hermo, 400
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
ebookbell.com