100% found this document useful (1 vote)
12 views

Learning Automata And Their Applications To Intelligent Systems Zhang download

The document is a comprehensive guide on learning automata and their applications to intelligent systems, authored by JunQi Zhang and MengChu Zhou. It covers various topics including stochastic ranking, selection, and optimization, while introducing novel approaches that combine learning automata with ordinal optimization. The book serves as a resource for researchers, practitioners, and graduate students in fields such as computer science, engineering, and management science.

Uploaded by

yomnacyed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
12 views

Learning Automata And Their Applications To Intelligent Systems Zhang download

The document is a comprehensive guide on learning automata and their applications to intelligent systems, authored by JunQi Zhang and MengChu Zhou. It covers various topics including stochastic ranking, selection, and optimization, while introducing novel approaches that combine learning automata with ordinal optimization. The book serves as a resource for researchers, practitioners, and graduate students in fields such as computer science, engineering, and management science.

Uploaded by

yomnacyed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Learning Automata And Their Applications To

Intelligent Systems Zhang download

https://ebookbell.com/product/learning-automata-and-their-
applications-to-intelligent-systems-zhang-57397714

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Learning Automata And Their Applications To Intelligent Systems Junqi


Zhang Mengchu Zhou

https://ebookbell.com/product/learning-automata-and-their-
applications-to-intelligent-systems-junqi-zhang-mengchu-zhou-53739614

Advances In Learning Automata And Intelligent Optimization 1st Ed 2021


Javidan Kazemi Kordestani

https://ebookbell.com/product/advances-in-learning-automata-and-
intelligent-optimization-1st-ed-2021-javidan-kazemi-
kordestani-57227176

Grammatical Inference Learning Automata And Grammars De La Higuera C

https://ebookbell.com/product/grammatical-inference-learning-automata-
and-grammars-de-la-higuera-c-2628528

Express Learning Automata Theory And Formal Languages Kandar

https://ebookbell.com/product/express-learning-automata-theory-and-
formal-languages-kandar-22067030
Cellular Learning Automata Theory And Applications 1st Ed Reza
Vafashoar

https://ebookbell.com/product/cellular-learning-automata-theory-and-
applications-1st-ed-reza-vafashoar-22501672

An Introduction To Formal Languages And Automata Linz Peter Linz

https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-linz-peter-linz-23456896

An Introduction To Formal Languages And Automata 7th Edition 7th


Edition Peter Linz

https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-7th-edition-7th-edition-peter-linz-58828398

An Introduction To Formal Languages And Automata 7th Edition Peter


Linz Susan H Rodger

https://ebookbell.com/product/an-introduction-to-formal-languages-and-
automata-7th-edition-peter-linz-susan-h-rodger-42278378

Learning Robotic Process Automation Create Software Robots And


Automate Business Processes With The Leading Rpa Tool Uipath Tripathi

https://ebookbell.com/product/learning-robotic-process-automation-
create-software-robots-and-automate-business-processes-with-the-
leading-rpa-tool-uipath-tripathi-35055986
Learning Automata and Their Applications
to Intelligent Systems
IEEE Press
445 Hoes Lane
Piscataway, NJ 08854

IEEE Press Editorial Board


Sarah Spurgeon, Editor in Chief

Jón Atli Benediktsson Behzad Razavi Jeffrey Reed


Anjan Bose Jim Lyke Diomidis Spinellis
James Duncan Hai Li Adam Drobot
Amin Moeness Brian Johnson Tom Robertazzi
Desineni Subbaram Naidu Ahmet Murat Tekalp
Learning Automata and Their Applications
to Intelligent Systems

JunQi Zhang
Tongji University, Shanghai, China

MengChu Zhou
New Jersey Institute of Technology, New Jersey, USA
Copyright © 2024 by The Institute of Electrical and Electronics Engineers, Inc.
All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.


Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,
MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to
the Publisher for permission should be addressed to the Permissions Department, John Wiley &
Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley &
Sons, Inc. and/or its affiliates in the United States and other countries and may not be used
without written permission. All other trademarks are the property of their respective owners.
John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with respect to the
accuracy or completeness of the contents of this book and specifically disclaim any implied
warranties of merchantability or fitness for a particular purpose. No warranty may be created or
extended by sales representatives or written sales materials. The advice and strategies contained
herein may not be suitable for your situation. You should consult with a professional where
appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other
commercial damages, including but not limited to special, incidental, consequential, or other
damages.

For general information on our other products and services or for technical support, please
contact our Customer Care Department within the United States at (800) 762-2974, outside the
United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic formats. For more information about Wiley products,
visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data Applied for:

Hardback ISBN: 9781394188499

Cover Design: Wiley


Cover Image: © Vertigo3d/Getty Images

Set in 9.5/12.5pt STIXTwoText by Straive, Chennai, India


v

Contents

About the Authors ix


Preface xi
Acknowledgments xiii
A Guide to Reading this Book xv
Organization of the Book xvii

1 Introduction 1
1.1 Ranking and Selection in Noisy Optimization 2
1.2 Learning Automata and Ordinal Optimization 5
1.3 Exercises 7
References 7

2 Learning Automata 9
2.1 Environment and Automaton 9
2.1.1 Environment 9
2.1.2 Automaton 10
2.1.3 Deterministic and Stochastic Automata 11
2.1.4 Measured Norms 15
2.2 Fixed Structure Learning Automata 16
2.2.1 Tsetlin Learning Automaton 16
2.2.2 Krinsky Learning Automaton 18
2.2.3 Krylov Learning Automaton 19
2.2.4 IJA Learning Automaton 20
2.3 Variable Structure Learning Automata 21
2.3.1 Estimator-Free Learning Automaton 22
2.3.2 Deterministic Estimator Learning Automaton 24
2.3.3 Stochastic Estimator Learning Automaton 26
2.4 Summary 27
2.5 Exercises 28
References 29
vi Contents

3 Fast Learning Automata 31


3.1 Last-position Elimination-based Learning Automata 31
3.1.1 Background and Motivation 32
3.1.2 Principles and Algorithm Design 35
3.1.3 Difference Analysis 37
3.1.4 Simulation Studies 40
3.1.5 Summary 45
3.2 Fast Discretized Pursuit Learning Automata 46
3.2.1 Background and Motivation 46
3.2.2 Algorithm Design of Fast Discretized Pursuit LAs 48
3.2.3 Optimality Analysis 54
3.2.4 Simulation Studies 59
3.2.5 Summary 63
3.3 Exercises 63
References 64

4 Application-Oriented Learning Automata 67


4.1 Discovering and Tracking Spatiotemporal Event Patterns 67
4.1.1 Background and Motivation 69
4.1.2 Spatiotemporal Pattern Learning Automata 70
4.1.3 Adaptive Tunable Spatiotemporal Pattern Learning Automata 73
4.1.4 Optimality Analysis 76
4.1.5 Simulation Studies 83
4.1.6 Summary 89
4.2 Stochastic Searching on the Line 89
4.2.1 Background and Motivation 89
4.2.2 Symmetrical Hierarchical Stochastic Searching on the Line 95
4.2.3 Simulation Studies 99
4.2.4 Summary 104
4.3 Fast Adaptive Search on the Line in Dual Environments 104
4.3.1 Background and Motivation 109
4.3.2 Symmetrized ASS with Buffer 111
4.3.3 Simulation Studies 114
4.3.4 Summary 118
4.4 Exercises 118
References 119

5 Ordinal Optimization 123


5.1 Optimal Computing-Budget Allocation 123
5.2 Optimal Computing-Budget Allocation for Selection of Best and
Worst Designs 125
Contents vii

5.2.1 Background and Motivation 125


5.2.2 Approximate Optimal Simulation Budget Allocation 126
5.2.3 Simulation Studies 138
5.2.4 Summary 150
5.3 Optimal Computing-Budget Allocation for Subset Ranking 151
5.3.1 Background and Motivation 151
5.3.2 Approximate Optimal Simulation Budget Allocation 153
5.3.3 Simulation Studies 159
5.3.4 Summary 167
5.4 Exercises 167
References 168

6 Incorporation of Ordinal Optimization into Learning


Automata 175
6.1 Background and Motivation 175
6.2 Learning Automata with Optimal Computing Budget Allocation 178
6.3 Proof of Optimality 182
6.4 Simulation Studies 187
6.5 Summary 193
6.6 Exercises 193
References 194

7 Noisy Optimization Applications 199


7.1 Background and Motivation 200
7.2 Particle Swarm Optimization 202
7.2.1 Parameters Configurations 203
7.2.2 Topology Structures 203
7.2.3 Hybrid PSO 203
7.2.4 Multiswarm Techniques 204
7.3 Resampling for Noisy Optimization Problems 204
7.4 PSO-Based LA and OCBA 205
7.5 Simulations Studies 209
7.6 Summary 223
7.7 Exercises 224
References 224

8 Applications and Future Research Directions of Learning


Automata 231
8.1 Summary of Existing Applications 231
8.1.1 Classification 231
8.1.2 Clustering 233
viii Contents

8.1.3 Games 233


8.1.4 Knapsack Problems 234
8.1.5 Decision Problems in Networks 235
8.1.6 Optimization 236
8.1.7 LA Parallelization and Design Ranking 238
8.1.8 Scheduling 240
8.2 Future Research Directions 241
8.3 Exercises 243
References 243

Index 249
ix

About the Authors

JunQi Zhang received the PhD degree in computing science from Fudan
University, Shanghai, China, in 2007. He became a post-doctoral research
fellow and a lecturer in the Key Laboratory of Machine Perception, Ministry
of Education in Computer Science, Peking University, Beijing, China, in 2007.
He is currently a full professor in the Department of Computer Science and
Technology, Tongji University, Shanghai. His current research interests include
intelligent and learning automata, reinforcement machine learning, particle
swarm optimization, firework algorithm, big data and high-dimensional index,
and multimedia data management. He has authored 20+ IEEE Transactions and
60+ conference papers in the above areas. Prof. Zhang was a recipient of the
Outstanding Post-Doctoral Award from Peking University.

MengChu Zhou received his BS degree in control engineering from the Nanjing
University of Science and Technology, China, in 1983; MS degree in automatic
control from the Beijing Institute of Technology, China, in 1986; and PhD degree
in computer and systems engineering from Rensselaer Polytechnic Institute,
USA, in 1990. He then joined the New Jersey Institute of Technology in 1990
and has been a distinguished professor since 2013. His interests are in intelligent
automation, robotics, Petri nets, and AI. He has over 1100 publications including
14 books, 600+ IEEE Transactions papers, and 31 patents. He is a recipient
of Humboldt Research Award for US Senior Scientists from Alexander von
Humboldt Foundation; Franklin V. Taylor Memorial Award and Norbert Wiener
Award from IEEE Systems, Man, and Cybernetics Society; and Edison Patent
Award from the Research & Development Council of New Jersey. He is a fellow
of IEEE, IFAC, AAAS, CAA, and NAI.
xi

Preface

Stochastic ranking and selection aim to design statistical procedures that select
a candidate with the highest mean performance from a finite set of candidates
whose performance is uncertain but may be estimated by a learning process based
on interactions with a stochastic environment or by simulations. The number of
interactions with environments or simulations of candidates is usually limited
and needs to be minimized due to limited computational resources. Traditional
approaches taken in the literature include frequentist statistics, Bayesian statis-
tics, heuristics, and asymptotic convergence in probability. Novel and recent
approaches to stochastic ranking and selection problems are learning automata
and ordinal optimization.
Surprisingly, none introduces or studies learning automata and ordinal opti-
mization in the same one book as if these two techniques have no relevance at
all. A learning automaton is a powerful tool for reinforcement learning and aims
at learning the optimal action that maximizes the probability of being rewarded
out of a set of allowable actions by the interaction with a stochastic environment.
An update scheme of its state probability vector of actions is critical for learn-
ing automata. Its action probability vector plays two roles: (i) deciding when it
converges, which is highly related to its used total computing budget, and (ii) allo-
cating computing budget among actions to identify the optimal one, where only
ordinal optimization is required. Ordinal optimization has emerged as an efficient
technique for simulation optimization. Its underlying philosophy is to obtain good
estimate of the optimal action or design while the accuracy of an estimate need
not be that high. It is much faster for selecting an outstanding action or design in
many practical situations if our goal is to find the best one rather than an accu-
rate performance value of the best one. Therefore, learning automata and ordi-
nal optimization share the common objective and the latter can provide efficient
methods as an update scheme of the state probability vector of actions for learning
automata.
xii Preface

This book introduces and combines learning automata and ordinal optimiza-
tion to solve stochastic ranking and selection problems for the first time. This
book may serve as a reference for those in the field and as a means for those new
to the field for understanding and applying the main reinforcement learning and
intelligent optimization approaches to the problems of their interest. This book is
both a research monograph for the intended audience including researchers and
practitioners, and a main reference book for a first-year graduate course for the
graduate students, in the fields of business, engineering, management science,
operation management, stochastic control, economics, and computer science.
Only basic knowledge of probability and an undergraduate background in the
above-mentioned majors are needed for understanding this book’s materials.

JunQi Zhang
Shanghai, China
MengChu Zhou
New Jersey, USA and Hangzhou, China
xiii

Acknowledgments

From the first author of this book:


I would like to thank all the people who have contributed to this book and the
research team at Tongji University for their full dedication and quality research.
In particular, I would like to acknowledge the following individuals.
First, I would like to express my great appreciation to this book’s co-author,
Professor MengChu Zhou, for his inspirational advice and insightful suggestions
to help strengthen the visions and concepts of this book.
I would like to thank the significant help from my students Drs. Huan Liu and
DuanWei Wu, Mr. Peng Zu, Mr. YeHao Lu, and Mr. YunZhe Wu and for content
and material preparations, as well as the research outcomes.
I would like to appreciate the Wiley-IEEE Press for providing the opportunity
to publish this book and the esteemed editor and anonymous reviewers for
reviewing our work. Special thanks are given to Jayashree Saishankar, Managing
Editor at Wiley-IEEE, and Victoria Bradshaw, Senior Editorial Assistant, who
kindly and patiently helped us move smoothly during our book writing and
preparation period.
I would like to acknowledge the support from Innovation Program of Shanghai
Municipal Education Commission (202101070007E00098), Shanghai Industrial
Collaborative Science and Technology Innovation Project (2021-cyxt2-kj10),
Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100),
the Fundamental Research Funds for the Central Universities, the National
Natural Science Foundation of China (51775385, 61703279, 62073244, 61876218),
and the Shanghai Innovation Action Plan under grant no. 20511100500.
Finally, I truly appreciate the continuous support and endless love from my
family, especially from my wife and two children, who always stay with me and
make my life colorful and happy.
xiv Acknowledgments

From the second author of this book:


Numerous collaborations have been behind this book and its related work.
It would be impossible to reach this status without the following collaborators,
some of whom are already mentioned in the first author’s message.
I would like to thank Professors Naiqi Wu (Fellow of IEEE, Macau Institute
of Systems Engineering, Macau University of Science and Technology, China),
Zhiwu Li (Fellow of IEEE, Macau Institute of Systems Engineering, Macau
University of Science and Technology, China), Maria Pia Fanti (Fellow of
IEEE, Dipartimento di Elettrotecnica ed Elettronica, Polytechnic of Bari, Italy),
Giancarlo Fortino (Fellow of IEEE, Department of Computer Science, Modeling,
Electronics and Systems Engineering (DIMES), University of Calabria, Italy), and
Keyi Xing (Systems Engineering Institute, Xi’an Jiaotong University, China).
I have enjoyed the great support and love from my family for long. It would
be impossible to accomplish this book and many other achievements without
their support and love. The work presented in this book was in part supported
by FDCT (Fundo para o Desenvolvimento das Ciencias e da Tecnologia) under
Grant No. 0047/2021/A1, and Lam Research Corporation through its Unlock
Ideas program.
xv

A Guide to Reading this Book

This book is written for senior students, graduate students, researchers, and prac-
titioners in relevant machine learning fields. This book can be a reference book
for those studying computer science/engineering, computational intelligence,
machine learning, artificial intelligence, systems engineering, industrial engi-
neering, and any field that deals with the optimal design and operation of
complex systems with noisy disturbance to apply learning automata to their
problem solving. The readers are assumed to have a background of computer
programming and discrete mathematics including set theory and predicate logic.
If readers have knowledge of formalization, optimization, and machine
learning, it should be easy for them to understand our presented algorithms and
problem formalizations. Readers may choose to learn the ideas and concepts
of learning automata. Readers may focus on the novel ideas and their related
algorithms that are useful for their research and particular applications. If you
do not like the ordinal optimization, you may choose to skip it and focus on the
learning automata. The following chart should help readers better obtain what
they need to read based on what they would learn from this book.
xvi A Guide to Reading this Book

Start

Chapters 1–2
Introduction and LA

Yes Interested in No
math
formalizations

Chapters 3–4 Chapters 3–4 without


Fast and application-oriented LA optimality analysis

No Interested in Yes
ordinal
optimization?

Interested in
Chapters 7–8 math
Applications and future directions formalizations?

Yes No
Chapters 5–8
Ordinal Optimization, applications, Chapters 5–8 without
and future directions optimality analysis

End
xvii

Organization of the Book

To overview the contents, this book first introduces the basic concept of learning
automata in Chapter 2. Two pioneering and improved variants of learning
automata from the perspectives of convergence and computational cost, respec-
tively, are presented in Chapter 3. Two application-oriented learning automata
are given Chapter 4. One discovers and tracks spatiotemporal event patterns.
The other solves the problem of stochastic search on a line. Ordinal optimization
is another method to solve stochastic ranking and selection problems and is intro-
duced in Chapter 5 through demonstrations of two pioneering variants of optimal
computing budget allocation (OCBA). Chapter 6 incorporates learning automata
with ordinal optimization to further improve their convergence performance.
The role of ordinal optimization is separated from the action probability vector
in learning automata. Then, as a pioneering ordinal optimization method, OCBA
is introduced into learning automata to allocate the computing budget to actions
in a way that maximizes the probability of selecting the true optimal action.
The differences and relationships between learning automata and ordinal opti-
mization can be well understood through this chapter. Chapter 7 demonstrates
an example to show how both learning automata and OCBA can be used in noisy
optimization. Finally, Chapter 8 summarizes the existing applications of learning
automata and suggests their future research directions.
1

Introduction

It is an expensive process to rank and select the best one from many complex
discrete event dynamic systems that are computationally intensive to simulate.
Therefore, learning the optimal system, action, alternative, candidate, or design
is a classical problem and has many applications in the areas of intelligent sys-
tem design, statistics and stochastic simulation, machine learning, and artificial
intelligence.
Stochastic ranking and selection of given system designs can be treated as a
simulation optimization problem. Its solution requires one to design statistical
procedures that can select the one with the highest mean performance from a
finite set of systems, alternatives, candidates, or designs. Their mean values are
unknown and can be estimated only by statistical sampling, because their reward
from environments is not deterministic but stochastic due to unknown noise. It is
a classical problem in the areas of statistics and stochastic simulation.

Example 1.1 Finding the most effective drug or treatment from many different
alternatives where the economic cost of each sample for testing the effectiveness of
the drug is very expensive and risky. Another representative example is the archery
competition. How can we select the real champion while using the least number
of arrows?

Noise comes mainly from three kinds of uncertainties [1]. (i) Environmental
uncertainties: operating temperature, pressure, humidity, changing material prop-
erties and drift, etc. are a common kind of uncertainties. (ii) Design parameter
uncertainties: the design parameters of a product can only be realized to a certain
degree of accuracy because high precision machinery is expensive. (iii) Evaluation
uncertainties: the uncertainty happens in the evaluation of the system output and
the system performance including measuring errors and all kinds of approxima-
tion errors if models instead of the real physical objects are used.

Learning Automata and Their Applications to Intelligent Systems, First Edition.


JunQi Zhang and MengChu Zhou.
© 2024 The Institute of Electrical and Electronics Engineers, Inc. Published 2024 by John Wiley & Sons, Inc.
2 1 Introduction

1.1 Ranking and Selection in Noisy Optimization


Real-world optimization problems are often subject to uncertainty as variables can
be affected by imprecise measurements or just corrupted by other factors such as
communication errors. In either case, uncertainty is an inherent characteristic of
many such problems and therefore needs to be considered when tailoring meta-
heuristics to find good solutions. Noise is a class of uncertainty and corrupts the
objective values of solutions at each evaluation [2]. Noise has been shown to sig-
nificantly deteriorate the performance of different metaheuristics such as Particle
Swarm Optimizer (PSO), Genetic Algorithms (GA), Evolutionary Strategies (ES),
Differential Evolution (DE), and other metaheuristics.

Example 1.2 Ranking and selection are the critical part in PSO. First, each par-
ticle has to compare the fitness of its new position to its previous best and retain
the better one. Second, the overall best solution found so far has to be determined
as a global best solution to lead the swarm flying and refining the accuracy. Since
PSO has a memory that stores the estimated personal best solution of a particle
and the estimated global best solution of a swarm, noise leads such memory to
be inaccurate over iterations and particles eventually fail to rank and select good
solutions from bad ones, which drives the particle swarm toward a wrong direc-
tion. Concretely, a noisy fitness function induces noisy fitness evaluations and
causes two types of undesirable selection behavior [3]: (i) A superior candidate
may be erroneously believed to be inferior, causing it to be eliminated; (ii) An infe-
rior candidate may be erroneously believed to be superior, causing it to survive
and reproduce. These behaviors in turn cause the following undesirable effects:
(i) The learning rate is reduced. (ii) The system does not retain what it has learnt.
(iii) Exploitation is limited. (iv) Fitness does not monotonically improve with gen-
eration even with elitism.

Uncertainty has to be taken into account in many real-world optimization prob-


lems. For example, in the engineering field, the signal returned from the real world
usually includes a significant amount of noise due to the measure error or various
uncertainties. It is usually modeled as sampling noise from a Gaussian distribu-
tion [4]. Therefore, the noise can be characterized by its standard deviation 𝜎 and
̌
classified into additive and multiplicative ones. Its impact to function F(x) can be
expressed as:
̌
F̂ + (x) = F(x) + N(0, 𝜎 2 ), (1.1)
̌
F̂ × (x) = F(x) × N(1, 𝜎 2 ), (1.2)
̌
where F(x) represents the real fitness value of solution x, F̂ + (x) and F̂ × (x) are illu-
sory fitness values of solution x in additive and multiplicative noisy environments,
1.1 Ranking and Selection in Noisy Optimization 3

×104
3

0
100
50 100
0 50
0
–50 –50
–100 –100
(a)

×104
4

2
1

0
100
50 100
0 50
–50 0
–50
–100 –100
(b)

×104
4

3
2
1

0
100
50 100
0 50
–50 0
–50
–100 –100
(c)

Figure 1.1 3-D map of a Sphere function with additive and multiplicative noise. (a) True.
(b) Corrupted by additive noise with 𝜎 = 0.1. (c) Corrupted by multiplicative noise with
𝜎 = 0.3.
4 1 Introduction

×105
15

10

0
100
50 100
0 50
–50 0
–50
–100 –100
(a)

×105
20

15

10

0
100
50 100
0 50
–50 0
–50
–100 –100
(b)

×106
3

0
100
50 100
0 50
–50 0
–50
–100 –100
(c)

Figure 1.2 3-D map of a Different Powers function with additive and multiplicative
noise. (a) True. (b) Corrupted by additive noise with 𝜎 = 0.1. (c) Corrupted by
multiplicative noise with 𝜎 = 0.3.
1.2 Learning Automata and Ordinal Optimization 5

respectively. It is worth noting that they are identical in the environment where
𝜎 = 0. It is obvious that multiplicative noise is much more challenging, since it
can bring much larger disturbance to fitness values than additive noise.

Example 1.3 The 3-D maps of functions [5] with additive and multiplicative
noise with different 𝜎 are illustrated in Figs. 1.1 and 1.2. The figures show that
the additional challenge of multiplicative noise over additive noise is a larger
corruption of the objective values whose magnitude changes across the search
space proportionally to the objective values of the solutions. For multiplicative
noise, its impact depends on the optimization objective and the range of objective
space. Specifically, on minimization problems whose objective space is only
positive, the objective values of better solutions (whose fitness value is small) can
be less affected by multiplicative noise. Conversely, in maximization problems,
the objective values of better solutions can be more affected [2].

Therefore, if the problem is subject to noise, the quality of the solutions


deteriorates significantly. The basic resampling method uses many re-evaluations
to estimate the fitness of a candidate solution. Thus, the efficiency of resampling
determines the accuracy of ranking and selection of the elite solutions, which are
critical to intelligent optimization methods during their evolution to the optimal
solutions. Yet the resampling costs too much. Considering a fixed and limited
computational budget of function evaluations or environmental interactions,
resampling methods better estimate the objective values of the solutions by per-
forming fewer iterations. Intelligent allocation of resampling budget to candidate
solutions can save many evaluations and improve the estimation of the solution
fitness.

1.2 Learning Automata and Ordinal Optimization


Ranking and selection procedures were developed in the 1950s for statistical
selection problems such as choosing the best treatment for a medical condi-
tion [6]. The problem of selecting the best among a finite set of alternatives needs
an efficient learning scheme given a noisy environment and limited simulation
budget, where the best is defined with respect to the highest mean performance,
and where the performance is uncertain but may be estimated via simulation.
Approaches presented in the literature include frequentist statistics, Bayesian
statistics, heuristics [7], and asymptotic convergence in probability [6]. This
book focuses on learning automata and ordinal optimization methods to solve
stochastic ranking and selection problems.
Investigation of a Learning Automaton (LA) began in the erstwhile Soviet
Union with the work of Tsetlin [8, 9] and popularized as LA in a survey paper
6 1 Introduction

in 1974 [10]. These early models were referred to as deterministic and stochastic
automata operating in random environments. Systems built with LA have been
successfully employed in many difficult learning situations and the reinforcement
learning represents a development closely related to the work on LA over the
years. This has also led to the concept of LA being generalized in a number of
directions in order to handle various learning problems.
An LA represents an important tool in the area of reinforcement learning
and aims at learning the optimal action that maximizes the probability of being
rewarded out of a set of allowable actions by the interaction with a random
environment. During a cycle, an automaton chooses an action and then receives a
stochastic response that can be either a reward or penalty from the environment.
The action probability vector of choosing the next action is then updated by
employing this response. The ability of learning how to choose the optimal action
endows LA with high adaptability to the environment. Various LAs and their
applications have been reviewed in survey papers [10], [11] and books [12–15].
Ordinal Optimization (OO) is introduced by Ho et al. [16]. There are two
basic ideas behind it: (i) Estimating the order among solutions is much easier
than estimating the absolute objective values of each solution. (ii) Softening the
optimization goal and accepting good enough solutions leads to an exponential
reduction in computational burden. The Optimal Computing Budget Allocation
(OCBA) [17–22] is a famous approach that uses an average-case analysis rather
than a worst-case bound of the indifference zone. It attempts to sequentially
maximize the probability that the best alternative can be correctly identified after
the next stage of sampling. Its procedure tends to require much less sampling
effort to achieve the same or better empirical performance for correct selection
than the procedures that are statistically more conservative.

Example 1.4 This example shows the objective of ordinal optimization. A sys-
tem works with one of 10 components that have their own time-to-failure. The
objective is to decide which component is the worst one to minimize steady-state
system unavailability given budget constraints. We have the budget to perform
only 50 component tests to obtain if the tested component leads to a failure. How
do we allocate these budget to these 10 components so as to identify the worst one
is the objective of ordinal optimization.

The next example shows the reason of softening the optimization goal, i.e.,
reducing computational burden.

Example 1.5 This example shows the reason of softening the optimization goal
and accepting good enough solutions leads to an exponential reduction in compu-
tational burden. A class consists of 50 students. The objective is to find the tallest
one. However, we do not have to measure all student’s accurate height. We can use
some sorting algorithms like bubble sorting to identify the tallest student. In this
References 7

way, we do not need all students’ accurate heights. Furthermore, the order of the
other students except the tallest one is not important and can be inaccurate. An
ordinal optimization method can also be used to identify the tallest student.

1.3 Exercises
1 What is the objective of stochastic ranking and selection?

2 What are the difficulties caused by noise in solving optimization problems?

3 What are the advantages and constraints of resampling in noisy optimization?

4 What is the objective of a learning automaton?

5 What are the basic ideas of ordinal optimization?

6 For additive and multiplicative noise, which one is more difficult to handle
and why?

7 In Example 1.5., how to find the tallest student without their accurate
height data?

References

1 H.-G. Beyer and B. Sendhoff, “Robust optimization–a comprehensive survey,”


Computer Methods in Applied Mechanics and Engineering, vol. 196, no. 33-34,
pp. 3190–3218, 2007.
2 J. Rada-Vilela, “Population statistics for particle swarm optimization on
problems subject to noise,” Ph.D. dissertation, 2014.
3 A. Di Pietro, L. While, and L. Barone, “Applying evolutionary algorithms to
problems with noisy, time-consuming fitness functions,” in Proceedings of
Congress on Evolutionary Computation, vol. 2. IEEE, 2004, pp. 1254–1261.
4 Y. Jin and J. Branke, “Evolutionary optimization in uncertain environments-a
survey,” IEEE Transactions on Evolutionary Computation, vol. 9, no. 3,
pp. 303–317, 2005.
5 J. Liang, B. Qu, and P. N. Suganthan, “Problem definitions and evaluation
criteria for the CEC 2013 special session on real-parameter optimization,”
Zhengzhou University, Zhengzhou, China and Nanyang Technological Univer-
sity, Singapore, Technical Report, vol. 201212, 2013.
6 M. C. Fu, “Handbook of simulation optimization,” Springer, 2015, vol. 216.
8 1 Introduction

7 Y. Jin, H. Wang, and C. Su, “Data-driven evolutionary optimization integrating


evolutionary computation,” Machine Learning and Data Science, Springer,
Cham, Switzerland, 2021.
8 M. Tsetlin, “On the behavior of finite automata in random media,”
Automation and Remote Control, pp. 1210–1219, 1961.
9 M. L. Tsetlin, “Automaton theory and the modeling of biological systems,”
New York: Academic, 1973.
10 K. S. Narendra and M. A. L. Thathachar, “Learning automata: A survey,” IEEE
Transactions on Systems, Man, and Cybernetics, vol. 4, pp. 323–334, 1974.
11 M. A. L. Thathachar and P. S. Sastry, “Varieties of learning automata:
An overview,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 32,
pp. 711–722, 2002.
12 S. Lakshmivarahan, “Learning algorithms theory and applications,” New York:
Springer-Verlag, 1981.
13 K. S. Narendra and M. A. L. Thathachar, “Learning automata: An introduc-
tion,” Englewood Cliffs, NJ: Prentice-Hall, 1989.
14 K. Najim and A. S. Poznyak, “Learning automata: Theory and applications,”
New York: Pergamon, 1994.
15 A. S. Poznyak and K. Najim, “Learning automata and stochastic optimization,”
New York: Springer, 1997.
16 Y. C. Ho, R. S. Sreenivas, and P. Vakili, “Ordinal optimization of discrete
event dynamic systems,” Discrete Event Dynamic Systems (DEDS), vol. 2, no. 2,
pp. 61–88, 1992.
17 H.-C. Chen, C.-H. Chen, and E. Yücesan, “Computing efforts allocation for
ordinal optimization and discrete event simulation,” IEEE Transactions on
Automatic Control, vol. 45, no. 5, pp. 960–964, 2000.
18 C.-H. Chen, J. Lin, E. Yücesan, and S. E. Chick, “Simulation budget allocation
for further enhancing the efficiency of ordinal optimization,” Discrete Event
Dynamic Systems: Theory and Applications, vol. 10, pp. 251–270, 2000.
19 Y. Ho, Q. Zhao, and Q. Jia, “Ordinal optimization: Soft optimization for hard
problems,” New York: Springer, 2007.
20 J. Zhang, Z. Li, C. Wang, D. Zang, and M. Zhou, “Approximate simulation
budget allocation for subset ranking,” IEEE Transactions on Control Systems
Technology, vol. 25, no. 1, pp. 358–365, 2017.
21 J. Zhang, L. Zhang, C. Wang, and M. Zhou, “Approximately optimal com-
puting budget allocation for selection of the best and worst designs,” IEEE
Transactions on Automatic Control, vol. 62, no. 7, pp. 3249–3261, 2017.
22 J. Zhang, C. Wang, D. Zang, and M. Zhou, “Incorporation of optimal com-
puting budget allocation for ordinal optimization into learning automata,”
IEEE Transactions on Automation Science and Engineering, vol. 13, no. 2,
pp. 1008–1017, 2016.
9

Learning Automata

In order to simulate a biological learning process and achieve automatic learning


of a machine, Tsetlin [1] et al., first proposed a novel mathematical model called
Learning Automaton (LA). LA’s performance is optimized by constantly interact-
ing with random environments. Consequently, the optimal action can be selected
from the set of alternative ones in the current environment. The optimal action
is defined as the action with the highest environmental reward probability in the
current environment. LA is one of the reinforcement learning algorithms and has
the advantages of being simple and easy to implement, possessing fast random
optimization and strong anti-noise ability, and complete convergence. During its
development and evolution over several decades, the convergence speed and accu-
racy of the LA algorithm have been greatly improved. In addition, LAs have also
been applied to many application fields, such as graph coloring, random shortest
path, distributed computing, wireless network spectrum allocation, image pro-
cessing, and pattern recognition.

2.1 Environment and Automaton

2.1.1 Environment
An LA consists of two parts: an automaton and an environment. The latter shown
in Fig. 2.1 is defined mathematically as a triple ⟨A, B, C⟩, which can be explained
as follows.
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r } is a set of actions (r ≥ 2). The action selected at instant t is
denoted by 𝛼(t).
2) B = {𝛽1 , 𝛽2 , ..., 𝛽u } is the output set of possible environmental responses. The
environmental response at instant t is denoted by 𝛽(t). To simplify our dis-
cussions, let B = {𝛽1 , 𝛽2 } = {0, 1}. “1” and “0” denote the reward and penalty
responses, respectively.
Learning Automata and Their Applications to Intelligent Systems, First Edition.
JunQi Zhang and MengChu Zhou.
© 2024 The Institute of Electrical and Electronics Engineers, Inc. Published 2024 by John Wiley & Sons, Inc.
10 2 Learning Automata

Input Environment Output:


A = {α1, α2, ..., αr} C = {c1, c2, ..., cr} B = {0, 1}

Figure 2.1 An environment.

3) C = {cij = Pr{𝛽(t) = 𝛽j |𝛼(t) = 𝛼i }}, i ∈ ℤr = {1, 2, ..., r}, j ∈ ℤu = {1, 2, ..., u}


is the penalty probability matrix that the environment rewards actions.
Since cij is a conditional probability, the environment is called a random envi-
ronment. According to the different values of B, the random environment can be
divided into three environmental models, i.e., P-model, Q-model, and S-model.
If B is a finite set and its elements are discretely distributed over the unit inter-
val [0, 1], the environmental model is referred to as a Q-model; if B has an infinite
number of values, and its elements are continuously distributed in the unit interval
[0, 1], the environmental model is referred as an S-model, while an environmen-
tal model whose response B contains only output values 0 and 1 is referred as a
P-model.
Q- and S-models are more closely related to biological learning and have more
practical application values. However, research on a P-model is the basis of
studying other more complex environments. Therefore, a P-model is usually more
attractive to many scholars.
Considering a P-model, if ci = Pr{𝛽(t) = 1|𝛼(t) = 𝛼i }, then we have 1 − ci =
Pr{𝛽(t) = 0|𝛼(t) = 𝛼i }. In this case, the penalty probability can be written as
C = {c1 , c2 , ..., cr }, where each ci , i ∈ ℤr , corresponds to the element 𝛼i of A. If cij
is not a function of time, the environment is known as a stationary environment;
otherwise, it is a non-stationary environment.

2.1.2 Automaton
An automaton shown in Fig. 2.2 can be described by a quintuple ⟨A, B, Q, T, G⟩.
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r }, 2 ≤ r < ∞ is the set of outputs or actions of an automaton.
The action selected at instant t is denoted by 𝛼(t).
2) B = {𝛽1 , 𝛽2 ..., 𝛽u }, as an environmental response, is an input set of an automa-
ton. At instant t, it is denoted as 𝛽(t). B could be infinite or finite.
3) Q = {q1 , q2 , ..., qv } is a state of an automaton. At instant t, the state is denoted
by q(t).
4) T ∶ Q × B → Q is a state transfer function of an automaton. T determines how
an automaton migrates to a state of t + 1 according to the output, input, and
the state at an instant t.
5) G ∶ Q → A is an output function, which determines how an automaton pro-
duces output based on the state.
2.1 Environment and Automaton 11

State transfer function T : Q × B → Q

Input The state Output:


B = { β1, β2, ..., βu} Q = {q1, q2, ..., qs} A = {α1, α2, ..., αr}
or B = {a, b}

Output function G : Q → A

Figure 2.2 An automaton.

In the above definitions, if A, B, and Q are all finite sets, the automaton is said
to be finite.

2.1.3 Deterministic and Stochastic Automata


If the mapping relationship in functions T and G is determined, an automaton is
referred to be a deterministic one. More specifically, as a deterministic automaton,
given an input and a state, the state and output at the next instant are determined.
On the other hand, if any of T or G is random, an automaton is defined as a stochas-
tic one. Running the experiment twice, if T is a random mapping, even if the same
state and the same input are given at the same instant, the state at the next instant
can be different. Therefore, regardless of whether the output map is deterministic
or not, the output of the next instant cannot be determined in advance. Similarly,
given the same state, if G is a random mapping, the same action output cannot be
guaranteed.
Considering the case of probability, the random mapping of T can be represented
as a series of conditional probability matrices:
𝜏ilk = Pr{q(t + 1) = ql |q(t) = qi , 𝛽(t) = 𝛽k } i, l ∈ ℤv , 𝛽k ∈ B, (2.1)
where t is the transition probability from state qi to ql with the environment output

𝛽k at instant t. For each 𝛽k , l 𝜏ilk = 1. Thereby, we can conclude that T is a Markov
matrix, since the state at instant t + 1 is only related to the state at instant t.
The random mapping of G can be expressed as a conditional probability matrix:
gij = Pr{𝛼(t) = 𝛼j |q(t) = qi } 𝛼j ∈ A, qi ∈ Q, (2.2)

where j gij = 1.
We give two examples to intuitively reflect the difference between deterministic
and stochastic automata.

Example 2.1 Consider an automaton with two inputs, i.e., B = {0, 1}, two
outputs, i.e., A = {𝛼1 , 𝛼2 }, and four states, i.e., Q = {q1 , q2 , q3 , q4 }. A deterministic
state transition graph and deterministic output graph are shown in Figs. 2.3
and 2.4, respectively, where each hollow point represents a state, each link
12 2 Learning Automata

β=0
q1 q2 q3 q4

β=1
q1 q2 q3 q4

Figure 2.3 The deterministic state transition graph.

q1 q2
α1 α2

q3 q4

Figure 2.4 The deterministic output graph.

represents a state transition, and the arrow on the link indicates the direction
of state transition. We can also depict the corresponding deterministic state
transition function T and the deterministic output function G with a matrix set
and a matrix, respectively. Assume that the entries 𝜏ij0 and 𝜏ij1 have definition as
follows:
{
1, if qi → qj with an input 𝛽 = 0
𝜏ij𝛽 = .
0, otherwise

Then, the deterministic state transition function T can be identified as


q 1 q2 q3 q4 q1 q2 q3 q4
q1 ⎡1 0 0 0⎤ q1 ⎡0 1 0 0⎤
⎢ ⎥ ⎢ ⎥
T(0) = q2 ⎢1 0 0 0⎥ T(1) = q2 ⎢0 0 1 0⎥
.
q3 ⎢0 0 0 1⎥ q3 ⎢0 1 0 0⎥
q4 ⎢0 0 0 1⎥ q4 ⎢0 0 1 0 ⎥⎦
⎣ ⎦ ⎣
The sum of each row of the matrices is 1, and the elements in the matrices are
either 1 or 0. Therefore, at the present instant, if a state and input are determined,
a state at the next instant is determined. Similarly, assume that entry gij is defined
as follows:
{
1, if G(qi ) = 𝛼j
gij = .
0, otherwise
2.1 Environment and Automaton 13

The deterministic output function G can be identified as


𝛼1 𝛼2
q1 ⎡1 0⎤
⎢ ⎥
G = q2 ⎢0 1⎥ .
q3 ⎢1 0⎥
q4 ⎢0 1⎥⎦

Analogously, the sum of all components in each row of the matrix is 1, and the
elements in the matrix are either 1 or 0. Therefore, if a state at the present instant
is determined, an action at the next instant is determined.

Example 2.2 Consider an automaton with the same 𝛽, 𝛼, and q as in


Example 2.1. A stochastic state transition graph and stochastic output graph are
shown in Figs. 2.5 and 2.6. The two mappings T and G are all stochastic. For the
stochastic state transition function T, given the present state and input, various
states, but not just one state, are likely to be reached. For the stochastic output
function G, given the present state, various actions, but not just one action, are
likely to be chosen. T can be defined in terms of conditional transition matrices

0.8 0.8 0.2 0.2 0.8

β=0
q1 q2 q3 q4
0.2 0.2 0.8

0.2 0.2 0.8 0.8 0.2

β=1
q1 q2 q3 q4
0.8 0.8 0.2

Figure 2.5 The stochastic state transition graph.

α2
0.7 1
q1 q2

0.2
0.3
q3 q4
1 α1 0.8

Figure 2.6 The stochastic output graph.


14 2 Learning Automata

T(𝛽0 ) and T(𝛽1 ). Each of them is a v × v matrix following an input 𝛽. The elements
in T have definition as follows:
𝜏ij𝛽 = Pr{q(t + 1) = qj |q(t) = qi , 𝛽(t) = 𝛽}. i, j ∈ ℤv , 𝛽 ∈ B.

Similarly, G can be defined in terms of a matrix with dimension v × r whose


elements are as follows:
gij = Pr{𝛼(t) = 𝛼j |q(t) = qi }. i ∈ ℤv , j ∈ ℤr .

According to Figs. 2.5 and 2.6, the corresponding stochastic state transition func-
tion T and the stochastic output function G can also be depicted with a matrix set
and a matrix, respectively.
The stochastic state transition function T can be identified as
q1 q2 q3 q4 q1 q2 q3 q4
q1 ⎡ .8 .2 0 0⎤ q1 ⎡ .2 .8 0 0⎤
⎢ ⎥ ⎢ ⎥
T(0) = q2 ⎢ .8 0 .2 0⎥ T(1) = q2 ⎢ .2 0 .8 0 ⎥.
q3 ⎢0 .2 0 .8 ⎥ q3 ⎢0 .8 0 .2 ⎥
q4 ⎢0 0 .2 .8 ⎥⎦ q4 ⎢0 0 .8 .2 ⎥⎦
⎣ ⎣
The stochastic output function G can be identified as
𝛼1 𝛼2
q1 ⎡ 0.3 0.7 ⎤
⎢ ⎥
G = q2 ⎢0 1 ⎥.
q3 ⎢1 0 ⎥
q4 ⎢ 0.8 0.2 ⎥⎦

Assume that we have state q1 and input 𝛽 = 0 in the present instant. We have
the transition probability to obtain the next state and may choose the next action
as follows:
𝜏11
0
= Pr{q(t + 1) = q1 |q(t) = q1 , 𝛽(t) = 0} = 0.8
𝜏12
0
= Pr{q(t + 1) = q2 |q(t) = q1 , 𝛽(t) = 0} = 0.2
𝜏13
0
= Pr{q(t + 1) = q3 |q(t) = q1 , 𝛽(t) = 0} = 0
𝜏14
0
= Pr{q(t + 1) = q4 |q(t) = q1 , 𝛽(t) = 0} = 0

and
g11 = Pr{𝛼(t) = 𝛼1 |q(t) = q1 } = 0.3
.
g12 = Pr{𝛼(t) = 𝛼2 |q(t) = q1 } = 0.7
Apparently, in state q1 and input 𝛽 = 0, we have a chance to access different
states and to choose different actions.
2.1 Environment and Automaton 15

2.1.4 Measured Norms


The purpose of LAs is to interact with the environment to obtain as many rewards
as possible from environmental feedbacks. For any two LAs, the one that receives
more rewards is better than the one that receives fewer rewards. Therefore, LAs
need to choose the “optimal behavior” to obtain the minimum penalty č . To mea-
sure LAs’ performance, a series of mathematical terms are utilized to indicate the
nature of LAs including expedient, absolutely expedient, optimal, and 𝜀-optimal.
In the absence of prior information, each action follows the same distribution.
LAs with this feature is called “pure chance automaton,” which is usually used
in various LAs. For the probability vector P(t) = [p1 (t), p2 (t), ..., pr (t)], the penalty
mean is defined as follows:
𝛷̄ = E[𝛽(t)|P(t) = [p1 (t), p2 (t), ..., pr (t)]]

= Pr[𝛽(t) = 0|𝛼i (t), P(t) = {p1 (t), p2 (t), ..., pr (t)}] ⋅ Pr[𝛼(t) = 𝛼i ]
i=r

= Pr[𝛽(t) = 0|𝛼i (t)] ⋅ Pr[𝛼(t) = 𝛼i ]
i=r

= ci ⋅ pi (t). (2.3)
i=r

Thereby, for the pure chance automaton, the penalty mean is:
∑ 1
𝛷̄ 0 = ci ⋅ . (2.4)
i=r
r
An LA is expedient if it meets:
̄
lim E{𝛷(t)} = lim E{E{𝛽(t)|P(t)}}
t→∞ t→∞
= lim E{𝛽(t)} < 𝛷̄ 0 (2.5)
t→∞
∑ 1
= ci ⋅ .
i=r
r
It is absolutely expedient if it meets:
̄ + 1)|P(t)} < 𝛷(t).
E{𝛷(t ̄ (2.6)

If the expectations on both sides are separately computed, we can obtain:


̄ + 1)|P(t)}} < E{𝛷(t)}
E{E{𝛷(t ̄ ̄ + 1)} < E{𝛷(t)}.
⇔ E{𝛷(t ̄ (2.7)
̄
This means that E{𝛷(t)} is strictly monotonously decreasing over time.
An LA is optimal if the probability of an action with the least penalty satisfies:

lim Pmin (t) → 1, w.p.1, (2.8)


t→∞

where “w.p.1” means with probability 1.


16 2 Learning Automata

In fact, the optimality condition mentioned above is so strict that many LAs
cannot satisfy it. Therefore, a slightly weaker definition called 𝜀-optimality is pro-
posed. An LA is 𝜀-optimal, and if for each 𝜀 > 0 and 𝛿 > 0, there is an instant
t0 < ∞ and a learning parameter 𝜆0 > 0, such that for all t ≥ t0 and 𝜆 < 𝜆0 :
Pr{| pmin (t) − 1| < 𝜀} > 1 − 𝛿. (2.9)
In the sense of penalty mean, this definition with any 𝜀 > 0 can also be writ-
ten as:
̄
lim E{𝛷(t)} < č + 𝜀. (2.10)
t→∞

Intuitively, 𝜀-optimality means that if given enough time and appropriate


parameters, the probability of an LA to choose the optimal action is infinitely close
to 1.

2.2 Fixed Structure Learning Automata


In the last section, we have introduced a deterministic and stochastic LA. For dif-
ferent automata, we can make the following summary based on their different
characteristics: In a deterministic automaton, the elements in the state transi-
tion matrices are composed of 1 and 0 only. Therefore, given an input, the state
transition process is deterministic, and it is thus called a deterministic automa-
ton. In a fixed-structure stochastic automaton, the elements in the state transition
matrices are composed of values in the interval [0, 1]. Therefore, given an input,
the state transition process is indeterminate and therefore belongs to a stochas-
tic automaton. The elements in the state transition matrices are fixed, and such
stochastic automaton is thus called a fixed structure stochastic automaton. In a
variable-structure stochastic automaton, the elements in the state transition matri-
ces are composed of values in the interval [0, 1]. Therefore, given an input, the
state transition process is indeterminate, therefore leading to a stochastic automa-
ton. The elements in the state transition matrices are variable, and such stochastic
automaton is thus called a variable-structure stochastic automaton.
There are several common fixed-structure automata, i.e., Tsetlin [1], Krylov [2],
Krinsky [2], IJA (Iraji–Jamalian Automaton) [3], and TFSLA (Tunable Fixed
Structure Learning Automaton) [4]. We introduce them next.

2.2.1 Tsetlin Learning Automaton


A Tsetlin automaton is the earliest proposed LA. Its state transition graph is shown
in Fig. 2.7.
2.2 Fixed Structure Learning Automata 17

1 2 s–1 s 2s 2s – 1 s+2 s+1


Favorable response β =0

1 2 s–1 s 2s 2s – 1 s+2 s+1


Unfavorable response β =1

Figure 2.7 The state transition graph of Tsetlin LA.

In Tsetlin LA, there are two actions, 2̂s states and each action can lead a state to
one of ŝ states. This LA is denoted as L̄ 2̂s,2 . It is easy to expand to r actions with r ⋅ ŝ
states.
The output function of a Tsetlin LA is relatively simple. When it is in state q(t) =
qi , i ∈ ℤŝ , the output is 𝛼1 . If it is in state q(t) = qi , i ∈ ℤ2̂s − ℤŝ , the output is 𝛼2 .
The output function can be defined as:
{
𝛼1 , i ∈ ℤŝ
G(qi ) = .
𝛼2 , i ∈ ℤ2̂s − ℤŝ

The state transition following an action can be expressed as:


}
qi → qi+1 (i ∈ ℤ2̂s−1 − ℤŝ )
𝛼2 results in an unfavorable response,
q2̂s → qŝ
}
qi → qi−1 (i ∈ ℤ2̂s−1 − ℤŝ )
𝛼2 results in a favorable response.
qŝ+1 → qŝ+1
(2.11)
}
qi → qi+1 (i ∈ ℤŝ−1 )
𝛼1 results in an unfavorable response,
qŝ → q2̂s
} (2.12)
qi → qi−1 (i ∈ ℤŝ − 1)
𝛼1 results in a favorable response.
q 1 → q1

At the initial moment, a Tsetlin LA is randomly in state q2̂s or qŝ without any
priori knowledge. After obtaining a feedback from the environment, the state tran-
sition is performed according to the state transition graph mentioned in Fig. 2.7.
For example, when q(t) = qŝ , its output action is 𝛼1 . Then if the Tsetlin LA receives
the reward feedback from the environment, the state changes to q(t) = qŝ−1 ; on the
contrary, if the feedback obtained from the environment is penalty, then the state
is transferred to q(t) = q2̂s .
18 2 Learning Automata

The finite state irreducible Markov chain is ergodic. So is the LA. It has been
proved that the penalty mean of the Tsetlin LA [1] is:
ŝ ŝ ŝ ŝ
1 𝛽̌1 −𝛽̂1 1 𝛽̌2 −𝛽̂2
ŝ−1 ⋅ 𝛽̌1 −𝛽̂1
+ ŝ−1 ⋅ 𝛽̌2 −𝛽̂2
𝛽̌1 𝛽̌2
̄ L̄ 2̂s,2 ) =
𝛷( , 𝛽̌1 + 𝛽̂1 = 1, 𝛽̌2 + 𝛽̂2 = 1. (2.13)
ŝ ŝ ŝ ŝ
1 𝛽̌1 −𝛽̂1 1 𝛽̌2 −𝛽̂2
ŝ ⋅ 𝛽̌1 −𝛽̂1
+ ŝ ⋅ 𝛽̌2 −𝛽̂2
𝛽̌ 1 𝛽̌2

In the environment min{𝛽̌1 , 𝛽̌2 } < 0.5, the Tsetlin LA has been proven to be
𝜀-optimal [5].

2.2.2 Krinsky Learning Automaton


Similar to Tsetlin LA, in Krinsky LA, there are two actions, 2̂s states and each
action can lead a state to one of ŝ states. Krinsky LA is denoted as K̄ 2̂s,2 . Its output
function is the same as Tsetlin LAs. When it is in state q(t) = qi , i ∈ ℤŝ , its output
is 𝛼1 . If it is in state q(t) = qi , i ∈ ℤ2̂s − ℤŝ , its output is 𝛼2 . Figure 2.8 shows the
state transition graph of the Krinsky LA. The output function can be defined as:
}
qi → qi+1 (i ∈ ℤ2̂s−1 − ℤŝ )
, 𝛼2 results in an unfavorable response,
q2̂s → qŝ
qi → qŝ+1 , 𝛼2 results in a favorable response.
(2.14)
}
qi → qi+1 (i ∈ ℤŝ−1 )
, 𝛼1 results in an unfavorable response,
qŝ → q2̂s (2.15)
qi → q1 , 𝛼1 results in a favorable response.
At the initial moment, the Krinsky LA is randomly in state q2̂s or qŝ without
any priori knowledge. When getting an environmental penalty, its state transition
process is the same as Tsetlin LAs. In Krinsky LA, each state jumps to the deepest
state when the environment rewards the current action as shown in Fig. 2.8, which
differs from Tsetlin LA’s behavior. For example, when Krinsky LA is in state qŝ ,
the output action is 𝛼1 . Then if the feedback from the environment is a penalty,
the state is transferred to q2̂s , which is the same as the Tsetlin LA. On the contrary,
if the feedback obtained from the environment is a reward, the state is transferred
to q1 , while Tsetlin LA transitions its state to qŝ−1 . The penalty mean of the Krinsky
LA is:
1 1
ŝ−1 + ŝ−1
𝛽̌1 𝛽̌2
̄ K̄ 2̂s,2 ) =
𝛷( , 𝛽̌1 + 𝛽̂1 = 1, 𝛽̌2 + 𝛽̂2 = 1 (2.16)
1 1
ŝ + ŝ
𝛽̌ 1 𝛽̌2
2.2 Fixed Structure Learning Automata 19

1 2 s–1 s 2s 2s – 1 s+2 s+1


Favorable response β =0

1 2 s–1 s 2s 2s – 1 s+2 s+1


Unfavorable response β =1

Figure 2.8 The state transition graph of Krinsky LA.

The Krinsky LA has been proven to be 𝜀-optimal [5] in any deterministic envi-
ronment.

2.2.3 Krylov Learning Automaton


Similar to Tsetlin LA, in Krylov LA, there are two actions, 2̂s states and each action
can lead a state to one of ŝ states. Krylov LA is denoted as J̄ 2̂s,2 . Its output function
is the same as Tsetlin LA’s. When it is in state q(t) = qi , i ∈ ℤŝ , its output is 𝛼1 .
If it is in state q(t) = qi , i ∈ ℤ2̂s − ℤŝ , its output is 𝛼2 . Figure 2.9 shows the state
transition graph of the Krylov LA.
At the initial moment, the Krylov LA is randomly in state q2̂s or qŝ without any
priori knowledge. When getting an environmental reward, the state transition
process in Krylov is the same as in Tsetlin LA. Differently from the Tsetlin LA,
in Krylov LA, each state transfers to the two adjacent states with the probability
of 0.5 when the environment rewards the current action. For example, when
Krylov LA is in state qŝ , the output action is 𝛼1 . Then if the feedback from the
environment is a reward, the state is transferred to qŝ−1 , which is the same as in
the Tsetlin LA. On the contrary, if the feedback obtained from the environment

1 2 s–1 s 2s 2s – 1 s+2 s+1


Favorable response β =0

0.5 0.5 0.5 0.5 0.5


0.5 0.5

1 2 s–1 s 2s 2s – 1 s+2 s+1


Unfavorable response β =1

Figure 2.9 The state transition graph of krylov LA.


20 2 Learning Automata

is a penalty, the state is transferred to qŝ−1 or q2̂s with a same probability


of 0.5.
The penalty mean of the Krylov LA is:
ŝ ŝ
1 𝛽̃1 −1 1 𝛽̃2 −1
ŝ−1 ⋅ ̃ + ŝ−1 ⋅ ̃
𝛽̃1 𝛽 1 −1 𝛽̃2 𝛽 2 −1 1 1
̄ J̄ 2̂s,2 ) =
𝛷( , 𝛽̃1 = , 𝛽̃2 = (2.17)
1 1

𝛽̃1 −1 1 1

𝛽̃2 −1 ̌
1 − 𝛽1 1 − 𝛽̌2
𝛽̌1
⋅ ŝ−1 ⋅ 𝛽̃1 −1
+ 𝛽̌2
⋅ ŝ−1 ⋅
𝛽̃1 𝛽̃2 𝜆2 −1

̄ J̄ 2̂s,2 ) = min{𝛽̌1 , 𝛽̌2 }, implying that


The limit of the penalty mean is: limŝ→∞ 𝛷(
the Krylov LA is 𝜀-optimal [5] in any deterministic environment.

2.2.4 IJA Learning Automaton


In IJA LA (Iraji–Jamalian Automaton), there are two actions and 2̂s states. Each
action can lead a state to one of ŝ states. IJA LA is denoted as Ī 2̂s,2 . Figure 2.10 shows
the state transition graph of an IJA LA. We use symbols 𝛽̂1 to represent favorable
responses and 𝛽̌2 to represent unfavorable ones. Its output function is the same as
Tsetlin LAs.
At the initial moment, IJA LA is randomly in state q2̂s or qŝ without any priori
knowledge. When receiving an environmental penalty, its state transition process
in IJA is the same as Tsetlin LAs. Differently from Tsetlin LA, in IJA LA, when
receiving an environmental reward, its state is transferred to q⌊ ŝ ⌋ with the proba-
2
bility of 0.5.
The penalty mean of IJA LA is:
𝛽̌1 Ĩ1 Ĩ2 + 𝛽̌2 Ĩ3 Ĩ4
̄ Ī 2̂s,2 ) =
𝛷( . (2.18)
Ĩ1 Ĩ2 + Ĩ3 Ĩ4

β1 β2

β1 β1 β2 β2
β1 β2
β1 β1 β1 β2 β2 β2
1 2 s–1 s 2s 2s – 1 s+2 s+1
(a)
β1 β1 β2 β2
β1
β1 β1 β2 β2
β1 β2
β1 β1 β1 β1 β2 β2 β2 β2
β1
1 2 s/2 s–1 s 2s 2s – 1 3s/2 s+2 s +1
(b)

Figure 2.10 The state transition graph of IJA LA. (a) 𝛽̂1 represent favorable response and
(b) 𝛽̌2 represent unfavorable one.
2.3 Variable Structure Learning Automata 21

The relevant parameters are as follows:


⌊ ⌋

𝛽̌1 2 1−𝛽̌2


⌊ ⌋
ŝ ⋅ 𝛽̌2 2
Ĩ4 = ŝ , Ĩ = 1−𝛽̌2
+ ŝ ,
2
(1−𝛽̌1 )−1 3 2
(1−𝛽̌2 )−1
ŝ ⌊ ⌋ŝ +1 ŝ ⌊ 2ŝ ⌋ (2.19)
𝛽̌2 𝛽̌2 2 1−𝛽̌1 ŝ ⋅ c1
Ĩ2 = 1−𝛽̌1
+ [ ] , Ĩ = 1−𝛽̌1
+ ŝ ,
̂
(1−𝛽̌1 ) 2s (1−𝛽̌2 )−1
1
2
(1−𝛽̌1 )−1

The limit of the penalty mean is: limŝ→∞ 𝛷(̄ Ī 2̂s,2 ) = min{𝛽̌1 , 𝛽̌2 }, which means
IJA LA is 𝜀-optimal [3] in any deterministic environment.

2.3 Variable Structure Learning Automata

In Section 2.2, we have introduced the fixed structure stochastic automata in sta-
tionary random environments. Their state transition probabilities and action prob-
abilities are fixed.
In a pioneering paper, Varshavskii and Vorontsova first present automata that
update transition probabilities [6]. Fu and his associates give an extension updat-
ing action probabilities [7–9]. In this book, the emphasis is on schemes for updat-
ing action probabilities.
Compared with the fixed structure stochastic automata, the variable structure
learning automata could change its state with iteration count according to the
environment, and has the advantages of faster convergence and adaptive learn-
ing ability. Its quintuple ⟨A, B, Q, T, G⟩ model can be simplified into a quadruple
⟨A, B, Q, T⟩. The concepts of A, B, and T have the same definitions as introduced
before. In variable structure LAs, Q = ⟨P, 𝔼⟩. Here, P represents the action prob-
ability matrix, and 𝔼 is the estimator. Because the state is composed of two parts,
the update of the state is also divided into two parts: updates P and update 𝔼. The
updated formulas for the estimator and probability matrix are expressed as

𝔼(t + 1) = T𝔼 (𝔼(t), 𝛼(t), 𝛽(t)) (2.20)

P(t + 1) = TP (P(t), 𝛼(t), 𝛽(t), 𝔼(t + 1)). (2.21)

Therefore, the state transition function should also consist of two parts:
T = ⟨T𝔼 , TP ⟩.
From the updated formula of the probability vector, we know that {P(t)}t≥0 is
a discrete time homogeneous Markov process. According to whether the Markov
process is ergodic or absorbing, the update algorithm can be referred to as an
ergodic algorithm and an absorbing one [10–12]. If the updated formula of
P(t + 1) = TP (P(t), 𝛼(t), 𝛽(t), 𝔼(t + 1)) is linear, such LAs are called linear variable
structure LA; otherwise, nonlinear variable structure LA.
22 2 Learning Automata

In the update strategies, there are two basic principles for LA algorithms: when
the output behavior is punished by the environment, the probability of this behav-
ior should be reduced; otherwise, if the output behavior is rewarded by the envi-
ronment, the probability of this behavior should be increased. Different specific
algorithms may use different principles. Examples of combinations of these prin-
ciples are as follows:

1) RP (Reward Penalty): The probability vector is updated when it is rewarded and


punished by the environment;
2) RI (Reward Inaction): The probability vector is only updated when it is
rewarded by the environment but not when receiving an environmental
penalty; and
3) IP (Inaction Penalty): The probability vector is updated only when it is pun-
ished by the environment. The probability vector does not change when receiv-
ing environmental reward.

Variable structure learning automata can be further divided into three cate-
gories.

1) The first type is an estimator-free LA, which is specifically represented by a


quadruple ⟨A, B, Q, T⟩, where Q = ⟨P, 𝔼⟩ and 𝔼 = ∅;
2) The second type is LA with a deterministic estimator. This type of algorithms
is represented by a Pursuit estimator framework, which is specifically repre-
sented by the a quadruple ⟨A, B, Q, T⟩, where Q = ⟨P, 𝔼⟩, 𝔼 = D̃ and D̃ is a deter-
ministic estimator; and
3) The third type is an automatic learning machine with a stochastic estimator
represented by the SERI algorithm, which is specifically represented by a
̃ and D
̃ S⟩,
quadruple ⟨A, B, Q, T⟩, where Q = ⟨P, 𝔼⟩, 𝔼 = ⟨D, ̃ and S̃ are denoted
as deterministic and stochastic estimators, respectively.

2.3.1 Estimator-Free Learning Automaton


The work [6] is the first to propose an estimator-free LA. The LA update algorithms
proposed earlier are all calculated on a probability space that is continuous. To be
precise, the probability of each behavior can be any value in the interval [0, 1].
Oommen et al. discretize the continuous probability space and introduce discrete
space LA. Applying the three strategies RP, RI, and IP mentioned above to contin-
uous and discrete linear formulas, six different linear strategies can be obtained:
LRP , LRI , LIP , DLRP , DLRI , and DLIP [10, 12, 13]. In this section, we introduce the
typical LAs with these linear strategies.
The algorithms introduced below are based on the assumption that there are
only two behaviors denoted as a set A = {𝛼1 , 𝛼2 }, and these algorithms can also be
2.3 Variable Structure Learning Automata 23

extended to the case of multiple behaviors A = {𝛼1 , 𝛼2 , ..., 𝛼r }. Then the probability
[ ]
space P is two-dimensional, i.e., P(t) = p1 (t), p2 (t) .

1) Continuous Linear Schemes: LRP , LRI , and LIP .


LRP means that an LA directly increases the action probability linearly when
it receives a reward feedback from the environment, and decreases the action
probability when it receives a penalty feedback from the environment. LRI
means that LA increases the action probability linearly when it receives a
reward feedback from the environment, but does not change the probability
of the action when it receives a penalty feedback from the environment. LIP
means that LA decreases the action probability linearly when it receives a
penalty feedback from the environment, but does not change the probability
of the action when it receives a reward feedback from the environment.
The probability vector update formula of LRP , LRI, and LIP is described as fol-
lows where parameters 𝜆1 ∈ (0, 1) and 𝜆2 ∈ (0, 1) are the reward and penalty
parameters, respectively; and at any instant t, p1 (t) + p2 (t) = 1.
p1 (t + 1) = p1 (t) + 𝜆1 ⋅ (1 − p1 (t)) if 𝛼(t) = 𝛼1 and 𝛽t =1
p1 (t + 1) = (1 − 𝜆2 ) ⋅ p1 (t) if 𝛼(t) = 𝛼1 and 𝛽t =0
(2.22)
p1 (t + 1) = (1 − 𝜆1 ) ⋅ p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t =1
p1 (t + 1) = p1 (t) + 𝜆2 ⋅ (1 − p1 (t)) if 𝛼(t) = 𝛼2 and 𝛽t =0

p1 (t + 1) = p1 (t) + 𝜆1 ⋅ (1 − p1 (t)) if 𝛼(t) = 𝛼1 and 𝛽t =1


p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼1 and 𝛽t =0
(2.23)
p1 (t + 1) = (1 − 𝜆1 ) ⋅ p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t =1
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t =0

p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼1 and 𝛽t =1


p1 (t + 1) = (1 − 𝜆2 ) ⋅ p1 (t) if 𝛼(t) = 𝛼1 and 𝛽t =0
(2.24)
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t =1
p1 (t + 1) = p1 (t) + 𝜆2 ⋅ (1 − p1 (t)) if 𝛼(t) = 𝛼2 and 𝛽t = 0.
It is worth noting that the parameters satisfy 𝜆1 ∈ (0, 1) and 𝜆2 ∈ (0, 1). When
𝜆1 ∈ (0, 1) and 𝜆2 = 0, LRP becomes LRI . When 𝜆2 ∈ (0, 1) and 𝜆1 = 0, LRP
becomes LIP .
The above three types of LAs LRP , LRI , and LIP work in continuous probability
space. In order to accelerate their convergence speed, researchers have pro-
posed discrete space learning automata algorithms.
2) Discrete Linear Schemes: DLRP , DLRI, and DLIP .
The basic idea of discretization is to divide the area [0, 1] into 𝛥 parts, and each
part is 1∕𝛥 in length, so the probability value that can be taken becomes a finite
set {0, 1∕𝛥, 2∕𝛥, ..., (𝛥 − 1)∕𝛥, 1}. Its essence is to transform a homogeneous
Markov process into a homogeneous Markov chain.
24 2 Learning Automata

The probability vector update formula of DLRP , DLRI , and DLIP described as
follows where at any instant t, p1 (t) + p2 (t) = 1.
if 0 < p1 (t) < 1,
p1 (t + 1) = p1 (t) + 𝛥1 if 𝛼(t) = 𝛼1 and 𝛽t = 1
( )
p1 (t + 1) = 1 − 𝛥1 if 𝛼(t) = 𝛼1 and 𝛽t = 0
p1 (t + 1) = p (t) + 𝛥1 if 𝛼(t) = 𝛼2 and 𝛽t = 0
(1 )
p1 (t + 1) = 1 − 𝛥1 if 𝛼(t) = 𝛼2 and 𝛽t = 1 (2.25)
if p1 (t) ∈ {0, 1} ,
p1 (t + 1) = p1 (t) if p1 (t) ∈ {0, 1} and 𝛽t = 1
p1 (t + 1) = 𝛥1 if p1 (t) = 0 and 𝛽t = 0
( )
p1 (t + 1) = 1 − 𝛥1 if p1 (t) = 1 and 𝛽t = 0
{ }
1
p1 (t + 1) = min p1 (t) + 𝛥 , 1 if 𝛼(t) = 𝛼1 and 𝛽t = 1
p1 (t + 1) = p1 (t) { } if 𝛼(t) = 𝛼1 and 𝛽t = 0 (2.26)
1
p1 (t + 1) = max p1 (t) − 𝛥
,0 if 𝛼(t) = 𝛼2 and 𝛽t = 1
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t = 0
{ }
p1 (t + 1) = max p1 (t) − 𝛥1 , 0 if 𝛼(t) = 𝛼1 and 𝛽t = 0
p1 (t + 1) = p1 (t){ } if 𝛼(t) = 𝛼1 and 𝛽t = 1 (2.27)
1
p1 (t + 1) = min p1 (t) + 𝛥
,0 if 𝛼(t) = 𝛼2 and 𝛽t = 0
p1 (t + 1) = p1 (t) if 𝛼(t) = 𝛼2 and 𝛽t = 1

2.3.2 Deterministic Estimator Learning Automaton


To improve the convergence speed of LA algorithms, Thathachar and Sastry
[14–16] add an estimator to the original algorithm. The estimator-based algo-
rithm introduces an estimate of the environmental reward probability, and
uses this estimate to update the probability vector. Depending on whether the
estimator is deterministic or random, it can be divided into deterministic and
stochastic estimators. In this subsection we introduce the deterministic estimator
algorithm. Moreover, according to different update methods of the probability
matrix, the deterministic estimator algorithm can be divided into a continuous
deterministic estimator algorithm and a discrete one [17, 18]. The deterministic
estimator algorithm is defined as a quaternion ⟨A, B, Q, T⟩, where
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r } is a set of actions (2 ≤ r < ∞).
2) B = {0, 1}. The environmental response at instant t is denoted by 𝛽(t). “1” and
“0” denote the reward and penalty responses, respectively.
2.3 Variable Structure Learning Automata 25

3) Q = ⟨P, 𝔼⟩ is the state set. P = {p1 (t), p2 (t), ..., pr (t), } is the state of the
automaton at instant t, where pi (t) = Pr{𝛼(t) = 𝛼i }. 𝔼 is the estimator, and the
̃
deterministic estimator is defined as 𝔼 = D(t). ̃ = {d̃ 1 (t), d̃ 2 (t), ... d̃ r (t)} is
D(t)
the estimator vector at instant t. The estimated reward for each behavior is:
ℍ (t)
d̃ i (t) = i ,
𝔾i (t)
where ℍi (t) represents the number of times that the ith action has been
rewarded up to instant t, i ∈ ℤr . 𝔾i (t) is the number of times that the ith action
has been selected up to instant t, i ∈ ℤr .
4) T ∶ Q × B → Q is the state transfer function of the automaton. T determines
how the automaton migrates to the state at t + 1 according to the output, input
and the state at instant t.
In the estimator-free algorithm, the probability vector is updated based on the
selected behavior and feedback from the environment. In the estimator algorithm,
the estimator is updated according to the selected behavior and environment feed-
back information, and finally the information of the estimator is used to update
the probability vector. In order to more intuitively reflect the steps of the estimator
algorithm, the continuous pursuit reward-penalty algorithm is given next.

CPRP : Continuous Pursuit Reward–Penalty Algorithm


𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐚𝐧𝐝 𝐍𝐨𝐭𝐚𝐭𝐢𝐨𝐧
𝜆: learning parameter where 𝜆 ∈ (0, 1).
m: index of the maximal component of D dm (t) = maxi {̃
̃ (t), ̃ di (t)}.
̂ threshold value.
T:
ℍi (t): number of times the ith action has been rewarded up to instant t, i ∈ ℤr .
𝔾i (t): number of times the ith action has been selected up to instant t, i ∈ ℤr .
Input: Number of allowable actions r, learning parameter 𝜆, action set A, envi-
ronmental response set B, convergence threshold T. ̂
Output: Estimated optimal action 𝛼m .
𝐈𝐧𝐢𝐭𝐢𝐚𝐥𝐢𝐳𝐞 pi (0) = 1r , i ∈ ℤr
𝐈𝐧𝐢𝐭𝐢𝐚𝐥𝐢𝐳𝐞 D̃ (0) by picking each action a small number of times.
𝐝𝐨
𝟏. At instant t, pick an action 𝛼(t) according to the probability distribution P(t).
𝟐. Obtain feedback 𝛽(t) from the environment according to behavior 𝛼(t).
𝟑. Update D ̃ (t) by following:
𝔾i (t) = 𝔾i (t) + 1;
ℍi (t) = ℍi (t) + 𝛽(t);
̃ ℍ (t)
di (t) = 𝔾i (t) .
i
26 2 Learning Automata

𝟒. Update P(t) by following:


pm (t + 1) = pm (t) + 𝜆 ⋅ (1 − pm (t));
pj (t + 1) = pj (t) + 𝜆 ⋅ (1 − pj (t)), j ≠ m.
𝐰𝐡𝐢𝐥𝐞(max{pj (t + 1)} ≤ T) ̂
END

2.3.3 Stochastic Estimator Learning Automaton


Stochastic estimator LAs, as the name implies, adopt stochastic estimators.
SERI [19] is a classic stochastic estimator algorithm that has been proven to be
𝜀-optimal. Therefore, in this section, we utilize the SERI algorithm to introduce
stochastic estimator LAs. SERI is defined as a quaternion ⟨A, B, Q, T⟩.
1) A = {𝛼1 , 𝛼2 , ..., 𝛼r } is a set of actions (2 ≤ r < ∞).
2) B = {0, 1}. The environmental response at instant t is denoted by 𝛽(t).
3) Q = ⟨P, 𝔼⟩ is the state set. P = {p1 (t), p2 (t), ..., pr (t)} is the state of the automaton
at instant t, where pi (t) = Pr{𝛼(t) = 𝛼i }. 𝔼 is the estimator, and it is defined as
̃
𝔼 = (D(t), ̃
S(t)). ̃ = {d̃ 1 (t), d̃ 2 (t), ... d̃ r (t)} is the same as a deterministic esti-
D(t)
̃
mator’s. S(t) = {̃s1 (t), s̃2 (t), ... s̃r (t)} is a stochastic estimator vector at instant t,
and it is defined as:
s̃i (t) = d̃ i (t) + Zi (t),
where Zi (t) is a random variable that is uniformly distributed within ( 𝔾−𝛾(t) , 𝔾𝛾(t) ).
i i

The SERI algorithm is given as follows.

SERI : Stochastic Estimator Reward–Inaction Algorithm


𝐏𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬 𝐚𝐧𝐝 𝐍𝐨𝐭𝐚𝐭𝐢𝐨𝐧
m: index of the maximal component of ̃ S(t), ̃sm (t) = maxi {̃si (t)}.
𝛾: perturbation parameter.
̂ threshold value.
T:
𝛥: step size.
ℍi (t): number of times the ith action has been rewarded up to instant t, i ∈ ℤr .
𝔾i (t): number of times the ith action has been selected up to instant t, i ∈ ℤr .
Input: Number of allowable actions r, learning parameter 𝜆, action set A, envi-
ronmental response set B, perturbation parameter 𝛾, convergence threshold T. ̂
Output: Estimated optimal action 𝛼m .
𝐈𝐧𝐢𝐭𝐢𝐚𝐥𝐢𝐳𝐞 pi (0) = 1r , i ∈ ℤr
𝐈𝐧𝐢𝐭𝐢𝐚𝐥𝐢𝐳𝐞 D̃ (0) by picking each action a small number of times.
𝐝𝐨
1: At instant t pick an action 𝛼(t) according to the probability distribution P(t).
2: Get feedback 𝛽(t) from the environment according to behavior 𝛼(t).
2.4 Summary 27

3: Update D ̃ (t) by following:


𝔾i (t) = 𝔾i (t) + 1;
ℍi (t) = ℍi (t) + 𝛽(t);
̃ ℍ (t)
di (t) = 𝔾i (t) ;
i

̃si (t) = ̃
di (t) + Zi (t);
̃sm (t) = maxi {̃si (t)}.
4: Update P(t) by following:
{
pj (t + 1) = max{pj (t) − 𝛥, 0} j ≠ m
if 𝛽(t) = 1, ∑
pm (t + 1) = 1 − j≠m pj (t + 1) .
else pj (t + 1) = pj (t), j ∈ ℤr
𝐰𝐡𝐢𝐥𝐞(max{pj (t + 1)} ≤ T) ̂
END

2.4 Summary
In this chapter, we first introduce an LA and its components, namely automata
and environment, and how it works. Then, mathematical terms for describing its
nature are introduced. According to its development, LAs can be classified. In
terms of whether the state transition and output functions of an automaton are
deterministic or stochastic, LAs can be divided into deterministic and stochastic
automata. Compared with the former, the latter can adapt to environmental con-
ditions to change its states, converge faster, and achieve the purpose of adaptive
learning. These advantages make stochastic automata more widely studied and
used [20–24]. According to whether the state transition and output functions of
an automaton change with time, the stochastic automaton can be divided into
fixed and variable structure LAs. In terms of whether there is an estimator and
whether the estimator is stochastic, LAs with a variable structure can be divided
into estimator-free, deterministic estimator, and stochastic estimator LAs. This
chapter depicts the learning automata with these different characteristics in detail.
Figure 2.11 illustrates the LA classification.

Deterministic automaton

Learning automaton Fixed structure

Stochastic automaton Estimator-free

Variable structure Deterministic estimator

Stochastic estimator

Figure 2.11 The classification of LA.


28 2 Learning Automata

2.5 Exercises
1 What is the mathematical model of learning automata?

2 Please list several examples where learning automata have been applied.

3 What are the components of an automaton?

4 What is the difference among P-model, Q-model, and S-model random


environments.

5 Please express how an automaton interacts with the environment and draw
an interaction diagram.

6 What is the difference between deterministic and stochastic automata? Make


an example for each of the two automata and draw their corresponding
Markov state transition processes.

7 How to measure the performance of an LA?

8 The probability update formula listed in (2.22) ∼ (2.24) are formulas for two
actions. Please extend them to the case of r actions.

9 What is the difference between the fixed structure stochastic LA and the vari-
able structure stochastic LA?

10 Please draw the state transition graphs of Tsetlin LA, Krinsky LA, Krylov LA,
and IJA LA.

11 According to the automaton performance listed in (2.5) ∼ (2.10), please prove


that Krinsky LA is 𝜀-optimal in any deterministic environment.

12 Please list the common combinations of the action probabilities update


principles. What is the difference among them?

13 Please classify variable structure LAs. What are their estimators?

14 What is the main difference between estimator-free and deterministic esti-


mator LAs?
References 29

15 In Section 2.3.2, the basic continuous pursuit rewardâ–penalty algorithm


(CPRP ) is presented. Based on the definitions of RP, RI, and IP mechanisms
and CPRP , please write the pseudo code of discrete reward inaction pursuit
algorithm: DPRI .

References

1 M. L. Tsetlin, “On the behavior of finite automata in random media,”


Automatika i Telemekhanika, vol. 22, pp. 1345–1354, Oct 1961.
2 M. L. Tsetlin, “Automaton theory and the modeling of biological systems,”
New York: Academic, 1973.
3 R. Iraji, M. T. Manzuri-Shalmani, A. H. Jamalian, and H. Beigy, “IJA
automaton: Expediency and ε-optimality properties,” in 5th IEEE International
Conference on Cognitive Informatics, Beijing, China, pp. 617–622, 2006.
4 A. H. Jamalian, R. Iraji, A. R. Sefidpour, and M. T. Manzuri-Shalmani,
“Examining the ε-optimality property of a tunable FSSA,” in 6th IEEE
International Conference on Cognitive Informatics, Lake Tahoe, CA, USA,
pp. 169–177, 2007.
5 K. S. Narendra and M. A. L. Thathachar, “Learning automata: An introduc-
tion,” Englewood Cliffs, NJ: Prentice-Hall, 1989.
6 V. I. Varshavskii and I. P. Vorontsova, “On the behavior of stochastic automata
with variable structure,” Automatika i Telemekhanika (USSR), vol. 24,
pp. 327–333, 1963.
7 G. McMurtry and K. Fu, “A variable structure automaton used as a multi-
modal searching technique,” IEEE Transactions on Automatic Control, vol. 11,
no. 3, pp. 379–387, 1966.
8 K. Fu and Z. Nikolic, “On some reinforcement techniques and their relation
to the stochastic approximation,” IEEE Transactions on Automatic Control, vol.
11, no. 4, pp. 756–758, 1966.
9 K. Fu, “11 stochastic automata as models of learning systems,” Mathematics in
Science and Engineering, vol. 66, pp. 393–431, 1970.
10 S. Lakshmivarahan, “Learning algorithms theory and applications,” New York:
Springer-Verlag, 1981.
11 K. Najim and A. S. Poznyak, “Learning automata: Theory and applications,”
New York: Pergamon, 1994.
12 B. J. Oommen and E. R. Hansen, “The asymptotic optimality of discretized
linear reward-inaction learning automata,” IEEE Transactions on Systems, Man,
and Cybernetics, vol. 14, pp. 542–545, May/June 1984.
30 2 Learning Automata

13 B. J. Oommen, “Absorbing and ergodic discretized two-action learning


automata,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 16,
pp. 282–296, May/June 1986.
14 M. A. L. Thathachar and P. S. Sastry, “A class of rapidly converging algorithms
for learning automata,” in Proceedings of the IEEE International Conference on
Cybernetics and Society, Bombay, India, 1984, pp. 602–606.
15 M. A. L. Thathachar and P. S. Sastry, “A new approach to the design of rein-
forcement schemes for learning automata,” IEEE Transactions on Systems,
Man, and Cybernetics, vol. SMC-15, pp. 168–175, Jan/Feb 1985.
16 M. A. L. Thathachar and P. S. Sastry, “Estimator algorithms for learning
automata,” in Proceedings of the Platinum Jubilee Conference on System and
Signal Processing, vol. Bangalore, India: Department of Electrical Engineering,
December 1986, p. Indian Institute of Science.
17 J. K. Lanctôt and B. J. Oommen, “Discretized estimator learning automata,”
IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, pp. 1473–1483,
Nov/Dec 1992.
18 B. J. Oommen and J. K. Lanctot, “Discretized pursuit learning automata,”
IEEE Transactions on Systems, Man, and Cybernetics, vol. 20, pp. 931–938,
July/Aug 1990.
19 M. S. Georgios I. Papadimitriou and A. S. Pomportsis, “A new class of
ε-optimal learning automata,” IEEE Transactions on Systems, Man, and
Cybernetics, vol. 34, no. 1, pp. 246–254, 2004.
20 J. Zhang, Y. Wang, C. Wang, and M. Zhou, “Fast variable structure stochastic
automaton for discovering and tracking spatiotemporal event patterns,” IEEE
Transactions on Cybernetics, vol. 48, no. 3, pp. 890–903, 2018.
21 Z. Zhang, D. Wang, and J. Gao, “Learning automata-based multiagent rein-
forcement learning for optimization of cooperative tasks,” IEEE Transactions
on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4639–4652, 2021.
22 S. Sahoo, B. Sahoo, and A. K. Turuk, “A learning automata-based schedul-
ing for deadline sensitive task in the cloud,” IEEE Transactions on Services
Computing, vol. 14, no. 6, pp. 1662–1674, 2021.
23 A. Yazidi, I. Hassan, H. L. Hammer, and B. J. Oommen, “Achieving fair load
balancing by invoking a learning automata-based two-time-scale separation
paradigm,” IEEE Transactions on Neural Networks and Learning Systems,
vol. 32, no. 8, pp. 3444–3457, 2021.
24 X. Zhang, L. Jiao, O.-C. Granmo, and M. Goodwin, “On the convergence of
Tsetlin Machines for the IDENTITY-and NOT operators,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 44, no. 10, pp. 6345–6359,
2022.
31

Fast Learning Automata

An update scheme of a state probability vector of actions is critical for a learn-


ing automaton (LA). The most popular one is the pursuit scheme that pursues
the estimated optimal action and penalizes others. The scheme was continuously
improved in the past decades to accelerate the convergence of LAs. The complexity
of the update scheme of a state probability vector is also critical to LAs’ perfor-
mance because it increases with the number of actions. This chapter introduces
two LAs to accelerate the convergence and computational update of LAs. The first
one achieves significantly faster convergence and higher accuracy than the classi-
cal pursuit scheme. The other lowers the computational complexity of updating a
state probability vector to be independent of the number of actions. Together, they
provide a fast learning method for the large-scale online reinforcement learning
tasks of intelligent systems [1–3].

3.1 Last-position Elimination-based Learning


Automata

This section first introduces a reverse philosophy opposed to the traditional


pursuit scheme and leads to Last-position Elimination-based Learning Automata
(LELA) [4] where the action graded last in terms of the estimated performance
is penalized by decreasing its state probability and is eliminated when its state
probability is decreased to zero. All active actions, i.e., actions with non-zero state
probability, equally share the penalized state probability from the last-position
action at each iteration. LELA is characterized by (i) “relaxed” convergence
condition for its optimal action, (ii) “accelerated” step size of its state probability
update scheme for the estimated optimal action, and (iii) “enriched” sampling
for its estimated non-optimal actions. Last-position elimination is a widespread
philosophy in the real world and is proved to be also helpful for the update

Learning Automata and Their Applications to Intelligent Systems, First Edition.


JunQi Zhang and MengChu Zhou.
© 2024 The Institute of Electrical and Electronics Engineers, Inc. Published 2024 by John Wiley & Sons, Inc.
Random documents with unrelated
content Scribd suggests to you:
longis navibus tranquillitates Rhodian boats too caused me a
aucupaturi eramus; ita tamen delay of exactly twenty days. On
properabamus, ut non posset the 1st of October, as I am
magis. embarking from Ephesus, I give
this letter to L. Tarquitius, who is
leaving the harbour at the same
time, but sailing by a faster boat. I
have had to wait for fair weather
owing to the undecked boats and
other war vessels of the Rhodians.
However I am hurrying as fast as
possible.
De raudusculo Puteolano Many thanks for paying the man
gratum. Nunc velim dispicias res of Puteoli[228] his pence. Now
Romanas, videas, quid nobis de please consider politics, and see
triumpho cogitandum putes, ad what you think I should do about
quem amici me vocant. Ego, nisi the triumph, to which my friends
Bibulus, qui, dum unus hostis in invite me. I should have been quite
Syria fuit, pedem porta non plus happy, had not Bibulus been
extulit quam domi[227] domo sua, trying for a triumph, though the
adniteretur de triumpho, aequo man never set his foot outside his
animo essem. Nunc vero αἰσχρὸν house so long as there was one
σιωπᾶν. Sed explora rem totam, enemy in Syria any more than he
ut, quo die congressi erimus, set foot out of his house in town
consilium capere possimus. when he was consul. But as it is
“’twere base to hold one’s
Sat multa, qui et properarem et peace.”[229] But consider the whole
ei litteras darem, qui aut mecum matter, that we may be able to
aut paulo ante venturus esset. decide something on the day we
Cicero tibi plurimam salutem dicit. meet.
Tu dices utriusque nostrum verbis
et Piliae tuae et filiae. That’s enough, considering I am
in a hurry and am giving this letter
to a man who will arrive at the
same time as myself or just before
me. My son pays you his best
respects. Please give the
compliments of both of us to your
wife and daughter.
IX IX

CICERO ATTICO SAL. CICERO TO ATTICUS,


GREETING.
Scr. Athenis Id. Oct. In
a. 704 Piraeea
cum exissem pridie Idus Octobr., Athens, Oct. 15, B.C.
accepi ab Acasto, servo meo, 50
statim tuas litteras. Quas quidem As soon as I landed in port on
cum exspectassem iam diu, the 14th of Oct. I received your
admiratus sum, ut vidi obsignatam letter from my slave Acastus. I
epistulam, brevitatem eius, ut have been looking forward to it so
aperui, rursus σύγχυσιν long that I was surprised at its
litterularum, quia solent tuae brevity, as I looked at the letter
compositissimae et clarissimae before breaking the seal. Again,
esse, ac, ne multa, cognovi ex eo, when I opened it, I was startled at
quod ita scripseras, te Romam the illegibility of the scribble, for
venisse a. d. XII Kal. Oct. cum your hand is generally very fine
febri. Percussus vehementer nec and legible. In short I gathered
magis, quam debui, statim quaero from the style of writing that you
ex Acasto. Ille et tibi et sibi visum had arrived in town, as you stated,
et ita se domi ex tuis on the 19th of Sept., suffering from
227. domi is added by Tyrrell and an attack of fever. Much disturbed,
Purser. as I was bound to be, I questioned
my slave. He said that both he
228. Vestorius.

229. Euripides; Frag. αἰσχρὸν


σιωπᾶν βαρβάρους δ’ ἐᾶν λέγειν.
audisse, ut nihil esset and you thought that it was
incommode. Id videbatur nothing serious and that he had
approbare, quod erat in extremo, gathered as much from your
febriculam tum te habentem people. This view seemed to be
scripsisse. Sed te amavi tamen supported by a remark at the end
admiratusque sum, quod nihilo of your letter that at the time of
minus ad me tua manu writing you had a touch of fever.
scripsisses. Quare de hoc satis. However I was greatly surprised
Spero enim, quae tua prudentia and pleased at your writing to me
et temperantia est, et hercule, ut in your own hand under the
me iubet Acastus, confido te iam, circumstances. So I will say no
ut volumus, valere. more. For I hope considering
your careful and temperate life—
A Turranio te accepisse meas
and to be sure Acastus bids me be
litteras gaudeo. Παραφύλαξον, si
confident—that you are now as
me amas, τὴν τοῦ φυρατοῦ
well as I could wish.
φιλοτιμίαν αὐτότατα. Hanc, quae
mehercule mihi magno dolori est I am glad you got my letter
(dilexi enim hominem), procura, from Turranius. Keep a very strict
quantulacumque est, Precianam eye, as you love me, on the
hereditatem prorsus ille ne untimely designs of that cooker of
attingat. Dices nummos mihi accounts Philotimus. As to this
opus esse ad apparatum legacy from Precius, which is a
triumphi. In quo, ut praecipis, great sorrow to me—for I loved
nec me κενὸν in expetendo him indeed—don’t let the fellow
lay a finger on it, small as it is.
cognosces nec ἄτυφον in
You will say that I want money
abiciendo.
for the outfit of my triumph. You
Intellexi ex tuis litteris te ex shall see that following your
Turranio audisse a me advice I will not show foolish
provinciam fratri traditam. vanity in seeking a triumph, nor
Adeon ego non perspexeram be phlegmatic enough to refuse it.
prudentiam litterarum tuarum?
I gather from your letter that
Ἐπέχειν te scribebas. Quid erat you heard from Turranius I had
dubitatione dignum, si esset given over my province to my
quicquam, cur placeret fratrem et brother. Do you imagine that I
talem fratrem relinqui? Ἀθέτησις overlooked the cautious tone of
ista mihi tua, non ἐποχὴ your letter? You wrote that you
videbatur. Monebas de Q. were doubtful. There could have
Cicerone puero, ut eum quidem been no reason for doubts, if
neutiquam relinquerem. Τοὐμὸν there had been grounds for
ὄνειρον ἐμοί. Eadem omnia, leaving a brother and such a
quasi conlocuti essemus, brother in charge. I took your
vidimus. Non fuit faciendum doubts for dogmatic rejection.
aliter, meque ἐπιχρονία ἐποχὴ You warn me on no account to
tua dubitatione liberavit. Sed leave the young Quintus. Your
puto te accepisse de hac re words repeat my dream. The
epistulam scriptam accuratius. same vision came to us both, as
though we had talked it over.
There was nothing else to be
done, and your long doubt has
relieved me of hesitation. But I
fancy you must have already got a
letter on this topic written in
more detail.
Ego tabellarios postero die ad I mean to send letter-carriers
vos eram missurus; quos puto to you to-morrow, who I fancy
ante ventures quam nostrum will arrive before our friend
Saufeium. Sed eum sine meis Saufeius: but it was hardly proper
litteris ad te venire vix rectum that he should come to you
erat. Tu mihi, ut polliceris, de without a letter from me. Please
Tulliola mea, id est de Dolabella, write me fully, as you promise,
perscribes, de re publica, quam about my little daughter, that is
praevideo in summis periculis, de about her husband Dolabella,
censoribus, maximeque de signis, about the political situation in
tabulis quid fiat, referaturne. which I foresee much trouble,
Idibus Octobribus has dedi about the censors, and above all
litteras, quo die, ut scribis, Caesar about the business of statues and
Placentiam legiones IIII. Quaeso, pictures, and whether the matter
quid nobis futurum est? In arce will come up before the Senate.
[230]
Athenis statio mea nunc placet. The 15th of October is the
date of this letter, a day on which
you say Caesar is going to bring
END OF VOL. I. four legions to Placentia. I
wonder what will be our fate. My
present quarters on the Acropolis
at Athens seem to me the best
place.
230. The censors had fixed a limit
on private expenditure on works of art:
but their edict required the confirmation
of the Senate before it became law.
CHRONOLOGICAL ORDER OF THE
LETTERS.[231]

I. 5 November, 68[232]
6 November or December, 68
7 February, 67
8 February, 67
9 February, 67
10 Before July, 67
11 July or August, 67
3 Late in 67
4 Early in 66
1 July, 65
2 July, 65
12 January 1, 61
13 January 25, 61
14 February 13, 61
15 March 15, 61
16 June, 61
17 December 5, 61
18 January 20, 60
19 March 15, 60
20 May, 60
II. 1 June, 60
2 December, 60
3 December, 60
10 March 29, 59
4 April, 59
5 April, 59
6 April, 59
7 April, 59
8 April, 59
9 April, 59
12 April 18, 59
11 April, 59
13 April 23, 59
14 April, 59
15 April, 59
16 May, 59
17 May, 59
18 June or July, 59
19 July-October, 59
20 July-October, 59
21 July-October, 59
22 July-October, 59
23 July-October, 59
24 Before October 18, 59
25 Before November 1, 59
III. 3 April 5(?),58
2 April 8, 58
5 April 10 (?), 58
4 April 13, 58
1 April, 58
6 April 17, 58
7 April 29, 58
8 May 29, 58
9 June 13, 58
10 June 17, 58
11 June 27, 58
12 July 17, 58
14 July 21, 58
13 August 5, 58
15 August 17, 58
16 August 19, 58
17 September 4, 58
18 September, 58
19 September 15, 58
20 October 4, 58
21 October 28, 58
22 November 25, 58
23 November 29, 58
24 December 10, 58
25 December, 58
26 January, 57
27 January, 57
IV. 1 September, 57
2 October, 57
3 November 23, 57
4 January 28, 56
4a April or May, 56
5 April or May, 56
6 April or May, 56
7 April or May, 56
8 April or May, 56
8a Autumn, 56
10 April 22, 55
9 April 27, 55
11 May, 55
12 May, 55
13 November 14, 55
14 May 10, 54
16 June or July, 54
15 July 27, 54
17 October 1, 54
18 October, 54
19 November, 54
V. 1 May 5 or 6, 51
2 May 10, 51
3 May 11, 51
4 May 12, 51
5 May 15, 51
6 May 19, 51
7 May 20, 51
8 June 2 or 3, 51
9 June 14, 51
10 June 29 or July 1, 51
11 July 6, 51
12 July, 51
13 July 26, 51
14 July 27, 51
15 August 3, 51
16 August, 51
17 August, 51
18 September 20, 51
19 September 20, 51
20 December, 51
21 February 13, 50
VI. 1 February 24, 50
2 May, 50
3 June, 50
4 June, 50
5 June 26, 50
7 July, 50
6 August, 10 (?) 50
8 October 1, 50
9 October 15, 50
231. In many cases the dates and order are only approximate and authorities
differ about them. I have generally accepted the dates given in the Teubner edition.

232. Some date this letter early in 67, and the next towards the end of
January, 67.
INDEX OF NAMES.

[The references are to the pages of Latin text.]

Abdera, 322
Academia, 12, 22, 28, 440, 470
Acastus, 478, 480
Achaia, 204
Achaici, 32
Acidinus, 276
Acilius Glabrio, 418
Actium, 354
Acutiliana controversia, 10;
-num negotium, 14
Acutilius, 14, 20
Aegyptus, 120
Aelia lex, 62, 136
Aelius Tubero (Q), 314
Aemilia tribus, 148
Aemilius Paulus (L), 188, 302, 324, 422, 456
Aemilius Scaurus (M), 308, 310, 316
Aetolia, 388
Afranius (L), consul 60 B.C., 82, 262.
See also Auli filius
Africanus, see Cornelius Scipio Africanus
Agamemnon, 78
Ahala, see Servilius Ahala
Albanum (praedium), 298
Albanus (mons), 10
Alcibiades, 434
Alexander Magnus, 390
Alexander, poet, 184
Alexandrea, 120
Alexandrinus rex, i.e. Ptolemy Auletes, 154
Alexis, 396
Aliphera, 444
Allobroges, 34, 102, 194
Amalthea (Ἀμαλθεία), 32, 64, 110, 132, 172
Amaltheum (Ἀμαλθεῖον), 64
Amanus (mons), 390, 394
Amianus, 428
Andromacha, 308
Andronicus (C.), 374
Anicatus, 170
Anneius (M.), 346
Anniana domus, 276
Annius, 424
Annius Milo Papianus (T.), 276, 278, 290, 302, 316, 354, 360, 474,
cf. 464, 466.
See also Κροτωνιάτης
Annius Saturninus, 336
Antias (praedium), 142
Antiates, 126
Antilibanus, 154
Antiochia, 382, 390, 398
Antiochus Gabinius, 330
Antiphon, 308
Antium, 100, 116, 124, 134, 138, 140, 142, 290, 300
Antius, 324
Antonius (C.), consul 63 B.C., 2, 28, 30, 32, 64, 112, 114
Antonius (M.), triumvir, 472
Apamea, 376, 388, 458, 460
Apamense forum, 404
Apelles, 176, 386
Apenas, 292
Apollinares ludi, 166
Apollonidensis, 370
Apollonius, 288
Appia (via), 142
Appi Forum, 140
Appius, see Claudius Pulcher (Appius)
Appuleia, 300
Appuleius, 364
Apulia, 332
Aquilius Gallus (C.), 2, 300
Aquinum, 338
Arabarches, 160
Araus, 354
Arbuscula, 308
Arcadia, 444
Arcanum (praedium), 338
Archias, 64
Archilochium edictum, 174;
-a -ta, 178
Ἄρειος πάγος, 42
Argiletanum aedificium, 46
Ariminum, 386
Ariobarzanes, 394, 416, 458
Ariopagitae, 52, 364
Ariopagus, 364
Aristarchus, 40
Aristodemus, 132
Aristus, 360
Aristoteles, 298, 314
Aristotelium pigmentum, 100
Armenii reges, 128;
-nius, 388
Arpinas (homo), 58, 290;
-ates aquae, 58
Arpinas (praedium), 18, 156, 158, 336, 338
Arpinum, 134, 140, 148
Arretini, 84
Arrius (C.), 148, 150
Arrius (Q.), 74, 122, 130
Artavasdes, 388, 398
Asia, 46, 64, 72, 156, 200, 208, 222, 236, 318, 372, 380, 402, 404
Asiaticum edictum, 430;
iter, 306
Asinius Dento, 390–2
Astyanax, 308
Athenae, 14, 22, 102, 204, 206, 208, 318, 358, 360, 362, 366, 438,
440, 452, 462, 470, 482
Ἀθηναίων (πολιτεία), 112
Athenio, 142
Atiliana praedia, 336;
-num nomen, 386
Atilius Serranus (Sex.), 268
Attica talenta, 404, 418, 438
Atticula, 468.
See also Caecilia
Atticus, see Pomponius Atticus
Atticus homo, 90
Aufidius (T.), 2
Aufidius Lurco, 60
Auli filius, i.e., Afranius (L.) q.v., 60, 80, 96, 114
Auli lex, see Gabinia lex
Aurelianus, 316
Autroniana domus, 38
Autronius Paetus, 196, 202
Axius (Q.), 28, 308, 398

Baiae, 58
Balbus, see Cornelius Balbus
Batonius, 476
Beneventum, 344
Bibulus, see Calpurnius Bibulus
Bona Dea, 118
Bovillana pugna, 368
Βοῶπις, i.e., Clodia, 146, 182, 186
Britannia, 310, 330
Britannicum bellum, 324
Brundisina colonia, 260;
-ni, 260
Brundisium, 48, 196, 198, 200, 202, 204, 260, 346, 352, 364, 414
Brutus, see Junius Brutus
Bursa, see Munatius Plancus Bursa
Buthrotum, 126, 290, 312, 318, 400

Caecilia, 452, 464;


cf. Atticula
Caecilia lex, 136
Caeciliana fabula, 64
Caecilius (Q.), 4, 6, 28, 170
Caecilius (T.), see Eutychides
Caecilius Bassus (Q.), 134
Caecilius Metellus (M.), praetor 69 B.C., 100
Caecilius Metellus Celer (Q.), consul 60 B.C., 74, 76, 80, 82, 86, 96,
104, 106, 462
Caecilius Metellus Creticus (Q.), 82
Caecilius Metellus Nepos (Q.), 122, 142, 244, 246, 254, 276, 278, 290
Caecilus Metellus Numidicus (Q.), 52
Caecilius Metellus Pius Scipio (P.) (formerly P. Cornelius Scipio
Nasica q.v.), 4, 110, 434.
See also Nasica.
Caecilius Trypho, 206
Caelius Caldus (C.), 452, 464, 468, 470, 472
Caelius Rufus (M.), 436, 438
Caepio, see Servilius Caepio
Caesar, see Julius Caesar
Caesariani terrores, 476
Caesonius (M.), 2
Caieta, 10, 12
Caldus, see Caelius Caldus (C)
Calliope, 116
Calpurnius Bibulus (M.), 74, 146, 148, 154, 166, 170, 172, 174, 178,
188, 190, 346, 378, 382, 390, 398, 428, 430, 468, 478
Calpurnius Piso (C.), consul 67 B.C., 4, 34, 44, 74
Calpurnius Piso (L.), consul 133 B.C., 84
Calpurnius Piso Frugi (C.), 10, 190, 244
Calvinus, see Domitius Calvinus
Calvus “ex Nanneianis ille,” 54
Camillus, see Furius Camillus
Campana lex, 162
Campanus ager, 152, 154, 158
Candavia, 204
Caninius Satyrus (A.), 4, 6
Canusinus hospes, 32
Capena porta, 260
Capitolinus clivus, 108
Capitolium, 260, 434
Cappadocia, 382, 388, 390, 394, 458
Capua, 166
Cassius Longinus (C.), 382, 390, 398, 428
Cassius Longinus (Q.), 396, 39S, 472, 476
Castricianum mendum, 130
Catilina, see Sergius Catilina
Cato, see Porcius Cato
Catulus, see Lutatius Catulus
Celer, see Pilius Celer
Ceos, 366
Cephalus, 314
Ceramicus, 22
Cerialia, 142, 144
Cermalus, 276
Cestius, 370
Chaerippus, 288, 346
Chaeron, 444
Chaonia, 456
Χερρονησιτικά, 466
Chersonesus, 434
Cibyratae, 436;
— pantherae, 402
Cibyraticum forum, 404
Cicero, see Tullius Cicero
Cicero (Cn.), nickname for Pompey, 60
Cicerones, see Tullii Cicerones
Cicilia, 330, 382, 388, 390, 402, 408, 446, 462
Cincia lex, 96
Cincius (L.), 2, 18, 20, 64, 92, 96, 280, 442
Claudius (Ser.), 96
Claudius Marcellus (M.), consul, 51 B.C., 278, 362, 404
Claudius Nero (Tib.), 470
Claudius Pulcher (Appius), 180, 234, 266, 276, 300, 310, 320, 330,
372, 374, 378, 380, 406, 414, 416, 418, 420, 426, 440, 450, 452,
458, 460, 470
Clodia lex, 248
Clodiana fabula, 78;
-na religio, 40;
-nae operae, 42;
num latrocinium, 276;
-num negotium, 176
Clodianus, i.e., Cornelius Lentulus Clodianus, consul, 72 B.C., 84
Clodi (C.), filius, see Claudius Pulcher (Appius)
Clodius Pulcher (P.), 30, 34, 36, 44, 46, 52, 56, 58, 62, 78, 80, 86,
118, 124, 128, 132, 134, 138, 142, 150, 164, 168, 170, 178, 182, 190,
192, 228, 234, 246, 248, 266, 268, 274, 276, 278, 290, 306, 308,
310, 458
Cluvius (M.), 444
Coctia lex, 326
Colossi, 388
Comensis, 362
Compitalia, 116;
-liciae ambulationes, 116
Considius (Q.), 28
Considius Gallus (Q.), 192
Corcyra, 100, 290, 354;
-aea epistula, 452
Corinthium aes, 110
Cornelius (M.), 32
Cornelius (Q.), 28
Cornelius Balbus (L), 116, 142
Cornelius Dolabella (P.), 482
Cornelius Lentulus Clodianus, 84
Cornelius Lentulus, son of Clodianus, 82
Cornelius Lentulus Crus (L.), consul, 49 B.C., 476
Cornelius Lentulus Marcellinus (Cn.), 268, 276
Cornelius Lentulus Niger (L.), flamen, 284, 288
Cornelius Lentulus (L.), son of Niger, 188, 326
Cornelius Lentulus Spinther (P.), 244, 246, 254, 344, 400, 408, 414,
438
Cornelius Lentulus Sura (P.), 58
Cornelius Scipio Africanus Aemilianus (P.), 314, 424, 432
Cornelius Scipio Nasica (P.), (afterwards Q. Caecilius Metellus Pius
Scipio q.v.), 4, 110, 434
Cornelius Sulla (L.), dictator, 84
Cornelius Sulla (P.), 276, 328
Cornelius Sulla Faustus (L.), son of the dictator, 296
Cornicinus, see Oppius Cornicinus
Cornificius (Q.), tribune, 69 B.C., 2, 34
Cornutus (C.), 46
Cosconius (C.), 168
Cossinius (L.), 92, 96, 100
Crassipes, see Furius Crassipes
Crassus, see Licinius Crassus
Crater, 134
Culleo, see Terentius Culleo
Culleolus, 460
Cumanum (praedium), 296, 298, 340, 380
Curiana πτῶσις, 158
Curio, see Scribonius Curio
Curiones, see Scribonii Curiones
Curius (M’.), 308
Curius (Q.), 4
Curtius Postumus (M.), 124
Cybistra, 382, 390, 414
Cyprii, 402, 404, 458;
— legati, 420
Cyprus, 402, 406, 452, 458
Cyrea, 298
Cyrrhestica, 382, 398
Cyrus, 114
Cyzicus, 200, 220, 230, 232

Darius, 390
Decimius (C.), 318
Decimus, 274
Deiotarus, 378, 382, 384, 396, 398, 412, 418, 430, 438
Delos, 366
Δημήτηρ, 292
Demetrius, 298
Demetrius Magnes, 300
Democritus, 428
Demosthenes, 102
Dicaearchus, 112, 144, 154, 444
Didia lex, 136
Diodotus, 174
Dionysius, 292, 300, 302, 304, 312, 332, 334, 344, 356, 428, 444.
See also Pomponius (M.)
Diphilus, 166
Dodonaea quercus, 120
Dolabella, see Cornelius Dolabella
Domitius Ahenobarbus (L.), 6, 62, 190, 228
Domitius Calvinus (Cn.), 292, 308, 316, 320
Drusus, see Livius Drusus
Drusus, of Pisaurum, 130
Duris, of Samos, 434
Duronius (C.), 354
Dyrrachini, 244
Dyrrachium, 66, 204, 244, 260

Egnatius (L.), 300


Egnatius Sidicinus, 438
Eleusis, 440, 470
Eleutherocilices, 392
Ennius (M.), 450
Ephesius praetor, 368
Ephesus, 204, 206, 332, 368, 370, 372, 388, 420, 452, 476
Ephorus, 426
Epicharmus, 88
Epiroticus, 32;
-ca emptio 16;
-cae litterae, 396
Epirus, 32, 120, 150, 196, 200, 202, 204, 218, 222, 232, 236, 238,
240, 242, 244, 252, 306, 318, 342, 352, 380, 384, 386, 398, 424
Equus Tuticus, 414
Eratosthenes, 124, 434
Ἡρώδης, 112
Εὐμολπίδαι, 22
Euphrates, 380, 398
Eupolis (Εὔπολις), 434
Eutychides, 304, 318, 354

Fabius, 106
Fabius Luscus, 294
Fadius Gallus (T.), 250
Fannius (C.), son-in law of Laelius, 314
Fannius (C.), tribune 59 B.C., 190
Fausta, 354
Faustus, see Cornelius Sulla Faustus
Favonius (M.), 44, 110, 264, 324
Figulus, see Marcius Figulus and Nigidius Figulus
Firmani fratres 294;
-nus 294
Flaccus, see Fulvius Flaccus, Laenius Flaccus and Valerius Flaccus
Flaminia via, 4;
-nius circus 40
Flavius (Cn.), 424, 434
Flavius (L.), 80, 84
Fonteius (Fontius) (M.), 18, 308
Formiae, 146
Formiani, 146–8
Formianum (estate), 12, 120, 134, 138, 140, 144, 146, 148, 150, 270,
272
Fufia lex, 62, 316
Fufius Calenus (Q.), 38, 44, 46, 48, 50, 160
Fulvius Flaccus (Q.), 276, 278
Fulvius Nobilior (M.), 330
Funisulanus, 344
Furius, 170, 172
Furius Camillus (C.) (Κάμιλλος), 354, 434, 466
Furius Crassipes, 284, 300
Furius Philus, 314
Furnius, 340, 384, 426

Gabinia lex, 408, 410, 450


Gabinius, 122, 188, 328, 330, 332
Galatae, 468
Galba, see Sulpicius Galba
Gallia, 4, 82, 96, 110, 316
Gallicum bellum, 82
Gallus, see Aquilius Gallus
Gavius (L.) of Firmum, 294, 418, 458
Gavius Caepio (T.), 392
Gellius Poplicola, 274
Glabrio, see Acilius Glabrio
Gnaeus, i.e. Pompeius Magnus (Cn.), q.v. 142, 152, 418
Graecia, 102, 362, 464
Graecus, 96, 288, 362;
-a natio, 100;
-ae litterae, 306;
-e, 90, 96, 100, 474;
-i, 156, 282, 358, 368, 410, 430, 432, 434, 444, 446;
-i cives, 404;
-i libri, 98, 110;
-um (commentarium), 90;
-um poema, 64
Granius (Q.), 460
Γυραί, 366
Gyarus, 366

Haedui, 82
Halimetus, 300
Helonius, 366
Helvetii, 82
Hercules, 434
Herennius (C.), 78, 86
Herennius (Sex.), 78
Hermathena, 8, 12
Hermeraclae, 24
Hermes, 12;
-ae, 20, 22
Hermo, 400
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like