Rapid Learning in Robotics PDF
Rapid Learning in Robotics PDF
J rg Walter
Walter
Die Deutsche Bibliothek — CIP Data
Walter, Jörg
Rapid Learning in Robotics / by Jörg Walter, 1st ed.
Göttingen: Cuvillier, 1996
Zugl.: Bielefeld, Univ., Diss. 1996
ISBN 3-89588-728-5
Copyright:
Robotics deals with the control of actuators using various types of sensors
and control schemes. The availability of precise sensorimotor mappings
– able to transform between various involved motor, joint, sensor, and
physical spaces – is a crucial issue. These mappings are often highly non-
linear and sometimes hard to derive analytically. Consequently, there is a
strong need for rapid learning algorithms which take into account that the
acquisition of training data is often a costly operation.
The present book discusses many of the issues that are important to make
learning approaches in robotics more feasible. Basis for the major part of
the discussion is a new learning algorithm, the Parameterized Self-Organizing
Maps, that is derived from a model of neural self-organization. A key
feature of the new method is the rapid construction of even highly non-
linear variable relations from rather modestly-sized training data sets by
exploiting topology information that is not utilized in more traditional ap-
proaches. In addition, the author shows how this approach can be used in
a modular fashion, leading to a learning architecture for the acquisition of
basic skills during an “investment learning” phase, and, subsequently, for
their rapid combination to adapt to new situational contexts.
ii
Foreword
The rapid and apparently effortless adaptation of their movements to a
broad spectrum of conditions distinguishes both humans and animals in
an important way even from nowadays most sophisticated robots. Algo-
rithms for rapid learning will, therefore, become an important prerequisite
for future robots to achieve a more intelligent coordination of their move-
ments that is closer to the impressive level of biological performance.
The present book discusses many of the issues that are important to
make learning approaches in robotics more feasible. A new learning al-
gorithm, the Parameterized Self-Organizing Maps, is derived from a model
of neural self-organization. It has a number of benefits that make it par-
ticularly suited for applications in the field of robotics. A key feature of
the new method is the rapid construction of even highly non-linear vari-
able relations from rather modestly-sized training data sets by exploiting
topology information that is unused in the more traditional approaches.
In addition, the author shows how this approach can be used in a mod-
ular fashion, leading to a learning architecture for the acquisition of basic
skills during an “investment learning” phase, and, subsequently, for their
rapid combination to adapt to new situational contexts.
The author demonstrates the potential of these approaches with an im-
pressive number of carefully chosen and thoroughly discussed examples,
covering such central issues as learning of various kinematic transforms,
dealing with constraints, object pose estimation, sensor fusion and camera
calibration. It is a distinctive feature of the treatment that most of these
examples are discussed and investigated in the context of their actual im-
plementations on real robot hardware. This, together with the wide range
of included topics, makes the book a valuable source for both the special-
ist, but also the non-specialist reader with a more general interest in the
fields of neural networks, machine learning and robotics.
Helge Ritter
Bielefeld
iii
Acknowledgment
The presented work was carried out in the connectionist research group
headed by Prof. Dr. Helge Ritter at the University of Bielefeld, Germany.
First of all, I'd like to thank Helge: for introducing me to the exciting
field of learning in robotics, for his confidence when he asked me to build
up the robotics lab, for many discussions which have given me impulses,
and for his unlimited optimism which helped me to tackle a variety of
research problems. His encouragement, advice, cooperation, and support
have been very helpful to overcome small and larger hurdles.
In this context I want to mention and thank as well Prof. Dr. Gerhard
Sagerer, Bielefeld, and Prof. Dr. Sommer, Kiel, for accompanying me with
their advises during this time.
Thanks to Helge and Gerhard for refereeing this work.
Helge Ritter, Kostas Daniilidis, Ján Jokusch, Guido Menkhaus, Christof
Dücker, Dirk Schwammkrug, and Martina Hasenjäger read all or parts of
the manuscript and gave me valuable feedback. Many other colleagues
and students have contributed to this work making it an exciting and suc-
cessful time. They include Jörn Clausen, Andrea Drees, Gunther Heide-
mannn, Hartmut Holzgraefe, Ján Jockusch, Stefan Jockusch, Nils Jung-
claus, Peter Koch, Rudi Kaatz, Michael Krause, Enno Littmann, Rainer
Orth, Marc Pomplun, Robert Rae, Stefan Rankers, Dirk Selle, Jochen Steil,
Petra Udelhoven, Thomas Wengereck, and Patrick Ziemeck. Thanks to all
of them.
Last not least I owe many thanks to my Ingrid for her encouragement
and support throughout the time of this work.
iv
Contents
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Introduction 1
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.3 Comparison to Splines . . . . . . . . . . . . . . . . . . 82
6.4 Chebyshev Spaced PSOMs . . . . . . . . . . . . . . . . . . . . 83
6.5 Comparison Examples: The Gaussian Bell . . . . . . . . . . . 84
6.5.1 Various PSOM Architectures . . . . . . . . . . . . . . 85
6.5.2 LLM Based Networks . . . . . . . . . . . . . . . . . . 87
6.6 RLC-Circuit Example . . . . . . . . . . . . . . . . . . . . . . . 88
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10 Summary 139
Bibliography 146
viii CONTENTS
List of Figures
(10/207) Illustrations contributed by Dirk Selle [2.5], Ján Jockusch [2.8,
2.9], and Bernd Fritzke [6.8].
xii LIST OF FIGURES
Chapter 1
Introduction
was rung always before the dog has been fed, the response salivation be-
came associated to the new stimulus, the acoustic signal. This fundamental
form of associative learning has become known under the name classical
conditioning. In the beginning of this century it was debated whether the
conditioning reflex in Pavlov's dogs was a stimulus–response (S-R) or a
stimulus–stimulus (S-S) association between the perceptual stimuli, here
taste and sound. Later it became apparent that at the level of the nervous
system this distinction fades away, since both cases refer to associations
between neural representations.
The fine structure of the nervous system could be investigated after
staining techniques for brain tissue had become established (Golgi and
Ramón y Cajal). They revealed that neurons are highly interconnected to
other neurons by their tree-like extremities, the dendrites and axons (com-
parable to input and output structures). D.O. Hebb (1949) postulated that
the synaptic junction from neuron A to neuron B was strengthened each
time A was activated simultaneously, or shortly before B . Hebb's rule
explained the conditional learning on a qualitative level and influenced
many other, mathematically formulated learning models since. The most
prominent ones are probably the perceptron, the Hopfield model and the Ko-
honen map. They are, among other neural network approaches, character-
ized in chapter 3. It discusses learning from the standpoint of an approx-
imation problem. How to find an efficient mapping which solves the de-
sired learning task? Chapter 3 explains Kohonen's “Self-Organizing Map”
procedure and techniques to improve the learning of continuous, high-
dimensional output mappings.
The appearance and the growing availability of computers became a
further major influence on the understanding of learning aspects. Several
main reasons can be identified:
First, the computer allowed to isolate the mechanisms of learning from
the wet, biological substrate. This enabled the testing and developing of
learning algorithms in simulation.
Second, the computer helped to carry out and evaluate neuro-physiological,
psychophysical, and cognitive experiments, which revealed many more
details about information processing in the biological world.
Third, the computer facilitated bringing the principles of learning to
technical applications. This contributed to attract even more interest and
opened important resources. Resources which set up a broad interdisci-
3
istic. The solution, as seen by many researchers is, that “learning must
meet the real world”. Of course, simulation can be a helpful technique,
but needs realistic counter-checks in real-world experiments. Here, the
field of robotics plays an important role.
The word “robot” is young. It was coined 1935 by the playwriter Karl
Capek and has its roots in the Czech word for “forced labor”. The first
modern industrial robots are even younger: the “Unimates” were devel-
oped by Joe Engelberger in the early 60's. What is a robot? A robot is
a mechanism, which is able to move in a given environment. The main
difference to an ordinary machine is, that a robot is more versatile and
multi-functional, and it can be programmed, or commanded to perform
functions normally ascribed to humans. Its mechanical structure is driven
by actuators which are governed by some controller according to an in-
tended task. Sensors deliver the required feed-back in order to adjust the
current trajectory to the commanded motion and task.
Robot tasks can be specified in various ways: e.g. with respect to a
certain reference coordinate system, or in terms of desired proximities,
or forces, etc. However, the robot is governed by its own actuator vari-
ables. This makes the availability of precise mappings from different sen-
sory variables, physical, motor, and actuator values a crucial issue. Often
these sensorimotor mappings are highly non-linear and sometimes very hard
to derive analytically. Furthermore, they may change in time, i.e. drift by
wear-and-tear or due to unintended collisions. The effective learning and
adaption of the sensorimotor mappings are of particular importance when
a precise model is lacking or it is difficult or costly to recalibrate the robot,
e.g. since it may be remotely deployed.
Chapter 2 describes work done for establishing a hardware infrastruc-
ture and experimental platform that is suitable for carrying out experi-
ments needed to develop and test robot learning algorithms. Such a labo-
ratory comprises many different components required for advanced, sensor-
based robotics. Our main actuated mechanical structures are an industrial
manipulator, and a hydraulically driven robot hand. The perception side
has been enlarged by various sensory equipment. In addition, a variety of
hardware and software structures are required for command and control
purposes, in order to make a robot system useful.
The time for gathering training data becomes a major issue. This
includes also the time for preparing the learning set-up. In princi-
ple, the learning solution competes with the conventional solution
developed by a human analyzing the system.
The faced complexity draws attention also towards the efficient struc-
turing of re-usable building blocks in general, and in particular for
learning.
the cost of gathering the training data is very relevant as well as the avail-
ability of adaptable, high-dimensional sensorimotor transformations.
Chapter 7 and 8 present several PSOM examples in the vision and the
robotics domain. The flexible association mechanism facilitates applica-
tions: feature completion; dynamical sensor fusion, improving noise re-
jection; generating perceptual hypotheses for other sensor systems; vari-
ous robot kinematic transformation can be directly augmented to combine
e.g. visual coordinate spaces. This even works with redundant degrees of
freedom, which can additionally comply to extra constraints.
Chapter 9 turns to the next higher level of one-shot learning. Here the
learning of prototypical mappings is used to rapidly adapt a learning sys-
tem to new context situations. This leads to a hierarchical architecture,
which is conceptually linked, but not restricted to the PSOM approach.
One learning module learns the context-dependent skill and encodes
the obtained expertise in a (more-or-less large) set of parameters or weights.
A second meta-mapping module learns the association between the rec-
ognized context stimuli and the corresponding mapping expertise. The
learning of a set of prototypical mappings may be called an investment
learning stage, since effort is invested, to train the system for the second,
the one-shot learning phase. Observing the context, the system can now
adapt most rapidly by “mixing” the expertise previously obtained. This
mixture-of-expertise architecture complements the mixture-of-experts archi-
tecture (as coined by Jordan) and appears advantageous in cases where
the variation of the underlying model are continuous within the chosen
mapping domain.
Chapter 10 summarizes the main points.
Of course the full complexity of learning and the complexity of real robots
is still unsolved today. The present work attempts to make a contribution
to a few of the many things that still can be and must be improved.
8 Introduction
Chapter 2
This chapter describes the developed concept and set-up of our robotic
laboratory. It is aimed at the technically interested reader and explains
some of the hardware aspects of this work.
A real robot lab is a testbed for ideas and concepts of efficient and intel-
ligent controlling, operating, and learning. It is an important source of in-
spiration, complication, practical experience, feedback, and cross-validation
of simulations. The construction and working of system components is de-
scribed as well as ideas, difficulties and solutions which accompanied the
development.
For a fuller account see (Walter and Ritter 1996c).
Two major classes of robots can be distinguished: robot manipulators
are operating in a bounded three-dimensional workspace, having a fixed
base, whereas robot vehicles move on a two-dimensional surface – either
by wheels (mobile robots) or by articulated legs intended for walking on
rough terrains. Of course, they can be mixed, such as manipulators mounted
on a wheeled vehicle, or e.g. by combining several finger-like manipula-
tors to a dextrous robot hand.
Figure 2.1: The six axes Puma robot arm with the TUM multi-fingered hand
fixating a wooden “Baufix” toy airplane. The 6 D force-torque sensor (FTS) and
the end-effector mounted camera is visible, in contrast to built-in proprioceptive
joint encoders.
2.1 Actuation: The Puma Robot 11
LAN Ethernet
"druide" "argus"
Host Host Active Pipeline
(SUN Sparc 2) 3D Space- 3D Space- (SUN Sparc 20) Image
Mouse Mouse Camera Processing
System (Datacube)
"manus" S-bus / VME
Controller
( 68040) VME-Bus
S-bus / VME
VME-Bus
~
~ Timer
Parallel Port
DLR
BusMaster BRAD A/D D/A Digital DSP
DA DSP
conv conv ports Image
PUMA conv image
Processing
~~
processing
Robot
LSI 11
Controller ~
~ ~
~ Light
(Androx)
(Androx)
6503 6 DOF
Force/ Life-Bit ~
Fingertip Presssure
Torque /Position
motor Motor
motor Laser
Motor Tactile motor motor
Wrist Sensors
motor
driver Driver
motor
driver
Drivers + Sensors driver driver
Sensor driver driver Light
Sensor Light
Interfaces
Figure 2.2: The Asymmetric Multiprocessing “Road Map”. The main hardware
“roads” connect the heterogeneous system components and lay ground for var-
ious types of communication links. The LAN Ethernet (“Local Area Network”
with TCP/IP and max. throughput 10 Mbit/s) connects the pool of Unix com-
puter workstations with the primary “robotics host” “druide” and the “active vi-
sion host” “argus” . Each of the two Unix SparcStation is bus master to a VME-bus
(max 20 MByte/s, with 4 MByte/s S-bus link). “argus” controls the active stereo
vision platform and the image processing system (Datacube, with pipeline ar-
chitecture). “druide” is the primary host, which controls the robot manipulator,
the robot hand, the sensory systems including the force/torque wrist sensor, the
tactile sensors, and the second image processing system. The hand sub-system
electronics is coordinated by the “manus” controller, which is a second VME bus
master and also accessible via the Ethernet link. (Boxes with rounded corners
indicate semi-autonomous sub-systems with CPUs enclosed.)
12 The Robotics Laboratory
carry the required payload of about 3 kg and which can be turned into an
open, real-time robot, was found with a Puma 560 Mark II robot. It is prob-
ably “the” classical industrial robots with six revolute joints. Its geome-
try and kinematics1 is subject of standard robotics textbooks (Paul 1981;
Fu, Gonzalez, and Lee 1987). It can be characterized as a medium fast
(0.5 m/s straight line), very reliable, robust “work horse” for medium pay
loads. The action radius is comparable to the human arm, but the arm is
stronger and heavier (radius 0.9 m; 63 kg arm weight). The Puma Mark II
controller comprises the power supply and the servo electronics for the
six DC motors. They are controlled by six parallel microprocessors and
coordinated by a DEC LSI-11 as central controller. Each joint micropro-
cessor (Rockwell 6503) implements a digital PD controller, correcting the
commanded joint position periodically. The decoupled joint position control
operates with 1 kHz and originally receives command updates (setpoints)
every 28 ms by the LSI-11.
In the standard application the Puma is programmed in the interpreted
language VAL II, which is considered a flexible programming language by
industrial standards. But running on the main controller (LSI-11 proces-
sor), it is not capable of handling high bandwidth sensory input itself (e.g.,
from a video camera) and furthermore, it does not support flexible control
by an auxiliary computer. To achieve a tight real-time control directly by
a Unix workstation, we installed the software package RCI/RCCL (Hay-
ward and Paul 1986; Lloyd 1988; Lloyd and Parker 1990; Lloyd and Hay-
ward 1992).
The acronym RCI/RCCL stands for Real-time Control Interface and Robot
Control C Library. The package provides besides the reprogramming of the
robot controller a library of commands for issuing high-level motion com-
mands in the C programming language. Furthermore, we patched the Sun
operating system OS 4.1 to sufficient real-time capabilities for serving a re-
liable control process up to about 200 Hz. Unix is a multitasking operating
system, sequencing several processes in short time slices. Initially, Unix
was not designed for real-time control, therefore it provides a regular pro-
cess only with timing control on a coarse time scale. But real-time process-
ing requires, that the system reliably responds within a certain time frame.
RCI succeeded here by anchoring the synchronous trajectory control task
1
Designed by Joe Engelberger, the founder of Unimation, sometimes called the father
of robotics. Unimation was later sold to Westinghouse Inc., AEG and last to Stäubli.
2.1 Actuation: The Puma Robot 13
θmeas
Xdes X Coordinate
θdes Position
Robot
1-S +
transform - Controller +
Environment
Fdes Force
Control S
Law
Coordinate
Ftrans transform Fmeas
+ Gravity
Compens.
Figure 2.3: A two-loop control scheme for the mixed force and position control.
The inner, fast loop runs on the joint micro controller within the Puma controller,
the outer loop involves the control task on “druide”.
Fig. 2.3 sketches the two-loop control scheme implemented for the mixed
force and position control of the Puma. The inner, fast loop runs on the
joint micro controller within the Puma controller, the outer loop involves
the control task on the Sun workstation “druide”. The desired position
X F
des and forces des are given for a specified coordinate system (here writ-
ten as generalized 6 D vectors: position and orientation in roll, pitch, yaw
X
(see also Fig. 7.2 and Paul 1981) des = (px py pz ) and generalized
F
force des = (fx fy fz mx my mz )). The control law transforms the force
deviation into a desired position. The diagonal selection matrix elements
in S choose force controls (if 1) or position control (if 0) for each axis, fol-
lowing the idea of Cartesian sub-space control2 . The desired position is
transformed and signaled to the joint controllers, which determine appro-
priate motor power commands. The results of the robot - environment in-
F
teraction meas is monitored by the force-torque sensor measurement and
F
transformed to the net acting force trans after the gravity force compu-
tation. The guard block checks on specified sensory patterns, e.g., force-
torque ranges for each axes and whether the robot is within a safe-marked
work space volume. Depending on the desired action, a suitable controller
scheme and sets of parameters must be chosen, for example, S, gains, stiff-
ness, safe force/position patterns). Here the efficient handling and access
of parameter sets, suitable for run-time adaptation is an important issue.
2
Examples for suitable selection matrices are: S=diag(0,0,1,0,0,0) for a compliant mo-
tion with a desired force in z direction, or b S=diag(0,0,1,1,1,0) for aligning two flat sur-
faces (with surface normal in z ). A free translation and z -rotational follow controller in
Cartesian space can be realized with S=diag(1,1,1,0,0,1). See (Mason and Salisbury 1985;
Schutter 1986; Dücker 1995).
2.1 Actuation: The Puma Robot 15
Figure 2.4: The endeffector. (left:) Between the arm and the hydraulic hand, the
cylinder shaped FTS device can measure current 6 D force torque values. The
three finger modules are mounted here symmetrically at the 12 sided regular
prism base. On the left side, the color video camera looks at the scene from an
end-effector fixed position. Inside the flat palm, a diode laser is directed in tool
axis, which allows depth triangulation in the viewing angle of the camera.
16 The Robotics Laboratory
For the purpose of studying dextrous manipulation tasks, our robot lab is
equipped with an hydraulic robot hand with (up to) four identical 3-DOF
fingers modules, see Fig. 2.4. The hand prototype was developed and built
by the mechanical engineering group of Prof. Pfeiffer at the Technical Uni-
versity of Munich (“TUM-hand”). We received the final hand prototype
comprising four completely actuated fingers, the sensor interface, and mo-
tor driver electronics. The robot finger's design and its mobility resembles
that of the human index finger, but scaled up to about 110 %.
Fig. 2.5 displays the kinematics of one finger. The particular kinematic
mapping (from piston location to joint angles and Cartesian position) of
the cardanic joint configuration is very hard to invert analytically. Selle
(1995) describes an iterative numerical procedure. This sensorimotor map-
ping is a challenging task for a learning algorithm. In section 8.1 we will
take up this problem and present solutions which achieve good accuracy
with a fairly small number of training examples.
2.2 Actuation: The Hand “Manus” 17
Xm p Xf k
Motor
Oil Hose F pistonExt.
Oil System
X
f
Ff, des e PD
τ DC Motor Finger
Cylinder
- K -1 +
Controller
and
+ F
Oil Cylinder ext
Environment
Xf, des
- Xm p F
friction
Figure 2.7: A control scheme for the mixed force and position control running on
the embedded VME-CPU “manus”. The original robot hand design allows only
indirect estimation of the finger state utilizing a model of the oil system. Certain
kinds of influences, especially friction effects require extra information sources to
be satisfyingly accounted for – as for example tactile sensors, see Sec. 2.3.
a)
Contact Sensor
Force and Center
c)
Figure 2.8: The sandwich structure of the multi-layer tactile sensor. The FSR
sensor measures normal force and contact center location. The PVDF film sensor
is covered by a thin rubber with a knob structure. The two sensitive layers are
separated by a soft foam layer transforming knob deflection into local stretching
of the PVDF film. By suitable signal conditioning, slippage induced oscillations
can be detected by characteristic spike trains. (c–d:) Intermediate steps in making
the compound sensor.
Fig. 2.8cd shows the prototype. Since the kinematics of the finger in-
volves a moving contact spot during object manipulation, an important
requirement is the continuous force sensitivity during the rolling motion
20 The Robotics Laboratory
Dynamic Sensor
analog signal
Analog Signal
Preprocessing
pulse output
Output
force sensor
Force readout
Readout FSR
grip slide release
Time [s] 0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 2.9: Recordings from the raw and pre-processed signal of the dynamic
slippage sensor. A flat wooden object is pressed against the sensor, and after
a short rest tangentially drawn away. By band-pass filtering the slip signal of
interest can be extracted: The middle trace clearly shows the sudden contact and
the slippage phase. The lower trace shows the force values obtained from the
second sensor.
Fig. 2.9 shows first recordings from the sensor prototype. The raw sig-
nal of the PVDF sensors (upper trace) is bandpass filtered and thresholded.
The obtained spike train (middle trace) indicates the critical, characteristic
signal shapes. The first contact with a flat wood piece induces a short sig-
nal. Together with the simultaneously recorded force information (lower
trace) the interesting phases can be discriminated.
2.4 Remote Sensing: Vision 21
These initial results from the new tactile sensor system are very promis-
ing. We expect to (i) fill the present gap in proprioceptive sensory infor-
mation on the oil cylinder friction state and therefore better finger fine
control; (ii) get fast contact state information for task-oriented low-level
grasp reflexes; (iii) obtain reliable contact state information for signaling
higher-level semi-autonomous robot motion controllers.
This chapter discusses several issues that are pertinent for the PSOM algo-
rithm (which is described more fully in Chap. 4). Much of its motivation
derives from the field of neural networks. After a brief historic overview
of this rapidly expanding field we attempt to order some of the prominent
network types in a taxonomy of important characteristics. We then pro-
ceed to discuss learning from the perspective of an approximation prob-
lem and identify several problems that are crucial for rapid learning. Fi-
nally we focus on the so-called “Self-Organizing Maps”, which emphasize
the use of topology information for learning. Their discussion paves the
way for Chap. 4 in which the PSOM algorithm will be presented.
x1 x1
wi1 y1
Σ
x2 wi2 x2
yi
y2
x3 wi3 x3
wi
1 1
input hidden output
a) b) layer layer layer
Figure 3.1: (a) The McCulloch-Pitts neuron “fires” (output yi =1 else 0) if the
P
weighted sum j wij xj of its inputs xj reaches or exceeds a threshold wi . If this
P
binary threshold function is generalized to a non-linear sigmoidal transfer func-
;
tion g ( j wij xj wi ) (also called activation, or squashing function, e.g. g ( )=tanh( )),
the neuron becomes a suitable processing element of the standard (b) Multi-Layer
Perceptron (MLP). The input values xi are made available at the “input layer”.
The output of each neural unit is feed forward as input to all neurons of the next
layer. In contrast to the standard or single-layer perceptron, the MLP has typi-
cally one or several, so-called hidden layers of neurons between the input and the
output layer.
3.1 A Brief History and Overview of Neural Networks 25
Fixed versus adaptable network structures As pointed out before, the suit-
able network (model) structure has significant influence on the effi-
ciency and performance of the learning system. Several methods
have been proposed for tackling the combined problem of adapt-
ing the network weights and dynamically deciding on the structural
adaptation (e.g. growth) of the network (additive models). Strategies
on selecting the network size will be later discussed in Sec. 3.6.
For a more complete overview of the field of neural networks we refer
the reader to the literature, e.g. (Anderson and E. Rosenfeld 1988; Hertz,
Krogh, and Palmer 1991; Ritter, Martinetz, and Schulten 1992; Arbib 1995).
Table 3.1: Creating and refining a model in order to solve a learning task
has various common names in different disciplines.
(iii) choosing the algorithm to find optimal values for the parameters W ;
The proceeding chapter 4 will present the PSOM approach with respect
to (ii)–(iv). Numerous examples for (i) are presented in the later chapters.
The following section discusses several common methods for (ii)
F (w x) = w x (3.2)
32 Artificial Neural Networks
x
the dot product with input , usually augmented by a constant one.
This linear regression scheme corresponds to a linear, single layer net-
work, compare Fig. 3.1.
RBF
RBF
norm.
σ1 < σ2 < σ3
Figure 3.2: Two RBF units constitute the approximation model of a function.
The upper row displays the plain RBF approach versus the results of the normal-
ization step in the lower row. From left to right three basis radii 13 illustrate the
smoothing impact of an increasing to distance ratio.
Normalized Radial Basis Function (RBF) networks take the form of a weighted
w
sum over reference points i located in the input space at i : w
P (jx ; u j) w
F (w x) = Pi (jx ; iu j) i (3.6)
i i
The radial basis functions : IR+ ! IR+ usually decays to zero
with growing argument and is often represented by the Gaussian
bell function (r) = e (r= 2) , characterized by the width (there-
p
;
2
ad-hoc to the half mean distance of the base centers. The output val-
ues wi are learned supervised. The RBF net can be combined with
a local linear mapping instead of a constant wi (Stokbro, Umberger,
and Hertz 1990), as described below. RBF network algorithms which
generalize the constant sized radii (sphere) to individually adapt-
able tensors (ellipses) are called “Generalized Radial Basis Function
networks” (GRBF), or “Hyper-RBF” (see Powell 1987; Poggio and
Girosi 1990).
x
1
4
3
Figure 3.3: Distance versus topological distance. Four RBF unit center points i u
x
(denoted 1: : :4) around a test point (the circles indicate the width ). Account-
ing only for the distance jx ; u j u
i , the RBF output (Eq. 3.6) weights 1 stronger
u
than 4 . Considering the triangles spanned by the points 123 versus 234 reveals
x
that is far outside the triangle 123, but in the middle of the triangle 234. There-
x
fore, can be considered closer to point 4 than to point 1 — with respect to their
topological relation.
Topological Models and Maps are schemes, which build dimension re-
ducing mappings from a higher dimensions input space to a low-
dimensional set. A very successful model is the so-called “feature
map” or “Self-Organizing Map” (SOM) introduced by Kohonen (1984)
and described below in Sec. 3.7. In the presented taxonomy the SOM
has a special role: it has a localized knowledge representation where
the location in the neural layer encodes topological information beyond
Euclidean distances in the input space (see also Fig. 3.3). This means
that input signals which have similar “features” will map to neigh-
boring neurons in the network (“feature map”). This topological pre-
serving effect works also in higher dimensions (famous examples are
Kohonen's Neural Typewriter for spoken Finnish language, and the se-
mantic map, where the similarity relationships of a set of 16 animals
3.5 Strategies to Avoid Over-Fitting 35
X X
Figure 3.4: (Left) A meaningful fit to the given cross-marked noisy data. (Right)
Over-fitting of the same data set: It fits well to the training set, but is performing
badly on the indicated (cross-marked) position.
Smoothing and Regularization: Poggio and Girosi (1990) pointed out that
learning from a limited set of data is an ill-posed problem and needs
further assumptions to achieve meaningful generalization capabili-
ties. The most usual presumption is smoothness, which can be formal-
ized by a stabilizer term in the cost function Eq. 3.1 (regularization
theory). The roughness penalty approximations can be written as
retinotopic map in the primary visual cortex (e.g. Obermayer et al. 1990).
Fig. 3.5 shows the basic operation of the Kohonen feature map. The
A
map is built by a m (usually two) dimensional lattice of formal neurons.
a A
Each neuron is labeled by an index 2 , and has reference vectors a w
attached, projecting into the input space X (for more details, see Kohonen
1984; Kohonen 1990; Ritter et al. 1992).
a*
x
wa*
Array of
Neurons a
Input Space X
x
The response of a SOM to an input vector is determined by the ref-
w
erence vector a of the discrete “best-match” node . The “winner” a
a
neuron is defined as the node which has its reference vector a closest
w
to the given input
a
= argmin k a0 ; k :
w x (3.9)
8 aA2
ticipates in the current learning step (as indicated by the gray shading in
Fig. 3.5.)
A
The networks starts with a given node grid and a random initializa-
tion of the reference vectors. During the course of learning, the width of
the neighborhood bell function h() and the learning step size parameter
is continuously decreased in order to allow more and more specialization
and fine tuning of the (then increasingly) individual neurons.
This particular cooperative nature of the adaptation algorithm has im-
portant advantages:
map encoding (i.e. node Location in the neural array) are advantageous
when the distribution of stochastic transmission errors is decreasing with
distance to the original data. In case of an error the reconstruction will
restore neighboring features, resulting in a more “faithful” compression.
Ritter showed the strict monotonic relationship between the stimulus
density in the m-dimensional input space and the density of the match-
x
ing weight vectors. Regions with high input stimulus density P ( ) will be
represented by more specialized neurons than regions with lower stimu-
lus density. For certain conditions the density of weight vectors could be
x
derived to be proportional to P ( ) , with the exponent = m=(m + 2)
(Ritter 1991).
x
F ( ) = ya : (3.11)
The next important step to increase the output precision was the intro-
duction of a locally valid mapping around the reference vector. Cleve-
land (1979) introduced the idea of locally weighted linear regression for
uni-variate approximation and later for multivariate regression (Cleve-
land and Devlin 1988). Independently, Ritter and Schulten (1986) devel-
oped the similar idea in the context of neural networks, which was later
coined the Local Linear Map (“LLM”) approach.
Within each subregion, the Voronoi cell (depicted in Fig. 3.5), the output
is defined by a tangent hyper-plane described by the additional vector (or
matrix) B
x B x w
F ( ) = ya + a ( ; a ): (3.12)
By this means, a univariate function is approximated by a set of tangents.
x
In general, the output F ( ) is discontinuous, since the hyper-planes do not
match at the Voronoi cell borders.
42 Artificial Neural Networks
Despite the improvement by the LLMs, the discrete nature of the stan-
dard SOM can be a limitation when the construction of smooth, higher-
dimensional map manifolds is desired. Here a “blending” concept is re-
quired, which is generally applicable — also to higher dimensions.
Since the number of nodes grows exponentially with the number of
map dimensions, manageably sized lattices with, say, more than three
dimensions admit only very few nodes along each axis direction. Any
discrete map can therefore not be sufficiently smooth for many purposes
where continuity is very important, as e.g. in control tasks and in robotics.
In this chapter we discuss the Parameterized Self-Organizing Map (“PSOM”)
algorithm. It was originally introduced as the generalization of the SOM
algorithm (Ritter 1993). The PSOM parameterizes a set of basis functions
and constructs a smooth higher-dimensional map manifold. By this means
a very small number of training points can be sufficient for learning very
rapidly and achieving good generalization capabilities.
a a
31 33
w
Embedding 9
Space X
w Array of
3
Knots a ∈A
s2
w
2 A∈S
s1
w
1
Figure 4.1: The PSOM's starting position is very much the same as for the SOM
depicted in Fig. 3.5. The gray shading indicates that the index space , which is A
discrete in the SOM, has been generalized to the continuous space S in the PSOM.
The space S is referred to as parameter space S .
PSOM. This is indicated by the grey shaded area on the right side of
Fig. 4.1.
Fig. 4.2 illustrates on the left the m=2 dimensional “embedded manifold”
M in the d=3 dimensional embedding space X . M is spanned by the nine
w w
(dot marked) reference vectors 1 : : : 9, which are lying in a tilted plane
in this didactic example. The cube is drawn for visual guidance only. The
w
dashed grid is the image under the mapping () of the (right) rectangular
grid in the parameter manifold S .
ws
How can the smooth manifold ( ) be constructed? We require that the
embedded manifold M passes through all supporting reference vectors a w
w
and write () : S ! M X :
w(s) = X H (a s) wa (4.1)
aA
2
as
This means that, we need a “basis function” H ( ) for each formal node,
weighting the contribution of its reference vector (= initial “training point”)
w s
a depending on the location relative to the node position , and possi- a
A
bly, also all other nodes (however, we drop in our notation the depen-
as as A
dency H ( ) = H ( ) on the latter).
4.1 The Continuous Map 45
a a
x3 31
M 33
w(s)
x2
x1 continuous
s2
mapping
wa
s1
S
a H(a,s) a a
s2
s1 s1 s1
Figure 4.3: Three of the nine basis functions H ( a ) for a 33 PSOM with equidis-
tant node spacing A = f0 1g f0 1g (left:) H ((0 0) s); (middle:) H (( ) s);
1
2
1
2
1 1
2 2
(right:) H (( 1) s). The remaining six basis functions are obtained by 90 rota-
1
2
tions around s = ( ).
1 1
2 2
as
A simple construction of basis functions H ( ) becomes possible when
the topology of the given points is sufficiently regular. A particularly
convenient situation arises for the case of a multidimensional rectangu-
as
lar grid. In this case, the set of functions H ( ) can be constructed from
products of one-dimensional Lagrange interpolation polynomials. Fig. 4.3
as
depicts three (of nine) basis functions H ( ) for the m = 2 dimensional
A
example with a 33 rectangular node grid shown in Fig. 4.5. Sec. 4.5 will
give the construction details and reports about implementation aspects for
fast and efficient computation of H ( ) etc. as
ws
Then (ii) use the surface point ( ) as the output of the PSOM in response
x
to the input .
To build an input-output mapping, the standard SOM is often extended
w
by attaching a second vector out to each formal neuron. Here, we gener-
alize this and view the embedding space X as the Cartesian product of the
input subspace X in and the output subspace Xout
ws
Then, ( ) can be viewed as an associative completion of the input space
x
component of if the distance function dist() (in Eq. 4.4) is chosen as the
Euclidean norm applied only to the input components of (belonging to x
X in). Thus, the function dist() actually selects the input subspace Xin ,
s
since for the determination of (Eq. 4.4) and, as a consequence, of ( ),
ws
x
only those components of matter, that are regarded in the distance met-
ric dist(). The mathematical formulation is the definition of a diagonal
projection matrix
P
= diag(p1 p2 : : : pd ) (4.6)
with diagonal element pk > 0 8 k 2 I , and all other elements zero. The set
I is the subset of components of X (f1 2 : : : dg) belonging to the desired
X in. Then, the distance function can be written as
X X
d
dist(x x ) = (x ; x P (x ; x ) =
0 0
)T 0
pk (xk ; xk )
0 2
= pk (xk ; xk )2:
0
(4.7)
k I
2 k=1
For example, consider a d = 5 dimensional embedding space X , where
the components I = f1 3 4g belong to the input space. Only those must
be specified as inputs to the PSOM:
0 1
B x1 CC
B
B CC missing
B
x = BB x3 CC components (4.8)
B
@ x4 CA = desired
output
The next step is the costly part in the PSOM operation: the iterative
“best-match” search for the parameter space location , Eq. 4.4 (see next s
section.) In our example Eq. 4.8, the distance metric Eq. 4.7 is specified
48 The PSOM Algorithm
P P P
~ ~
A A
A ~ X A A
ld
ld
ld
n M
n M
n M
B
o
o
a O
if
a O
a O
if
if
X B B ~ X B
M S
M S
M S
C
P
P
P
X C X C C
BB w (s ) CC ;! BB w (s ) CC
1
BB 2 CC
BB 2 CC
w(s ) = BB w3(s ) CC
BB x3 CC =: x
(4.9)
B@ w4(s ) CA B@ x4 CA
w5(s ) ;! w5(s )
x
when varies over a regular 10 10 grid in the plane x2 = 0. Fig. 4.5c
displays a rendering of the associated “completions” ( ( )), which form ws x
a grid in X .
As an important feature, the distance function dist() can be changed
on demand, which allows to freely (re-) partition the embedding space X
in input subspace and output subspace. One can, for example, reverse the
mapping direction or switch to other input coordinate systems, using the
same PSOM.
Staying with the previous simple example, Figures 4.6 illustrate the
alternative use of the previous PSOM in example Fig. 4.5. To complete this
4.2 The Continuous Associative Completion 49
a) x3
b) c) d)
x3
x3
x2
x2
s2
x1 x2
x1
s1
S
Figure 4.5: a–d: PSOM associative completion or recall procedure (I = f1 3g,
P=diag(1,0,1)) for a rectangular spaced set of 10 10 (x1 x3) tuples to (x2 x3),
together with the original training set of Fig. 4.1,4.5. (a) the input space in the x2 =
0 plane, (b) the resulting (Eq. 4.4) mapping coordinates s 2 S , (c) the completed
data set in X , (d) the desired output space projection (looking down x1 ).
x3 x3
x3
x2 x2
s2
x2
x1 x1
s1
0.5
1
0
0 0.5
0.5
0
s s
0t+1 = t+1 + t : s (4.12)
s s
if the last iteration step resulted in a cost reduction E ( t ; ) the step
s s s
is accepted, t+1 = t ; , and is decreased by a significant factor of,
e.g. 10. Otherwise, if the step leads to an increase of cost, then the step
s
will be repeated with based on an increased . The iteration is stopped
when the step size drops below a desired threshold.
The Levenberg-Marquardt works very satisfactorily and finds the min-
s
imum of E ( ) efficiently within a couple of iterations. Our experience
shows a more than one order of magnitude higher speed than the sim-
ple steepest gradient algorithm. The general problem of finding a global
4.4 Learning Phases 53
Irrespective of the way in which the initial structure of the PSOM map-
ping manifold was established, the PSOM can be continuously fine-tuned
54 The PSOM Algorithm
1 after training
reference vectors W_a
training data set
0.8 before training
0.6
0.4
0.2
-1 -0.5 0 0.5 1
Figure 4.8: The mapping result of a three node PSOM before and after learning
by the means of Eq. 3.10 with the “+” marked learning examples (generated by
x2 = x21 + (0 0:15); normal distributed noise with mean 0 and standard devia-
tion 0.15). The positions of the reference vectors wa are indicated by the asterisks.
The same adaptation mechanism can be employed for learning the out-
put components of the reference vectors from scratch, i.e. even in the ab-
sence of initial corresponding output data. As a prerequisite, the input
sub-space X in of the mapping manifold must be spanned in a topolog-
ical order to facilitate the best-match dynamic Eq. 4.4 to work properly.
4.4 Learning Phases 55
w
in contrast to a in the “locally acting” Kohonen step of Eq. 3.10. But the
alternatively proposed learning rule (Ritter 1993)
wa = H (a s ) (x ; wa)
(4.15)
x2
0.5
100
50
Ad Ste
1
ap ps
0 0 0.5
-1 -0.5
ta
tio
n
x1
Figure 4.9: In contrast to the adaptation rule Eq. 4.14, the modified rule Eq. 4.15
is here instable, see text.
1a 3a
1 2
1a
s 2
3 1a
3 3a
s S 1
1 s
2
2a 2a 2a 2a
1 2 3 4
a = ai = ( ai ai : : : maim )T 2 A:
1
1
2
2
In Fig. 4.3 (p. 46) some basis functions of a m = 2 dimensional PSOM with
equidistantly chosen node spacing, are rendered. Note, that the PSOM al-
gorithm is invariant to any offset for each S axis, and together with the
iterative best-match finding procedure, it becomes also invariant to rescal-
ing.
Comparative results using (for n > 3) a node spacing derived from the
Chebyshev polynomial are reported in Sec. 6.4.
The first derivative of (4.1) turns out to be surprisingly simple, if we
write the product rule in the form
@ f (x)f (x) f (x) = f (x)f (x) f (x) Xn @fk (x)
@x 8fk(x) 6= 0:
n n
@x 1 2 1 2
k=1 k (x)
f
(4.19)
58 The PSOM Algorithm
@ l (s A ) = l (s A ) X n 1
i i
j =1 j =i s ; aj
(4.20)
@s
6
and
@ w(s) = X w H (a s) X n 1
a :
j =1 j =i s ; aj
(4.22)
@ ( s) a
6
s
This is correct, if fk (li ) 6= 0. Or, in other words, for all staying away
from the dashed grid structure in Fig. 4.10; there, one or more of the prod-
uct functions fk (li) become zero. Although the vanishing denominator
is canceled by a corresponding zero in the pre-factor (the derivative of a
polynomial is well-behaved everywhere), the numerical implementation
of the algorithm requires special attention in the vicinity of a diminishing
denominator in Eq. 4.20.
One approach is, to treat in each S -axis the smallest term ks ; aj k
direction always separately. This is shown below.
For an implementation of the algorithm, an important point is the ef-
ficient evaluation of the Lagrange factors and their derivatives. Below we
give some hints how to improve their computation requiring O(n) opera-
tions, instead of O(n2 ) operations for the “naive” approach. For the sake
of readability we omit from here on the (upper left) S component index
(2 f1 2 : : : mg) and reduce the notation of running indexes 1 j n to
the short form “j ”; extra exclusions j 6= i are written as “j ;i”.
We denote the index of the closest support point (per axis) with an asterisk
i
:= argmin ks ; aj k (4.23)
j
(
li(s A) =
Qi if i = i
Qidi else
(
@li =
Qi S1i if i = i
@s Qi(S1idi + 1) else
8
@ 2li < Qih(S12i ; S2i) i if i = i
=
@s2 : Qi S12i ; S2i di + 2S1i else
using: di := s ; ai
Y Y
Ci := (ai ; aj ) = const Qi := C1 dj
j i
; i j i i
; ;
X 1 X 1 !2
S1i := S2i :=
j i i dj
; ; j i i dj
; ;
(4.24)
The interim quotients Qi and sums S1i and S2i are efficiently gener-
ated by defining the master product Qi (and sums S i respectively) and
working out the specific terms via “synthetic division” and “synthetic sub-
traction” for all i 6= i
2
Qi = Qdi S1i = S1i ; d1 , S2i = S2i ; d1 :
, and (4.25)
i i i
as
Computing H ( ) and its derivatives is now a matter of collecting the
proper pre-computed terms.
H (a s) =
Qm l (s A )
=1 i
( @
@ H (a s) =
Qm @ (s) li ( s A ) if =
@ (s) =1 li (s A ) else
(4.26)
8 @2
> l ( s A ) if = =
@ 2 H (a s) =
Qm < l@ ((s)s2 Ai ) if 6= and 6=
@ (s)@ (s) =1 > i
: @ (@s) li (s A ) else
60 The PSOM Algorithm
From here the coefficients of the linear system Eq. 4.13 can be assembled
4.6 Summary
The key points of the “Parameterized Self-Organizing Map” algorithm are:
Independently of how the the initial structure was found, further on-
line fine-tuning is facilitated by an on-line error minimization proce-
dure. This adaption is required for example, in the case of coarsely
sampled or noise-corrupted data, as well as in cases, where the learn-
ing system has to adapt to drifting, or suddenly changing mapping
tasks.
Characteristic Properties by
Examples
sional embedding space X . The first two components are the input space
X in = X 12 the last two are the output-space Xout = X 34. The subscript
indicates here the indented projection of the embedding space.
Fig. 5.1a shows the reference vectors drawn in the input sub-space X12
and Fig. 5.1b in the output sub-space X34 as two separate projections.
a A
The invisible node spacing 2 is here, and in the following examples,
equidistantly chosen. For the following graphs, the embedding space con-
tains a m dimensional sub-space Xs where the training vector components
a
are set equal to the internal node locations in the rectangular set . TheA
PSOM completion in Xs is then equivalent to the internal mapping man-
s
ifold location , here Xs = X 12 in Fig. 5.1a+c. Since the PSOM approach
makes it easy to augment the embedding space X , this technique is gen-
erally applicable to visualize the embedding manifold M : A rectangular
test-grid in X s is given as input to the PSOM and the resulting completed
ws
d-dimensional grid f ( )g is drawn in the desired projection.
Fig. 5.1c displays the 15 15 rectangular grid and on the right Fig. 5.1d
its image, the output space projection X34 . The graph shows, how the
embedding manifold nicely picks the curvature information found in the
connected training set of Fig. 5.1. The edges between the training data
w
points a are the visualization of the essential topological information
w
drawn from the assignment of a to the grid locations . a
Fig. 5.2 shows, what a 22 PSOM can do with only four training vectors
w a . The embedding manifold M now non-linearly interpolates between
the given support points (which are the same corner points as in the pre-
vious figure). Please note, that the mapping is already non-linear, since
5.1 Illustrated Mappings – Constructed From a Small Number of Points 65
parallel lines are not kept parallel. Any bending of M has disappeared. To
capture the proper “curvature” of the transformation, one should have at
least one further point, between the two endpoints of a mapping interval.
1
0.5
0 Figure 5.3: Isometric projection
-0.5 of the d = 3, m = 2 dimen-
-1 sional manifolds M . The 2
2 PSOM manifold spans like a
1
-1 -0.5 0 0.5 soap film over the four corner-
0 0.5 1 -1 -0.5
w
ing reference vectors a.
Eq. 5.1 maps the unit cube [-1,1]3 into a barrel shaped region, shown in
Fig. 5.5. The first four plots in the upper row illustrate the mapping if the
5.2 Map Learning with Unregularly Sampled Training Points 67
a) b) c) d)
e) f) g) h)
+ =
Figure 5.5: The jittery barrel mapping (Eq. 5.1, X123 : (a e g h) X 456 : (b c d f )
projections). The training data set is asterisk marked shown as the X123 (a) and
X 456 (b) projection and the mapping manifold M (m=3) as surface grid plot in
(c). To reveal the internal structure of the mapping inside the barrel, a “ filament”
picture is drawn by the vertical lines and the horizontal lines connecting only the
points of the 10 5 10 grid in the top and bottom layer (d).
(e)(f) If the samples are not taken on a regular grid in X123 but with a certain jitter,
the PSOM is still able to perform a good approximation of the target mapping:
(g) shows the image of the data set (d) taken as input. The plot (h) draws the
difference between the result of the PSOM completion and the target value as
lines.
68 Characteristic Properties by Examples
PSOM training samples are taken from the rectangular grid of the asterisk
markers depicted in Fig. 5.5ab.
The succeeding plots in the lower row present the situation, that the
PSOM only learned the randomly shifted sampling positions Fig. 5.5ef.
The mapping result is shown in the rightmost two plots: The 333 PSOM
can reconstruct the goal mapping fairly well, illustrating that there is no
necessity of sampling PSOM training points on any precise grid structure.
Here, the correspondence between X and S is weakened by the sampling,
but the topological order is still preserved.
b) c)
1 2
x
a) 3 4
5
x 6
d) e)
1 2
x x
3
4
5
6
Figure 5.7: “Topological defects” by swapping two training vectors: a–b the 2 2
PSOM of Fig. 5.2 and c–d the 3 3 PSOM of Fig. 5.1
Note, that the node points are still correctly mapped, as one can expect
from Eq. 4.2, but in the inter-node areas the PSOM does not generalize
well. Furthermore, if the opposite mapping direction is chosen, the PSOM
has in certain areas more than one unique best-match solution . The s
s
result, found by Eq. 4.4, will depend on the initial start point t=0 .
Can we algorithmically test for topological defects? Yes, to a certain
extent. Bauer and Pawelzik (1991) introduced a method to compare the
5.5 Extrapolation Aspects 71
“faithfulness” of the mapping from the embedding input space to the pa-
rameter space. The topological, or “wavering” product gives an indication
on the presence of topological defects, as well as too small or too large
mapping manifold dimensionality.
As already pointed out, the PSOM draws on the curvature information
drawn from the topological order of the training data set. This information
w
is visualized by the connecting lines between the reference vectors a of
neighboring nodes. How important this relative order is, is emphasized
by the shown effect if the proper order is missing, as seen Fig. 5.7.
values are “far away”, which leads to the advise: Be suspicious, if the best-
s
match is found far outside the given node-set . A
Depending on the particular shape of the embedding manifold M , an
unfortunate gradient situation may occur in the vicinity of the border
training vectors. In some bad cases the local gradient may point to an-
other, far outside local minimum, producing a misleading completion re-
sult. Here the following heuristic proved useful:
a
In case the initial best-match node (Sect. 4.3) has a marginal surface
A
position in , the minimization procedure Eq. 4.4 should be started at a
shifted position
s a
t=0 = + :
a? (5.2)
The start-point correction a is chosen to move the start location inside
a
?
the node-set hyper-box, perpendicular to the surface. If is an edge or
a
corner node, each surface normal contributes to . The shift length is
?
uncritical: one third of the node-set interval, but maximal one inter-node
distance, is reasonable. This start-point correction is computationally neg-
ligible and helps to avoid critical border gradient situations, which could
otherwise lead to another, undesired remote minimum of Eq. 4.4.
Over-specified Input: Consider the case, where the specified input sub-
space X in over-determines the best-match point in the parameter
manifold S . This happens if the dimensionality of the input space
is higher than the parameter manifold S : dim(Xin ) = jI j > m.
Fig. 5.9 illustrates this situation with a (m = 1) one-dimensional
PSOM and displays the two input space dimensions Xin together
5.6 Continuity Aspects 73
w(s )
s*)
x
w(
.
. for a sequence of inputs (dotted line)
M lead to a “jump” in the resulting best-
x s
match at the corresponding comple-
ws
tion ( ).
P
with the projection of the embedding manifold M . Assume that
x
the sequence of presented input vectors (2 D!) varies on the indi-
cated dotted line from left to right. The best-match location Pw s
( ),
x
determined as the closest point to , is moving up the arch-shaped
embedding manifold M . At a certain point, it will jump to the other
s
branch, obviously exhibiting a discontinuity in and the desired
ws
association ( ).
Multiple Solutions: The next example Fig. 5.10 depicts the situation jI j =
m = 1. A one-dimensional four-node PSOM is employed for the
approximation of the mapping x1 7! x2. The embedding manifold
M X = IR2 is drawn, together with the reference vectors a. w
x2
x1
s
three compatible solutions fulfilling Eq. 4.4, which is a bifurcation
with respect to the shift operation and a discontinuity with respect
to the mapping x1 7! x2.
In view of the pure x1 projection, the final stage could be interpreted
as “topological defect” (see Sec. 5.4). Obviously, this consideration is
relative and depends very much on further circumstances, e.g. infor-
mation embedded in further X components.
5.7 Summary
The construction of the parameterized associative map using approxima-
tion polynomials shows interesting and unusual mapping properties. The
high-dimensional multi-directional mapping can be visualized by the help
of test-grids, shown in several construction examples.
The structure of the prototypical training examples is encoded in the
a
topological order, i.e. the correspondence to the location ( ) in the map-
ping manifold S . This is the source of curvature information utilized by
the PSOM to embed a smooth continuous manifold in X . However, in
certain cases input-output mappings are non-continuous. The particular
manifold shape in conjunction with the associative completion and its op-
tional partial distance metric allows to select sub-spaces, which exhibit
multiple solutions. As described, the approximation polynomials (Sec. 4.5)
as choice of the PSOM basis function class bears the particular advan-
tage of multi-dimensional generalization. However, it limits the PSOM
approach in its extrapolation capabilities. In the case of a low-dimensional
input sub-space, further solutions may occur, which are compatible to the
given input. Fortunately, they can be easily discriminated by their their
s
remote location.
Chapter 6
From the previous examples, we clearly see that in general we have to ad-
dress the problem of multiple minima, which we combine with a solution
to the problem of local minima. This is the subject of the next section.
In the following, section 6.2 describes a way of employing the multi-
way mapping capabilities of the PSOM algorithm for additional purposes,
e.g. in order to simultaneously comply to auxiliary constraints given to
resolve redundancies.
If an increase in mapping accuracy is desired, one usually increases the
number of free parameters, which translates in the PSOM method to more
training points per parameter axis. Here we encounter two shortcomings
with the original approach:
Both aspects motivate two extensions to the standard PSOM approach: the
“Local-PSOMs” and the “Chebyshev-spaced PSOM”, which are the focus
of the Sec. 6.3 and 6.4.
W4
a) M
X_1 -> X_2
W_a
b) M
X_1 -> X_2
W_a
x2 W3 x2 W3
W2 W2
W1 s) x1 W1 x1
W(
Figure 6.1: The problem of local and multiple minima can be solved by the
multi-start technique. The solid curve shows the embedded one-dimensional
(m = 1) PSOM manifold, spanned by the four asterisks marked reference vectors
fw w w w g
1 2 3 4 in IR . The dashed line connects a set of diamont-marked PSOM
2
!
mappings x1 x2 .
(a) A pathological situation for the standard approach: depending on the starting
s
location t=0 , the best-match search can be trapped in a local minimum.
(b) The multi-start technique solves the task correctly and can be employed to
find multiple solutions.
reoccurs except for the middle region and the right side, close to the last
reference point.
First, we want to note that the problem can be detected here by mon-
itoring the residual distance dist() (respectively the cost function E ( )) s
which is here the horizontal distance of the found completion (close to 3 ) w
x
and the input .
Second, this problem can be solved by re-starting the search: suitable
restart points are the distance-ranked list of node locations found in the a
first place (e.g. the 10th points probes the start locations at node 3,2,1,4).
The procedure stops, if a satisfying solution (low residual cost function)
or a maximum number of trials is reached. Fig. 6.1b demonstrates this
multi-start procedure and depicts the correctly found solutions.
In case the task is known to involve a consecutive sequence of query
points, it is perfectly reasonable to place the previous best-match location
s
at the head position of the list of start locations.
Furthermore, the multi-start technique is also applicable to find mul-
tiple best-match solutions. However, extra effort is required to find the
complete list of compatible solutions. E.g. in the middle region of the de-
picted example Fig. 6.1, at least two of the three solutions will be found.
X
d
E (s) = 12 dist (x w(s)) = 12 pk xk ; wk (s)]2 :
k=1
x3 x3
x3
x2 x2
s2
x2 x1 s1 x2 x1
input vector
a) best matching knot b) c) d)
Figure 6.2: a–d: The Local-PSOM procedure. The example task of Fig. 4.1, but
this time using a 3 3 local PSOM of a 7 7 training set. (a–b) The input vector
a
(x2 x3) selects the closest node (in the now specified input space). The asso-
ciated 3 3 node sub-grid is indicated in (c). The minimization procedure starts
s a
at its center = and uses only the PSOM constructed from the 3 3 sub-grid.
ws
(d) displays the mapping result ( ) in X , together with the selected sub-grid
of nodes in orthonormal projection. The light dots indicate the full set of training
nodes. (For the displayed mapping task, a 2 2 PSOM would be appropriate; the
7 7 grid is for illustrative purpose only.)
Fig. 6.2a–d explains the procedure in the the context of the previous
simple cube scenario introduced in Sec. 4.1. One single input vector is
given in the (x2 x3) plane (left cube side), shown also as a projection on
w
the left Fig. 6.2a. The reference vector a that is closest to the current
x
input will serve as the center of the sub-grid. The indicated 3 3 node
grid is now used as the basis for the completion by the Local-PSOM.
Continuity is a general problem for local methods and will be dis-
cussed after presenting an example in the next section.
80 Extensions to the Standard PSOM Algorithm
a) b) c)
target train set n=5
d) e) f)
n’=2 n’=3 n’=4
Figure 6.3: The Local-PSOM approach with various sub-grid sizes. Completing
the 5 5 sample set (b) Gaussian bell function (a) with the local PSOM approach
using sub-grid sizes n0 n0 , with n0 = 2 3 and 4; see text.
Fig. 6.3c shows how the full set PSOM completes the embedding man-
w
ifold M . The marginal oscillations in between the reference vectors a are
a product of the polynomial nature of the basis functions. Fig. 6.3d-e is the
image of the 22, 33, and the 44 local PSOM.
tions
The choice of the sub-grid needs some more consideration. For example,
0
s
in the case n = 2 the best-match should be inside the interval swanned
6.3 The Local-PSOM 81
by the two selected nodes. This requires to shift the selected node window,
s
if is outside the interval. This happens e.g. when starting at the best-
a
match node , the “wrong” next neighboring node is considered first (left
instead of right).
Fig. 6.3d illustrates that the resulting mapping is continuous – also along
edge connecting reference vectors. Because of the factorization of the basis
functions the polynomials are continuous at the edges, but the derivatives
perpendicular to the edges are not, as seen by the sharp edges. An analo-
gous scheme is also applicable for all higher even numbers of nodes n . 0
What happens for odd sub-grid sizes? Here, a central node exists and
a
can be fixated at the search starting location . The price is that an input,
continuously moving from one reference vector to the next will experience
halfway that the selected sub-grid set changes. In general this results in a
discontinuous associative completion, which can be seen in Fig. 6.3e which
coincides with Fig. 6.4a).
a) b) c)
Figure 6.4: Three variants to select a 3 3 sub-grid (in the previous Problem
Fig. 6.4) lead to different approximations: (a) the standard fixed sub-grid selection
and (b)(c) the continuous but asymmetric mappings, see text.
Despite the symmetric problem stated, case Fig. 6.4b and Fig. 6.4c give
asymmetric (here mirrored) mapping results. This can be understood if
looking at the different 3 3 selections of training points in Fig. 6.3b. The
round shoulder (Fig. 6.4b) is generated by the peak-shaped data sub-set,
which is symmetric to the center. On the other hand, the “steep slope part”
bears no information on the continuation at the other side of the peak.
82 Extensions to the Standard PSOM Algorithm
tical to the multidimensional extension of the linear splines, see Fig. 6.3d.
Each patch resembles the m-dimensional “soap-film” illustrated in Fig. 5.3
and 5.4.
An alternative was suggested by Farmer and Sidorowich (1988). The
idea is to perform a linear simplex interpolation in the jI j–dimensional
input space X in by using the jI j+1 (with respect to the input vector ) x
closest support points found, which span the simplex containing Px . This
lifts the need of topological order of the training points, and the related
restriction on the number of usable points.
For n = 3 and n = 4 the local PSOM concept resembles the quadratic
0 0
and the cubical spline, respectively. In contrast to the spline concept that
uses piecewise assembled polynomials, we employ one single, dynami-
cally constructed interpolation polynomial with the benefit of usability for
multiple dimensions m.
For m = 1 and low d the spline concept compares favorably to the poly-
nomial interpolation ansatz, since the above discussed problem of asym-
metric mapping does not occur: at each point 3 (or 4, respectively) polyno-
mials will contribute, compared with one single interpolation polynomial
in a selected node sub-grid, as described.
For m = 2 the bi-cubic, so-called tensor-product spline is usually com-
puted by row-wise spline interpolation and a column spline over the row
interpolation results (Hämmerlin and Hoffmann 1991). The procedure is
computationally very expensive and has to be independently repeated for
6.4 Chebyshev Spaced PSOMs 83
each component d.
For m > 2 the tensor splines are simple to extend in theory, but in prac-
tice they are not very useful (Friedman 1991). Here, the PSOM approach
can be easily extended as shown for the case m = 6 later in Sec. 8.2.
0.5
-1 -0.5 0.5 1
0.5
-1 -1
s1
Figure 6.5: (left:) The Chebyshev polynomial T10 (Eq. 6.2). Note the increased
density of zeros towards the approximation interval bounds .
(center:) T5 below a half circle, which is cut into five equal pieces. The base pro-
jection of each arc mid point falls at a zero of T5 (Eq. 6.3.)
(right:) Placement of 5 10 nodes A2S , placed according to the zeros aj of the
Chebyshev polynomial T5 and T10.
In this section the Gaussian bell curve serves again as test function. Here
we want to compare the mapping characteristics between the PSOM ver-
sus the Local-Linear-Map (LLM) approach, introduced in chapter 3.
6.5 Comparison Examples: The Gaussian Bell 85
0.3 1
equidistant spacing 2x2
Chebyshev spacing 3x3
0.25 4x4
0.1 equidistant spacing, full set
Chebyshev spacing, full set
Deviation (nrms)
Deviation (nrms)
0.2
0.15 0.01
0.1
0.001
0.05
0 0.0001
3 4 5 6 8 10 12 3 4 5 6 8 10 12
Number of Knots ^2 Number of Training Knots per Axes
Figure 6.6: a–b; Mapping accuracy of the Gaussian bell function for the pre-
sented PSOM variants – in a linear (left) and a logarithmic plot (right) – versus n
the number of training points per axes.
To compare the local and Chebyshev spaced PSOMs we return to the Gaus-
sian bell function of Eq. 6.1 with = 0:5 chosen to obtain a “medium
sharp” curved function. Using the same nn (in x1 x2) equidistantly sam-
pled training points we compute the root mean square deviation (RMS)
between the goal mapping and the mapping of (i) a PSOM with equidis-
tantly spaced nodes, (ii) local PSOMs with sub-grid sizes n = 2, 3, 4 (sub- 0
grids use equidistant node spacing here as well), and (iii) PSOMs with
Chebyshev spaced nodes.
Fig. 6.6 compares the numerical results (obtained with a randomly cho-
sen test set of 500 points) versus n. All curves show an increasing mapping
accuracy with increasing number of training points. However, for n > 3
the Chebyshev spaced PSOM (iii) shows a significant improvement over
the equidistant PSOM. For n = 3, the PSOM and the C-PSOM coincide,
since the two node spacings are effectively the same (the Chebyshev poly-
nomials are always symmetric to zero and here equidistant as well.)
In Fig. 6.6 the graphs show at n = 5 the largest differences. Fig. 6.7
displays four surface grid plots in order to distinct the mapping perfor-
mances. It illustrates the 5 5 training node set, a standard PSOM, a C-
PSOM, and a (2-of-5)2 L-PSOM.
86 Extensions to the Standard PSOM Algorithm
1
0.5
-1 0
-0.5 -0.5
0
0.5 -1
1
Figure 6.7: a–d; PSOM manifolds with a 5 5 training set. (a) 5 5 training points
are equidistantly sampled for all PSOMs; (b) shows the resulting mapping of the
local PSOM with sub-grid size 2 2. (c) There are little overshoots in the marginal
mapping areas of the equidistant spaced PSOM (i) compared to (d) the mapping
of the Chebyshev-spaced PSOM (ii) which is for n = 5 already visually identical
to the goal map.
6.5 Comparison Examples: The Gaussian Bell 87
Training 3 units
data:
15 units 65 units
Figure 6.8: Gaussian Bell approximated by LLM units (from Fritzke 1995). A fac-
tor of 100 more training points were used - compared to the PSOM in the previous
Fig. 6.7.
Fig. 6.8 shows the approximation result if using 3, 15, and 65 units.
Since each unit is equipped with a parameterized local linear mapping
(hyper-plane), comprising 2+3 (input+output) adaptable variables, the ap-
proximations effectively involves 15, 75, and 325 parameters respectively.
For comparison, each PSOM node displayed in Fig. 6.7 uses only 2+1 non-
constant parameters times 25 nodes (total 75), which makes it comparable
to the LLM network with “15 units”.
The pictures (Fig. 6.8) were generated by and reprinted with permis-
sion from Fritzke (1995). The units were allocated and distributed in the
input space using a additive “Growing Neural Gas” algorithm (see also
Sec. 3.6). Advantageously, this algorithm does not impose any constraints
on the topology of the training data.
Since this algorithm does not need any topological information, it can-
88 Extensions to the Standard PSOM Algorithm
the required training data set for the LLM network is much larger,
here 2500 points versus 25 data for the PSOM network (factor 100);
the PSOM can employ the self-organizing adaption rule Eq. 3.10 on
a much smaller data set (25), or it can be instantly constructed if the
sampling structure is known apriori (rapid-learning);
U
I
Figure 6.9: Schematic diagram
R C L
of the alternating current series
RLC - circuit.
U = U0 sin 2ft:
The resulting current I that flows through the circuit is also a sinusoidal
with the same frequency,
2fL ; 2fC
1
= (R L C f ) = atan : (6.5)
R
Following Friedman (1991), we varied the variables in the range:
0 100 ]
R
0 1 [H]
L
1 11 F]
C
20 280[ Hz]
f
which results in a impedance range Z 2 50 8000] and the phase lag
between 90 .
The PSOM training sets are generated by active sampling (here com-
puting) the impedance Z and for combination of one out of n resistors
values R, one (of n) capacitor values C , one (of n) inductor values L, at
n different frequencies f . As the embedding space we used the d = 6
x
dimensional space X spanned by the variables = (R L C f Z ).
Table Tab. 6.1 shows the root mean square (RMS and NRMS) results
of several PSOM experiments when using Chebyshev spaced sampling
and nodes placement. Within the same parameter regime Friedman (1991)
reported results achieved with the MARS algorithm. He performs a hyper-
rectangular partitioning of the task variable space and the fit of uni- and
90 Extensions to the Standard PSOM Algorithm
Z 1500 Z
8000
7000 1000
6000
5000
4000
3000 500
2000
1000
0 0
1
250
200
0.5 0 150
50 100
100
150
f[Hz] L[H}50 50
200 0 L[H] C[uF]
250 100
1800 Z Z
1700 7000
6000
1600 5000
1500 4000
1400 3000
2000
1300 1000
1200 0
-1000
10 10
0 5 50 5
100
R 50 C[uF]
150
f[Hz] 200 C[uF]
100 250
Figure 6.10: Isometric surface plots of 2 D cuts of the RLC-circuit mapping task
of one single PSOM (contour lines projected on the base). All visualize the same
4 D continuous PSOM mapping manifold, here in the d = 6 embedding space X .
6.7 Summary 91
bi-variate splines within these regions. His best results for 400 data points
were reported as a NRMS of 0.069 for the impedance value Z .
In contrast the PSOM approach constructs (here) a 4-dimensional pa-
rameterized manifold throughout. Fig. 6.10 visualizes this underlying map-
ping manifold in 3 D isometric views of various 2 D slices. Shown are
Z0
1 F (f L), Z0:5 H 6 F (L C ), Z1 H 280 Hz (R C ), and Z0
0 H (f C ). All
views are obtained from the same PSOM, here a n = 5, n = 3 L-PSOM, in
0
6.7 Summary
We proposed the multi-start algorithm to face the problem of obtaining
several solutions to the given mapping task. The concept of cost function
modulation was introduced. It allows to dynamically add optimization
goals which are of lower priority and possibly conflict with the primary
goal.
Furthermore, we presented two extensions to the PSOM algorithm which
92 Extensions to the Standard PSOM Algorithm
0 5 0 5
L[H]0.5 C[uF] L[H]0.5 C[uF]
1 1
R:0.6, f:0.6
34 PSOM R:0.6, f:0.6 (3 3 3 3 -chebyshev)
1500
Z phase
150
1000
100
50
500 0
-50
0 -100
-150
-500 10 10
0 5 0 5
L[H]0.5 C[uF] L[H]0.5 C[uF]
1 1
R:0.6, f:0.6
(3-of-5)4 L-PSOM R:0.6, f:0.6 (5 5 5 5 -chebyshev)
Z phase
1100 100
1000
900
800 50
700
600 0
500
400
300 -50
200
100
0 -100
10 10
0 5 0 5
L[H]0.5 C[uF] L[H]0.5 C[uF]
1 1
R:0.6, f:0.6
54 PSOM R:0.6, f:0.6 (5 5 5 5 -chebyshev)
Z phase
150
1100
1000 100
900
800 50
700
600
500 0
400
300 -50
200
100 -100
0
10 10
0 5 0 5
L[H]0.5 C[uF] L[H]0.5 C[uF]
1 1
Figure 6.11: Comparison of three PSOM networks and the target mapping for
one particularly interesting 2 D cut at a given R and f , drawn as (left column)
Z (L C ) and (right column) (L C ) surface grids with contour plot projections
on the base. See text.
6.7 Summary 93
The PSOM algorithm has been explained in the previous chapters. In this
chapter a number of examples are presented which expose its applicability
in the vision domain. Vision is a sensory information source and plays an
increasingly important role as perception mechanism for robotics.
The parameterized associative map and its particular completion mech-
anism serves here for a number of interesting application possibilities. The
first example is concerned with the completion of an image feature set
found here in a 2 D image, invariant to translation and rotation of the im-
age. This idea can be generalized to a set of “virtual sensors”. A redundant
set of sensory information can be fused in order to improve recognition
confidence and measurement precision. Here the PSOM offers a natural
way of performing the fusion process in a flexible manner. As shown, this
can be useful for further tasks, e.g. for inter-sensor cooperation, and iden-
tifying the the object's 3 D spatial rotations and position. Furthermore, we
present also a more low-level vision application. By employing special-
ized feature filters, the PSOM can serve for identification of landmarks in
gray-scale images, here shown for fingertips.
and rotated freely. The goal is to determine the proper shift and twist angle
parameters when at least two image points are seen. Furthermore we de-
sire to predict the locations of the hidden – maybe occluded or concealed –
features. For example, this can be helpful to activate and direct specialized
(possibly expensive) perception schema upon the predicted region.
α
ζ
δ
ε β
η
γ
Figure 7.1: The star constellation the “Big Dipper” with its seven prominent stars
, and . (Left): The training example consists of the image position
of a seven stars
in a particular viewing situation. (Right): Three
examples of completed constellations after seeing only two stars in translated
and rotated position. The PSOM output are the image location of the missing
stars and desired set of viewing parameters (shift and rotation angle.)
Fig. 7.1 depicts the example. It shows the positions of the seven promi-
nent stars in the constellation Ursa Major to form the asterisk called the
“Big Dipper”. The positions (x y ) in the image of these seven stars ( )
are taken from e.g. a large photograph. Together with the center-of-gravity
position xc yc and the rotation angle of the major axis ( - ) they span the
x
embedding space X with the variables = fxc yc x
y
x y : : : x y g.
As soon as two stars are visible, the PSOM network can predict the
location of the missing stars and outputs the current viewing parameters –
here shift and rotation (xc yc ). Additionally other correlated parameters
of interest can be trained, e.g. azimuth, elevation angle, or time values.
While two points are principally enough to fully recover the solution
in this problem any realistic reconstruction task inevitably is faced with
noise. Here the fusion of more image features is therefore an important
problem, which can be elegantly solved by the associative completion
mechanism of the PSOM.
7.2 Sensor Fusion and 3 D Object Pose Identification 97
z φ θ ψ
0.5 0.5 0.5 0.5
Figure 7.2: The z system. (a) The cubical test object seen by the camera
when rotated and shifted in several depths z (=10, =20, =30, z =2. . . 6L, cube
size L.) (b–d) 0, 20, and 30rotations in the roll , pitch , and yaw system.
(The transformations are applied from right to left z .)
Figure 7.3: Six Reconstruction Examples. Dotted lines indicate the test cube as
seen by a camera. Asterisks mark the positions of the four corner points used as
inputs for reconstruction of the object pose by a PSOM. The full lines indicate the
reconstructed and completed object.
(inter-sensor coordination). The lower part of the table shows the results
when only four points are found and the missing locations are predicted.
P
Only the appropriate pk in the projection matrix (Eq. 4.7) are set to one,
in order to find the best-matching solution in the attractor manifold. For
several example situations, Fig. 7.3 depicts the completed cubical object on
the basis of the found four points (asterisk marked = input to the PSOM),
and for comparative reasons the true target cube with dashed lines (case
3333PSOM with ranges 150 ,2L). In Sec. 9.3.1 we will return to this
problem.
Table 7.1: Mean Euclidean deviation of the reconstructed pitch, roll, yaw angles
, the depth z, the column vectors ~n~o of the rotation matrix T , the scalar
product of the vectors ~n~o (orthogonality check), and the predicted image position
of the object locations P 5 P 6. The results are obtained for various experimental
parameters in order to give some insight into their impact on the achievable re-
construction accuracy. The PSOM training set size is indicated in the first column,
the intervals are centered around 0 , and depth z ranges from zmin = 2L,
where L denotes the cube length (focal length of the lens is also taken as = L.)
In the first row all corner locations are inputs. All remaining results are obtained
using only four (non-coplanar) points as inputs.
7.2 Sensor Fusion and 3 D Object Pose Identification 101
25
20
15
10
3
4
5 uts
Inp
10
6
5 7 r of
8 m be
Noise Nu
0
[%]
Figure 7.4: The reconstruction deviation versus the number of fused sensory
inputs and the percentage of Gaussian noise added. By increasing the number of
fused sensory inputs the performance of the reconstruction can be improved. The
significance of this feature grows with the given noise level.
Fig. 7.4 exposes the results. Drawn is the mean norm of the orientation
angle deviation for varying added noise level from 0 to 10 % of the av-
erage image size, and for 3,4,: : : and 8 fused sensory inputs, which were
taken into account. We clearly find with higher noise levels there is a grow-
ing benefit from an increasing increased number of contributing sensors.
And as one expects from a sensor fusion process, the overall precision
of the entire system is improved in the presence of noise. Remarkable
is how naturally the PSOM associative completion mechanism allows to
include available sensory information. Different feature sensors can also
be relatively weighted according to their overall accuracy as well as their
estimated confidence in the particular perceptual setting.
102 Application Examples in the Vision Domain
Figure 7.5: Left,(a): Typical input image. Upper Right,(b): after thresholding and
binarization. Lower Right,(c): position of 3 3 array of Gaussian masks (the dis-
played width is the actual width reduced by a factor of four in order to better
depict the position arrangement)
the full 1800 -range. This yields the very manageable number of 28 images
in total, for which the location of the index finger tip was identified and
marked by a human observer.
Ideally, the dependency of the x- and y -coordinates of the finger tip
should be smooth functions of the resulting 9 image features. For real
images, various sources of noise (surface inhomogeneities, small specular
reflections, noise in the imaging system, limited accuracy in the labeling
process) lead to considerable deviations from this expectation and make
the corresponding interpolation task for the network much harder than it
would be if the expectation of smoothness were fulfilled. Although the
thresholding and the subsequent binarization help to reduce the influence
of these effects, compared to computing the feature vector directly from
the raw images, the resulting mapping still turns out to be very noisy. To
give an impression of the degree of noise, Fig. 7.7 shows the dependence
of horizontal (x-) finger tip location (plotted vertically) on two elements of
the 9 D-feature vector (plotted in the horizontal xy ;plane). The resulting
mesh surface is a projection of the full 2 D-map-manifold that is embedded
in the space X , which here is of dimensionality 11 (nine dimensional input
features space X in , and a two dimensional output space Xout = (x y ) for
position.) As can be seen, the underlying “surface” does not appear very
smooth and is disrupted by considerable “wrinkles”.
To construct the PSOM, we used a subset 16 images of the image en-
semble by keeping the images seen from the two view directions at the
ends (90 ) of the full orientation range, plus the eight pictures belonging
from the remaining three view directions of 0 and 60 . I.e., both train-
ing and testing ensembles consisted of image views that were multiples of
60 apart, and the directions of the test images are midway between the
Figure 7.6: Some examples of hand images with correct (cross-mark) and pre-
dicted (plus-mark) finger tip positions. Upper left image shows average case, the
remaining three pictures show the three worst cases in the test set. The NRMS
positioning error for the marker point was 0.11 for horizontal, 0.23 for vertical
position coordinate.
Even with the very small training set of only 16 images, the resulting
PSOM achieved a NRMS-error of 0.11 for the x-coordinate, and of 0:23 for
the y -coordinate of the finger tip position (corresponding to absolute RMS-
errors of about 2.0 and 2.4 pixels in the 8080 image, respectively). To give
a visual impression of this accuracy, Fig. 7.6 shows the correct (cross mark)
and the predicted (plus mark) finger tip positions for a typical average
case (upper left image), together with the three worst cases in the test set
(remaining images).
106 Application Examples in the Vision Domain
Figure 7.7: Dependence of vertical index finger position on two of the nine input
features, illustrating the very limited degree of smoothness of the mapping from
feature to position space.
This closes here the list of presented PSOM applications homing purely
in the vision domain. In the next two chapters sensorimotor transforma-
tion will be presented, where vision will again play a role as sensory part.
Chapter 8
As pointed out before in the introduction, in the robotic domain the avail-
ability of sensorimotor transformations are a crucial issue. In particular,
the kinematic relations are of fundamental character. They usually describe
the relationship between joint, and actuator coordinates, and the position
in one, or several particular Cartesian reference frames.
Furthermore, the effort spent to obtain and adapt these mappings plays
an important role. Several thousand training steps, as required by many
former learning schemes, do impair the practical usage of learning meth-
ods in the domain of robotics. Here the wear-and-tear, but especially the
needed time to acquire the training data must be taken into account.
Here, the PSOM algorithm appears as a very suitable learning approach,
which requires only a small number of training data in order to achieve a
very high accuracy in continuous, smooth, and high-dimensional map-
pings.
(2 DOF) offers sidewards gyring of 15 and full adduction with two addi-
tional coupled joints (one further DOF). Fig. 8.1 illustrates the workspace
with a stroboscopic image.
(b) (c)
(d)
(a)
Figure 8.1: a–d: (a) stroboscopic image of one finger in a sequence of extreme
joint positions.
(b–d) Several perspectives of the workspace envelope ~r, tracing out a cubical
10 10 10 grid in the joint space ~ . The arrow marks the fully adducted posi-
tion, where one edge contracts to a tiny line.
For the kinematics in the case of our finger, there are several coordi-
nate systems of interest, e.g. the joint angles, the cylinder piston positions,
one or more finger tip coordinates, as well as further configuration depen-
dent quantities, such as the Jacobian matrices for force / moment trans-
formations. All of these quantities can be simultaneously treated in one
single common PSOM; here we demonstrate only the most difficult part,
the classical inverse kinematics. When moving the three joints on a cubical
101010 grid within their maximal configuration space, the fingertip (or
more precisely the mount point) will trace out the “banana” shaped grid
displayed in Fig. 8.1 (confirm the workspace with your finger!) Obviously,
8.1 Robot Finger Kinematics 109
The reason is the peculiar structure; e.g. in areas close to the tip a certain
angle error corresponds to a smaller Cartesian deviation than in other ar-
eas.
When measuring the mean Cartesian deviation we get an already sat-
isfying result of 1.6 mm or 1.0 % of the maximum workspace length of
160 mm. In view of the extremely small training set displayed in Fig. 8.2a–
b this appears to be a quite remarkable result.
Nevertheless, the result can be further improved by supplying more
training points as shown in the asterisk marked curve in Fig. 8.3. The
effective inverse kinematic accuracy is plotted versus the number of train-
ing nodes per axes, using a set of 500 randomly (in ~ uniformly) sampled
positions.
For comparison we employed the “plain-vanilla” MLP with one and
two hidden layers (units with tanh() squashing function) and linear units
in the output layer. The encoding was similar to the PSOM case: the
plain angles as inputs augmented by a constant bias of one (Fig. 3.1). We
found that this class of problems appears to be very hard for the standard
MLP network, at least without more sophisticated learning rules than the
standard back-propagation gradient descent. Even for larger training set
sizes, we did not succeed in training them to a performance comparable
110 Application Examples in the Robotics Domain
(a) (b)
Xθ Xr
Figure 8.2: a–b and c–e; Training data set of 27 nine-dimensional points in X for
the 3 3 3 PSOM, shown as perspective surface projections of the (a) joint angle
~ and (b) the corresponding Cartesian sub space. Following the lines connecting
the training samples allows one to verify that the “banana” really possesses a
cubical topology. (c–e) Inverse kinematic result using the grid test set displayed
in Fig. 8.1. (c) projection of the joint angle space ~ (transparent); (d) the stroke
position space ~c; (e) the Cartesian space ~r , after back-transformation.
0
8.1 Robot Finger Kinematics 111
4.5 10
2x2x2 used 2x2x2 used
4 3x3x3 used 3x3x3 used
Mean Cartesian Deviation [mm]
2.5
1.5 0.1
0.5
0 0.01
3 4 5 6 8 10 3 4 5 6 8 10
Knot Points per Axes Knot Points per Axes
Figure 8.3: a–b: Mean Cartesian inverse kinematics error (in mm) of the pre-
sented PSOM types versus number of training nodes per axes (using a test set
of 500 randomly chosen positions; (a) linear and (b) log plot). Note, the result
of Fig. 8.2c–e corresponds to the smallest training set n = 3. The maximum
workspace length is 160 mm.
to the PSOM network. Table 8.1 shows the result of two of the best MLP-
networks compared to the PSOM.
Table 8.1: Normalized root mean square error (NRMS) of the inverse kinematic
mapping task ~r 7!
~ computed as the resulting Cartesian deviation from the goal
position. For a training set of n n n points, obtained by the two best performing
standard MLP networks (out of 12 different architectures, with various (linear
decreasing) step size parameter schedules
=
i : : :
f ) 100000 steepest gradient
descent steps were performed for the MLP and one pass through the data set for
PSOM network.
Why does the PSOM perform more that an order of magnitude better
than the back-propagation algorithm? Fig. 8.4 shows the 27 training data
pairs in the Cartesian input space ~r. One can recognize some zig-zig clus-
ters, but not much more. If neighboring nodes are connected by lines, it
is easy to recognize the coarse “banana” shaped structure which was suc-
cessfully generalized to the desired workspace grid (Fig. 8.2). The PSOM
112 Application Examples in the Robotics Domain
160
150
140
130
120
110
100
90
4030
2010
x 0
-10
-20
-30 0 10 20 30
θ
-40 -40 -30 -20 -10 y
r
Figure 8.4: The 27 training data vectors for the Back-propagation networks: (left)
in the input space ~r and (right) the corresponding target output values ~
.
z
wa
160
150
140
a
130
120
110
100
90
4030
2010
0
s2
x -10
-20
-30 0 10 20 30
-40 -40 -30 -20 -10 y A∈S
r θ
s1
Figure 8.5: The same 27 training data vectors (cmp. Fig. 8.4) for the bi-directional
PSOM mapping: (left) in the Cartesian space ~r, (middle) the corresponding joint
angle space ~. (Right:) The corresponding node locations a2A
in the param-
eter manifold S . Neighboring nodes are connected by lines, which reveals now
the “banana” structure on the left.
Here, 1 : : : denote the joint angles, ~r is the Cartesian position of the end
effector of length lz in world coordinates. ~a and ~n denote the normalized
approach vector and the vector normal to the hand plane. The last nine
components vectors are part of the homogeneous coordinate transforma-
tion matrix 2 3
n x o x a
66 n o a r 77 x r x
= 66 ny oy ay ry 77 :
T (8.1)
4 z z z z5
0 0 0 1
(The missing second matrix column ~o is the cross product of the normal-
ized orientation vectors ~a and ~n and therefore bears no further informa-
tion, see Fig. 8.6 and e.g. (Fu et al. 1987; Paul 1981).)
In this space, we must construct the m = 6 dimensional embedding
manifold M that represents the configuration manifold of the robot. With
three nodes per axis direction we require 36 = 726 reference vectors a 2 w
X . The distribution of these vectors might have been found with a SOM,
however, for the present demonstration we generated the values for the
w a by augmenting 729 joint angle vectors on a rectangular 333333
grid in joint angle space ~ with the missing rx ry rz ax ay az nx ny nz –
114 Application Examples in the Robotics Domain
With
wkmax = max wka and wkmin = min wka (8.4)
8aA 2 8aA 2
Tab. 8.2 summarizes the resulting mean deviation of the desired Carte-
sian positions and orientations. While the tool length lz has only marginal
influence on the performance, the Chebyshev-spaced PSOM exhibits a sig-
nifcant advantage. As argued in Sect. 6.4, Chebyshev polynomials have ar-
guably better approximation capabilities. However, in the case n = 3 both
sampling schemes have equidistant node-spacing, but the Chebyshev-spacing
approach contracts the marginal sampling points inside the working inter-
val. Since the vicinity of each reference vector is principally approximated
with high-accuracy, this advantage is better exploited if the reference train-
ing vector is located within the given workspace, instead of located at the
border.
Table 8.3: 3 DOF inverse Puma robot kinematics accuracy using several
PSOM architectures including the equidistantly (“PSOM”), Chebyshev
spaced (“C-PSOM”), and the local PSOM (“L-PSOM”).
To set the present approach into perspective with these results, we in-
vestigate the same Puma robot problem, but with the three wrist joints
fixed. Then, we may reduce the embedding space X to the essential vari-
ables (1 2 3 px py pz ). Again using only three nodes per axis we require
w
only 27 reference vectors a to specify the PSOM. Using the same joint
ranges as in the previous case we obtain the results of Tab. 8.3 for several
PSOM network architectures and training set sizes.
118 Application Examples in the Robotics Domain
160
Mean Cartesian Deviation [mm]
140 Mean Joint Angle Deviation [deg]
120
100
80
60
40
20
0
0 100 200 300 400 500 600 700 800
Number of Training Examples
Figure 8.8: The positioning capabilities of the 3 3 3 PSOM network over the
course of learning. The graph shows the mean Cartesian ~r and angular hj ji
hj ji
~ deviation versus the number of already experienced learning examples.
After 400 training steps the last arm segment was suddenly elongated by 150 mm
( 10 % of the linear work-space dimensions.)
totically.
A very important advantage of self-learning algorithms is their abil-
ity to adapt to different and also changing environments. To demonstrate
the adaptability of the network, we interrupted the learning procedure
after 400 training steps and extended the last arm segment by 150 mm
(lz = 350 mm). The right side of Fig. 8.8 displays how the algorithm re-
0
sponded. After this drastic change of the robot's geometry only about 100
further iterations where necessary to re-adapt the network for regaining
the robot's previous positioning accuracy.
(in contrast to the cases described in Sec. 5.6.) Certainly, the standard best-
match search algorithm will find one possible solution — but it can be any
s
compatible solution and it will depend on the initial start condition t=0 .
Here, the PSOM offers a versatile way of formulating extra goals or
constraints, which can be turned on and off, depending on the situation
and the user's desire. For example, of particular interest are:
Minimal joint movement: “fast” or “lazy” robot. One practical goal can
be: reaching the next target position with minimum motor action.
This translates into finding the shortest path from the current joint
configuration ~curr to a new ~ compatible with the desired Cartesian
position ~r.
Since the PSOM is constructed on a hyper-lattice in , finding the
s
shortest route in S is equivalent to finding the shortest path in .
Thus, all we need to do is to start the best-match search at the best-
s
match position curr belonging to the current position, and the steep-
model this goal as a “discomfort” term in the cost function E (~) and demon-
strate how to incorporate extra cost terms in the standard PSOM mecha-
nism.
cj
j
mid 2
Figure 8.9: “Discomfort” cost function
cj (j ) = 2
jmax j
jmin for each joint
;
Figure 8.10: Series of intermediate steps for optimizing the remaining joint angle
mobility in the same position.
122 Application Examples in the Robotics Domain
attracts the solution to the particular single configuration with all joints in
mid range position. Any further kinematics specification is usually con-
flicting, and the result therefore a compromise (the least-square optimum;
jI j > 6). How to solve this conflict?
To avoid this mis-attraction effect, the auxiliary constraint terms pk = pk (t)
2. should decay during the gradient descent iteration. The final step
should be done with all extra terms cj weighted with factors pk zero
(here p16:::21 = 0). This assures that the final solution will be – without
compromise – within the solution space, spanned by the primary
goal, here the end-effector position.
a) b) c)
d)
8.5 Summary
The PSOM learning algorithm shows very good generalization capability
for smooth continuous mapping tasks. This property becomes highlighted
at the robot finger inverse kinematics problem with 3 inherent degrees-of-
freedom (see also 6 D kinematics). Since in many robotics learning tasks
the data set can be actively sampled, the PSOM's ability to construct the
high-dimensional manifold from a small number of training data turns out
to be here a many-sided beneficial mechanism for rapid learning.
124 Application Examples in the Robotics Domain
Context Dependent
Mixture-of-Expertise:
Investment Learning
If one wants to learn with extremely few examples, one inevitably faces a
dilemma: on the one hand, with few examples one can only determine a
rather small number of adaptable parameters and, as a consequence, the
learning system must be either very simple, or, and usually this is the rel-
evant alternative, it must have a structure that is already well-matched to
the task to be learned. On the other hand, however, having to painstak-
ingly pre-structure a system by hand is precisely what one wants to avoid
when using a learning approach.
It is possible to find a workable compromise that can cope with this
dilemma, i.e., that somehow allows the structuring of a system without
having to put in too much by hand?
Context c
ω parameters
or weights
X2
X1 T-Box
Figure 9.1: The T-B OX maps between different task variable sets within a certain
context (~c), describable by a set of parameters ! .
the context changes only from time to time, or on a much larger time
scale, than the time scale on which the task mapping T-B OX is em-
ployed.
the T-B OX ;
the M ETA -B OX , which has the responsibility for providing the map-
ping between the context information ~c to the weight or parameter
set ! .
The first, the investment learning stage may be slow and has the task
to pre-structure the system for
Prototypical
(2)
ω
Context
Meta-Box
parameters
c (2) or weights
X2
X1 T-Box
(1) (1)
weights / parameters !j determined (see Fig. 9.2, arrows (1)). It serves to-
gether with the context information ~c as a high-dimensional training data
vector for the M ETA -B OX (2). During the investment learning phase the
M ETA -B OX mapping is constructed, which can be viewed as the stage for
the collection of expertise in the suitably chosen prototypical contexts.
New
(3)
ω
Context
Meta-Box
parameters
c (3) or weights
X2
X1 T-Box
(4) (4)
After the M ETA -B OX has been trained, the task of adapting the “skill” to
a new system context is tremendously accelerated. Instead of any time-
consuming re-learning of the mapping T this adjustment now takes the
form of an immediate M ETA -B OX ! T-B OX mapping or “one-shot adapta-
tion”. As illustrated in Fig. 9.3, the M ETA -B OX maps a new (unknown)
context observation ~cnew (3) into the parameter = weight set !new for the
T-B OX . Equipped with !new , the T-B OX provides the desired mapping
Tnew (4).
Input
Context Gating
Network
Σ
T-Box
Task Variables
Expert 1
T-Box Output
Expert 2
T-Box
Expert 3
T-Box
Expert N ‘‘Mixture-of-Exper ts’’
‘‘Mixture-of-Exper tise’’
Input
Context Meta
Network Parameters
ω
Task Variables T-Box
Expert
Output
The lower part of Fig. 9.4 redraws the proposed hierarchical network
scheme and suggests to name it “mixture-of-expertise”. In contrast to the
specialized “experts” in Jordan's picture, here, one single “expert” gathers
specialized “expertise” in a number of prototypical context situations (see
investment learning phase, Sec. 9.2.1). The M ETA -B OX is responsible for the
non-linear “mixture” of this “expertise”.
With respect to networks' requirements for memory and computation,
the “mixture-of-expertise” architecture compares favorably: the “exper-
tise” (! ) is gained and implemented in a single “expert” network (T-B OX ).
Furthermore, the M ETA -B OX needs to be re-engaged only when the con-
text is changed, which is indicated by a deviating sensor observation ~c.
However, this scheme requires from the learning implementation of
the T-B OX that the parameter (or weight) set ! is represented as a con-
tinuous function of the context variables ~c. Furthermore, different “de-
generate” solutions must be avoided: e.g. a regular multilayer perceptron
allows many weight permutations ! to achieve the same mapping. Em-
ploying a MLP in the T-B OX would result in grossly inadequate interpo-
lation between prototypical “expertises” !j , denoted in different kinds of
permutations. Here, a suitable stabilizer would be additionally required.
Please note, that the new “mixture-of-expertise” scheme does not only
identify the context and retrieve a suitable parameter set (association).
Rather it achieves a high-dimensional generalization of the learned (in-
vested) situations to new, previously unknown contexts.
A “mixture-of-expertise” aggregate can serve as an expert module in
a hierarchical structure with more than two levels. Moreover, the two ar-
chitectures can be certainly combined. This is particularly advantageous
when very complex mappings are smooth in certain domains, but non-
continuous in others. Then, different types of learning experts, like PSOMs,
Meta-PSOMs, LLMs, RBF and others can be chosen. The domain weight-
ing can be controlled by a competitive scheme, e.g. RBF, LVQ, SOM, or a
“Neural-Gas” network (see Chap. 3).
9.3 Examples
The concept imposes a strong need for efficient learning algorithms: to
keep the number of required training examples manageable, those should
9.3 Examples 131
Roll-Pitch Matrix
X1
Yaw-Shift
X2 X1 Multiplier X2 X1 T-PSOM X2
(i) (ii) (iii)
Figure 9.5: Three different ways to solve the context dependent, or investment
learning task.
The first solution (i) uses the Meta-PSOM for the reconstruction of ob-
ject pose in roll-pitch-yaw-depth values from Sec. 7.2. The T-B OX is given
by the four successive homogeneous transformations (e.g. Fu et al. 1987)
on the basis of the z values obtained from the Meta-PSOM.
132 “Mixture-of-Expertise” or “Investment Learning”
Comparing the RMS results in Tab. 9.1 shows, that the PSOM approach
(iii) can fully compete with the dedicated hand-crafted, one-way mapping
solutions (i) and (ii).
Uref
Meta-PSOM ω
weights
U X
T-PSOM
ξref
~x(~u) = FTu
;
7! x
PSOM (~u ~!(~uref )) (9.1)
9.3 Examples 135
Table 9.2 shows the experimental results averaged over 100 random lo-
cations (from within the range of the training set) seen from 10 different
camera locations, from within the 3 3 roughly radial grid of the training
positions, located at a normal distance of about 65–165 cm (to work space
center, about 80 cm above table, total range of about 95–195 cm), covering
a 50 sector. For identification of the positions in image coordinates, a
tiny light source was installed at the manipulator tip and a simple proce-
dure automatized the finding of ~u with about 1 pixel accuracy. For the
achieved precision it is important that all learned Tj share the same set
of robot positions i , and that the training sets (for the T-PSOM and the
Meta-PSOM) are topologically ordered, here as two 3 3 grids. It is not
important to have an alignment of this set to any exact rectangular grid
in e.g. world coordinates, as demonstrated with the radial grid of camera
training positions (see Fig. 9.6 and also Fig. 5.5).
Table 9.2: Mean Euclidean deviation (mm or pixel) and normalized root mean
square error (NRMS) for 1000 points total in comparison of a directly trained T-
PSOM and the described hierarchical PSOM-network, in the rapid learning mode
with one observation.
These data demonstrate that the hierarchical learning scheme does not
fully achieve the accuracy of a straightforward re-training of the T-PSOM
after each camera relocation. This is not surprising, since in the hierar-
chical scheme there is necessarily some loss of accuracy as a result of the
interpolation in the weight space of the T-PSOM. As further data becomes
available, the T-PSOM can certainly be fine-tuned to improve the perfor-
mance to the level of the directly trained T-PSOM. However, the possibil-
ity to achieve the already very good accuracy of the hierarchical approach
with the first single observation per camera relocation is extremely attrac-
tive and may often by far outweigh the still moderate initial decrease that
136 “Mixture-of-Expertise” or “Investment Learning”
θ
2
54
Uref
R Meta-PSOM ωR
R
Figure 9.7: Rapid learning of the 3D visuo-motor coordination for two cameras.
The basis T-PSOM (m = 3) is capable of mapping to and from three coordinate
systems: Cartesian robot world coordinates, the robot joint angles (6-DOF), and
the location of the end-effector in coordinates of the two camera retinas. Since the
left and right camera can be relocated independently, the weight set of T-PSOM
is split, and parts !L !R are learned in two separate Meta-PSOMs (“L” and “R”).
Table 9.3: Mean Euclidean deviation (mm or pixel) and normalized root mean
square error (NRMS) for 1000 points total in comparison of a directly trained T-
PSOM and the described hierarchical Meta-PSOM network, in the rapid learning
mode after one single observation.
Table 9.3 shows experimental results averaged over 100 random lo-
cations (from within the range of the training set) seen in 10 different
138 “Mixture-of-Expertise” or “Investment Learning”
camera setups, from within the 3 3 square grid of the training positions,
located in a normal distance of about 125 cm (center to work space center,
1 m2 ), covering a disparity angle range of 25 –150 .
Summary
we should enlarge our view towards mappings which produce other mappings
as their result. Similarly, this embracing consideration received increasing
attention in the realm of functional programming languages.
To implement this approach, we used a hierarchical architecture of
mappings, called the “mixture-of-expertise” architecture. While in principle
various kinds of network types could be used for these mappings, a practi-
cally feasible solution must be based on a network type that allows to con-
struct the required basis mappings from a rather small number of training
examples. In addition, since we use interpolation in weight/parameter
space, similar mappings should give rise to similar weight sets to make
interpolation of expertise meaningful.
We illustrated three versions of this approach when the output map-
ping was a coordinate transformation between the reference frame of the
camera and the object centered frame. They differed in the choice of the
utilized T-B OX . The results showed that on the T-B OX level the learning
PSOM network can fully compete with the dedicated engineering solu-
tion, additionally offering multi-way mapping capabilities. At the M ETA -B OX
level the PSOM approach is a particularly suitable solution because, first,
it requires only a small number of prototypical training situations, and
second, the context characterization task can profit from the sensor fusion
capabilities of the same PSOM, also called Meta-PSOM.
We also demonstrated the potential of this approach with the task of 2D
and 3D visuo-motor mappings, learnable with a single observation after
changing the underlying sensorimotor transformation, here e.g. by repo-
sitioning the camera, or the pair of individual cameras. After learning by
a single observation, the achieved accuracy compares rather well with the
direct learning procedure. As more data becomes available, the T-PSOM
can be fine-tuned to improve the performance to the level of the directly
trained T-PSOM.
The presented arrangement of a basis T-PSOM and two Meta-PSOMs
further demonstrates the possibility to split the hierarchical “mixture-of-
expertise” architecture into modules for independently changing parame-
ter sets. When the number of involved free context parameters is growing,
this factorization is increasingly crucial to keep the number of pre-trained
prototype mappings manageable.
The two hierarchical architectures, the “mixture-of-expert” and the in-
troduced “mixture-of-expertise” scheme, complement each other. While
145
the PSOM as well as the T-B OX /M ETA -B OX approach are very efficient
learning modules for the continuous and smooth mapping domain, the
“mixture-of-expert” scheme is superior in managing mapping domains
which require non-continuous or non-smooth interfaces. As pointed out,
the T-B OX -concept is not restricted to a particular network type, and the
“mixture-of-expertise” can be considered as a learning module by itself.
As a result, the conceptual combination of the presented building blocks
opens many interesting possibilities and applications.
146 Summary
Bibliography
Some of the author's publications, including this book, are available on-
line via: http://www.techfak.uni-bielefeld.de/ walter/