0% found this document useful (0 votes)
6 views

Movement Prediction From Real-World Images Using A Liquid State Machine

This document discusses using a liquid state machine and recurrent spiking neural network to predict ball trajectories from real-world video input for a robotics application. The approach is evaluated on predicting ball positions for a robot soccer game. The results support using liquid state machines for generic time series prediction with real-world data.

Uploaded by

Phuong an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Movement Prediction From Real-World Images Using A Liquid State Machine

This document discusses using a liquid state machine and recurrent spiking neural network to predict ball trajectories from real-world video input for a robotics application. The approach is evaluated on predicting ball positions for a robot soccer game. The results support using liquid state machines for generic time series prediction with real-world data.

Uploaded by

Phuong an
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Appl Intell (2007) 26:99–109

DOI 10.1007/s10489-006-0007-1

Movement prediction from real-world images using a liquid state


machine
Harald Burgsteiner · Mark Kröll · Alexander Leopold ·
Gerald Steinbauer

Published online: 13 November 2006


C Springer Science + Business Media, LLC 2007

Abstract The prediction of time series is an important task kind of computation appears biologically more plausible than
in finance, economy, object tracking, state estimation and conventional methods for prediction. Our results support the
robotics. Prediction is in general either based on a well- idea that learning with a liquid state machine is a generic
known mathematical description of the system behind the powerful tool for prediction.
time series or learned from previously collected time series.
In this work we introduce a novel approach to learn pre- Keywords Liquid state machine . Time series prediction .
dictions of real world time series like object trajectories in Recurrent spiking neural networks . Robotics
robotics. In a sequence of experiments we evaluate whether a
liquid state machine in combination with a supervised learn- 1 Introduction
ing algorithm can be used to predict ball trajectories with
input data coming from a video camera mounted on a robot The prediction of time series is an important issue in many
participating in the RoboCup. The pre-processed video data different domains, such as finance, economy, object track-
is fed into a recurrent spiking neural network. Connections ing, state estimation and robotics. The aim of such predictions
to some output neurons are trained by linear regression to could be to estimate the stock exchange price for the next day
predict the position of a ball in various time steps ahead. The or the position of an object in the next camera frame based on
main advantages of this approach are that due to the nonlin- current and past observations. In [7] decision-theoretic plan-
ear projection of the input data to a high-dimensional space ning in combination with the agent-language GOLOG was
simple learning algorithms can be used, that the liquid state used to control robots in a dynamic environment. Decision-
machine provides temporal memory capabilities and that this theoretic planning uses predictions of the outcome of selected
actions to derive the optimal plan. For the task of fault di-
H. Burgsteiner
agnosis in robot systems in general, models of the nominal
InfoMed/Health Care Engineering, Graz University of Applied
Sciences, Eggenberger Allee 11, A-8020 Graz, Austria behavior of the system over time are required. The outcome
e-mail: [email protected] of these models are compared to the current state of the sys-
tem in order to detect and identify faults in the system. In
M. Kröll
[26] particle filter techniques were used for this purpose.
Division of Knowledge Discovery, Know-Center, Inffeldgasse
21a, A-8010 Graz, Austria In the domain of robot control such predictions are used to
e-mail: [email protected] stabilize a controller, to do planning with uncertainty and
to automatically derive diagnosis about the robot’s hard- and
A. Leopold
software. Jordan and Wolpert [13] provides a survey of differ-
Institute for Theoretical Computer Science, Graz University of
Technology, Inffeldgasse 16b/I, A-8010 Graz, Austria ent approaches in motor control where prediction enhances
e-mail: [email protected] the stability of a controller.
There are two popular approaches for this kind of predic-
G. Steinbauer ()
tions: (1) modeling the behavior of the system or (2) learning
Institute for Software Technology, Graz University of Technology,
Inffeldgasse 16b/II, A-8010 Graz, Austria prediction based on collected data. The first one claims a
e-mail: [email protected] basic understanding of the system behind. It is preferred if

Springer
100 Appl Intell (2007) 26:99–109

the internal structure of the system is well known and its allows a system to incorporate a much richer range of dy-
behavior could be described sufficiently precise by a set of namic behaviors. Many approaches have been elaborated on
equations. In general this method is applicable to electronic recurrent ANNs. Some of them are: dynamic recurrent neu-
circuits, technical processes or mechanical systems. A well ral networks [22], radial basis function networks (when one
known example for this approach is the prediction step in views lateral connections also as a recurrency) [2], Elman
state estimation with Kalman-Filters [20]. It uses the current networks [5], self-organizing maps [14], Hopfield nets [11]
state and a linear system model to predict the state for the next and the “echo state” approach from [12].
time step. This prediction is optimal for linear systems. For In case of autonomous agents it is rather difficult to employ
non-linear systems the Extended Kalman-Filter (EKF) uses strictly supervised learning algorithms for recurrent ANNs
a linearization of the system. Hence, the EKF is not an opti- such as backpropagation, Boltzmann machines or Learning
mal predictor anymore. The second approach is to learn the Vector Quantization (LVQ), because the correct output is
prediction from previously collected data. The advantages not always available or computable. It is also very difficult
are that knowledge of the internal structure is not necessarily to set the weights of a recurrent ANN directly for a given
needed, arbitrary non-linear prediction can be learned and ad- non-trivial task. Hence, other learning techniques have to be
ditionally some past observations could be integrated in the developed for ANNs, that could simplify the learning process
prediction. In this work we focus on the second approach. of complex tasks for autonomous robots.
Artificial Neural Networks (ANN) are common methods Recently, networks with models of biologically more re-
used for this type of computations. The common view of a alistic neurons, e.g. spiking neurons, in combination with
neural network is that of a set of neurons plus a set of weighted simple learning algorithms have been proposed as general
connections (synapses in the biological context) between the powerful tools for the computation on time series [18]. In
neurons. Each neuron comes with a transfer function comput- Maass et al. [17] this new computation paradigm, a so called
ing an output from its set of inputs. In multi-layer networks Liquid State Machine (LSM)—which will be introduced in
these outputs can again be used as an input to the next layer the next section—was used to predict the motion of objects
of neurons, weighted by the relevant synaptic “strength”. in visual inputs. The visual input was presented to a 8 × 8
Feed-forward networks only have connections starting from sensor array and the prediction of the activation of these sen-
external input nodes, possibly via one or more intermedi- sors representing the position of objects for succeeding time
ate hidden node processing layers, to output nodes. Recur- steps was learned. This approach appears promising, as the
rent networks may have connections feeding back to earlier computation of such prediction tasks is assumed to be similar
layers or may have lateral connections (i.e. to neighboring in the human brain [1]. The weakness of the experiments in
neurons on the same layer). See Fig. 1 for a comparison of [17] is that they were only conducted on artificially generated
the direction of computation between a feed-forward and a data. The question is how the approach performs with real-
recurrent neural network. With this recurrency, activity can world data. Real data, e.g. the detected motion of an object
be retained by the network over time. This provides a sort of in a video stream from a camera mounted on a moving robot,
memory within the network, enabling it to compute functions are noisy and afflicted with outliers.
that are more complex than just simple reactive input-output In this paper we present how this approach can be extended
mappings. This is a very important feature for networks that to a real world task. We applied the proposed approach to the
will be used for computation of time series, because current RoboCup robotic-soccer domain. The task was movement
output is not solely a function of the current sensory input, prediction for a ball in the video stream of the robot’s cam-
but a function of the current and previous sensory inputs and era. Such a prediction is important for reliable tracking of the
also of the current and previous internal network states. This ball and for decision making during a game. The remainder of
this paper is organized as follows. The next section provides
k input
neurons an overview of the LSM. Section 3 describes the prediction
... ... ... ... ... ... approach for real data, Sections 4 and 5 cover the retrieval
of this real data and the training of the prediction. Experi-
mental results will be reported in Section 6 and discussed in
n hidden
neurons Section 7. Finally, in Section 8 we draw some conclusions.

2 The liquid state machine


.. ... .. ... .. .. .. .. .. ..
m output
neurons 2.1 The framework of a liquid state machine
Fig. 1 Comparison of the architecture of a feed-forward (left hand side)
with a recurrent neural network (right hand side); the gray arrows sketch The LSM from [18] is the theoretical framework for com-
the direction of computation putations in neural microcircuits. An exemplary structure is

Springer
Appl Intell (2007) 26:99–109 101

input

f 1 (t): sum of rates of inputs 1&2 in the interval [t-30 ms, t]


0.4
0.2

f 2 (t): sum of rates of inputs 3&4 in the interval [t-30 ms, t]


0.6
0
f 3 (t): sum of rates of inputs 1-4 in the interval [t-60 ms, t-30 ms]
0.8
0
f 4 (t): sum of rates of inputs 1-4 in the interval [t-150 ms, t]
0.4
0.2
f 5 (t): spike coincidences of inputs 1&3 in the interval [t-20 ms, t]
3
0
f 6 (t): nonlinear combination f6 (t) = f1 (t) · f2 (t)
0.15
0
f 7 (t): nonlinear combination f7 (t) = 2f1 (t) − 4f12 (t) + 3
2 (f2 (t) − 0·3)2
0.3
0.1
0 0.2 0.4 0.6 0.8 1
time [sec]
Fig. 2 Multi-tasking with any-time computing. A single neural micro- to the LSM, 7 different functions were computed by readout neurons
circuit can be used by different readout-neurons to compute various (figure from [21])
function in parallel. In this case, based on a Poisson spike train as input

shown in Fig. 3. The term “liquid state” refers to the idea algorithms which e.g. only have to minimize the mean square
to view the result of a computation of a neural microcircuit error in relation to a desired output can be applied. In fact, [6]
not as a stable state like an attractor that is reached. Instead, have already used this principle to demonstrate that a simple
a neural microcircuit is used as an online computation tool bucket of water can also be used as the “liquid”. They used
that receives a continuous input that drives the state of the the waves that are produced on the surface of a bucket full
neural microcircuit. The result of a computation is again a of water as the medium for a liquid state machine. The input
continuous output generated by readout neurons given the was fed into the liquid with motors that perturbed the surface.
current state of the neural microcircuit. A camera took pictures of the waves that originated that way.
Recurrent neural networks with spiking neurons represent These digital images were the input to a simple perceptron
a non-linear dynamical system with a high dimensional in- that could solve the XOR-problem.
ternal state, which is driven by the input. The internal state In [18] the LSM is also proven to have universal computa-
vector x(t) is given as the contributions of all neurons within tional power on inputs of varying time series under idealized
the neural microcircuit to the membrane potential of a read- conditions. Furthermore, the LSM has several interesting fea-
out neuron at the time t. The complete internal state is de- tures in comparison to other approaches with recurrent cir-
termined by the current input and all past inputs that the cuits of spiking neural networks:
network has seen so far. Hence, a history of (recent) inputs is
preserved in such a network and can be used for computation 1. The LSM provides “any-time” computing, i.e. one does
of the current output. For a detailed analysis of the properties not have to wait for a computation to finish before the
of the LSM see [18]. The basic idea behind solving tasks result is available. Results start emitting from the readout
with a LSM is that one does not try to set the weights of neurons as soon as input is fed into the liquid. Furthermore,
the connections within the neural microcircuit but instead different computations can overlap in time. That is, new
only sets the weights of the readout neurons. This reduces input can be fed into the liquid and perturb it while the
learning dramatically and much simpler supervised learning readout still gives answers to past input streams.

Springer
102 Appl Intell (2007) 26:99–109

Fig. 3 Architecture of our experimental setup depicting the three dif- side and some random connection within the liquid. The output of ev-
ferent pools of neurons as well as sample input and output patterns with ery liquid neuron is projected onto every output neuron (located on the
the data path overview. Example connections of a single liquid neuron rightmost side). The 8 × 6 × 3 neurons in the middle form the “liquid”
are shown: input is received from the input sensor field on the left hand

2. A single neural microcircuit can not only be used to com- can be chosen freely. One also specifies a factor to determine
pute a special output function via the readout neurons. how many of the N neurons should be inhibitory. Another
Because the LSM only serves as a pool for dynamic re- important parameter in the definition of a neural microcircuit
current computation, one can use many different readout is the parameter λ. Number and range of the connections be-
neurons to extract information for several tasks in parallel. tween the N neurons within the LSM are determined by this
So a sort of “multi-tasking” can be incorporated. Figure 2 parameter λ. The probability of a connection between two
illustrates this and the previous property. neurons i and j is given by
3. In most cases simple learning algorithms can be used to D(i, j)
set the weights of the readout neurons. The idea is similar p(i, j) = C · exp− λ2

to kernel methods like support vector machines or kernel


PCA, where one uses a kernel to project input data into where D(i, j) is the Euclidean distance between those two neu-
a high-dimensional space. In this very high-dimensional rons and C is a parameter depending on the type (excitatory or
space simpler classifiers can be used to separate the data inhibitory) of each of the two connecting neurons. There exist
than in the original input data space. The LSM has a similar 4 possible values for C for each connection within a neural
effect as a kernel: due to the recurrency the input data is microcircuit: CEE , CEI ,C I E and CII may be used depending
also projected into a high-dimensional space. Hence, in on whether the neurons i and j are excitatory (E) or inhibitory
almost any case experienced so far simple learning rules (I). In our experiments we used spiking neurons according
like e.g. linear regression suffice. to the standard leaky-integrate-and-fire (LIF) neuron model
4. The LSM is not only a computational powerful model, but that are connected via dynamic synapses. The time course
it is also one of the biological most plausible so far. Thus, it for a postsynaptic current is approximated by the equation
− t
provides a hypothesis for computation in biological neural v(t) = w · e τsyn where w is a synaptic weight and τsyn is
systems. the synaptic time constant. In case of dynamic synapses the
“weight” w depends on the history of the spikes it has seen so
2.2 Neural microcircuit far according to the model from [19]. For synapses transmit-
ting analog values (such as the output neurons in our experi-
The model of a neural microcircuit as it is used for simula- mental setup) synapses are simply modeled as static synapses
tions in the LSM is based on biological evidence found in with a strength defined by a constant weight w. Additionally,
[9] and [25]. Still, it gives only a rough approximation to a synapses for analog values can have delay lines, modeling
real neural microcircuit since many parameters are still un- the time an action potential would need to propagate along
known. The neural microcircuit is the biggest computational an axon.
element within the LSM, although multiple neural micro-
circuits could be placed within a single virtual model. In a 2.3 Training procedure
model of a neural microcircuit N = n x · n y · n z neurons are
placed on a regular grid in 3D space. The number of neu- As it was said before, training a LSM requires only setting the
rons along the x, y and z axis, n x , n y and n z respectively, weights of one or more output neurons. This output neuron

Springer
Appl Intell (2007) 26:99–109 103

can in fact be any computational model that can map the Table 1 Parameters for the static analog synapses which are used to
state vector x(t) to the desired binary or analog value. In our feed input data into the LSM. ‘EE’ or ‘EI’ denotes whether the source
and target neurons of a connection release excitatory or inhibitory action
experiments perceptrons with a linear activation function in potentials, respectively. Covariance for delaymean is 0.1
combination with linear regression as the supervised learning
algorithm sufficed. A typical training procedure for a LSM wmean delaymean [ms]
includes the following steps: Inoise [n A] EE EI EE EI
−8 −8
1. Define a LSM with an appropriate size for the given task 0 3 · 10 6 · 10 1.5 0.8
2. Initialize the internal weights of the LSM to reasonable
random values out of a given distribution The liquid pool consists of Leaky Integrate And Fire Neu-
3. Feed the LSM with the training input u(t) and record the rons—whose parameters are listed in Table 2—grouped in
internal state of the LSM x(t) over time an 8 · 6 · 3 cuboid, that are randomly connected via Dynamic
4. Use any supervised training algorithm to compute the val- Spiking Synapses (parameters are listed in Table 3), as de-
ues of the weights of each readout neuron based on a given scribed above. The probability of a connection between every
target vector y(t). two neurons is modeled by the probability distribution de-
5. Set the weights of the readout neuron(s) in the model to pending on a parameter λ described in the previous section.
these values Various combinations of λ (average connection distance) and
6. Run the LSM with unseen test data and compare the per-  (mean connection weight) were used for simulation. 20%
formance of the liquid neurons were randomly chosen to produce in-
In experiments like in [3] this procedure can be extended by hibitory potentials. Figure 3 shows an example for connec-
switching the LSM software into real-time mode and per- tions within the LSM.
forming tasks with robots controlled by the trained LSM. The information provided by the spiking neurons in the
liquid pool is processed (read out) by External Output Neu-
rons (Vinit , Vresting , Inoise have the same values as for the liquid
3 Experimental setup neurons), each of them connected to all neurons in the liq-
uid pool via Static Spiking Synapses (parameters are listed in
In this section we introduce the general setup that was used Table 4). The output neurons perform a simple linear com-
during our experiments to solve prediction tasks with real- bination of inputs that are provided by the liquid pool.
world data from a robot. As depicted in Fig. 3, such a network We evaluate the prediction approach by carrying out
consists of three different neuron pools: (a) an input layer that several experiments with real-world data in the Robo-Cup
is used to feed sensor data from the robot into the network, Middle-Size robotic soccer scenario. The experiments were
(b) a pool of neurons forming the liquid and (c) the output conducted using a robot of the “Mostly Harmless” Robo-
layer consisting of readout neurons which perform a linear Cup Middle-Size team [8]. The task within the experiments
combination of the membrane potentials obtained from the is to predict the position of a ball in the field of view several
liquid neurons. These three parts together form the LSM. frames into the future.
For simulation within the training and evaluation the neu-
ral circuit simulator CSim1 was used. Parameterization of 4 Generating input data
the LSM is described below. Names for neuron and synapse
types all originate from terms used in the CSim environment. The input data was recorded by using a prototype of the
Letters I and E denote values for inhibitory and excitatory middle-size-league RoboCup robot in use (and developed)
neurons respectively. by the RoboCup team at the Graz University of Technology.
To feed activation sequences into the liquid pool, we use The experimental setup can be described as follows: the robot
External Input Neurons that conduct an injection current was located on the field and pointed its camera across the
Iinject via Static Analog Synapses (parameters are shown in field. This robot tracked the movements of a soccer ball and
Table 1) into the first layer of the liquid pool. Inspired from featured a directional firewire camera driven by the XVision
information processing in living organisms, we set up a cog- machine vision software [10], frequently delivering steady
nitive mapping from input layer to liquid pool. The value of state images of 320 × 240 true color format. Time delays
Iinject depends on the value of the input data, in this case the between transmission of two images varied from 70 ms to
activation of each single visual sensor. 200 ms. Similar to [16], the input to the LSM was provided
by 48 sensors arranged in a 2D array (8 × 6), so the recorded
1 images had to be preprocessed.
The software simulator CSim and the appropriate documenta-
tion for the LSM can be found on the web page http://www. For each image, the x and y coordinates as well as the
lsm.tugraz.at/. radius of the ball were extracted using an existing tracking

Springer
104 Appl Intell (2007) 26:99–109

Table 2 Parameters for the leaky integrate and fire neurons comprising the liquid pool. Letters ‘E’ and ‘I’ indicate whether the neurons emit
excitatory or inhibitory action potentials. U (a, b) denotes an uniform distribution on the interval [a, b]

Trefract [ms]
Cm Rm Vthresh Vresting Vreset Vinit Inoise Iinject
[n F] [M] [mV ] [mV ] [mV ] [mV ] E I [n A] [n A]

30 1 15 0 U (13.8, 14.5) U (13.5, 14.9) 3 2 0 U (13.5, 14.5)

Table 3 Parameters for the dynamic spiking synapses connecting the prediction of 1 time step (50 ms), 2 time steps (100 ms) and
neurons within the liquid pool. ‘EE’, ‘EI’, ‘IE’ and ‘II’ denote whether 4 time steps (200 ms) ahead. The corresponding target ac-
the source and target neurons of a connection emit excitatory or in-
hibitory action potentials. Covariance for delaymean is 0.1 tivation sequences are simply obtained by shifting the input
activation sequences 1, 2 or 4 steps forward in time.
Fmean delaymean τsyn
Con. Umean Dmean [s] [ms] [ms] C
5 Simulation and learning
EE 0.5 1.1 0.05 1.5 3 0.3
EI 0.05 0.125 1.2 0.8 3 0.4
IE 0.25 0.7 0.02 0.8 6 0.2 Simulation for the training set is carried out sequence-by-
II 0.32 0.144 0.06 0.8 6 0.1 sequence: for each collected activation sequence, the neural
circuit is reset, input data are assigned to the input layer,
Table 4 Parameters for the static spiking synapses connecting recorders are set up to record the liquid’s activity, simulation
the read out neurons with each liquid neuron. ‘EE’ and ‘EI’ is started, and the corresponding recorded liquid activity is
denote whether the source and target neurons of a connection stored for the training part. The training is performed by cal-
fire excitatory (E) or inhibitory (I) action potentials. The co-
variance of delaymean is 0.1
culating the weights of all static synapses connecting each
liquid neuron with all output layer neurons using linear re-
τsyn [ms] delaymean [ms] gression.2 Let {m i, j [n]} be the activation sequence for sensor
EE EI EE EI i out of the 8 × 6 sensor pool for one sequence j out of the
training set. Let {xi [n]} = {{m i,1 }, {m i,2 }, . . . , {m i,N }} be the
3 6 1.5 0.8 concatenation of all N activation sequences of the training
set. With the expected output sequence {yi [n]} = {xi [n + p]}
package. The ball is detected within an image by simple consisting of the input sequence shifted by p prediction time
color-blob-detection leading to a binary image of the ball. steps, wi as the weights of output neuron i and { psp[n]} as
We can use this simple image preprocessing since all objects the sequence of postsynaptic potentials of all liquid neurons
on the RoboCup-field are color-coded and the ball is the only recorded during simulation, the regression writes as
red one. The segmented image is presented to the 8 times
6 sensor field of the LSM. The activation of each sensor is wi = regress({yi [n]}, { psp[n]})
equivalent to the percentage of how much of the sensory area
is covered by the ball. To group contiguous ball movements for one neuron in the output layer. The least squares method
from the moment the ball entered the robot’s field of view up is used for approximation. To get a more robust modeling,
to the point the ball left it (movies), we wrote an add-on for the white noise was added to { psp[n]}.
XVision environment. Tracking information is stored line by Analogous to the simulation with the training set, simu-
line for each image containing coordinates, radius and time lation is then carried out on the validation set of activation
elapsed since start. Given this movie recording equipment, sequences. The resulting output neuron activation sequences
it was possible to record several hundreds of raw data movie are stored for evaluating the network’s performance.
files.
We collected a large set of 674 video sequences of the 6 Results
ball rolling with different velocities and directions across the
field. The video sequences have different lengths and contain We introduce the mean absolute error and the correlation
images in 50 ms time steps. These video sequences are trans- coefficient to evaluate the performance of the network. The
fered into the equivalent sequences of activation patterns of mean absolute error is the positive difference between the
the input sensors. Figure 4 shows such a sequence. The ac- activation values of target and output sequences of the valida-
tivation sequences are randomly divided into a training set
(85%) and a validation set (15%) used to train and evaluate 2
In fact also the injection current Iinject for each output layer neuron is
the prediction. Training and evaluation is conducted for the calculated. For simplification this bias is treated as the 0th weight.

Springer
Appl Intell (2007) 26:99–109 105

Fig. 4 Upper Row: Ball


movement recorded by the
camera. Lower Row: Activation
of the input sensor field

Fig. 5 Sensor activation for a


prediction one timestep ahead.
Input activation, target output
activation, predicted activation
and absolute error between
target activation and predicted
activation (left to right)

tion set divided by the number of neurons in the input/output Table 5 Maximum correlation coefficients by the LSM
layer and the length of the sequence. This average error per according to the desired prediction time
output neuron and per image yields a reasonable measure Prediction time Correlation coefficient
for the performance on validation sets with different length.
50 ms 0.86
Figure 5 shows an example for a prediction and its error.
100 ms 0.74
A problem which arises if only this mean absolute error is 200 ms 0.53
used for evaluation is that also networks with nearly no output
activation produce a low mean absolute error—because most
of the neurons in the target activation pattern are not covered the results of both predictions for different parameter com-
by the ball and therefore they are not activated leading to a binations, we use again a landscape plot of the correlation
low average error per image. The correlation coefficient mea- coefficients. Figure 7 shows the correlation coefficient for
sures the linear dependency of two random variables. If the parameter values range from 0.1 to 5.7 for  and from 0.5
value is zero two variables are not correlated. The correlation to 5.7 for λ. The regions of good results remain the same
coefficient is calculated in a similar way as the mean absolute as in the one timestep prediction though lower correlation
error. Therefore the higher the coefficient the higher the prob- coefficients were achieved (about 0.7 at two timesteps and
ability of getting a correlation as large as the observed value about 0.5 at four timesteps).
without coincidence involved. In our case a relation between Not surprisingly the performance decreases when the task
mean absolute error and correlation coefficient exists. A high gets harder (i.e. the prediction time increases). Nevertheless,
correlation coefficient indicates a low mean absolute error. the results are good enough for reasonable predictions.
In Fig. 6 the mean absolute errors averaged over all single Figure 8 shows an example for the activations and the error
images in the activation sequences in the validation set and for the prediction of two timesteps ahead. It clearly shows
the correlation coefficients for the prediction one timestep that the center of the output activation is in the region of high
(50 ms) ahead are shown for various parameter combina- activation in the input and the prediction is reasonable good.
tions. The parameter values range for both landscapes from Figure 9 shows that the activation is more and more blurred
0.1 to 5.7 for  and from 0.5 to 5.7 for λ. If both  and λ are if the prediction time increases.
high, there is too much activation in the liquid. Remember,
λ controls the probability of a connection and  controls the 7 Discussion
strength of a connection. This high activity hampers the net-
work making a difference between the input and the noise. In this section we discuss some observations that can help to
Both values indicate a good area if at least one of the pa- give answers to questions such as “Why do certain parameter
rameters is low. Best results are achieved if both parameters combinations (, λ) yield better results than others?”, “How
are low (e.g.  = 0.5, λ = 1.0). The figure clearly shows come a recurrent neural network is capable of predicting
the close relation between the mean absolute error and the something?” and “Will the precision of the prediction raise
correlation coefficient. Furthermore, good results for the pre- with the network size?”. Moreover the LSM is exhibiting
diction can be observed for some parameter combinations. similar effects as kernel functions used in kernel methods,
Maximum correlation coefficients are listed in Table 5. like kernel PCA. Thus, we discuss the pros and cons of both
We also compare the results achieved with two (100 ms) approaches. This section is completed by reasoning about
and four (200 ms) time steps predicted. In order to compare real-time applicability of the LSM.

Springer
106 Appl Intell (2007) 26:99–109

Fig. 6 Mean absolute error landscape (upper plot) and the correspond- Fig. 7 Correlation coefficient landscape for a prediction of two
ing correlation coefficient landscape (lower plot) for a prediction of one timesteps (100 ms) ahead on the upper plot and of four timesteps
time step (50 ms) ahead. Mean connection weight (wscale) ∈ [0.1, (200 ms) on the lower plot. Mean connection weight (wscale) ∈ [0.1,
5.7], average connection distance λ ∈ [0.5, 5.7] 5.7], average connection distance λ ∈ [0.5, 5.7]

7.1 Edge of chaos binations yield optimal performance according to the task
(see Table 5). This region of highest correlation coefficients
Certain parameter combinations (e.g. the light shaded region does not change throughout the various prediction tasks. In
in Fig. 6) yield better results than others. In [15] it is shown our understanding the network operates at the edge of chaos
that cortical microcircuits do operate at the edge of chaos, when initialized with the corresponding parameter combina-
a region that is located at the boundary between ordered tions. Similar parameter regions are obtained in Legenstein
and chaotic behavior. It turns out that in the study of neural et al. [15] when performing a spike train classification task
systems this research direction is of special interest, since using a LSM.
dynamic systems exhibit enhanced computational power in
this region. 7.2 Temporal integration
At this critical line, the antagonistic effects of the fad-
ing memory property [23] and the separation property [17] The task of predicting ball movements from simple 2-D im-
reaches an equilibrium state. The ordered phase is typically ages asks for some kind of memory. Hence, the neural mi-
characterized by the fading memory property—small dif- crocircuit needs to feature the proper degree of temporal
ferences in the network state tend to decrease rapidly over integration. According to [21] the temporal memory is in-
time. In chaotic networks these differences are highly ampli- fluenced significantly by the number of neurons comprising
fied and do not vanish. In the landscape plots we presented the liquid and the type of synapses used for connecting these
in Section 6, regions can be spotted whose parameter com- neurons. Thus by increasing the network size a better tem-

Springer
Appl Intell (2007) 26:99–109 107

Fig. 8 Sensor activation for a prediction two timesteps ahead. Input ac- eters: Mean connection weight  = 1.0, average connection distance
tivation, target output activation, predicted activation and absolute error λ = 2.0
between target activation and predicted activation (left to right). Param-

Fig. 9 Target output activation


for some frame and
corresponding predicted
activations when predicting 1, 2
and 4 timesteps ahead (left to
right)

poral integration ability is achieved. Nevertheless we keep at data that is mapped into a high dimensional space in order to
a network size of 144 neurons in the liquid not only because perform PCA in that space.
of the desired real-time applicability but also because of too The same raster input that is fed into the LSM is rearranged
little performance improvement discussed in 7.3. An impor- into vector form to be able to apply a kernel function, the
tant factor is the usage of dynamic synapses as it is reported basic component in all kernel methods. It can be described
in [21] yielding superior performance to static synapses. as a “comparison function” k : X × X → R where X de-
notes the input data set. The value that is returned by k(x, x  )
7.3 Size and topology (where x, x  ∈ X ) is large when x and x  are “similar”. We
make use of a radial basis function kernel to obtain a Gram
As mentioned in Section 2, the network acts similar to kernel matrix (kernel matrix) that is then diagonalized to get an
functions of support vector machines due to its recurrency. eigenvalue decomposition. Eigenvalues and corresponding
The input is projected into a high-dimensional state within eigenvectors are then sorted in descending order beginning
the neural microcircuit. Therefore a network state consists of with the largest. By projecting the transformed input data
an abundance of non-linear combinations. The dimensional- onto the eigenvectors, principal components are obtained in
ity of that particular state depends on the number of neurons an unsupervised manner. A simple linear learning algorithm
in the network and the density of the connections between the (linear regression) is carried out to calculate coefficients that
neurons. Clearly, if the network is too small, the non-linearity map the principal components to the desired target data. In
that is hidden in the input sequence cannot be contained in the the evaluation stage, the same kernel function is applied to
network state and therefore not revealed by linear learning the validation data to obtain again a kernel matrix. The trans-
rules. On the other hand if the network is too large, the main formed validation data are projected onto the eigenvectors
part of non-linear combinations in the network state is redun- that were acquired during the training stage. By means of
dant and a significantly better performance is not achieved. the extracted features and the former learnt coefficients a
We carried out the same experiments we reported in the prediction of the ball position is calculated.
previous sections with network sizes of 36 and 576 liquid The correlation coefficient is calculated to evaluate the
neurons. In the first case the results were unsatisfactory and performance (see Table 6) and to be able to compare the re-
in the latter case the resulting correlation coefficients ex- sults with the LSM approach. The training set comprises 180
ceeded only marginally the ones with 144 neurons. Thus a activation sequences and the validation set 100 activation se-
network size of 144 neurons in the liquid is sufficient for this quences. Both sets were chosen randomly out of the original
prediction task. 674 sequences. The number of used features depends on the
decrease of the eigenvalues - eigenvectors whose eigenval-
7.4 Comparison with a kernel method ues are close to zero are not taken into account (in order to
separate information from noise). 40 features sufficed for the
We also carried out this movement prediction task employ- linear regression. The RBF kernel width was set to 0.95 to
ing a suitable kernel method. We selected Kernel Principal achieve largest correlation coefficients.
Component Analysis (kernel PCA) [24], since it is a power- The kernel method performs worse than the LSM ap-
ful technique for extracting non-linear structure from input proach, mainly because it lacks temporal integration. There

Springer
108 Appl Intell (2007) 26:99–109

Table 6 Comparison of maximum correlation coefficients the RoboCup robotic soccer domain. The advantages of the
achieved by kernel PCA and the LSM LSM are that it projects the input data into a high-dimensional
Correlation coefficients space and therefore simple learning methods, e.g. linear re-
gression, can be used to train the readout. Furthermore, the
Prediction time kernel PCA LSM
liquid, a pool of inter-connected neurons, serves as a memory
50 ms 0.86 0.86 which holds the current and some past inputs up to a certain
100 ms 0.67 0.74 point in time (temporal memory). Finally, this kind of com-
200 ms 0.43 0.53 putation is also biologically more plausible than other ap-
proaches like Artificial Neural Networks or Kalman-Filters.
is only information about the ball position in one input vector, Experiments within the RoboCup domain show that the
information about the ball trajectory is missing. Moreover the LSM approach is able to reliably predict ball movement up
number of training sequences is limited by the kernel method to 200 ms ahead. The experimental setup for this task and the
itself. Because of those computational constraints only 180 corresponding results are presented. These results support the
sequences were used for training resulting in matrices of idea of the LSM as a generic powerful prediction mechanism
approximately 4000 × 4000 elements that have to be pro- for time series.
cessed. That limitation is disadvantageous compared to the Furthermore, a deeper discussion was made about philoso-
LSM, since not all possible input variations occur during the phy behind the liquid state machine, the necessary topologies,
training period. Furthermore kernel PCA can only be applied sizes and its real-time applicability. Moreover, a systematic
offline, whereas the LSM provides any-time computing, thus comparison to the kernel PCA method was made to show
as soon as input is fed into the network the corresponding that the more general LSM approach performs similarly as
output is available. less general methods.

7.5 Real-time applicability


References
In order to utilize the LSM not only in real world data applica-
tions but also in real time applications, one needs a simulation 1. Bear MF (2000) Neuroscience: exploring the brain. Williams and
framework that provides the proper interface as well as suffi- Wilkins, Baltimore, MA
cient simulation speed. In [3] the real-time extensions of the 2. Bishop C (1995) Neural networks for pattern recognition. Oxford,
LSM simulator are used successfully for imitation learning UK, Oxford University Press
3. Burgsteiner H (2005) Training networks of biological realistic spik-
experiments with a Khepera miniature robot. The work in- ing neurons for real-time robot control. In: Proceeding of the 9th
troduces a simple extension enabling real-time experiments. International Conference on Engineering Applications of Neural
This of course only works for network sizes and activities Networks, pp 129–136
that can be simulated in a time smaller than the time that is 4. Burgsteiner H (2006) Imitation learning for real-time robot control.
Int J Eng Appl Artif Intell 19:741–752
passing in the simulated neural network. This constraint is 5. Elman J (1990) Finding structure in time. Cognitive Sci 14:179–
not too hard to meet for network sizes like we used in our 211
experiments. E.g. our simulations of the networks with 144 6. Fernando C, Sojakka S (2003) Pattern recognition in a bucket: a
neurons were performed on a desktop PC with a 1.9 GHz real liquid brain. In: Advances in Artificial Life: 7th European Con-
ference, Lecture Notes in Computer Science, vol. 2801, Springer,
Intel
R
Pentium R
4 CPU and 512 MB RAM and took place Berlin/Heidelberg, pp 588–597
1
in about 4 of real-time. This framework enables us to imple- 7. Ferrein A, Fritz C, Lakemeyer G (2004) On-line decision-theoretic
ment a much richer range of experiments that can use noisy Golog for unpredictable domains. In: Proc. 4th Cognitive Robotics
real world data on agents that operate in real-time. We are Workshop at ECAI 04
8. Fraser G, Steinbauer G, Wotawa F (2004) A modular architecture
currently working on a goal keeper where we use the ball for a multi-purpose mobile robot. In: Innovations in Applied Artifi-
prediction from this article to control a robot that is able to cial Intelligence, IEA/AIE, Lecture Notes in Artificial Intelligence,
intercept a ball. For a more detailed discussion of the real- vol. 3029, Springer, Canada
time framework of the LSM see [4]. 9. Gupta A, Wang Y, Markram H (2000) Organizing principles for a
diversity of gabaergic interneurons and synapses in the neocortex.
Science 287:273–278
10. Hager G, Toyama K (1998) The XVision system: a general purpose
8 Conclusion substrate for portable real-time vision applications. Comput Vis
Image Underst 69:23–37
11. Hopfield J (1982) Neural networks and physical systems with emer-
In this work we propose a biologically realistic approach for
gent collective computational abilities. In: Proceeding of the Na-
the computation of time series of real world images. The tional Academy of Science 79:2554–2558
Liquid State Machine (LSM), a biologically inspired com- 12. Jaeger H (2001) The echo state approach to analysing and training
putation paradigm, is used to learn ball prediction within recurrent neural networks. Tech Rep 148, GMD

Springer
Appl Intell (2007) 26:99–109 109

13. Jordan M, Wolpert D (1999) Computational motor control. In: honors. Mr. Burgsteiner worked as a research and teaching assistant
Gazzaniga M (ed) The Cognitive Neurosciences. Cambridge, MA, at Prof. Maass’ Institute for Theoretical Computer Science at the Graz
MIT Press University of Technology. His main working area was to explore new
14. Kohonen T (2001) Self-Organizing maps, 3 edn. Springer-Verlag learning algorithms for neural networks on robots in real-world environ-
15. Legenstein R, Maass W (2005) What makes a dynamical system ments. He left the group in Spring 2003. Harald Burgsteiner is currently
computationally powerful? In: Haykin S, Principe JC, Sejnowski T, working at the Graz University of Applied Sciences as a Professor for
McWhirter J (eds) New Directions in Statistical Signal Processing: Medical Informatics.
From Systems to Brain. MIT Press.
16. Legenstein RA, Markram H, Maass W (2003) Input prediction and
autonomous movement analysis in recurrent circuits of spiking neu-
rons. Reviews in the Neurosciences (Special Issue on Neuroinfor-
matics of Neural and Artificial Computation) 14(1–2):5–19
17. Maass W, Legenstein RA, Markram H (2002) A new approach to-
wards vision suggested by biologically realistic neural microcircuit
models. In: Buelthoff HH, Lee SW, Poggio TA, Wallraven C (eds)
Biologically Motivated Computer Vision. In: Proc. of the Second
International Workshop, BMCV 2002, Lecture Notes in Computer
Science, vol 2525, Springer, Berlin, pp 282–293
18. Maass W, Natschlaeger T, Markram T (2002) Real-time computing Mark Kröll is a Master student at the Institute for Theoretical Com-
without stable states: A new framework for neural computation puter Science, Graz University of Technoloy. Currently he works at the
based on perturbations. Neural Comput 14(11):2531–2560 Division of Knowledge Discovery, Know-Center Graz. His scientific
19. Markram H, Wang Y, Tsodyks M (1998) Differential signaling via interests are in the fields of Machine Learning and Kernel Methods.
the same axon of neocortical pyramidal neurons. In: Proceedings
of the National Academy of Science 95(9):5323–5328
20. Maybeck PS (1990) The kalman filter: an introduction to concepts.
In: Cox I, Wilfong G (eds) Autonomous robot vehicles, Springer-
Verlag, pp 194–204
21. Natschläger T, Markram H, Maass W (2004) Computational models
for generic cortical microcircuits. Computational Neuroscience: A
Comprehensive Approach, pp 575–605
22. Pearlmutter B (1995) Gradient calculation for dynamic recurrent
neural networks: a survey. IEEE Transactions on Neural Networks
6(5):1212–1228
23. Boyd S, Chua LO (1985) Fading memory and the problem of ap-
proximating nonlinear operators with volterra series. IEEE Trans. Alexander Leopold received his B.Sc. degree in Telematics from Graz
on Circuits and Systems, pp 1150–1161 University of Technology in 2005 and is currently writing his master
24. Schölkopf B, Smola A, Müller KR (1998) Nonlinear compo- thesis at the Signal Processing and Speech Communication Laboratory.
nent analysis as a kernel eigenvalue problem. Neural Comput His research interests are computational intelligence and stochastic sig-
10(5):1299–1319 nal processing.
25. Thomson A, West D, Wang Y, Bannister A (2002) Synaptic con-
nections and small circuits involving excitatory and inhibitory neu-
rons in layers 2–5 of adult rat and cat neocortex: triple intracel-
lular recordings and biocytin labelling in vitro. Cerebral Cortex
12(9):936–953
26. Verma V, Simmons R, Gordon G, Thrun S (2004) Particle fil-
ters for fault diagnosis. IEEE Robotics and Automation Magazine
11(2):56–66

Gerald Steinbauer received a M.Sc. in Computer Engineering


(Telematik) in 2001 from Graz University of Technology. He is cur-
rently researcher at the Institute for Software Technology at the Graz
University of Technology and works on his Ph.D.-thesis focused on
intelligent robust control of autonomous mobile robots. His research
interests include autonomous mobile robots, sensor fusion, world mod-
eling, robust robot control and RoboCup. He built up the RoboCup
Middle-Size League Team of Graz University of Technology and works
Harald Burgsteiner graduated from Salzburg Technical High School currently as its project leader. He is a member of the IEEE Robotics and
in the field of Electronics and Information Technology and went on Automation Society, the IEEE Computer Society and the Austrian So-
to receive his M.Sc. and Ph.D. from the Graz University of Technol- ciety for Artificial Intelligence. Moreover, he is co-founder and member
ogy. He passed the exams with distinction and received his degree with of the Austrian RoboCup National Chapter.

Springer
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like