0% found this document useful (0 votes)
24 views

A Neural Network Pole Balancer That Learns and Operates On A Real Robot in Real Time

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

A Neural Network Pole Balancer That Learns and Operates On A Real Robot in Real Time

Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

A Neural Network Pole Balancer

that Learns and Operates on a Real Robot in Real Time

Dean Hougen John Fischer Deva Johnam


[email protected] [email protected] [email protected]

Artificial Intelligence, Robotics, and Vision Laboratory


Department of Computer and Information Sciences
University of Minnesota
4-192 Electrical Engineering and Computer Science Building
200 Union St. SE, Minneapolis, MN 55455

Abstract 1 Introduction

A neural network approach to the classic Pole-balancing is the task of keeping a rigid pole,
inverted pendulum task is presented. This task is the hinged to a cart and free to fall in a plane, in a roughly ver-
task of keeping a rigid pole, hinged to a cart and free tical orientation by moving the cart horizontally in the
to fall in a plane, in a roughly vertical orientation by plane while keeping the cart within some maximum dis-
moving the cart horizontally in the plane while keep- tance of its starting position (see Figure 1). Variously
ing the cart within some maximum distance of its known as the inverted pendulum problem, pole-balancing,
starting position. This task constitutes a difficult con- stick-balancing, or broom-balancing, this task is a classic
trol problem if the parameters of the cart-pole system object of study in both system dynamics (where the equa-
are not known precisely or are variable. It also forms tions of its motion are of interest)[4], [16] and control the-
the basis of an even more complex control-learning ory (where control systems capable of balancing the pole
problem if the controller must learn the proper actions are of interest)[20]. The dynamics are by now well under-
for successfully balancing the pole given only the cur- stood, but they provide a difficult task nonetheless, as the
rent state of the system and a failure signal when the system is inherently unstable. Further, if the system
pole angle from the vertical becomes too great or the parameters are not known precisely, or may vary, then the
cart exceeds one of the boundaries placed on its posi- task of constructing a suitable controller is, accordingly,
tion. more difficult.
The approach presented is demonstrated to The pole-balancing task is quickly becoming a clas-
be effective for the real-time control of a small, self- sic object of study in the theory of control-learning, as
contained mini-robot, specially outfitted for the task. well. Systems that can learn to balance poles were first
Origins and details of the learning scheme, specifics developed over thirty years ago [6], [23] and many more
of the mini-robot hardware, and results of actual have been developed since (e.g. [1], [3], [5], [7], [8], [9],
learning trials are presented. [11], [15], and [19]). All of these systems have fallen

12°

-4.0m 4.0m

Figure 1
short of fully solving the inverted pendulum problem (see control (allowing only for a preset force to be applied in
Subsection 5.3 “Discussion”) due to the difficult nature of either direction on the cart, rather than a variable force
the problem. In the most difficult version of the problem application) was approximated by the application of a sin-
(which we refer to in this paper as the full or entire prob- gle torque pulse in either direction by the motor on any
lem), the learning system is provided with nothing but a time step. The magnitude of this torque pulse is not
vector encoding the current state of the cart-pole system known.
and a single failure signal which indicates that any one of The coefficient of friction in the “hinge” between
the four error conditions (pole fell left, pole fell right, cart the cart and the pole (actually a variable resistor, see Sec-
went too far left, or cart went too far right) has occurred. tion 4 “The Robot”) was not estimated, nor was the fric-
Because a great many actions will have been specified by tion in the axles nor that produced by the motor itself, nor
the controller when failure is finally signalled, determining was the coefficient of sliding friction between the tires and
which actions were “correct” and which were the cause of the floor. Finally, changes due to a reduction in battery
failure is an extremely difficult credit and blame assign- power during the run were not estimated.
ment problem. To reduce the computational load on the mini-
Perhaps due to the difficulty of this problem, or per- robot’s processor (see Section 4.2 “Mini-Computer
haps due to a failure to understand the shortcomings of Board”), the state vector was restricted to approximations
simulation, most researchers have used their control-learn- of the cart position and pole angle. (Standard versions of
ing systems only on simulations of cart-pole systems. the pole-balancing problem provide the controller with
What is likely the first control system to learn to balance a cart velocity and pole angular velocity as well.)
pole in an actual physical system without the aid of an out-
side teacher was only recently developed (see [12]). Yet
even this system does not solve the complete problem on 3 SONNET
its own; fully half the problem is solved for it before it
even begins the training process (see Subsection 5.3 “Dis- The learning system follows the Self-Organizing
cussion”). We present, then, what we believe to be the first Neural Network with Eligibility Traces (SONNET) para-
control system which learns to solve the entire pole-bal- digm first described in [11]. The SONNET paradigm
ancing problem for a real, physical cart-pole system. delineates a general class of control-learning systems
which are appropriate for problems in which correct con-
trol actions are not known, but a feedback mechanism that
2 Problem Definition allows for an overall evaluation of system performance
(success and/or failure signals) is available and for which
The pole balancing problem, as it pertains to con- system performance is temporally based on network
trol-learning theory, is really a class of problems of the responses.
general form described above. For simulation results, a The SONNET paradigm combines the self-organiz-
standard (as presented in [2]) is generally, but not always ing behavior of Kohonen’s Self-Organizing Topological
(e.g [9]), followed. For actual cart-pole systems, the Feature Maps (Kohonen Maps, see [14]) with the concept
parameters are naturally quite variable. For our cart-pole of an eligibility trace (see Section 3.2 “The Eligibility
system, many of the system parameters are known only Trace”) to create a class of novel and powerful control-
very roughly and others are unknown. It is not necessary learning systems. A SONNET system (known as PBMax
for us to known the particulars of our cart-pole system, as after its intended task and particulars of its computations)
it is the task of the learning system to decide the correct was applied to a standard simulation of the pole-balancing
control actions and the learning system is designed to problem [11]. Here, a (new) SONNET-style network is
work on the class of problems, not on any particular applied to an instantiation of the inverted pendulum prob-
instantiation of it. lem using a real cart-pole system for the first time.
Approximations of the cart-pole system parameters
are as follows (see Figure 1): The mass of the cart is
roughly 1 kilogram, the pole has a roughly uniform mass
3.1 Self-Organizing Maps
of 50 grams and a length of approximately 1 meter, giving
a center of mass approximately 0.5 meters from the pivot. Kohonen has proposed as set of connectionist sys-
The cart is restricted to stay within approximately 4.0 tems based the recognition that in biological neural net-
meters of its initial position and the pole within roughly works (i.e. brains) there are regions (especially in the
12° of vertical. Violating either of these conditions causes cerebral cortex) for which topology preserving relations
a failure signal to be generated and the trial to end. exist between patterns of input from sensory organs and
The standard of allowing the cart only “bang-bang” the physical arrangement of the neurons themselves [14,
pp.119-122]. These areas provide efficient representations 1,1 2,1 3,1 4,1 5,1 6,1
for interrelated data. Kohonen Maps, then, are a class of
conceptually similar artificial maps that use several simple
techniques (such as lateral inhibition) in concert to achieve 1,2 2,2 3,2 4,2 5,2 6,2
the same general effects as those found in biological sys-
tems. Kohonen Maps have been used extensively as pat-
tern classifiers. 1,3 2,3 3,3 4,3 5,3 6,3

3.2 The Eligibility Trace 1,4 2,4 3,4 4,4 5,4 6,4

Biological neurons are highly complex and their 1,5 2,5 3,5 4,5 5,5 6,5
functions are only very roughly approximated by the artifi-
cial neurons in today’s connectionist systems. One func-
tion found in biological neural networks, as noted in [13], 1,6 2,6 3,6 4,6 5,6 6,6
is what we refer to here as the ‘eligibility trace’. This func-
tion is the result of neurons becoming more amenable to
change immediately after they fire. This plasticity reduces Figure 2
with time, but provides an opportunity for learning based
on feedback following the neuron’s activity. A ‘neighborhood’, then, is a set of neurons within
Building on this idea, we can construct an artificial some “distance” (defined according to network’s topol-
neural network capable of learning based on temporal ogy) of the neuron in question. A neighborhood may have
feedback. The network receives input, gives responses any width from zero (the neighborhood is restricted to the
and, when feedback signals arrive, is updated based on the neuron itself) to the width of the network itself (such that
neurons’ eligibilities for change. Notably, this feedback the entire network is within the neighborhood).
can come from some other system (such as a cart-pole sys- More formally, for a rectangular network, if U is the
tem) that the network is causally connected to. set of units u in the network, (i, j) the numerical identifier
of each neuron, and ω the width of the neighborhood, then
3.3 The PBMin Learning System the neighborhood N of neuron n is defined as
{ u ∈ U || i n − i u < ω ∧ j n − j u < ω } . (1)
A particular learning system, the Pole Balancer for
the Mini-robot (PBMin) was constructed under the SON- In PBMin, a constant neighborhood width of one
NET paradigm. Unlike previous SONNET systems, was used; the neighborhood for any given neuron was
PBMin is restricted to using integer values, due to restric- therefor reduced to the set of nearest neighbors plus the
tions imposed by the mini-robot’s onboard processor. This neuron itself. Figure 2, therefor, shows the entire connec-
fact should be kept in mind when viewing the equations tivity of the network. The rectangular topology was cho-
given below. sen to match the two-dimensional input (state) vector (see
Section 2 “Problem Definition”).
3.3.1 Welcome to the neighborhood
3.3.2 Cooperation and competition
One concept borrowed for SONNET from Kohonen
Maps is that of ‘neighboring.’ This concept relates to the Each neuron has one “weight” (an integer) associ-
topological arrangement of the neurons which does not ated with it for each dimension of the input space. Because
change as the network learns. In PBMin, the topology is PBMin receives a two-dimensional input vector, each neu-
rectangular; the nearest neighbors of any neuron are those ron correspondingly has two input weights. These weights
neurons which have identification numbers differing by are initially set so that the neurons are laid out in a rectan-
one in either or both digits from those of the neuron in gular matrix which matches the shape of the network’s
question. For example, the nearest neighbors of neuron topology and uniformly covers approximately the central
(3,3) are (2,2), (2,3), (2,4), (3,2), (3,4), (4,2), (4,3), and 15% of the input space. Each time a new input X is given
(4,4). In Figure 2, all nearest neighbors are directly con- to the network, its values are compared with the values of
nected by line segments. the weights of all of the neurons. To reduce computation
time, the similarity measure used in this comparison was
simply the sum of the differences between the entries in 3.3.4 Eligibility for adaptation
the input vector and the corresponding weights. (More
commonly, the Euclidean distance would be used.) The The use of an eligibility trace makes a teacher
neuron s which has weights most closely matching the unnecessary in domains in which system performance is
input vector (according to the similarity measure) is called temporally dependant on network responses. For PBMin,
the “selected” neuron. the eligibility for adaptation e, for each neuron n, was cal-
Formally, the selected neuron s is defined as culated according to the equation
∃s w s − X 1 + w s − X 2 ≤ new old
1 2 en = δe n + ισ ( o n ) , (5)
wu − X1 + wu − X2 , (2)
1 2 where δ is the rate of eligibility decay, ι is the initial eligi-
bility for change for a neuron which has just fired, and σ is
∀ ( u ∈ U) , s ∈ U ]
a function of the neuron’s output o. Only the selected neu-
where wn is the pair of weights for neuron n. If more than ron gives an output response, so σ is defined to be zero for
one neuron satisfies equation (2), then one of these neuron all neurons besides the one selected. For non-selected neu-
is selected by some arbitrary method. The weights of the rons, then, equation (5) reduces to a simple decay in the
selected neuron are updated to match the input vector level of eligibility. All neurons have their eligibility level
more closely according to the equation initially set to zero.
Conceptually, PBMin has a single output weight for
new old old
w = w + α (x − w ), (3) each neuron which may take on both positive and negative
values; a positive value indicates a push (torque pulse) in
where x is the entry in the input vector to which the given one direction and a negative value a push in the other
weight corresponds, and α is a scaling factor which, for direction. PBMin also has two separate eligibility traces;
PBMin, was kept at 0.002. one for reward and the other for punishment. The function
The neurons in the neighborhood of the selected σ is such that an eligibility for reward will increase the
neuron also have their weights updated according to equa- magnitude of a neuron’s output value without changing its
tion (3). In this way, all the neurons in the network com- sign. An eligibility for punishment, on the other hand,
pete for selection and each neuron, when selected, might result in either a reduction in magnitude without a
cooperates with the others in its neighborhood to arrive at change in sign, or might cause a change in sign and a
new input weight values. change (either reduction or increase) in magnitude. Which
of these results is the case depends on the neuron’s initial
3.3.3 Self-organized output value and its eligibility for punishment. Both reward and
punishment signals are generated only upon failure, so
output values stay constant within a trial, but change
These concepts from Kohonen Maps are extended
between trials. The central difference between the eligibil-
for control by adding one or more output values (weights)
ity for reward and that for punishment is the decay rate.
for each neuron. These output weights may be updated in a
For reward the decay rate δr is 0.99 and for punishment
manner similar to that used for the input weights. For
the decay rate δp is 0.90.
example, if an output response is given by the selected
Building on the eligibility function, the output
neuron, and this value differs from the correct out
weights of the neurons are updated according to
response, the output weight of the selected neuron might
be updated according to the equation old
v + ρ ( S) e
v
new
= v
old
+ β (r − v
old
), (4)
vn
new
= ∑
N
ξ
(6)
n
where v is the output weight, r is the correct response, and
where ρ is the feedback response from the controlled sys-
β is a scaling factor. This idea has, in fact, been used to
tem S, and ξ is the number of neurons in the neighborhood
create networks which can learn to self-organize output as
in question. In this way neurons cooperate to ensure that
well as input (e.g. [10], [17], and [18]). The limiting factor
nearby units have similar output values. Note, however,
to these approaches is that they require a teacher be avail-
that there is no competition to be the neuron selected for
able to give the network a “correct” answer. These super-
change. All neurons are updated on failure (when there is
vised learning schemes are appropriate to some problem
feedback from the controlled system).
domains, but other domains require that the learning sys-
tem solve the problem independently.
4 The Robot grammable Read Only Memory (EPROM). The mini-
computer board has 32 kilobytes of RAM, which is used to
store and run the pole balancing program, and 16 kilobytes
The robot, like the learning system, is named
of EPROM, which is used to start up the MCU.
PBMin. PBMin is an inexpensive, small, self-contained
robot. It has its own onboard mini-computer and operates serial port
independently once the pole-balancing program is loaded. analog port

R E
P
A R
holes MCU O
M
M

Figure 4

4.3 Motor Board

The motor board is the same size as the mini-com-


puter board and is stacked immediately below it on the
chassis. The motor board supplies power to PBMin’s drive
Figure 3 motor allowing the robot to go forwards or backwards, or
stop.

4.1 Chassis
4.4 Pole Angle Sensor
The chassis is made from a modified radio con-
trolled car (see Figure 3). Most of the original electronic The pole angle sensor is a variable resistor attached
components, except for the motors, are removed. A 7.2 to a pole (see Figure 5(a)). The pole that PBMin uses is a
volt battery is attached to the back end of the chassis. The common yard stick. As the pole tips it turns the shaft of the
plastic that originally covered the radio controlled car is variable resister. The variable resistor converts the angle
removed. The mini-computer board, the motor board, and of the pole into a voltage, which is read by the mini-com-
the pole angle sensor are stacked in the middle of the car. puter board.
PBMin’s position sensor is mounted on the front of the
chasis. 4.5 Cart Position Sensor

4.2 Mini-Computer Board The position sensor is made up of two infrared


transmitter and receiver pairs (IR modules, see Figure
The mini-computer board contains the MicroCon- 5(b)). A transmitter sends out infrared light, which
troller Unit (MCU) and memory (see Figure 4). The mini- bounces off the paper disk, to its receiver. This paper disk
computer board is 2 1/4 inches wide and 4 inches long. has white and black regions on it (see Figure 5(c)) and is
The MCU is a 8 megahertz 68HC11A1FN micro- attached to PBMin’s left front wheel. When PBMin moves
controller made by Motorola. PBMin makes use of two of forwards and backwards, the wheel and disk rotate, and
the MCU’s internal devices -- they are: (1) the analog to the receiver senses different amounts of infrared light (see
digital converters, which are used to get the angle of the Figure 5(d) and (e)).
pole and obtain PBMin’s position and (2) the serial port, When an IR module senses a white region, it sends
which is used to receive the pole balancing program from a signal to the mini-computer board. The mini-computer
the main computer. board looks at the two signals it receives from the two IR
The memory on the mini-computer board is divided modules and interprets which way the wheel has turned
into Random Access Memory (RAM) and Erasable Pro- and by how much.
IR transmitter receiver
modules

variable
resister (b)
pole wheel
disk (d)

transmitter receiver

shaft
(a) (c)

(e)
Figure 5

5 Results and Discussion of the robot, the network was given the optimal solution.
One run of fifty trials was performed. The average trial
time of this run was 12.8 seconds.
Experimental results have shown the PBMin learn-
The PBMin learning system was run with different
ing system to be an effective real-time controller for pole-
sets of random initial output weights. Five runs of the first
balancing and, by extension, that the SONNET paradigm
set of weights were performed and an average trial time of
(previously only used for the control of simulated systems)
3.3 seconds was obtained. One run each of three additional
can be used for the construction of real-world controllers.
sets of weights were performed and an average trial time
The performance of the PBMin system is seen to approach
of 3.3 seconds was obtained. The fact that the average trial
optimal control for this robot on this task.
time stayed constant regardless of the values of the ran-
dom initial weights demonstrates that the ability to of the
5.1 Experimental Set-up system to learn is not dependent on the initial weight val-
ues.
The robot was run in a large, open, carpeted room.
The size of the room dictated the size of the “track.” The 5.3 Discussion
robot allowed the robot to go 4 meters in either direction
of start.
Because of the difficulty of this problem, many
The robot was programmed with three different
authors (e.g.[7], [8], [22], and [23]) have constructed sys-
programs: (1) the PBMin learning system described above
tems which learn by watching the actions of a who person
(see Subsection 3.3 “The PBMin Learning System”), (2) a
controls the cart-pole system and a few (e.g. [7] and [9])
preprogrammed solution that we believe produces optimal
have made systems which learn by observing the cart-pole
control given only the cart position and pole angle, and (3)
system as it is controlled by another computer program.
a program that used a file of random numbers for com-
There systems are completely incapable of solving the
pletely random control of the cart-pole system. For each
pole-balancing problem without this outside aid. Others
program, the robot performed runs of 50 trials and sent
(e.g. [5]) have been forced to give their systems “hints” in
back results.
order for them to solve the problem.
Most systems which are said to solve the pole-bal-
5.2 Results ancing problem without teachers or hints really do no such
thing. In these authors’ systems (e.g. [3], [15], [19], and
As a baseline for comparison, the robot was con- [21]) the input space is partitioned by the researcher, rather
trolled by random numbers. Three runs of fifty trials were than by the learning system. This head start makes a sig-
performed. The average time for these trials was 1.6 sec- nificant difference. In one of the few instances where the
onds. entire pole-balancing problem is approached [1], a single
To get a measure of the potential balancing ability layer neural network that can learn to balance a pole quite
well when the input space partitioning is precomputed by mentioned that such quick learning was probably never a
the author, fails to learn much beyond the level of random goal for those researchers as their system was provided
control when applied to the entire problem and a two layer with an external power source, whereas ours had a very
network takes, on the average, nearly 100 times as long to limited battery capacity. Or, one might note that, in some
learn when the input space partitioning was not provided versions, their system achieved levels of success (as mea-
for it. (The only other case that we are aware of where sured by time until failure) that ours never did. This suc-
learning pole-balancing is attempted without a pre-parti- cess, however, came only after at least 150 trials whereas
tioning of the input space is in the first SONNET paper our runs were stopped at 50 trials due to limitations
[11].) imposed by the battery.
PBMin is applied to the entire pole-balancing prob- Perhaps the best comparison for our learning sys-
lem; it needs to dynamically partition the input space tem is with the performance of other learning systems on
while learning the output space. To reduce the computa- the same hardware. Unfortunately, other learning schemes
tional load on the processor, however, it was necessary to are simply too computationally expensive to be imple-
restrict the input vector to two dimensions. This means mented on our mini-robot in real-time. We can compare
that our system could not take cart velocity or pole angular our results, however, with two ends of the automatic con-
velocity into account. These elements of the state vector trol spectrum. These are complete random control and
must be taken into account if long-term stability is to be what we believe to be optimal control given only the pole
achieved. angle and cart position. The averages of each of these
In the only previous paper in which researchers strategies are plotted along with the results of a single run
attempt to apply a control-learning scheme to a real cart- of the learning scheme in Figure 6. As can be seen, the
pole system [12], the input space is partitioned by the controller’s learned behavior approached the average per-
researches, rather than by the learning system. Because of formance of the “optimal” controller.
this, and because of differences between their cart-pole It should be noted that the optimal controller varied
system and the one presented here, direct comparisons widely in its performance. From a minimum of 90 time
between results obtained with the two systems may be steps until failure to a maximum of 847, the performance
misleading. For instance, one might note that their system of the optimal controller highlights the effects of initial
never exceeded a run length of 10 seconds within the first conditions and noise on the results obtained by any control
50 trials whereas ours never failed to achieve this mile- scheme for this cart-pole system.
stone. However, in fairness to their system, it should be

optimal
seconds

example run

baseline

trial number

Figure 6
6 Acknowledgments [11] D. Hougen. “Use of an eligibility trace to self-orga-
nize output,” in Science of Artificial Neural Net-
works II, D. Ruck, ed., SPIE 1966, 436-447, 1993.
This research was carried out in its entirety at the
Artificial Intelligence, Robotics, and Vision Laboratory of [12] T. Jervis and F. Fallside. “Pole balancing on a real
the University of Minnesota. We would like to thank the rig using a reinforcement learning controller.”
other members of the laboratory for their comments, ques- Technical Report CUED/F-INFENG/TR115, Cam-
tions, suggestions and support. Thanks in particular to bridge University Engineering Department, Cam-
Chris Smith for help on most aspects of this project, Mike bridge, England.
McGrath for his work on the interface software, Maria [13] A. Klopf. “Brain function and adaptive systems -- a
Gini for her funding support, made possible by the AT&T heterostatic theory,” in Proceedings of the Interna-
Foundation and grant NSF/DUE-9351513, and her advice tional Conference on Systems, Man, and Cybernet-
on dealing with real robots, and James Slagle for his help ics, 1974.
in developing the SONNET learning paradigm. [14] T. Kohonen. Self-Organization and Associative
Memory, Ch.5. 3rd. ed., Springer-Verlag, Berlin,
References 1989.
[15] D. Michie and R. Chambers. “Boxes: An experi-
ment in adaptive control,” in Machine Intelligence,
[1] C. Anderson. “Learning to control an inverted pen- E. Dale and D. Michie, eds., Oliver and Boyd,
dulum using neural networks.” IEEE Control Sys- Edinburgh, 1968.
tems Magazine, Vol. 9, No. 3, 31-37, 1989.
[16] K. Ogata. System Dynamics, 531-536. Prentice-
[2] C. Anderson and W. Miller. “Challenging Control Hall, Englewood Cliffs, New Jersey, 1978
Problems,” in Neural Networks for Control, 475-
[17] H. Ritter and K. Schulten. “Extending Kohonen’s
510. W. Miller, R. Sutton, and P. Werbos, eds., MIT
self-organizing mapping algorithm to learn balistic
Press, Cambridge, MA, 1991.
movements,” in Neural Computers, R. Eckmiller
[3] A. Barto, R. Sutton, and C. Anderson. “Neuronlike and C. von der Malsberg, eds., Vol. F41, 393-406.
adaptive elements that can solve difficult learning Springer, Heidelberg, 1987.
control problems,” IEEE Transactions on Systems,
[18] H. Ritter, T. Marinetz, and K. Schulten. “Topology
Man, and Cybernetics, Vol. SMC-13, 834-846,
conserving maps for learning visio-motor-coordina-
1983.
tion.” Neural Networks, Vol. 2, No. 3, 1989.
[4] R. Cannon. Dynamics of Physical Systems, 703-
[19] C. Sammut. “Experimental results from an evalua-
710. McGraw-Hill, New York, 1967.
tion of algorithms that learn to control dynamic sys-
[5] M. Connell and P. Utgoff. “Learning to control a tems,” in Proceedings of the Fifth International
dynamic physical system,” in Proceedings AAAI- Conference on Machine Learning, 437-443. Mor-
87, Vol. 2, 456-460. American Association for Arti- gan Kaufman, San Mateo, California, 1988.
ficial Intelligence, Seattle, 1987.
[20] J. Schafer and R. Cannon. “On the control of unsta-
[6] P. Donaldson. “Error decorrelation: a technique for ble mechanical systems,” in Automatic and Remote
matching a class of functions,” in Proceedings: III Control III: Proceedings of the Third Congress of
International Conference on Medical Electronics, the International Federation of Automatic Control,
173-178, 1960. Paper 6C, 1966.
[7] A. Guez and J. Selinsky. “A trainable neuromorphic [21] O. Selfridge, R. Sutton, and A. Barto. “Training and
controller.” Journal of Robotic Systems, Vol. 5, No. tracking in robotics,” in Proceedings IJCAI-85,
4, 363-388, 1988. 670-672. International Joint Conference on Artifi-
[8] A. Guez and J. Selinsky. “A neuromorphic contoller cial Intelligence, Los Angeles, 1985.
with a human teacher,” in IEEE International Con- [22] V. Tolat and B. Widrow. “An adaptive ‘broom bal-
ference on Neural Networks, Vol. 2, 595-602, 1988. ancer’ with visual inputs,” in IEEE International
[9] D. Handelman and S. Lane. “Fast sensorimotor Conference on Neural Networks, Vol. 2, 641-647,
skill acquisition based on rule-based training of San Diego, 1988.
neural networks,” in Neural Networks in Robotics, [23] B. Widrow. “The original adaptive neural net
G. Bekey and K. Goldberg, eds., Kluwer Academic broom-balancer,” in International Symposium on
Publishers, Boston, 1991. Circuits and Systems, 351-357, 1987.
[10] R. Hecht-Nielsen. “Counterpropagation networks.”
Applied Optics, Vol. 26, No. 23, 4979-4984, 1987.

You might also like