Backpropagation through space, time and the brain
Paul Haider, Benjamin Ellenberger, Jakob Jordan, Kevin Max,
Ismael Jaras, Laura Kriener, Federico Benitez, Mihai A. Petrovici
Abstract In Machine Learning (ML), the answer to the a b
spatiotemporal credit assignment problem is almost univer- π
2
6 (r, Dτ+r Iτ−m [r]))
sally given by the error backpropagation algorithm, through 6 (e, Dτ+m Iτ−r [e])
(λ, Iτ+m Dτ−r [λ])
phase shift
6
either space (BP) or space and time (BPTT). However,
BP(TT) is well-known to rely on biologically implausible as- 0
sumptions, in particular the dependency on both spatially
and temporally non-local information.
Here, we introduce Generalized Latent Equilibrium (GLE), − π2
a computational framework for spatio-temporal credit assign- −1.0 −0.5 0.0 0.5 1.0 1.5 2.0
τr /τm
ment in dynamical physical systems. We start by defining an 100 c
energy based on neuron-local mismatches, from which we de-
rive both neuronal dynamics via stationarity and parameter 80
validation accuracy
dynamics via gradient descent. The resulting dynamics can
60
be interpreted as a real-time, biologically plausible approxi-
mation of BPTT in deep cortical networks with continuous-
40 GLE
time, leaky neuronal dynamics and continuously active, local Linear
synaptic plasticity. As in the Latent Equilibrium framework 20 MLP
CNN
(Haider et al., 2022), GLE exploits the ability of biological GRU
neurons to phase-shift their output rate with respect to their 0
0 20 40 60 80 100
membrane potential. The benefit of incorporating this prop- epoch
erty is twofold: First, this type of computation allows the
mapping of time-continuous inputs to neuronal space and Figure 1: (a) Mapping of GLE equations to hierarchical, cortical micro-
second, it enables the temporal inversion of feedback signals circuits with feedforward (red) and feedback (blue) paths. (b) Temporal
which is essential in order to approximate the adjoint states modulation of GLE errors e are exactly equal to exact adjoint error sig-
nals λ as well as inverted w.r.t. to feedforward inputs r. (c) Average
necessary for estimating useful parameter updates. validation accuracy over 5 seeds of our GLE networks trained online on
Mathematical framework The GLE framework can be the MNIST1D dataset (cf. Greydanus, 2020) compared to standard ML
approaches such as MLPs, TCNs or GRUs trained with offline BP(TT).
defined as a set of postulates from which network structure
and dynamics are derived. First, we abstract biological neu-
rons to perform two fundamental temporal operations: low- but with prospective neuronal interactions. These equa-
pass filtering, i.e., retrospective integration of inputs x(t) into tions suggest a direct mapping to cortical microcircuitry
the membrane u(t), denoted by the operator Iτ+m (Fig. 1a). For hierarchical networks, feedback errors are
1
Z t
t−s
backpropagated over layers but temporally inverted with
−
u(t) = Iτ m {x(t)} := m x(s) exp − m ds , (1) the same operators that modulate the feedforward signals
τ τ
−∞ rℓ = φ Dτ+r Iτ−m {Wℓ rℓ−1 } :
and prospective coding in its output r(t), denoted by the op-
eℓ = Dτ+m Iτ−r φ′ℓ ⊙ Wℓ+1
T
erator Dτ+r eℓ+1 (6)
+ − ′ T
≈ Iτ m Dτ r φℓ ⊙ Wℓ+1 λℓ+1 = λℓ (7)
d
r(t) = φ Dτ+r {u(t)} := φ 1 + τr
u(t) . (2)
dt This property of inverse temporal modulation is crucial as it
Next, we postulate the real-time energy function ensures that feedforward signals rℓ and feedback errors eℓ are
X in sync w.r.t. their phase shift. Our prospective errors ap-
2
E(t) = ∥ei (t)∥ + C(t) , (3) proximate the exact, but temporally non-local errors given
i by the adjoint variables λℓ (Fig. 1b), where Iτ+ {x(t)} =
1 ∞ t−s
ds and Dτ− {x(t)} = 1 − τ dt d
R
τ t x(s) e x(t). Finally,
τ
from which both neuronal dynamics and plasticity are de-
+ P +
rived. The mismatches ei = Dτ m {ui }− j Wij φ Dτ r {uj } −
plasticity of all parameters θ ∈ {W, b, τ } performs gradient
bi represent the difference between a neuron’s prospective out- descent on the energy. For example, in the case of synaptic
put and what its presynaptic partners expect it to be and the weights, this leads to a bio-plausible, error-correcting rule, as
cost C(t) is a measure of the incorrectness of the output neu- described in, e.g., Urbanczik & Senn, 2014:
rons. Neuronal dynamics satisfy a stationarity condition
Ẇij = −ηW ∂E/∂Wij = ηW ei rj . (8)
∂E ∂E
Iτ−m + Iτ−r = 0, (4) Altogether, this approximates gradient descent on the to-
∂Dτ+m {ui } ∂Dτ+r {ui } RT
tal integrated cost over time C = 0 C(t) dt and enables
leading to classical leaky neuronal membranes our GLE networks to achieve results competitive with well-
X known, powerful ML architectures (Fig. 1c), but with fully
τim u̇i = −ui + Wij φ Dτ+r {uj } + bi + ei ,
(5) local, always-on, phase-free learning in real time.
j