Ivan K. Boikov, Alfredo de Rossi, Mihai A. Petrovici - Ultrafast Neural Sampling With Spiking Nanolasers
Ivan K. Boikov, Alfredo de Rossi, Mihai A. Petrovici - Ultrafast Neural Sampling With Spiking Nanolasers
1
Results a
Elec. #
the onset of an “active” state GS
SA
zk (t) = 1 ⇔ neuron has fired in (t − τ ; t] , (1) 0
Phot. #
with zk (t) = 0 otherwise. In BMs, the probability of
Gain
neuron ensemble being in a particular state z follows the
0
Boltzmann distribution p(z) ∝ exp [−E(z)/kB T ] , where −1.0 −0.3 0.0 0.6 6.0
the energy E is defined as Time (ns)
X X
E=− Wkj zk zj − bk zk ,
Figure 1: Spiking nanolasers. (a) Schematic of a PSN
k<j k
coupled to a waveguide (gray). Here, the laser is a pho-
with neuronal biases bk and synaptic weights Wkj = Wjk , tonic crystal; its modes are standing-wave and optical
kB is the Boltzmann constant, and T is the ensemble spikes are coupled out to the waveguide in both directions,
(Boltzmann) temperature. Here, we disregard the units, E is an electric field, λ is the wavelength in vacuum, and n
assume kB T = 1 and omit this term for brevity. is the refractive index of the absorber. (b) Optical spike
In SBS, a spiking neural network approximates p(z) emission. Dashed lines separate sections with different
through a time-continuous analogon of Gibbs sam- scaling of the time axis.
pling [11]. A particularly interesting variant builds on
LIF sampling (LIFS) neurons [12, 13], as they represent a
de-facto standard model across the vast majority of neuro- After some time, the pumping re-establishes the initial
morphic platforms. In LIFS, the required stochasticity is gain, and the nanolaser can spike again. This process
generated by adding (or exploiting pre-existing) noise on (Fig. 1b) bears a strong resemblance to the generation of
the neuronal membranes, which consequently follow the action potentials and the subsequent refractoriness found
Ornstein-Uhlenbeck process in biological neurons, as described by the Hodgkin-Huxley
XX model [31].
τm u̇k = (b − u) + Wkj κ(t − tspike ) + σW dW (2)
j The spiking dynamics in a semiconductor laser with an
j tspike
j SA is described by the Yamada model [27, 32], which ig-
with an autocorrelation determined by the neuronal mem- nores spontaneous emission and therefore assumes a per-
brane time constant τm (dW represents a Wiener process fectly deterministic response. Here, we consider a laser
scaled by a noise amplitude σW ). The interaction between with a single optical mode; its field is distributed over
neurons is mediated by an additive postsynaptic potential a volume comparable to λ3 , where λ is the wavelength.
(PSP) kernel κ that is triggered by an incoming spike at Moreover, the active region, where electron and hole pairs
time tspike . are created, is even smaller. Under these conditions, spon-
taneous emission is not negligible, as the fraction of it go-
ing into the mode – the spontaneous emission factor β – is
Spiking nanolasers between 0.1 and 1.0, whereas in macroscopic semiconduc-
The PSNs discussed in this work are semiconductor lasers tor lasers it is 10−4 or less [33]. Therefore, in nanolasers,
composed of two sections: a gain section and an SA. the average number of photons is much smaller, and the
The two sections form a single resonator and are there- relative noise due to the granularity is much larger, which
fore spanned by a single laser mode (Fig. 1a). The gain disrupts the otherwise deterministic spiking at regular in-
section is pumped to reach local electron population in- tervals controlled by pumping strength.
version and stimulated emission of photons. The SA is A rigorous description of noise requires a quantum me-
not pumped, such that here absorption always dominates, chanical formalism, which is exceedingly complicated for
yet saturating as the flux of absorbed photons increases. semiconductor systems. Therefore, the semiconductor
As the pump rate increases beyond a certain threshold, laser is often approximated as a homogeneously broad-
the absorption is overtaken by the stimulated emission; ened two-level system, as intraband scattering is fast
akin to negative differential resistance in electronic oscil- enough such that electrons and holes are in thermal equi-
lators, this introduces a positive feedback which initiates librium [34, 35]. The light-matter interaction within a
the emission of a pulse. In turn, this depletes the popula- semiconductor can therefore be described as a collection
tion of excited electron-hole pairs, hence the gain, and the of dipoles interacting with the same optical mode. With
laser is shut off. Therefore, the emission of a new pulse im- some approximations, such as fast dephasing of the polar-
mediately following a previous one is strongly suppressed. ization with respect to the damping rate of carriers and
2
a
u (mV)
photons, the quantum mechanical descriptions lead to rate −52
equations [36, 37]. −56
The rate equations describe the evolution in time of 0 40 80 120 160 200
the populations of excited dipoles ne and photons S in a Time (ms)
cavity mode due to multiple processes. Stimulated emis- b
Gain (norm.)
sion and absorption are described by a term GS, where 0.0
G = γr (2ne − n0 ) represents the gain, n0 is the total
number of dipoles, and γr is the radiative transition rate. −1.5
Spontaneous emission and photon damping are described
0 40 80 120 160 200
by γr ne and γS. Since ne ≤ n0 , the maximum gain is Time (ns)
limited to γr n0 ; this condition is ensured by an optical
c d
(norm.)
Count
pumping term γp (n0 − ne ), where γp is the pumping rate
Gain
−0.1
and n0 − ne represents pumping saturation. −0.2
The stochastic equations are formed by including 45 60 0 10 20 30
the Langevin forces Fi (t) which are random variables Time (ns) Inter-spike interval (ns)
e f
Gain autocor.
with zero mean and auto-/cross-correlation strengths 1.0 OU process
⟨Fi (t)Fj (t)⟩ = 2Dij . The Langevin forces are calculated
Count
0.5
consistently with the rate equations [37] based on the Mc-
Cumber noise model [38] (see Nanolaser noise model in 0.0
Methods). Here, we extend the model by including two 0 1 2 3 −0.18 −0.12 −0.06
separate sections, one providing gain with an excited pop- Delay (ns) Gain (norm.)
ulation ne , and another one representing the SA, denoted
with the suffix a, with an excited population na , both in- Figure 2: PSNs emulate LIFS neurons. (a) Membrane
teracting with the same mode: potential of a LIF neuron. Shading represents the “ac-
tive” state after a spike emission. Below: properties of
Ṡ = GS + γr ne + γr,a na + FS (t) , a PSN far from (blue, γp = 0.955γpthr ) and close to (or-
n˙e = −γr (2ne − n0 )S − γt ne + γp (n0 − ne ) + Fe (t) , (3) ange, γp = 0.995γpthr ) the spiking threshold. (b) Gain
n˙a = −γr,a (2na − n0,a )S − γt,a na + Fa (t) , time traces. After a spike is emitted, the gain is reduced
drastically, and the PSN is considered “active” for a time
where G = γr (2ne − n0 ) + γr,a (2na − n0,a ) − γ is the net shown with shading. (c) Gain time traces between spike
gain including photon damping, and other parameters are emissions. (d) ISI histogram. (e) Autocorrelation of the
defined in Table 1 (see Nanolaser with quantum wells in gain. For each spike at ts , the interval (ts −0.5τ ; ts +τ ) was
Methods). For clarity, we normalize γp by the thresh- omitted from the analysis to exclude the highly nonlin-
old pumping strength in the absence of noise γpthr (see ear regime dynamics of the spiking process. The case far
Nanolaser with quantum wells in Methods) which we later from the threshold is fitted with an Ornstein-Uhlenbeck
refer to as the “threshold”. It is important to note that process, as also obeyed by the free membrane potential
with noise, spike emission can also occur with γp < γpthr . of LIFS neurons. (f ) Histograms of gain values between
Fig. 2a shows a time trace of the membrane potential of spike emissions and fit with normal distributions (black
a single LIFS neuron with parameters from [13] and Gaus- lines).
sian noise. This serves as a reference for the behavior of
PSNs, which we discuss below. In Fig. 2b we show time
traces of PSN gain with pumping far from and close to the Networks of spiking nanolasers
spiking threshold, respectively. In this system, the gain is
a stochastic variable as it depends on stochastic electron Interference-based interaction between PSNs is challeng-
densities. As a result, when a PSN is not refractory, the ing: their bistability, combined with a potentially large
gain undergoes a random walk as shown in Fig. 2c, simi- number, will complicate the necessary resonance align-
larly to the LIFS membrane potential. The ISI distribu- ment. For this reason, in this work we assume that PSNs
tion, gain autocorrelation and probability density function are incoherent, and their interaction is mediated by pho-
of a PSN are shown in Fig. 2d, e and f, respectively. These todiodes. Spikes are extracted from a PSN by coupling
are close to the corresponding properties of LIFS neurons, to a waveguide. Connection weights can be implemented
with small deviations explained by the additional nonlin- optically using a waveguide crossbar array [42]. There,
ear terms of the PSN state equations. each output waveguide incoherently combines optical sig-
It has been pointed out that the granularity of light nals from input waveguides, each weighted by a set of
challenges numerical integration based on the Langevin couplers. Due to a lack of interference, these couplers
forces [39–41]. Therefore, we further cross-checked the can only implement non-negative synaptic weights, but in
results by implementing a rigorous discrete PSN model, BMs, weights can be negative as well. For this reason, we
as proposed in [40] and concluded that both methods lead follow [43] and use balanced photodetectors, which require
to the same statistical properties of the PSN (see Discrete an N × 2N crossbar array for N PSNs (see Fig. 3a). As a
nanolaser model in Methods). result, optical spikes are converted into electrical current
3
a Pump b Time (τ) equation in Methods):
+ 0.0 0.5 1.0 1.5
Gain
+ where Gp is a drift term depending on the pumping
strength, and the second term corresponds to the PSN in-
− Laser teraction (see Eq. (4)). This equation is identical to that
0 5 10 15 of the membrane potential of a LIFS neuron (see Eq. (2))
PSP Time (ns) with two assumptions. First, na ≪ ne , which holds during
c non-spiking with moderate pumping strength. Second, is
that changes of ne due to incoming spikes need to be small,
Gain
4
a BM bias b 1.0 a 10 −1 b Samples collected
DKL(p ∥ p * )
−4 0 4 10 2 10 4 10 2 10 4 10 2 10 4
DKL(p ∥ p * )
1.0 10 −2 10 −1
p(z 1 = 1)
0.5 10 −2
p(z = 1)
10 −3 10 −3
0.5
0.0 0.5 1.0 10 0 10 2 10 0 10 2 10 0 10 2
𝜏U/𝜏 Time (μs)
0.0 0.0 c
Laser
0.96 1.00 1.04 −4 −2 0 2 4
Pumping strength (norm.) BM weight
0.15 0.30 0.45 0.60
Time (μs)
Figure 4: Translation of Boltzmann parameters to PSN d
Probability
10 −1
parameters. (a) Spiking nanolaser activation function 10 −2
(crosses) with a logistic function fit (line). (b) Impact 10 −3
of connection weight on activation of a receiving neuron. 10 −4
○○○○○
○○○○●
○○○●○
○○○●●
○○●○○
○○●○●
○○●●○
○○●●●
○●○○○
○●○○●
○●○●○
○●○●●
○●●○○
○●●○●
○●●●○
○●●●●
●○○○○
●○○○●
●○○●○
●○○●●
●○●○○
●○●○●
●○●●○
●○●●●
●●○○○
●●○○●
●●○●○
●●○●●
●●●○○
●●●○●
●●●●○
●●●●●
Lines and crosses show activation of a BM neuron and a
nanolaser, respectively. Each color corresponds to a re- State
ceiving neuron bias b1 = −3, −2, . . . 3 increasing along the e
Probability
−1
10
arrow. 10 −2
10 −3
p(z1 = 1|Gp,1 (b1 ), κ12 ). The goal is to find a proportion-
○○○○
○○○●
○○●○
○○●●
○●○○
○●○●
○●●○
○●●●
●○○○
●○○●
●○●○
●○●●
●●○○
●●○●
●●●○
●●●●
○○○
○○●
○●○
○●●
●○○
●○●
●●○
●●●
○○
○●
●○
●●
ality coefficient ξ such that State State
State
p(z1 = 1|Gp,1 (b1 ), ξκ12 ) ≈ p(z1 = 1|b1 , W12 ) . (7)
Figure 5: Sampling from random Boltzmann distribu-
The result is given in Fig. 4b. We find that for limited tions with PSN networks. (a) Optimization of the pho-
biases and weights, PSNs replicate the behaviour of LIFS todiode timescale: τU is swept and sampling performance
neurons well. Based on Eq. (7), we find the translation computed for PSN networks with maximum weights of
rule for weights: 0.6 (blue), 1.2 (orange) and 2.4 (green). The solid black
Wkj = ξκkj . (8) line is the mean DKL for each τU . The dashed line shows
τU = 0.37τ . (b) Convergence of sampling from 10 ran-
Optical sampling from Boltzmann distribu- dom Boltzmann distributions with weights up to 0.6, 1.2
tions and 2.4 from left to right. (c) Spike raster during sam-
pling from a Boltzmann distribution with weights up to
To investigate the accuracy of sampling with PSNs, we 2.4. (d) Sampling result for a Boltzmann distribution in
first consider sampling from predefined Boltzmann dis- (c). Gray: analytical distribution, green: sampling re-
tributions over a small set of binary random variables. sult. (e) Sampling from conditional distributions. From
Following [13], biases and weights were drawn from Beta left to right: p(z1345 |z2 = [1]), p(z245 |z13 = [1, 0]) and
distributions: bk ∼ 1.2(B(0.5, 0.5) − 0.5) and Wkj ∼ p(z12 |z345 = [1, 1, 1]). Colors match (d).
β(B(0.5, 0.5) − 0.5), where β controls the range of weights
and is either 0.6, 1.2 or 2.4. The generated biases and
weights are translated to PSN parameters using Eqs. (6) vergence of the sampling procedure towards the target dis-
and (8). The sampled distributions p are compared to the tribution. We find that within each set the convergence
target distributions p∗ by means of the Kullback-Leibler is almost identical. In Fig. 5d we compare a distribution
divergence DKL (p ∥ p∗ ). of samples to the exact distribution for β = 2.4, and we
First, we optimize the PSP timescale controlled by τU . note a very close match. Figure 5c shows a raster of spikes
On one hand, τU must be small enough such that a PSP during this sampling.
does not last longer than the refractory period τ . On Next, we demonstrate Bayesian inference by sampling
the other hand, too short τU will make the PSP short, but from conditional probability distributions. We split the
strong, which can break the operating regime assumptions five neurons in two arbitrary groups Y and X. The first
(see Simplified gain equation in Methods). In LIFS neu- group is clamped to an arbitrarily chosen state y, and
rons, the PSP timescale is ideally slightly shorter than the neurons in the second group are free; their probabil-
the refractory period [13, Fig. 7]; we use this as a starting ity distribution is the conditional probability distribution
point. In Fig. 5a we sweep τU and track the sampling ac- p(X|Y = y). In PSNs, we clamp the state by significantly
curacy for all considered β. We find that the optimal τU reducing or increasing the pumping strength. Figure 5e
is approximately 0.37τ . Figure 3b shows a PSP with such shows the close match between the correct conditionals
a timescale, and indeed, the PSP becomes negligible after and those sampled with our PSNs. We therefore conclude
t = τ. that a network of PSNs can sample accurately from a wide
We proceed to sample from sets of 10 random Boltz- range of Boltzmann distributions over small state spaces,
mann distributions for different β. Fig. 5b shows the con- as well as accurately perform Bayesian inference therein.
5
a Visible layer b
DKL(p ∥ p * )
10 0 ferred to a network of 14 PSNs. Its sampled distribution
over visible neurons pPSN (z) is also shown in Fig. 6c. We
10 −1 find that the accuracy of the PSN network is very close
to that of the RBM, and sampling from the target distri-
10 1 10 2 10 3 10 4
Hidden layer Samples bution p∗ (z) and several conditionals shown in Fig. 6d is
c correspondingly accurate.
Probability
10 −1
10 −2 Optical probabilistic inference
○○○○
○○○●
○○●○
○○●●
○●○○
○●○●
○●●○
○●●●
●○○○
●○○●
●○●○
●○●●
●●○○
●●○●
●●●○
●●●●
In this section, we demonstrate Bayesian inference from
0
State incomplete information with PSN networks. For better
Probability
d 10
visualization, we chose three images of digits “0”, “3” and
10 −1 “4” from the MNIST dataset [46], rescaled to 12×12 and
10 −2 with brightness rounded to zero or unity. These images
○○○
○○●
○●○
○●●
●○○
●○●
●●○
●●●
○○
○●
●○
●●
●
State State were mapped to a fully connected network of 144 PSNs,
State with one pixel assigned to each neuron. The PSN network
was then trained as an associative memory to store the
Figure 6: Sampling from an arbitrary distribution with prepared images (see Probabilistic inference training in
a network of PSNs. (a) Architecture of the implemented Methods). Similar to the simulations described in previous
PSN network. Each line represents a pair of symmetric sections, we first trained an equivalent BM with wake-
synaptic connections. (b) Convergence of sampling with sleep and then mapped the resulting parameters to the
an RBM (orange) and a PSN network (green). (c) Com- PSN network.
parison of the target distribution (blue) to the distribu- First, we assess the mixing capability of the network by
tions sampled by the RBM and the PSN network. (d) observing its “dreaming” phase (corresponding to the sleep
Sampling from conditional distributions. From left to phase during training). Without external input, the PSN
right: p(z134 |z2 = [0]), p(z24 |z13 = [1, 0]) and p(z3 |z124 = network correctly samples from the prior and switches
[1, 0, 1]). Colors match (c). randomly between states forming the three images with
approximately equal probability. Figure 7a shows a two-
dimensional projection of PSN samples onto vectors cor-
Optical sampling from arbitrary distribu- responding to the images (see Probabilistic inference vi-
tions sualization in Methods). We found that the result is close
to that of the BM, and in most cases the network switches
Boltzmann distributions are only a subset of all possible
between the numbers every few samples (Fig. 7b), indi-
probability distributions over binary variables. However,
cating good mixing between the states.
the “fully visible” sampling networks described above can
Next, we assess the inference capability of the network
be extended by adding “hidden” neurons that are not ob-
in a pattern completion scenario. By applying additional
served during sampling. Consequently, the probability
bias to a few neurons that are only active for two out of the
distribution of the visible layer becomes a marginal dis-
three patterns, we provide informative, but limited input
tribution over the full state space, which can, in principle,
to the network. We chose five pixels that are black for
take any shape, given a large enough hidden space. To
“0” and “3”, but not “4” (Fig. 7c), and apply an additional
simplify training and improve convergence, a hierarchical
positive bias to the corresponding neurons. The biases
network structure is preferable, with no horizontal connec-
of other neurons are unchanged, i.e. no information is
tions within individual layers [14]. Here, we emulate such
given. Such an input is ambiguous w.r.t. “0” and “3”, but
a two-layer restricted Boltzmann machine (RBM) with an
incompatible with “4”. As a result, the PSN network only
equivalently structured PSN network (see Fig. 6a). Given
generates complete images of “0” and “3”, and randomly
enough hidden neurons, an RBM can sample from any dis-
switches between them.
tribution with arbitrary precision [44]. Consequently, by
implementing such an RBM with PSNs, optical sampling
from arbitrary distributions can be achieved. Learning from data
For the target distribution p∗ (z) we choose four binary Sampling from arbitrary distributions implies the ability
variables with the probability of each state sampled from to sample from distributions dictated by real-world data.
the inverse continuous uniform distribution and normal- Here, we learn a generative model of handwritten digits
ized such that their sum is unity. The probability distri- based on the MNIST dataset [46]. Following [19], we
bution is shown in Fig. 6c. round the brightness of each pixel to minimum or max-
Using the contrastive divergence algorithm [45], we train imum.
an RBM with 4 visible and 10 hidden neurons. Its sam- To work with this dataset, we use a hierarchical sam-
pled distribution over visible neurons pRBM (z) is shown in pling network: an RBM with three layers: visible, hidden
Fig. 6c. and label (Fig. 8a). The brightness of pixels is mapped to
The parameters of the trained RBM were then trans- an activity of a neuron in the visible layer. The activity
6
a c d but instead needs to adapt to the style of each individual
input sample. Figure 8d shows the results for a few images
from the MNIST testing dataset. We found that in most
cases, the obscured parts were completed to a large degree
of accuracy.
For guided dreaming, label neurons were clamped for
200τ . To enforce top-down control (from labels to pixels),
we strengthened the weights between the hidden and label
b e
Count
Count
layers by a factor of 2, while marginally reducing the oth-
ers by 10% (see Discussion). In Fig. 8e, we show how the
1 2 3 4 5 6 >6 1 2 3 4 5 6 >6 PSN network can thereby be used for generating images
State lifetime State lifetime from all learned classes.
While BMs are designed primarily as genera-
Figure 7: Bayesian inference with a network of PSNs. tive networks, especially when trained purely with
(a) Two-dimensional projection of network states after contrastive Hebbian methods and without dedicated
sampling from the prior trained to store images of dig- backpropagation-based fine-tuning, input classification
its “0”, “3” and “4”. Colored dots show projections of the can still be viewed as a form of Bayesian inference; thus,
images. More samples are shown with darker dots; those it is instructive to compare classification performance be-
inside the half-circles are considered to be close to a corre- tween the original BM and its PSN implementation.
sponding digit. The red line shows a trajectory of 100 sam- Simulating the processing of 10,000 images of the test-
ples. (b) Histogram of time in samples spent inside the ing MNIST dataset with a PSN network of this size is
half-circles in (a) (colors match). (c) Input for Bayesian computationally intensive; we therefore drew samples un-
inference. The five red markers show the pixels where pos- til the saturation of the classification convergence curve
itive bias is applied. Such an input is ambiguous w.r.t. “0” (Fig. 8f). In this case, we drew 50 samples for each im-
and “3”, but incompatible with “4”. (d,e) Same as (a,b) age, yielding an average classification accuracy of 87.0%.
when provided the input shown in (c). The RBM is much less demanding; we could thus draw
500 samples, obtaining 87.8% accuracy. Their confusion
matrices are compared in Fig. 8g; we note their similarity,
in the label layer shows which digit is represented in the
as well as the only marginal performance loss caused by
visible layer at the current point in time. For example, if
the exchange of substrates.
visible neurons form an image of “0”, only the label neuron
Interestingly, during the completion task, the networks
corresponding to a “0” will spike.
perform classification as well. However, the accuracy of
In this section, we consider three tasks: completion,
the PSN network is reduced to 75.0%, compared to 85.6%
guided dreaming and classification. For completion, vis-
for the RBM (Fig. 8f); we expect this to be mitigated by
ible neurons are clamped to brightness of corresponding
further fine-tuning or direct in-situ training of the PSN
pixels except for a bottom-right quadrant, which is as-
network (see Discussion). However, in both cases, we ob-
sumed “obscured” and remains free alongside other neu-
serve that the PSN network only needs a few samples to
rons. The task is for the free visible neurons to complete
converge to a solution, which is faster than the RBM and
the obscured quadrant. For guided dreaming, one label
can prove beneficial in time-constrained scenarios.
neuron is clamped, and the visible neurons are expected to
form an image of a corresponding digit. For classification,
visible neurons are clamped to brightness of correspond- Discussion
ing pixels, and the most active label neuron on average is
taken as the answer (“winner-takes-all”). In this work, we have demonstrated the feasibility of spike-
For these tasks, we use the BM parameters from [48] based sampling using networks of photonic spiking neu-
(Fig. 8b,c), as they were already optimized for LIFS net- rons. We have rigorously derived an analogy between
works and are therefore expected to be favorable for an the gain dynamics of a two-section semiconductor laser
implementation with PSNs. This network is composed of and the membrane dynamics of biological neurons, both
1194 neurons: 784 in the visible, 400 in the hidden and above and below the spiking threshold. Using the resulting
10 in the label layers, respectively. Its implementation translation rules, we have mapped the learned parameters
with PSNs thus represents a scaling test for our general of Boltzmann machines to the corresponding quantities in
approach. nanolaser networks and have demonstrated accurate sam-
The simulation of PSNs during the tasks starts with a pling across varied tasks of different scale and complexity.
burn-in phase, where no input is provided. Then, the in- While the analogy between PSNs, ideal LIFS neurons
puts are provided sequentially, with each subsequent input and ideal BM neurons represents a good approximation,
following immediately after the previous one. we take note of two explicit differences. First, the refrac-
For pattern completion with PSNs, 20 samples were toriness in PSNs is a result of a steep drop in the gain.
drawn for each image. This is a difficult inference task, as This is more akin to the strong, but relative refractori-
the network should not simply produce an average image, ness of biological neurons, which is not as absolute as the
7
a d 0.5
Visible
Hidden Label
Label 02468
0.0
0.5 1.0 1.5
Time (μs)
e
b
160
Count
80
0
−8 −4 0
Bias
c 10 6
Count
10 4
10 2
−1 0 1
Weight
f 100 g
RBM PSN RBM (patch) PSN (patch)
0.3
Accuracy (%)
0123456789
75
Predicted label
50 0.2
RBM
PSN
25 RBM (p.) 0.1
PSN (p.)
0 0.0
10 1 10 2 0123456789 0123456789 0123456789 0123456789
Samples collected True label True label True label True label
Figure 8: Network of spiking nanolasers applied on the MNIST dataset. (a) Scheme of an hierarchical sampling network.
Each line represents a symmetric connection. (b)(c) Histograms of Boltzmann machine parameters. Strongly negative
biases (-30) were omitted from the figures. (d) Time trace of spiking nanolaser-based MNIST completion with a bottom-
right quarter patch occlusion. Restored pixels are orange. (e) Guided dreaming. Each image corresponds to a separate
dream. For each dream, the activity of neurons in the visible layer was averaged over the last 20τ of each dream. The
image positions correspond to projection in two dimensions using T-SNE [47]. (f ) Convergence of classification with
an RBM and a corresponding PSN network with complete images (solid lines) and with the bottom-right quarter patch
occluded (dashed lines). (g) Confusion matrices for classification (left pair) (right pair).
8
refractoriness assumed by the LIFS model. Second, the
Table 1: Parameters of the PSN model.
PSP shape in PSN networks is close to an alpha function. Parameter Value Definition
This represents a deviation from the interaction kernels in γ 0.2 THz photon damping rate
BMs, which are rectangular. Nevertheless, neither of these γr 2.79 × 10−6 γ transition rate
properties is significantly detrimental to the ultimate net- γr,a 5.31 × 10−6 γ same, for the SA
work sampling accuracy. This is in line with observations γt 1.28 × 10−3 γ carrier damping rate
from [11] and [13], which also explicitly address the issues γt,a 1.01 × 10−3 γ same, for the SA
of relative refractoriness and PSP shape. We expect the χg 3 differential gain ratio
accuracy to further improve when parameter translation n0 1.02 × 106 gain section dipole count
is replaced by in-situ training of PSN hardware. n0,a 8.20 × 105 same, in the SA
The use of photonic timescales fosters a large improve-
ment of sampling speeds compared to biological or elec-
tronic timescales. Even for highly accelerated neuromor- A naïve implementation with integrated programmable
phic systems such as BrainScaleS-1 [19] and BrainScaleS- switch matrices [52] would be limited by chip space. The
2 [10], convergence speeds for sampling from small Boltz- largest implementations currently demonstrated, such as
mann distributions over 5 random variables amount to a 240×240 port switch [53], would be still several times
ca. 10 seconds. With PSNs, these convergence times below the requirement. However, the number of necessary
drop by over 4 orders of magnitude to ca. 102 microsec- input ports can be significantly reduced by frequency mul-
onds. This acceleration factor would directly translate tiplexing. Three-dimensional interconnects [54] are also
to a corresponding decrease in times-to-solution for any promising for more extreme cases and multi-chip systems.
neuromorphic applications of SBS. Beyond the examples
of Bayesian inference discussed here, these include tasks
as diverse as stochastic constrained optimization [49] or Methods
quantum tomography [20, 21].
In this work, we have considered photonic neuronal sam- Nanolaser with quantum wells
plers ranging in size from a few PSNs up to more than a The rate Eq. (3) describe the interaction of light with
thousand. While the implementation of individual compo- quantum dots [36, 37]. These semiconductor nanostruc-
nents has seen recent experimental validation, the large- tures, akin to artificial atoms, localize carriers within a
scale implementation of PSN networks in integrated pho- few nanometers, and therefore are well modelled with a
tonics faces several challenges, which we discuss below. two-level electronic system. However, this work builds on
We have assumed photonic crystal nanolasers as PSNs. the experimental results reported in [30], where the gain
Their small footprint promises high-density integration, material consists of quantum wells. There, unlike in quan-
but power dissipation then becomes an important issue. tum dots, the gain depends nonlinearly on the density of
It is therefore necessary to reduce the amount of power carriers in the conduction band (i.e. the population of ex-
required for PSN operation. Electrically driven nanolasers cited dipoles in our model divided by the volume of the
with the pump threshold of 10 µA have been recently section). This nonlinear dependence is approximated by
demonstrated [50]. The photonic crystals nanolasers sim- a piecewise linear function. The slope is larger at lower
ilar to those used in this work can also be pumped elec- carrier density, with a ratio χg . To ensure that ne ≤ n0 ,
trically, requiring about 100 µA each [29]. We estimate we corrected the pumping term in Eq. (3) by replacing
that for the MNIST tasks, 1194 nanolasers would require ne → 2ne /(χg + 1).
3×3 mm2 of chip space and, excluding optical and elec- The pump rate at threshold γp is computed from Eq. (3)
trical losses and controller power consumption, approx- by assuming a steady state without noise, G = 0 and
imately 700 mW of power: 100 mW for pumping and S ≈ 0, which leads to γpthr = γt nthr thr
e /[n0 − 2ne /(χg + 1)],
600 mW for the amplification of spikes. With a sampling where nthr
e = (γ + γra n0,a + γr n0 )/2γr .
rate of 0.1 GHz, such a network of PSNs is equivalent to
an RBM running at 64 TFLOP/s and 95 TFLOP/J.
Nanolaser noise model
The photonic crystal nanolasers are complex optical
structures that require a dedicated fabrication process. A The stochastic Eq. (3) are composed of deterministic
recently demonstrated technique of micro transfer print- (“drift”) and stochastic (“diffusion”) parts that can be rep-
ing, where each cavity is transferred from a wafer on the resented in matrix form:
chip, has been shown to be scalable [51].
du = µ(u, t) dt + σ(u, t) dW ,
The implementation of dense programmable optical in-
terconnects is one of the most challenging and actively where u = [S, ne , na ]T . In this work, the diffusion term
pursued goals. In this work, the largest interconnect ma- follows the approach in [38, A.13.1.2] based on the Mc-
trix considered was 784×400. As the PSN network is Cumber noise model [55] and is comprised of five Langevin
incoherent, the implementation of positive and negative forces corresponding to the following groups of processes:
weights would require balanced photodetectors which dou-
bles the number of required matrix outputs (see Fig. 3a). 1. electron-photon interaction in the gain section,
9
2. same, in the SA, The second term on the first line is negligible compared to
3. electronic processes inside the gain section, the first, as during the random walk na ≪ ne . The terms
on the second line can be considered a drift term:
4. intrinsic optical loss,
5. electronic processes inside the SA. Gp = γ + γr,a n0,a + γr n0 + 2τG γr γp n0 .
The Langevin forces are stochastic processes represented where τG = 1/(γt +γp ). This way, the dynamical equation
with Wiener processes with zero average and cross- becomes
correlation 2Dij ; for i = j, the latter represents the au-
tocorrelation or noise spectral density. For the groups of dG ≈ [−(G − Gp )/τG + 2γr (n0 − ne )∆γp (t)] dt + σG dW .
processes described above, we find
e
Here, the interaction term can be nonlinear as ne changes
2DSS = γr n0 S + γr ne , due to incoming spikes; we assume that such interaction is
a
2DSS = γr,a n0,a S + γr,a na , weak. Moreover, during the walk, the gain section is close
2ne
to transparency, i.e. ne ≈ n0 /2. Finally, we find
o
2Dee = γp n0 − + (γt − γr )ne ,
1 + χg
γ
dG ≈ [−(G − Gp )/τG + 2γr n0 ∆γp (t)] dt + σG dW .
2DSS = γS ,
o
2Daa = (γt,a − γr,a )na , Discrete nanolaser model
where ne , na and S are the averaged (not stochastic) val- In this work, we assumed a continuous model, i.e. S, ne
ues. This way, the diffusion matrix becomes: and na are continuous variables and stochastic processes
are approximated by Langevin forces. However, the par-
σ(u, t) =
p e p p γ ticles are discrete, and so are the processes. In a typical
a
p2DSS 2DSS 2DSS laser, the number of photons and electron-hole pairs in
lasers is large, and such a model is a good approximation.
p
− 2De o
2Dee .
SS
However, for nanolasers this can be disputed, given their
p a
p
− 2DSS o
2Daa
small volume and the operation regime close to the thresh-
The equations are integrated using the SKSROCK old. Therefore, we consider a discrete model based on [37,
solver from the DifferentialEquations.jl library [56] in the 58].
Julia programming language [57]. We consider all processes separately – 10 total – as
stochastic with rates: γr/r,a Sne/a for stimulated emission
Simplified gain equation in the gain section / SA, γr/r,a S(n0/0,a − ne/a ) for pho-
ton absorption in the gain section / SA, γS for optical
Here, we derive the Eq. (5). Consider a network of PSNs.
loss, γr/r,a ne/a for spontaneous emission in the gain sec-
A derivative of the gain of a PSN is given in Eq. (3):
tion / SA, γt/t,a ne/a for nonradiative recombination and
Ġ = 2γr ṅe + 2γr,a ṅa . out-of-mode spontaneous emission in the gain section /
SA, γp (n0 − 2ne /(χg + 1)) for pumping. When an event
Assume the PSN is not currently emitting a spike, i.e. happens, a single particle is added or removed from ap-
its gain does a random walk shown in Fig. 2c, in which propriate variables. For example, for stimulated emission
case, S ≈ 0. Then, substitute the derivatives of electron in the SA, S → S + 1 and na → na − 1. Such a simulation
populations from Eq. (3): is considerably more computationally demanding, but is
rigorous and more accurate for this system.
Ġ ≈ − 2γr γt ne + 2γr (γp + ∆γp (t))(n0 − ne )−
The simulation was carried out using the SSAStepper
− 2γr,a γt,a na + 2γr Fe (t) + 2γr,a Fa (t) = solver from the DifferentialEquations.jl library [56]. Fig. 9
= − 2γr ne (γt + γp ) + 2γr γp n0 − 2γr,a γt,a na + shows a comparison between the models. We find that
+ 2γr ∆γp (t)(n0 − ne ) + 2γr Fe (t) + 2γr,a Fa (t) . they give very similar results in the operating regime of
interest.
where ∆γp (t) is given in Eq. (4). Then, replace 2γr ne =
G + γ − γr,a (2na − n0,a ) + γr n0 : Probabilistic inference training
Ġ ≈ − (γt + γp )(G + γ − γr,a (2na − n0,a ) + γr n0 )− The approach used here follows [13]. A fully visible BM
+ 2γr ∆γp (t)(n0 − ne ) + 2γr γp n0 − 2γr,a γt,a na + was trained to store three digits – “0”, “3” and “4” – taken
+ 2γr Fe (t) + 2γr,a Fa (t) . from the MNIST dataset and scaled down to 12×12. The
intensity of each pixel, ranged from zero to unity, was
Rearranging the terms we find rounded to 0.05 or 0.95. This way, we define the target
statistics ⟨zk ⟩tgt and ⟨zk zj ⟩tgt for the BM. We start with
Ġ ≈ − (γt + γp )G + 2γr,a (γt + γp − γt,a )na − arbitrary weights wkj and biases bj and collect a sufficient
− (γt + γp )(γ + γr,a n0,a + γr n0 ) + 2γr γp n0 − number of samples to estimate ⟨zk ⟩ and ⟨zk zj ⟩. Then, we
+ 2γr (n0 − ne )∆γp (t) + 2γr Fe (t) + 2γr,a Fa (t) . refine the BM parameters using the following update rules:
10
a Below b 1 Below
Training Networks action POST-DIGITAL, project num-
Gain autocor.
threshold threshold ber 860360. M.A.P. gratefully acknowledges the continu-
ing support of the Manfred Stärk Foundation for the Neu-
Count
0
Above
1 Above roTMA Lab.
threshold threshold
0
−0.15 0.00 0.0 1.5 3.0 References
Gain (norm.) Delay (ns)
[1] J. Göltz et al. “Fast and energy-efficient neuromor-
c d 1.0
phic deep learning with first-spike times”. In: Nature
machine intelligence 3.9 (2021), pp. 823–835.
p(z=1)
Count
11
[14] G. E. Hinton, S. Osindero, and Y.-W. Teh. “A fast [28] M. A. Nahmias et al. “A leaky integrate-and-fire
learning algorithm for deep belief nets”. In: Neural laser neuron for ultrafast cognitive computing”. In:
computation 18.7 (2006), pp. 1527–1554. doi: 10 . IEEE journal of selected topics in quantum electron-
1162/neco.2006.18.7.1527. ics 19.5 (2013), pp. 1–12. doi: 10 . 1109 / JSTQE .
[15] G. E. Hinton and R. R. Salakhutdinov. “Reducing 2013.2257700.
the dimensionality of data with neural networks”. In: [29] G. Crosnier et al. “Hybrid indium phosphide-on-
science 313.5786 (2006), pp. 504–507. silicon nanolaser diode”. In: Nature Photonics 11.5
[16] A.-r. Mohamed, G. E. Dahl, and G. Hinton. “Acous- (2017), pp. 297–300. doi: 10.1038/nphoton.2017.
tic modeling using deep belief networks”. In: IEEE 56.
transactions on audio, speech, and language process- [30] M Delmulle et al. “Excitability in a PhC nanolaser
ing 20.1 (2011), pp. 14–22. with an integrated saturable absorber”. In: European
[17] A. M. Abdel-Zaher and A. M. Eldeib. “Breast cancer Quantum Electronics Conference. Optica Publishing
classification using deep belief networks”. In: Expert Group. 2023, jsiii_6_2.
Systems with Applications 46 (2016), pp. 139–144. [31] A. L. Hodgkin and A. F. Huxley. “A quantitative de-
[18] G. Carleo and M. Troyer. “Solving the quantum scription of membrane current and its application to
many-body problem with artificial neural networks”. conduction and excitation in nerve”. In: The Journal
In: Science 355.6325 (2017), pp. 602–606. of physiology 117.4 (1952), p. 500.
[19] A. F. Kungl et al. “Accelerated physical emulation [32] M. Yamada. “A theoretical analysis of self-sustained
of bayesian inference in spiking neural networks”. In: pulsation phenomena in narrow-stripe semiconduc-
Frontiers in neuroscience 13 (2019), p. 1201. doi: tor lasers”. In: IEEE Journal of Quantum Electron-
10.3389/fnins.2019.01201. ics 29.5 (1993), pp. 1330–1336. doi: 10 . 1109 / 3 .
236146.
[20] S. Czischek et al. “Spiking neuromorphic chip learns
entangled quantum states”. In: SciPost Physics 12.1 [33] G. Björk, A. Karlsson, and Y. Yamamoto. “On the
(2022), p. 039. linewidth of microcavity lasers”. In: Applied physics
letters 60.3 (1992), pp. 304–306. doi: 10.1063/1.
[21] R. Klassert et al. “Variational learning of quantum 106693.
ground states on spiking neuromorphic hardware”.
In: Iscience 25.8 (2022). [34] S. Wieczorek et al. “The dynamical complexity of
optically injected semiconductor lasers”. In: Physics
[22] R. Ho, K. W. Mai, and M. A. Horowitz. “The future Reports 416.1-2 (2005), pp. 1–128. doi: 10.1016/j.
of wires”. In: Proceedings of the IEEE 89.4 (2001), physrep.2005.06.003.
pp. 490–504. doi: 10.1109/5.920580.
[35] G. P. Agrawal. “Population pulsations and nonde-
[23] H. Wünsche et al. “Excitability of a semiconduc- generate four-wave mixing in semiconductor lasers
tor laser by a two-mode homoclinic bifurcation”. In: and amplifiers”. In: JOSA B 5.1 (1988), pp. 147–
Physical review letters 88.2 (2001), p. 023901. doi: 159. doi: 10.1364/JOSAB.5.000147.
10.1103/PhysRevLett.88.023901.
[36] A. Moelbjerg et al. “Dynamical Properties of
[24] B. Romeira et al. “Excitability and optical pulse gen- Nanolasers Based on Few Discrete Emitters”. In:
eration in semiconductor lasers driven by resonant IEEE Journal of Quantum Electronics 49.11 (2013),
tunneling diode photo-detectors”. In: Opt. Express pp. 945–954. doi: 10.1109/JQE.2013.2282464.
21.18 (Sept. 2013), pp. 20931–20940. doi: 10.1364/
OE.21.020931. [37] J Mork and G. Lippi. “Rate equation description
of quantum noise in nanolasers with few emitters”.
[25] P. R. Prucnal et al. “Recent progress in semiconduc- In: Applied Physics Letters 112.14 (2018), p. 141103.
tor excitable lasers for photonic spike processing”. doi: 10.1063/1.5022958.
In: Advances in Optics and Photonics 8.2 (2016),
pp. 228–299. doi: 10.1364/AOP.8.000228. [38] L. A. Coldren, S. W. Corzine, and M. L. Masanovic.
Diode Laser and Photonic Integrated Circuits. Wi-
[26] J. Robertson et al. “Toward Neuromorphic Photonic ley, 2012. doi: 10.1002/9781118148167.
Networks of Ultrafast Spiking Laser Neurons”. In:
IEEE Journal of Selected Topics in Quantum Elec- [39] G. L. Lippi, J. Mørk, and G. P. Puccioni. “Numeri-
tronics 26.1 (2020), pp. 1–15. doi: 10.1109/JSTQE. cal solutions to the Laser Rate Equations with noise:
2019.2931215. technical issues, implementation and pitfalls”. In:
Nanophotonics VII. Vol. 10672. SPIE. 2018, pp. 82–
[27] S. Barbay, R. Kuszelewicz, and A. M. Yacomotti. 95. doi: 10.1117/12.2305948.
“Excitability in a semiconductor laser with saturable
absorber”. In: Optics letters 36.23 (2011), pp. 4476– [40] E. C. André, J. Mørk, and M. Wubs. “Efficient
4478. doi: 10.1364/OL.36.004476. stochastic simulation of rate equations and photon
statistics of nanolasers”. In: Optics Express 28.22
(2020), pp. 32632–32646. doi: 10.1364/OE.405979.
12
[41] D. Elvira et al. “Higher-order photon correlations in [56] C. Rackauckas and Q. Nie. “DifferentialEquations.jl
pulsed photonic crystal nanolasers”. In: Phys. Rev. – a performant and feature-rich ecosystem for solv-
A 84 (6 2011), p. 061802. doi: 10.1103/PhysRevA. ing differential equations in Julia”. In: Journal of
84.061802. Open Research Software 5.1 (2017), p. 15. doi: 10.
[42] S. Ohno et al. “Si microring resonator crossbar array 5334/jors.151.
for on-chip inference and training of the optical neu- [57] J. Bezanson et al. “Julia: A fresh approach to nu-
ral network”. In: Acs Photonics 9.8 (2022), pp. 2614– merical computing”. In: SIAM review 59.1 (2017),
2622. doi: 10.1021/acsphotonics.1c01777. pp. 65–98. doi: 10.1137/141000671.
[43] M. A. Nahmias et al. “A laser spiking neuron in [58] G. Puccioni and G. Lippi. “Stochastic Simulator for
a photonic integrated circuit”. In: arXiv preprint modeling the transition to lasing”. In: Optics Express
arXiv:2012.08516 (2020). doi: 10 . 48550 / arXiv . 23.3 (2015), pp. 2369–2374.
2012.08516.
[44] N. Le Roux and Y. Bengio. “Representational power
of restricted Boltzmann machines and deep be-
lief networks”. In: Neural computation 20.6 (2008),
pp. 1631–1649. doi: 10.1162/neco.2008.04- 07-
510.
[45] G. E. Hinton. “Training products of experts by min-
imizing contrastive divergence”. In: Neural computa-
tion 14.8 (2002), pp. 1771–1800.
[46] Y. LeCun et al. “Gradient-based learning applied to
document recognition”. In: Proceedings of the IEEE
86.11 (1998), pp. 2278–2324. doi: 10 . 1109 / 5 .
726791.
[47] L. Van der Maaten and G. Hinton. “Visualizing data
using t-SNE”. In: Journal of machine learning re-
search 9.11 (2008).
[48] A. Korcsak-Gorzo et al. “Cortical oscillations sup-
port sampling-based computations in spiking neu-
ral networks”. In: PLoS computational biology 18.3
(2022), e1009753. doi: 10 . 1371 / journal . pcbi .
1009753.
[49] M. Davies et al. “Advancing neuromorphic comput-
ing with loihi: A survey of results and outlook”. In:
Proceedings of the IEEE 109.5 (2021), pp. 911–934.
[50] E. Dimopoulos et al. “Electrically-Driven Photonic
Crystal Lasers with Ultra-low Threshold”. In: Laser
& Photonics Reviews 16.11 (2022), p. 2200109. doi:
https://doi.org/10.1002/lpor.202200109.
[51] A. S. Greenspon et al. “Scalable construction of hy-
brid quantum photonic cavities”. In: arXiv preprint
arXiv:2410.03851 (2024).
[52] H. Zhou et al. “Photonic matrix multiplication lights
up photonic accelerator and beyond”. In: Light: Sci-
ence & Applications 11.1 (2022), p. 30.
[53] T. J. Seok et al. “Wafer-scale silicon photonic
switches beyond die size limit”. In: Optica 6.4 (2019),
pp. 490–494.
[54] J. Moughames et al. “Three-dimensional waveguide
interconnects for scalable integration of photonic
neural networks”. In: Optica 7.6 (2020), pp. 640–646.
[55] D. E. McCumber. “Intensity Fluctuations in the
Output of cw Laser Oscillators. I”. In: Phys. Rev. 141
(1 Jan. 1966), pp. 306–322. doi: 10.1103/PhysRev.
141.306.
13