Rrams in Neuromorphic
Rrams in Neuromorphic
View Export
Online Citation
AFFILIATIONS
School of Electrical and Computer Engineering, Purdue University, 465 Northwestern Ave., West Lafayette, Indiana 47906, USA
Note: This paper is part of the special collection on Brain Inspired Electronics.
a)
Author to whom correspondence should be addressed: [email protected]
ABSTRACT
Historically, memory technologies have been evaluated based on their storage density, cost, and latencies. Beyond these metrics, the need to
enable smarter and intelligent computing platforms at a low area and energy cost has brought forth interesting avenues for exploiting non-
volatile memory (NVM) technologies. In this paper, we focus on non-volatile memory technologies and their applications to bio-inspired
neuromorphic computing, enabling spike-based machine intelligence. Spiking neural networks (SNNs) based on discrete neuronal “action
potentials” are not only bio-fidel but also an attractive candidate to achieve energy-efficiency, as compared to state-of-the-art continuous-
A.
Stochasticity—Opportunities and challenges . . . . . 22 efficient spike-based information passing, robustness, and adaptability.
B.
Challenges of NVM crossbars . . . . . . . . . . . . . . . . . . 22 Interestingly, both the brain’s cognitive ability and its energy-
C.
Mitigating crossbar non-idealities . . . . . . . . . . . . . . 24 efficiency stem from basic computation and storage primitives called
D.
Multi-memristive synapses . . . . . . . . . . . . . . . . . . . . 24 neurons and synapses, respectively.
E.
Beyond neuro-synaptic devices and STDP . . . . . . . 25 Networks comprising artificial neurons and synapses have, there-
F.
NVM for digital in-memory computing . . . . . . . . . 25 fore, been historically explored for solving various intelligent problems.
G.
Physical integrability of NVM technology with Over the years, neural networks have evolved significantly and are
CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 usually categorized based on the characteristic neural transfer function
V. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 as first, second, and third generation networks.4 As shown in Fig. 2,
AUTHORS’ CONTRIBUTION . . . . . . . . . . . . . . . . . . . . . . . . . 26 the first generation neurons, called as perceptrons,4 had a step function
response to the neuronal inputs. The step perceptrons, however, were
I. INTRODUCTION
not scalable to deeper layers and were extended to Multi-Layer
The human brain remains a vast mystery and continues to baffle Perceptrons (MLPs) using non-linear functional units.5 This is alluded
researchers from various fields alike. It has intrigued neuroscientists by to as the second generation neurons based on a continuous neuronal
its underlying neural circuits and topology of brain networks that output with non-linear characteristic functions such as sigmoid5 and
result in vastly diverse cognitive and decision-making functionalities ReLU (Rectified Linear Unit).6 Deep Learning Networks (DLNs) as we
as a whole. Equivalently, computer engineers have been fascinated by know it today are based on such second generation neural networks.
the energy-efficiency of the biological brain in comparison to the state- The present revolution in artificial intelligence is being currently fueled
of-the-art silicon computing solutions. For example, the Bluegene by such DLNs using global learning algorithms based on the gradient
supercomputer1 consumed mega-watts of power2 for simulating the descent rule.7 Deep learning has been used for myriad of applications
activity of cat’s brain.3 This is in contrast to 20 W of power account- including classification, recognition, prediction, cognition, and deci-
ing for much more complex tasks including cognition, control, move- sion making with unprecedented success.8 However, a major require-
ment, and decision making, being rendered simultaneously by the ment to achieve the vision of intelligence everywhere is to enable
brain. The massive connectivity of the brain fueling its cognitive abili- energy-efficient computing much beyond the existing Deep learning
ties and the unprecedented energy-efficiency makes it by far the most solutions. Toward that end, it is expected that networks of spiking neu-
remarkable known intelligent system. It is, therefore, not surprising rons hold promise for building an energy-efficient alternative to tradi-
FIG. 1. Neuromorphic computing as a brain-inspired paradigm to achieve cognitive ability and energy-efficiency of the biological brain. “Hardware” and “Algorithms” form the
two key aspects for neuromorphic systems. As shown in the right hand side, a generic neuromorphic chip consists of several “Neuro-Cores” interconnected through the
address event representation (AER) based network-on-chip (NOC). Neuro-Cores consist of arrays of synapses and neurons at the periphery. Non-volatile technologies
including PCM, RRAM, MRAM, and FG devices have been used to mimic neurons and synapses at various levels of bio-fidelity.
FIG. 2. Three generations of neural networks. First generation (Gen-I) of networks used step transfer functions and were not scalable, and second generation (Gen-II) uses transfer functions
such as Rectified Linear Unit (ReLU) that has fueled today’s deep learning networks. The third generation (Gen-III) uses spiking neurons resembling the neural activity of their biological coun-
terparts. The three components of an SNN are (1) neurons, (2) synapses, and (3) learning. (1) Neurons: three broad classes of spiking neurons that researchers attempt to mimic using NVMs
are Leaky-Integrate-Fire (LIF), Integrate-Fire (IF), and Stochastic Neurons. (2) Synapses: the key attributes needed for a particular device to function as a synapse are its ability to map synaptic
efficacy (wherein a synaptic weight modulates the strength of the neuronal signal) and that they can perform multiplication and dot-product operations. (3) Learning: as shown in the figure,
learning can be achieved either through supervised or unsupervised algorithms. From an NVM perspective, various NVM technologies are being used to mimic neuronal and synaptic function-
alities with appropriate learning capabilities. At an architectural level, arrays of such NVMs are connected through the network-on-chip to enable seamless integration of a large neural network.
From the energy-efficiency perspective, SNNs have two key aspects of biological neurons and synapses. Let us highlight few repre-
advantages. First, the fact that neurons exchange information through sentative behaviors for both neurons and synapses that form the basic
discrete spikes is explicitly utilized in hardware systems to enable set of neuro-synaptic dynamics usually replicated through non-volatile
energy-efficient event-driven computations. By event-driveness, it is devices.
implied that only those units in the hardware system are active, which
have received a spike, and all other units remain idle reducing the A. Neurons
energy expenditure. Second, such an event-driven scheme also enables Neural interactions are time varying electro-chemical dynamics
Address Event Representation (AER).9 AER is an asynchronous com- that gives rise to brain’s diverse functionalities. These dynamical
munication scheme, wherein the sender transmits its address on the behaviors in turn are governed by voltage dependent opening and
system bus and the receiver regenerates the spikes based on the closing of various charge pumps that are selective to specific ions such
addresses it receives through the system bus. Thereby, instead of trans- as Naþ and Kþ.10,11 In general, a neuron maintains a resting potential,
mitting and receiving the actual data, event addresses are exchanged across its cell membrane by maintaining a constant charge gradient.
between the sender and the receiver, leading to energy-efficient trans- Incoming spikes to a neuron lead to an increase in its membrane
fer of information. potential in a leaky-integrate manner until the potential crosses a cer-
In addition to emulation of neuro-synaptic dynamics and use of tain threshold after which the neuron emits a spike and remains non-
event-driven hardware, two notable developments, namely, (1) the responsive for a certain period of time called as the refractory period. A
emergence of various non-volatile technologies and (2) the focus on typical spike (or action potential) is shown in Fig. 3 highlighting the
learning algorithms for networks of spiking neurons, have accelerated specific movements of charged ions through the cell membrane.
the efforts in driving neural network hardware closer toward achieving Additionally, it has been known that the firing activity of neurons is
both energy-efficiency and improved cognitive abilities. Non-volatile stochastic in nature.12,13
technologies have facilitated area- and energy-efficient implementa- Having known the generic qualitative nature of neural function-
tions of neuromorphic systems. As we will see in Sec. III of the manu- ality, it is obvious that a resulting model, describing the intricacies of a
script, these devices are of particular interest since they are governed biological neuron, would consist of complex dynamical equations. In
by intrinsic physics that can be mapped directly to certain aspects of fact, detailed mathematical models such as Hodgkin–Huxley model14
biological neurons and synapses. This implies that instead of using and spike response model have been developed, which closely match
multiple transistors to imitate neuronal and synaptic behavior, in the behavior of biological neurons. However, implementing such mod-
FIG. 3. (a) The biological neuron and a typical spiking event. Various ions and the role they play in producing the spiking event are shown. (b) A simplified neural computing
form, a stochastic firing behavior can be modeled by a firing manipulating the release of neurotransmitters and controlling the
probability, which increases with the input stimulus. However, responsiveness of the cells to them. Such plasticity is believed to be the
stochasticity can also be combined with LIF and IF neurons, such fundamental basis of learning and memory in the biological brain.
that once the neuron crosses the threshold, it only emits a spike From the neuromorphic perspective, synaptic learning strategies can
based on a probabilistic function. be broadly classified into two major classes: (1) unsupervised learning
and (2) supervised learning.
LIF neurons are most widely used in the domain of SNNs. The
leaky nature of LIF neurons renders a regularizing effect on their firing
rates. This can help particularly for frequency based adaptation mech- 1. Unsupervised learning
anisms that we will discuss in the next section.18 IF neurons are typi-
cally used in supervised learning algorithms. In these algorithms, the Unsupervised learning is a class of learning algorithms associated
learning mechanism does not have temporal significance, and hence, with self-organization of weights without the access to labeled data. In
temporal regularization is not required. Stochastic neurons, on the the context of hardware implementations, unsupervised learning
other hand, have a different computing principle. Due to the probabil- relates to biologically inspired localized learning rules where the weight
istic nature of firing, it can also act as a regularizer and also lead to bet- updates in the synapses depend solely on the activities of the neurons
ter generalization behavior in neural networks. All the aforementioned on its either ends. Unsupervised learning in spike-based systems can
neurons can leverage the inherent device physics in NVM devices for be broadly classified into (i) Spike Timing Dependent Plasticity
efficient hardware implementation. (STDP) and (ii) frequency dependent plasticity.
Spike timing dependent plasticity (STDP), shown in Fig. 4, is a
learning rule, which strengthens or weakens the synaptic weight based
B. Synapses on the relative timing between the activities of the connected neurons.
Information in biological systems is governed by transmission of This kind of learning was first experimentally observed in rat’s hippo-
electrical pulses between adjacent neurons through connecting bridges, campal glutamatergic synapses.19 It involves both long-term potentia-
commonly known as synapses. Synaptic efficacy, representing the tion (LTP),20 which signifies the increase in the synaptic weight2þ, and
strength of connection through an internal variable, is the basic crite- long-term depression (LTD), which signifies a reduction in the synap-
rion for any device to work as an artificial synapse. Neuro-chemical tic weight. LTP is realized through STDP when the post-synaptic neu-
changes can induce plasticity in synapses by permanently ron fires after the pre-synaptic activity, whereas LTD results from an
2. Supervised learning
Although unsupervised learning is believed to form the dominant
part of learning in biological synapses, the scope of its applicability is
still in its nascent stages in comparison to conventional deep learning.
An alternative ex situ learning methodology to enable spike-based
processing in deep SNNs is restricting the training to the analog
domain, i.e., using the greedy gradient descent algorithm as in conven-
tional DLNs and converting such an analog valued neural network to
the spiking domain for inferencing. Various conversion algo-
rithms24–26 have been proposed to perform nearly lossless transforma-
tion from the DLN to the SNN. These algorithms address several
concerns pertaining to the conversion process, primarily emerging due
to differences in neuron functionalities in the two domains. Such con-
version approaches have been demonstrated to scale to state-of-art
neural network architectures such as ResNet and VGG performing
classification tasks on complex image datasets as in ImageNet.27 More
FIG. 4. Different kinds of learning strategies can be broadly classified into (i) spiking
timing dependent plasticity (STDP), (ii) frequency dependent plasticity, and (iii) gradient- recently, there has been a considerable effort in realizing gradient-
based learning. STDP induces both potentiation and depression of synaptic weights in a based learning in the spiking domain itself28 to eliminate conversion
non-volatile fashion based on the difference in spike timing of pre-neurons and post- losses.
neurons, Dt. Classical STDP assumes an exponential relationship with Dt, as demon-
strated by Bi and Poo.19 Other variants of STDP have also been observed in mamma- III. NON-VOLATILE TECHNOLOGIES FOR
lian brains. Frequency dependent plasticity manifests itself in the form of short-term NEUROMORPHIC HARDWARE
plasticity (STP) and long-term potentiation (LTP). The change in the synaptic weight, in
this case, depends on how frequently the synapse receives stimulus. STP and LTP form
As elaborated in Sec. II, SNNs not only are biologically inspired
the basis of short-term and long-term memory in biological systems. Finally, gradient- neural networks but also potentially offer energy-efficient hardware
based learning is a supervised learning scheme where the change in the synaptic weight solutions due to their inherent sparsity and asynchronous signal
depends on gradients calculated from error between the predicted and the ideal output. processing. Advantageously, non-volatile technologies provide two
additional benefits with respect to neuromorphic computing. First, the about possible commercial offerings—for high density, large-scale
inherent physics of such devices can be exploited to capture the func- storage solutions.31 These materials can encode multiple intermediate
tionalities of biological neurons and synapses. Second, these devices states, rendering them the capability of storing multiple bits in a single
can be connected in a crossbar fashion allowing analog-mixed signal cell. More recently, PCM devices have also emerged as a promising
in-memory computations, resulting in highly energy-efficient hardware candidate for neuromorphic computing due to their multi-level stor-
implementations. age capabilities. In this section, we discuss various neuromorphic
In this section, we first delve into the possibilities and challenges applications of PCM devices.
of such non-volatile devices, based on various technologies, used to
emulate the characteristics of synapses and neurons. Subsequently, we 1. PCM as neurons
describe how crossbar structures of such non-volatile devices can be
used for in-memory computing and the associated challenges. PCM devices show reversible switching between amorphous and
crystalline states, which have highly contrasting electrical and optical
properties. In fact, this switching dynamics can directly lead to inte-
A. Phase change devices grate and firing behaviors in PCM-based neurons. The device struc-
Phase change materials (PCMs) such as chalcogenides are the ture of such a neuron comprises a phase change material sandwiched
front-runners among emerging non-volatile devices—with speculation between two electrodes, as shown in Fig. 5(a). The mushroom
FIG. 5. (a) Device structure of a PCM-based IF neuron.29 The thickness of the amorphous region (shown in red) represents the membrane potential of the neuron. The inte-
grating and firing behaviors for different incident pulse amplitudes and frequencies are shown (bottom). (b) Device structure of a photonic IF neuron based on PCM (GST).30
The input pulses coming through the INPUT port get coupled to the ring waveguide and eventually to the GST element, changing the amorphous thickness. The output at the
“THROUGH” port represents the membrane potential, which depends on the state of the GST element.
structure shows the shape of the switching volume just above the performed when an exponential current above the threshold voltage
region known as the heater. The heater is usually made of resistive ele- leads to heating of the material above its crystallization temperature
ments such as W, and high current densities at the contact interface and switches it to the crystalline state, as depicted by the I–V charac-
between the phase change material and the heater cause locally con- teristics in Fig. 6(a). The crystallization (or “SET”) pulses are much
fined Joule heating. When the PCM in the neuron is in its initial amor- longer as opposed to amorphization (or RESET) pulses, as shown in
phous state, a voltage pulse that has an amplitude low enough so as to Fig. 6(b). Multiple states are achieved by progressively crystallizing the
not melt the device but high enough to induce crystal growth can be material, thus reducing the amorphous thickness.
applied. The resulting amorphous thickness, ua, on application of such These multi-level PCM synapses can be used to perform unsu-
a pulse is given as29 pervised on-chip learning using the STDP rule.33 LTP and LTD using
STDP involves a gradual increase and decrease in conductance of
dua PCM devices, respectively. However, such a gradual increase or
¼ vg ðRth ðua ÞPp þ Tamb Þ; ua ð0Þ ¼ u0 (3)
dt decrease in conductance needs to ensure precise control, which is diffi-
where vg is the crystal growth velocity dependent on the temperature cult to achieve using identical current pulses. As a result, by configur-
determined by its argument Rth ðua ÞPp þ Tamb . Here, Rth is the thermal ing a series of programming pulses of increasing or decreasing
resistance and Tamb is the interface temperature between amorphous amplitude [Fig. 7(a)], both LTP and LTD have been demonstrated
and crystalline regions. The variable, ua, in Eq. (1) can be interpreted using PCM devices.34–36 In this particular scheme, the pre-spikes con-
as the neuron’s membrane potential where Pp is the input variable sist of a number of pulses of gradually decreasing or increasing pulses,
controlling the dynamics. On successive application of crystallization whereas the post-spike consists of a single negative pulse. The differ-
pulses, the amorphous thickness, ua, decreases, leading to lower con- ence between the magnitude of the pre-spike and post-spike due to
ductance and temporal integration of the membrane potential. overlap of the pulses varies with the time difference, resulting in the
Beyond a certain threshold conductance level, the neuron fires, or in change in conductance of the synapse following the STDP learning
other words, the PCM changes to a crystalline state. A reset mecha- rule. The scheme for potentiation is explained in Fig. 7(a). A simplified
nism puts the neuron back in its original amorphous state. The afore- STDP learning rule with constant weight update can also be imple-
mentioned integrate-and-fire characteristics in PCM neurons are mented using a single programming pulse by shaping the pulses
accompanied by inherent stochasticity. The stochasticity arises from appropriately33 as shown in Fig. 7(b). However, such pulse shaping
different amorphous states created by repeated resets of the neuron. requires additional circuitry. These schemes rely on single PCM devi-
ces representing a synapse. Alternatively, using a “2-PCM” synapse,
FIG. 7. (a) STDP learning in PCM synapses34 by a series of pulses of increasing (decreasing) amplitude demonstrating LTP behavior (left) similar to neuroscientific experi-
in the optical response of PCM elements by modulating the refractive Currents from all the devices in a column get added in accordance
index can be achieved through varying the number of programming with Kirchoff’s current law to produce a column-current, which is a
pulses. This has been exploited to experimentally demonstrate unsu- result of the dot-product of the voltages and conductance. Such a dot-
pervised STDP learning in photonic synapses.39 To scale beyond single product operation can be mathematically represented as
devices, the rectangular waveguides used in this work can be replaced X
with microring resonators to perform unsupervised learning in an Ij ¼ Vi Gij ; (4)
atemporal fashion.40 i
where Vi represents the voltage on the i-th row and Gij represents the
3. PCM crossbars conductance of the element at the intersection of the i-th row and j-th
We have thus far talked about isolated PCM devices mimicking columns. This ability of parallel computing within the memory array
the neuronal and synaptic behaviors. Interestingly, these devices can using single-element memory elements capable of packing multiple
be connected in an integrated arrangement to perform in-memory bits paves the way for faster, energy-efficient, and high-storage neuro-
computations involving a series of multiply and-accumulate (MAC) morphic systems.
operations. Such operations can be broadly represented as a multipli- In addition to synaptic computations, PCM crossbars can also be
cation operation between an input vector and the synaptic weight used for on-chip learning that involves dynamic writing into individ-
matrix, which is key to many neural computations. Vector–matrix ual devices. However, parallel writing to two-terminal devices in a
multiplication (VMM) operations require multiple cycles in a standard crossbar is not feasible as the programming current might sneak to
von-Neumann computer. Interestingly, arranging PCM devices in a undesired cells, resulting in inaccurate conductance updates. To allevi-
crossbar fashion (or in more general terms, arranging resistive memo- ate the concern of sneak current paths, two-terminal PCM devices are
ries in a crossbar fashion) can engender a new, massively parallel para- usually used in conjunction with a transistor or a selector. Such mem-
digm of computing. VMM operation, which is otherwise a fairly ory cell structures are termed as “1T-1R” or “1S-1R” (shown in Fig. 8)
cumbersome operation, can be performed organically through the and are extensively used in NVM crossbar arrays. Such 1T-1R crossbar
application of Kirchoff’s laws as follows. This can be understood arrays can be seamlessly used for on-line learning schemes such as
through Fig. 8, where each PCM device encodes the synaptic strength STDP. To that effect, PCM crossbars were used as one of the first of its
in the form of its conductance. The current through each device is pro- kind to experimentally demonstrate on-chip STDP based learning,41,42
portional to the voltage applied and the conductance of the device. and simple pattern recognition tasks were conducted using the arrays.
dw
I ¼ gM ðw=LÞVðtÞ; ¼ f ðwðtÞ; VðtÞÞ: (5)
dt
Interestingly, the RRAM device can be used in an integrator circuit as
a resistor in parallel to an external capacitance, as shown in Fig. 9
(top), to emulate the LIF characteristics where the conductance of the
device can be used as an internal variable.59 When the memristor is in
its OFF state, the current through the circuit is low, and hence, it does
not output a spike. Once the memristor reaches its ON state, the
current suddenly jumps, which can be converted to analog spike. The
voltage across the memristor, in that case, obeys the dynamics of a LIF
neuron, given by Eq. (1) in Sec. II A. A similar neuron circuit has also
been explored for CBRAM devices based on Cu=Ti=Al2 O3 60 [Fig. 9 FIG. 10. (a) Basic device structure for RRAM devices consisting of a metal-oxide
layer sandwiched between two electrodes. (b) I–V characteristics showing varying
(bottom)]. Unlike PCMs, to emulate the differential equations of the SET and RESET points, leading to different resistance states.
LIF neuron, an R-C circuit configuration is used. If the leaky behavior
is not required, the internal state of the neuron or the membrane
oxide RRAM, the switching mechanisms can be categorized as (a) fila-
potential can be directly encoded in the oxygen concentration in the
mentary and (b) non-filamentary. The filamentary switching results
device. By manipulating the migration of oxygen vacancies using
due to the formation and rupture of filamentary conductive paths due
post-synaptic pulses, IF neurons can be realized by oxide-based
to thermal redox reactions between metal electrodes and the oxide
devices.63 To that effect, oxide-based devices have been used to
material. The “forming” or SET process occurs at a high electric field
design common neuronal models involving leaky behavior, such as
due to the displacement and drift of oxygen atoms from the lattice.
the Hodgkin–Huxley model and leaky IF model.64
These oxygen vacancies form localized conductive filaments, which
form the basis of filamentary conduction in RRAM devices. The form-
2. Metal-oxide RRAMs and CBRAMs as synapses ing voltage can be reduced by thinning down the oxide layer65 and
Much like PCM devices, RRAM devices can also be programmed controlling annealing temperatures during deposition.66 The RESET
to multiple intermediate states between the two extreme resistance mechanism, on the other hand, is well debated, and ionic migration
has been cited as the most probable phenomenon.67,68 A unified model
controlled by using a higher pulse amplitude. Stochastic synapses have RRAM devices showing non-filamentary switching.80 In addition to
the ability to encode information in the form of probability, thus long-term learning methods, RRAM devices with controllable volatil-
achieving significant compression over deterministic counterparts. ity can also be used to mimic frequency dependent learning, thus
Learning stochastically using binary synapses has been demonstrated enabling a transition from short-term to long-term memory.81 By con-
to achieve pattern learning.76 Unsupervised learning using multi-state trolling the frequency and amplitude of the incoming pulses, STP-LTP
memristors can also be performed probabilistically to yield robust characteristics have been achieved in WO3 based RRAM synapses.82
learning against corrupted input data.77 In general, higher amplitude pulses in quick succession are required to
Oxides of some transition metals, such as Pr0:7 Ca0:3 MnO3 transition the device from decaying weights to a more stable persistent
(PCMO), exhibit non-filamentary switching as well. This type of state. Such metastable switching dynamics of RRAM devices have
switching, on the other hand, results from several possible phenomena been used to perform spatiotemporal computation on correlated
such as charge-trapping or defect migration at the interface of metal patterns.83
and oxide, which end up modulating the electrostatic or Schottky bar- Thus far, we have discussed how metal-oxide RRAM devices can
rier. Although the switching physics in non-filamentary RRAM devi- emulate synaptic behavior. Next, we will discuss CBRAM devices,
ces is different from that in filamentary RRAMs, the fundamental which also exhibit similar switching behavior by just replacing the
behavior of using these RRAM devices as synapses is quite similar. oxide material with an electrolyte. The switching mechanism is analo-
Non-filamentary RRAMs can also be programmed using different gous to filamentary RRAM except that the filament results in a metal-
voltage pulses to exhibit multi-level synaptic behavior. Moreover, vary- lic conductive path due to electro-chemical reactions. This technology
ing pulse widths can instantiate partial SET/RESET characteristics, has garnered interest due to its fast and low-power switching. Most
which have been used to implement STDP characteristics in RRAM CBRAM devices are based on Ag electrodes where resistive switching
synapses.78,79 By encoding the conductance change using the number behavior is exhibited due to the contrast in conductivity in Ag-rich
of pulses coupled with appropriate waveform engineering can enable and Ag-poor regions. The effective conductance of such a device can
various kinds of STDP behaviors, explained in Sec. II B, of isolated be written as88
Crossbar size 512 512 108 54 128 128 128 16 512 512
ON/OFF ratio 10 5 N/A 10 N/A
Area per operation (lm2 ) 22.12 24 0.05 31.15 N/A
Latency (ns) 80 10 13.7 0.6 9.8
Energy-efficiency (TOPS/W) 28 1.37 141 11 121.38
stochastic-LIF neuron. MTJs are composed of two ferromagnetic (FM) effectively using the stochastic-Landau–Lifshitz–Gilbert–Slonczewski
nanomagnets sandwiching a spacer layer105 as shown in Fig. 12(a). (s-LLGS) equation,
Nanomagnets encode information in the form of the direction of
@m^ @m^ 1
magnetization and can be engineered to stabilize in two opposite ¼ jcjðm^ HEFF Þ þ a m^ þ ðm
^ Is mÞ
^
directions. The relative direction of the two FMs—parallel (P) vs anti- @t @t qNs
parallel (AP)—results in two distinct resistive states—LOW vs HIGH 1 þ a2 @ m ^
resistance. Switching the MTJ from the P to the AP state or vice versa ¼ ðm^ HEFF Þ þ aðm^ m^ HEFF Þ
c @t
can be achieved by passing a current through the MTJ, resulting in 1
transfer of torque from the incoming spins to the FMs. Interestingly, þ ðm
^ Is mÞ
^ (7)
qNs
the dynamics of the spin under excitation from a current induced tor-
que can be looked upon as a stochastic-LIF dynamics. Mathematically, ^ is the unit vector of free layer magnetization, c is the gyro-
where m
the spin dynamics of an FM, shown in Fig. 12(b), can be expressed magnetic ratio for the electron, a is Gilbert’s damping ratio, and HEFF
FIG. 12. (a) MTJ-based neuron102 showing the device structure (top) and leaky-integrate characteristics (bottom). Sengupta et al., Sci. Rep. 6, 30039 (2016). Copyright 2016
Author(s), licensed under a Creative Commons Attribution (CC BY) license. The magnetization of the free layer of the MTJ integrates under the influence of incoming current
pulses. (b) ME oxide-based LIF neuron103 showing the device structure (top) and LIF characteristics (bottom). Reproduced with permission from Jaiswal et al., IEEE Trans.
Electron Devices 64(4), 1818–1824 (2017). Copyright 2017 IEEE. (c) SHE-MTJ-based stochastic neuron102 showing the device structure (top) and the stochastic switching
characteristics (bottom). Reprinted with permission from Sengupta et al., Sci. Rep., 6, 30039 (2016); Copyright 2016 Author(s), licensed under a Creative Commons Attribution
(CC BY) license. (d) DWM-based IF spiking neuron104 showing the device structure (top) and integration and firing behavior (bottom) over time. For incident input spikes, the
domain wall moves toward the MTJ at the end, thus decreasing the resistance of the device. When the domain wall is at the end, the resistance reaches its lowest, enough for
the neuron fires. Reproduced with permission from Sengupta and Roy, Appl. Phys. Rev. 4(4), 041105 (2017). Copyright 2017 AIP Publishing.
is the effective magnetic field including the shape anisotropy field, the layer, and a spin-polarized current is generated, which rotates the
external field, and thermal field. This equation bears similarities with magnetization in the adjacent MTJ such that the switching probability
the leaky-integrate-and-fire behavior of a neuron. The last term repre- increases as the magnitude of the input current is increased. This in
sents the spin transfer torque (STT) phenomenon, which causes the turn implies that the incoming current passes through a much lower
magnetization to rotate by transferring the torque generated through metal resistance and sees a constant metal resistance throughout the
the change in angular momentum of incoming electrons. switching process as opposed to current based switching in conven-
Interestingly, the first two terms can be related to the “leak” dynamics tional two-terminal MTJs. As we will see later, the existence of a low
in an LIF neuron, while the last term relates to the integrating behavior input resistance for the neuron allows easy interfacing with synaptic
of the neuron as follows. When an input current pulse or “spike” is crossbar arrays. Second, the decoupled read-write path in SOT-MTJs
applied, the magnetization starts integrating or precessing toward the allows for independent optimization of the read (inferencing) and
opposite stable magnetization state owing to the STT effect (last term). write (learning) paths. A typical SOT-MTJ and its sigmoid-like
In the absence of such a spike, the magnetization leaks back toward stochastic switching behavior are shown in Fig. 12(c). While the
the original magnetization state (Gilbert damping, second term). aforementioned behaviors depicted in Fig. 12(c) correspond to an
Furthermore, due to nano-scale size of the magnet, the switching SOT-MTJ with a high energy-barrier (10–60 kT), telegraphic
dynamics is a strong function of a stochastic thermal field, leading to SOT-MTJ with an energy-barrier as low as 1 kT has also been explored
the stochastic behavior. This thermal field can be modeled using as stochastic neurons.107
Brown’s model.106 In terms of Eq. (7), the thermal field can be incor- In addition to smaller magnets, wherein the entire magnet
porated into HEFF as a magnetic field, switches like a giant spin, longer magnets known as domain wall mag-
sffiffiffiffiffiffiffiffiffiffiffiffiffiffi nets (DWMs)108 have been used as IF neurons. DWMs consist of two
2akT oppositely directed magnetic domains separated by a domain wall [see
Hthermal ¼ f ; (8)
jcjMs V Fig. 12(d)]. Electrons flowing through the DWM continuously
exchange angular momentum with the local magnetic moment.
where f is a zero mean, unit variance Gaussian random variable, V Current induced toque affects the misaligned neighboring moments
is the volume of the free layer, T is the temperature, and k is the around the domain wall region, thus displacing the domain wall along
Boltzmann constant. A typical, stochastic-LIF behavior using MTJ the direction of current flow. The instantaneous membrane potential
is shown in Fig. 12(a).102 While the two-terminal MTJ does repre- is encoded in the position of the domain wall, which moves under the
FIG. 13. STDP learning scheme in the DWM-based spin synapse112 using peripheral transistors. The exponential characteristics of STDP are realized by operating MSTDP in
the sub-threshold region and applying a linearly increasing voltage at its gate. MSTDP is activated when a pre-neuron spikes, and the programming current (shown in blue)
through the transistor is injected into the HM layer (grey) when a post-neuron spikes. Reproduced with permission from Sengupta et al., Phys. Rev. Appl. 6(6), 064003 (2016).
Copyright 2017 American Physical Society.
Anti-Parallel (AP) configuration, which defines the HIGH resistance controlled by varying the amplitude or duration of the programming
state of the device. With respect to the position of the domain wall, x, pulse as shown in Fig. 12(c). This benefit of controlled stochasticity
the resistance of the device changes as leads to energy-efficient learning in binary synapses implemented
x Lx using MTJs.115,116 An advantage of on-chip stochastic learning is that
Geq ¼ GP þ GAP þ GDW : (9) the operating currents are lower than the critical current for switching,
L L
crossbar fashion to emulate large-scale neural networks, both in a fully as a viable option to emulate synaptic behavior for large-scale neuro-
connected form114 and as convolutional networks.121 In addition to morphic systems.
inferencing frameworks based on spin synapses, STDP based learn-
ing112 has also been explored at an array-level, as shown in Fig. 14, to
perform feature recognition and image classification tasks. As dis- D. Ferroelectric FETs
cussed earlier, MTJ-based binary synapses require stochasticity for
effective learning. They can leverage the inherent stochasticity in the Similar to the phase change and ferromagnetic materials, another
network, and a population of such synapses can perform on-line learn- member of functional material family is ferroelectric (FE) materials. In
ing, which not only achieves energy-efficiency but also enables addition to being electrically insulating, ferroelectric materials exhibit
extremely compressed networks.116 non-zero spontaneous polarization (P), even in the absence of an
These simulation-based designs and results show significant applied electric field (E). By applying an external electric field (more
promise for spin based neuromorphic systems. However, several tech- than a threshold value, called the coercive field), the polarization direc-
nological challenges need to be overcome to realize large-scale systems tion can be reversed. Such an electric field driven polarization switch-
with spin. As alluded to earlier, the ON/OFF ratio between the two ing behavior of FE is highly non-linear (compared to di-electric
extreme resistance states is governed by the TMR of the MTJ, which materials) and exhibits non-volatile hysteretic characteristics. Due to
has been experimentally demonstrated to reach 600% (Ref. 122), lead- the inherent non-volatile nature, FE based capacitors have been histor-
ing to an ON/OFF ratio of 7. This is significantly lower than other ically investigated for non-volatile memory elements. However, in fer-
competitive technologies and poses a limitation on the range of synap- roelectric field effect transistors (FEFETs), an FE layer is integrated at
tic weight representation at an array level. Second, MTJs can only rep- the gate stack of a standard transistor and thus offers all the benefits of
resent binary information. For multi-bit representation, it is necessary CMOS technology in addition to several unique features offered by
to use domain wall devices or multiple binary MTJs at the cost of FE. The FE layer electrostatically couples the underlying transistor.
area density. However, since synapses in the neural networks usually Due to such coupling, FEFETs offer non-volatile memory states by vir-
encode information in an analog fashion, the lack of multi-state tue of polarization retention of FE. Beside CMOS process compatibil-
representation in MTJs can potentially limit the area-efficiency of ity, one of the most appealing features of FEFET based memory is the
non-volatile spin devices for neuromorphic applications. The lack of ability of voltage based READ/WRITE operation, which is unlike the
multi-bit precision can be alleviated with architectural design facets current based READ/WRITE schemes in other non-volatile memory
FIG. 14. A crossbar arrangement of spintronic synapses connected between pre-neurons A and B and post-neurons C and D, showing peripheral circuits for enabling STDP
learning.112 Reproduced with permission from Sengupta et al., Phys. Rev. Appl. 6(6), 064003 (2016). Copyright 2017 American Physical Society.
this section, we will briefly discuss the recent progress in FEFET based theoretically predicted in Ref. 126. Such spontaneous polarization
neuro-mimetic devices. relaxation has been attributed as the cause of domain wall instabil-
ity,126 and such a process has recently been experimentally demon-
1. FEFETs as neurons strated in an HfxZr1-xO2 (HZO) thin-film.127 By harnessing such a
quasi-leaky behavior along with the accumulative and abrupt polariza-
The dynamics in a ferroelectric FET device can be used to mimic tion switching in FE, a quasi-leaky-integration-fire (QLIF) type FEFET
the functionality of a biological neuron. In a scaled FEFET, if identical based neuron can offer an intrinsic homeostatic plasticity. Network
sub-threshold pulses (“sub-coercive” in the context of FE) are applied level simulations utilizing the QLIF neuron showed a 2.3 reduction
at the gate terminal [shown in Fig. 15(a) (leftmost)], the device in the firing rate compared to the traditional LIF neuron while main-
remains in the OFF state (since the sub-threshold pulses are insuffi- taining the accuracy of 84%–85% across varying network sizes.127
cient for polarization switching). However, after a certain number of Such an energy-efficient spiking neuron can potentially enable ultra-
pulses are received, the FEFET abruptly switches to the highly conduc- low-power data processing in energy constrained environments.
tive state [Fig. 15(a) (rightmost)]. Such phenomena can be understood
as the initial nucleation of nano-domains followed by an abrupt polari- 2. FEFETs as synapses
zation reversal of the entire grain connecting the source and drain of
FEFETs. Before the critical threshold is reached, the nucleated nano- We have seen how the switching behavior of a FEFET can mimic
domains are not capable of inducing a significant charge inversion in the behavior of a biological neuron. The switching behavior also pro-
the channel, leading to the absence of the conduction path (OFF state). duces bi-stability in FEFETs, which makes them particularly suitable
The accumulative P-switching presented in Ref. 125 appears to be for synaptic operations. The bi-stable nature of spontaneous polariza-
invariant with respect to the time difference between the consecutive tion of ferroelectric materials causes voltage induced polarization
excitation pulses, and therefore, the device acts as an integrator. switching characteristics to be intrinsically hysteretic. The device struc-
Moreover, the firing dynamics of such FEFET based neurons can be ture of a FEFET based synapse is similar to a neuronal device as shown
tuned by modulating the amplitude and duration of the voltage in Fig. 15(b) (leftmost). The FE electrostatically couples with the
pulse.123,125 However, to implement the leaky behavior, a proposed underlying transistor. Due to such coupling, FEFETs offer non-volatile
option is to modulate the depolarization field or insertion of a negative memory states by virtue of polarization retention of the ferroelectric
inhibit voltage in the intervals between consecutive excitation pulses. (FE) material. In a mono-domain FE (where the FE area is comparable
Apart from this externally emulated leaky process, an intrinsically to the domain size), two stable polarization states (P and þP) can be
FIG. 15. (a) FEFET device structure showing an integrated ferroelectric layer in the gate stack of the transistor (leftmost). A series of pulses can be applied to emulate the inte-
grating behavior of neurons and the eventual firing through abrupt switching of the device.123 (b) A FEFET synaptic device (leftmost) showing programming pulsing schemes
generating the STDP learning curve based on the change in charge stored in the device.124
conductances for the underlying transistor. Such states can also be 3. FEFET crossbars
referred to as “low VT” (corresponds to þP) and “high VT” (corre-
sponds to P) states.128 Even though the polarization at the lattice FEFETs utilize the electric field driven writing scheme, and such
level (microscopic polarization) can have two values (þP or P), in a a feature is unique when compared with the Spin-, PCM-, and
macroscopic scenario, multi-domain nature of FE films (with the area RRAM-based synaptic devices. Therefore, FEFET based synaptic devi-
significantly higher than the domain size), multiple levels of polariza- ces are potential candidates for low-power realization of neuro-
tion can be achieved. Furthermore, the polycrystalline nature of the FE mimetic hardware. These transistor-like devices can also be arranged
film offers a distribution in the polarization switching voltages (coer- in a crossbar fashion to perform dot-product operations. Simulation
cive voltage) and time (nucleation time) in different grains. As a result, studies using the population of neuronal and synaptic devices have
a voltage pulse dependent polarization tuning can be obtained such been studied for image classification tasks.130–132 We discussed earlier
that the overall polarization of the FE film can be gradually switched. that the multi-state conductance of FEFETs originates from the multi-
This corresponds to a gradual tuning of channel conductivity (or VT) domain behavior of the FE layer at the gate stack. However, such
in FEFETs and can be readily exploited to mimic multi-level synap- multi-domain features of FE (domain size and patterns) are highly
ses,124,129 in a manner similar to what has already been reported for dependent on the physical properties of FE (i.e., thickness, grain size,
PCM and RRAMs. As noted above, FEFETs are highly CMOS com- etc.).126 As a consequence, in a FEFET synaptic array, the multi-state
patible, which makes their applications as neuro-mimetic devices quite behavior of FEFETs may suffer from the variability of the FE layer
appealing. along with the variation induced by underline transistors. Therefore,
Recently, several FEFET based analog synaptic devices have been large-scale implementation of the synaptic array with identical FEFET
experimentally demonstrated,124,130,131 where the conductance poten- characteristics will be challenging, which can potentially be overcome
tiation and depression via a gradual VT tuning were obtained by apply- with high quality fabrication of FE films and variation aware designs.
ing a voltage pulse at the gate terminal. However, in the case of Despite the benefits offered by FEFETs, the technology is still at its
identical voltage pulses, the observed potentiation and depression nascent stage in the context of neuro-mimetic devices, and crossbar-
characteristics are highly non-linear and asymmetric with respect to level implementations will be potentially explored in the future.
the number of pulses. To overcome such non-ideal effects, different
non-identical pulsing schemes were proposed in Ref. 130, which utilize
a gradual modulation of pulse magnitude or pulse time. Such non- E. Floating gate devices
FIG. 17. Floating gate leaky-integrate-and-fire neuron133 showing (a) the integrating circuit, (b) and (c) feedback amplifier circuits for thresholding operation, and (d) reset
circuit.133
(or system) level, NVMs and crossbars provide interesting opportuni- in the right level of memory hierarchy, before it can be processed, lead
ties for energy- and area-efficiency. NVMs provide a radical departure to the memory-wall bottleneck. Since the storage density of NVMs is
from the state-of-the-art von-Neumann machines due to the following much larger [a single static random access memory (SRAM) cell stor-
two factors: (1) NVM based crossbars are being looked upon by the ing one bit of data consumes 150F2 area compared to an NVM that
research community as the holy grail for enabling in-memory mas- can take 4F2 space storing multiple bits], they lend themselves easily
sively parallel dot-product operations, and (2) the high storage density for distributed spatial architectures. This implies that an NVM based
offered by NVMs allows construction of spatial neuromorphic archi- neuromorphic chip can have a crossbar array that stores a subset of
tectures, leading to higher levels of energy, area, and latency improve- the network weights, and such multiple crossbars can be arranged in a
ments.144–147 Spatial architectures differ from conventional processors tiled manner, wherein weights are almost readily available within each
in the sense that the latter rely heavily on various levels of memory tile for processing.
hierarchy, and data have to be shuffled back and forth between various Keeping in view the aforementioned discussion, a generic NVM
memory sub-systems over long distances (between on-chip and off- based distributed spatial architecture is shown in Fig. 19, enable map-
chip memory). As such, the energy and time spent in getting the data ping of neural network applications entirely using on-chip NVM. The
1=Rs
Vi;ni ¼ Vi P : (11)
1
1=Rs þ
Rji þ Rsink
Here, Ij is the current of the j-th column, Vi is the input voltage to the
i-th row of the crossbar, (Rij ¼ 1=Gij Þ is the resistance/conductance of
the synaptic element connecting the i-th row with the j-th column,
Vi;ni is the degraded input voltage due to the effect of peripheral resis-
tances, Rs is the effective source resistance, and Rsink is the effective
sink resistance. These resistances in relation to a crossbar are shown in
FIG. 20. A comparison in energy consumption for stochastic spin neurons for various
energy-barrier heights.156 Reproduced with permission from Liyanagedera et al., Phys. Fig. 21. This modeling gives us an intuition about the behavior of
Rev. Appl. 8(6), 064017 (2017). Copyright 2017 American Physical Society. crossbars, which can help preserve the computation accuracy. For
example, lower synaptic resistances result in higher currents, which
results in larger parasitic drops across the metal line. On the other
applicability. These include the variability in RRAM states, which can
hand, higher operating resistances might lead to low sensing margins,
detrimentally affect the verity of analog computations in synaptic ele-
necessitating the need for expensive peripheral circuitry. The presence
ments. This is primarily due to the uncontrolled nature of the variabil-
of sneak paths in synaptic crossbars can also adversely affect the pro-
ity in filamentary RRAM or CBRAM devices.159 PCM devices on the
gramming process, thus harming the performance of on-chip learning
other hand, in spite of being less prone to variability, suffer from resis-
systems.
tance drifting due to structural relaxations after the melt-quench
(DACs) and Analog-to-Digital Converters (ADCs) is essential toward however, does not recover the performance of an ideal neural network
building large-scale neuromorphic systems. As shown in Fig. 21, without any non-idealities. The presence of non-idealities in the for-
DACs are used to convert bit-streamed data to voltages, whereas the ward path of a neural network may require a modified backpropaga-
ADCs convert back the analog voltage outputs from a sample-and- tion algorithm to closely resemble the ideal neural network.162 For
hold array into digital bits. These converters are especially necessary as unsupervised learning algorithms such as STDP, the impact of non-
the sizes of neural network models are much higher than the size of a idealities may be significantly lower due to the ease of enabling on-line
single crossbar. As a result, multiple crossbars are required to represent learning, which can automatically account for the errors. In addition
the entire neural network, which necessitates digital communication to static non-idealities in the crossbars, the effect of non-linearity and
between the outputs of individual crossbars. As the crossbar size asymmetry of programming characteristics of NVM devices can also
increases, the precision requirements for ADCs become higher, lead- be detrimental to the performance of the network. Reliable mitigation
ing to enormous power consumption, which can potentially reduce due to such programming errors can be performed by novel pulsing
the benefits in terms of energy consumption that NVM crossbars schemes.166,167 These pulsing schemes involve modulation of pulse-
inherently offer. However, the inherent robustness of neural networks widths based on the current conductance state, which help restore
toward computation errors may allow us to design approximate linearity.
peripheral circuitry based on ADCs with lower precision require- Beyond re-training, other static compensation techniques can
ments. Moreover, efficient mapping of crossbars and introducing pro- also be used to recover some system level inaccuracies. For example,
grammability in peripheral precision requirements can potentially the limited ON/OFF ratio and precision of NVM synaptic devices can
preserve the benefits offered by NVM technology. In light of these result in computational errors, which can be taken care of by effective
challenges such as device variations, non-ideal resistances, sneak paths, mapping of weight matrices to synaptic conductance.168 Static trans-
and peripheral design, careful design space exploration is required to formations of weight matrices have been explored to alleviate circuit-
identify optimum resistances for operation and crossbar sizes of syn- level non-idealities.169 This methodology performs gradient search to
aptic elements along with efficient device-circuit-algorithm co-design identify weight matrices with non-idealities that resemble ideal weight
for exploring effective mitigation techniques. matrices. Most of the compensation techniques adopted to account for
computation inaccuracies in NVM crossbars address very specific
C. Mitigating crossbar non-idealities problems. A more complete and holistic analysis, modeling, and miti-
NVM provides a massively parallel mode of computations using gation of crossbar non-idealities are necessary to completely under-
FIG. 22. (a) Two separate NVM devices used for LTP and LTD, and the resulting output of the synapse is fed to the neuron. (b) Multiple NVM devices connected in parallel to
increase the current range of the synapse. (c) Through the use of an arbitrator, any one of the devices is selected for learning.
decrementing the PCM device resistance, and given the complex computing still remain a major technical roadblock. In contrast, one
nature of waveforms required to write into PCM devices, this would could use digital in-memory computing for implementing on-chip
have led to additional area overhead. In yet another work, more than robust SNN networks. These implementations can use various digital
one memristors were connected in parallel [Fig. 22(b)]170 to allow the techniques, as in use of read only memory (ROM) embedded RAM in
increased current range of the overall synaptic cell. For learning, an NVM arrays174 or peripheral circuits based on in-memory digital
arbitration scheme was used to select one memristor and program in computations.175 Interestingly, these works do not require heavy re-
accordance with the learning scheme as shown in Fig. 22(c). With ref- engineering of the devices themselves. As such, they can easily benefit
erence to these examples, we believe that such schemes, wherein device from the recent technological and manufacturing advancements
level constraints can be mitigated through the use of clever circuit driven by industry for commercialization of various non-volatile tech-
techniques, can be a key enabler for NVMs in neuromorphic comput- nologies as memory solutions.
ing without solely relying on better material stack and manufacturing Furthermore, in a large neural network, NVM can be used as sig-
processes for improved device characteristics. nificance driven in-memory compute accelerators. For example, layers
of the neural network, which are less susceptible to noise, can be accel-
E. Beyond neuro-synaptic devices and STDP erated using analog in-memory computing, while those layers that
need more accurate computations can be mapped on NVM arrays
As would be apparent by now, the state-of-the-art in neuromor-
rendering digital in-memory computing. Thus, fine-grained heteroge-
phic hardware using non-volatile devices can be characterized in two
neous in-memory computing (both digital and analog) can be used in
broad categories of works—(1) those that tend to mimic the LIF unison to achieve both lower energy consumption and higher applica-
dynamics of a neuron using device characteristics and (2) others that tion accuracy. It is also well known that NVMs that store data digitally
are geared toward synaptic functionalities and associated learning are easier to program as opposed to analog storage, which requires
through STDP in shallow SNNs. On the other hand, the state-of-the- multiple “read-verify” cycles. Thus, on-chip learning, which requires
art on the algorithmic side of neuromorphic computing has taken a frequent weight updates, is more amenable to digital or heterogeneous
step forward beyond LIF dynamics and STDP learning. We have dis- (digital þ analog) computing arrays as opposed to analog storage of
cussed briefly about how supervised learning such as gradient descent data. Additionally, bit errors induced due to digital computing can be
can also be used for spike-based systems. Previously, supervised learn- easily rectified using error correction codes. Thereby, resorting to digi-
ing has been performed in the artificial neural networks (ANN) tal processing for critical or error susceptible computation could help
domain, and trained networks have been converted to SNNs.27 widen the design space for use of NVMs as SNN accelerators.
5
computers while implementing such models have led to a three decade S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, and classification,”
long search for bio-plausible computing paradigms. They draw inspi- IEEE Trans. Neural Networks 3, 683–697 (1992).
6
V. Nair and G. E. Hinton, “Rectified linear units improve restricted
ration from the elusive energy-efficiency of the brain. To that effect, Boltzmann machines,” in Proceedings of the 27th International Conference
non-volatile technologies offer a promising solution toward realizing on Machine Learning (ICML-10) (2010), pp. 807–814.
such computing systems. In this review article, we discuss how the 7
D. E. Rumelhart, G. E. Hinton, R. J. Williams et al., “Learning representations
rich intrinsic physics of non-volatile devices, based on various technol- by back-propagating errors,” Nature 323(6088), 533–536 (1986).
8
ogies, can be exploited to emulate bio-plausible neuro-synaptic func- Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436
(2015).
tionalities in spiking neural networks. We delve into the generic 9
K. A. Boahen, “Point-to-point connectivity between neuromorphic chips
requirements of the basic functional units of SNNs and how they can using address events,” IEEE Trans. Circuits Syst. II 47, 416–434 (2000).
be realized using various non-volatile devices. These devices can be 10
G. M. Shepherd, The Synaptic Organization of the Brain (Oxford University
connected in an intricate arrangement to realize a massively parallel Press, 2003).
11
in-memory computing crossbar structure representing a radical depar- M. Mahowald and R. Douglas, “A silicon neuron,” Nature 354, 515 (1991).
12
E. T. Rolls and G. Deco, The Noisy Brain: Stochastic Dynamics as a Principle
ture from the existing von-Neumann computing model. A huge num-
of Brain Function (Oxford University Press, Oxford, 2010), Vol. 34.
ber of such computing units can be arranged in a tiled architecture to 13
B. Nessler, M. Pfeiffer, L. Buesing, and W. Maass, “Bayesian computation
realize extremely area and energy-efficient large-scale neuromorphic emerges in generic cortical microcircuits through spike-timing-dependent
systems. Finally, we discuss the challenges and possible solution of plasticity,” PLoS Comput. Biol. 9, e1003037 (2013).
14
realizing neuromorphic systems using non-volatile devices. We believe A. L. Hodgkin and A. F. Huxley, “A quantitative description of membrane
that non-volatile technologies show significant promise and immense current and its application to conduction and excitation in nerve,” J. Physiol.
117, 500–544 (1952).
potential as the building blocks in neuromorphic systems of the future. 15
L. F. Abbott, “Lapicque’s introduction of the integrate-and-fire model neuron
In order to truly realize that potential, a joint research effort is neces- (1907),” Brain Res. Bull. 50, 303–304 (1999).
16
sary, right from the materials that would achieve better trade-offs A. V. Hill, “Excitation and accommodation in nerve,” Proc. R. Soc. London,
between higher stability and programming speeds and exhibit more Ser. B 119, 305–355 (1936).
17
linear and symmetric characteristics. This material investigation C. D. Geisler and J. M. Goldberg, “A stochastic model of the repetitive activity
of neurons,” Biophys. J. 6, 53–69 (1966).
should be complemented with effective device-circuit co-design to alle- 18
Y.-H. Liu and X.-J. Wang, “Spike-frequency adaptation of a generalized leaky
viate problems of variations and other non-idealities that introduce integrate-and-fire model neuron,” J. Comput. Neurosci. 10, 25–45 (2001).
19
errors into neuromorphic computations. Finally, there must be effi- G. Qiang Bi and M. Ming Poo, “Synaptic modifications in cultured hippocam-
32
T. Tuma, M. Le Gallo, A. Sebastian, and E. Eleftheriou, “Detecting correla- drift due to chalcogenide structural relaxation,” in 2007 IEEE International
tions using phase-change neurons and synapses,” IEEE Electron Device Lett. Electron Devices Meeting (IEEE, 2007), pp. 939–942.
52
37, 1238–1241 (2016). M. Suri, D. Garbin, O. Bichler, D. Querlioz, D. Vuillaume, C. Gamrat, and B.
33
S. Ambrogio, N. Ciocchini, M. Laudato, V. Milo, A. Pirovano, P. Fantini, and DeSalvo, “Impact of PCM resistance-drift in neuromorphic systems and drift-
D. Ielmini, “Unsupervised learning by spike timing dependent plasticity in mitigation strategy,” in Proceedings of the 2013 IEEE/ACM International
phase change memory (PCM) synapses,” Front. Neurosci. 10, 56 (2016). Symposium on Nanoscale Architectures (IEEE Press, 2013), pp. 140–145.
34 53
D. Kuzum, R. G. Jeyasingh, B. Lee, and H.-S. P. Wong, “Nanoelectronic pro- Y. Watanabe, J. Bednorz, A. Bietsch, C. Gerber, D. Widmer, A. Beck, and S.
grammable synapses based on phase change materials for brain-inspired Wind, “Current-driven insulator–conductor transition and nonvolatile mem-
computing,” Nano Lett. 12, 2179–2186 (2012). ory in chromium-doped SrTiO3 single crystals,” Appl. Phys. Lett. 78,
35 3738–3740 (2001).
M. Suri, O. Bichler, D. Querlioz, O. Cueto, L. Perniola, V. Sousa, D.
54
Vuillaume, C. Gamrat, and B. DeSalvo, “Phase change memory as synapse for A. Beck, J. Bednorz, C. Gerber, C. Rossel, and D. Widmer, “Reproducible
ultra-dense neuromorphic systems: Application to complex visual pattern switching effect in thin oxide films for memory applications,” Appl. Phys.
extraction,” in 2011 International Electron Devices Meeting (IEEE, 2011), p. 4. Lett. 77, 139–141 (2000).
55
36
Y. Li, Y. Zhong, L. Xu, J. Zhang, X. Xu, H. Sun, and X. Miao, “Ultrafast synap- W. Zhuang, W. Pan, B. Ulrich, J. Lee, L. Stecker, A. Burmaster, D. Evans, S.
tic events in a chalcogenide memristor,” Sci. Rep. 3, 1619 (2013). Hsu, M. Tajiri, A. Shimaoka et al., “Novel colossal magnetoresistive thin film
37
O. Bichler, M. Suri, D. Querlioz, D. Vuillaume, B. DeSalvo, and C. Gamrat, nonvolatile resistance random access memory (RRAM),” in International
“Visual pattern extraction using energy-efficient “2-PCM synapse” neuromor- Electron Devices Meeting, Technical Digest (IEEE, 2002), pp. 193–196.
56
phic architecture,” IEEE Trans. Electron Devices 59, 2206–2214 (2012). L. Goux, P. Czarnecki, Y. Y. Chen, L. Pantisano, X. Wang, R. Degraeve, B.
38
D. Kuzum, R. G. Jeyasingh, and H.-S. P. Wong, “Energy efficient program- Govoreanu, M. Jurczak, D. Wouters, and L. Altimime, “Evidences of oxygen-
ming of nanoelectronic synaptic devices for large-scale implementation of mediated resistive-switching mechanism in TiN\HfO2\Pt cells,” Appl. Phys.
associative and temporal sequence learning,” in 2011 International Electron Lett. 97, 243509 (2010).
57
Devices Meeting (IEEE, 2011), pp. 30–33. C. Rohde, B. J. Choi, D. S. Jeong, S. Choi, J.-S. Zhao, and C. S. Hwang,
39
Z. Cheng, C. Rıos, W. H. Pernice, C. D. Wright, and H. Bhaskaran, “On-chip “Identification of a determining parameter for resistive switching of TiO2 thin
photonic synapse,” Sci. Adv. 3, e1700160 (2017). films,” Appl. Phys. Lett. 86, 262907 (2005).
58
40
J. Feldmann, N. Youngblood, C. Wright, H. Bhaskaran, and W. Pernice, “All- Z. Wei, Y. Kanzawa, K. Arita, Y. Katoh, K. Kawai, S. Muraoka, S. Mitani, S.
optical spiking neurosynaptic networks with self-learning capabilities,” Fujii, K. Katayama, M. Iijima et al., “Highly reliable TaOx ReRAM and direct
Nature 569, 208 (2019). evidence of redox reaction mechanism,” in 2008 IEEE International Electron
41
S. B. Eryilmaz, D. Kuzum, R. G. Jeyasingh, S. Kim, M. BrightSky, C. Lam, and Devices Meeting (IEEE, 2008), pp. 1–4.
59
H.-S. P. Wong, “Experimental demonstration of array-level learning with M. Al-Shedivat, R. Naous, G. Cauwenberghs, and K. N. Salama, “Memristors
empower spiking neurons with stochasticity,” IEEE J. Emerging Sel. Top.
phase change synaptic devices,” in 2013 IEEE International Electron Devices
Circuits Syst. 5, 242–253 (2015).
Meeting (IEEE, 2013), p. 25.
72 90
B. Rajendran, Y. Liu, J.-S. Seo, K. Gopalakrishnan, L. Chang, D. J. Friedman, M. Suri, D. Querlioz, O. Bichler, G. Palma, E. Vianello, D. Vuillaume, C.
and M. B. Ritter, “Specifications of nanoscale devices and circuits for neuro- Gamrat, and B. DeSalvo, “Bio-inspired stochastic computing using binary
morphic computational systems,” IEEE Trans. Electron Devices 60, 246–253 CBRAM synapses,” IEEE Trans. Electron Devices 60, 2402–2409 (2013).
91
(2013). T. Ohno, T. Hasegawa, A. Nayak, T. Tsuruoka, J. K. Gimzewski, and M.
73
K. Seo, I. Kim, S. Jung, M. Jo, S. Park, J. Park, J. Shin, K. P. Biju, J. Kong, K. Aono, “Sensory and short-term memory formations observed in a Ag2S gap-
Lee et al., “Analog memory and spike-timing-dependent plasticity character- type atomic switch,” Appl. Phys. Lett. 99, 203108 (2011).
92
istics of a nanoscale titanium oxide bilayer resistive switching device,” K.-H. Kim, S. Gaba, D. Wheeler, J. M. Cruz-Albrecht, T. Hussain, N.
Nanotechnology 22, 254023 (2011). Srinivasa, and W. Lu, “A functional hybrid memristor crossbar-array/CMOS
74 system for data storage and neuromorphic applications,” Nano Lett. 12,
I.-T. Wang, Y.-C. Lin, Y.-F. Wang, C.-W. Hsu, and T.-H. Hou, “3D synaptic
architecture with ultralow sub-10 fJ energy per spike for neuromorphic 389–395 (2012).
93
computation,” in 2014 IEEE International Electron Devices Meeting (IEEE, Y. Wang, T. Tang, L. Xia, B. Li, P. Gu, H. Yang, H. Li, and Y. Xie, “Energy
2014), p. 28. efficient RRAM spiking neural network for real time classification,” in
75 Proceedings of the 25th Edition on Great Lakes Symposium on VLSI (ACM,
Z. Wang, S. Ambrogio, S. Balatti, and D. Ielmini, “A 2-transistor/1-resistor
artificial synapse capable of communication and stochastic learning in neuro- 2015), pp. 189–194.
94
morphic systems,” Front. Neurosci. 8, 438 (2015). G. Pedretti, V. Milo, S. Ambrogio, R. Carboni, S. Bianchi, A. Calderoni, N.
76
S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, and H.-S. P. Wong, “Stochastic learn- Ramaswamy, A. Spinelli, and D. Ielmini, “Memristive neural network for on-
ing in oxide binary synaptic device for neuromorphic computing,” Front. line learning and tracking with brain-inspired spike timing dependent plasti-
Neurosci. 7, 186 (2013). city,” Sci. Rep. 7, 5288 (2017).
95
77
A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, and T. Prodromakis, C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N.
“Unsupervised learning in probabilistic neural networks with multi-state Davila, C. E. Graves et al., “Analogue signal and image processing with large
metal-oxide memristive synapses,” Nat. Commun. 7, 12611 (2016). memristor crossbars,” Nat. Electron. 1, 52 (2018).
96
78
S. Park, H. Kim, M. Choo, J. Noh, A. Sheri, S. Jung, K. Seo, J. Park, S. Kim, M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net:
W. Lee et al., “RRAM-based synapse for neuromorphic system with pattern ImageNet classification using binary convolutional neural networks,” in
recognition function,” in 2012 International Electron Devices Meeting (IEEE, European Conference on Computer Vision (Springer, 2016), pp. 525–542.
97
2012), pp. 10–12. P. Wijesinghe, A. Ankit, A. Sengupta, and K. Roy, “An all-memristor deep
79
S. Park, A. Sheri, J. Kim, J. Noh, J. Jang, M. Jeon, B. Lee, B. Lee, B. Lee, and spiking neural computing system: A step toward realizing the low-power sto-
H.-J. Hwang, “Neuromorphic speech systems using advanced ReRAM-based chastic brain,” IEEE Trans. Emerging Top. Comput. Intell. 2, 345–358 (2018).
98
synapse,” in 2013 IEEE International Electron Devices Meeting (IEEE, 2013), G. Sun, X. Dong, Y. Xie, J. Li, and Y. Chen, “A novel architecture of the 3D
stacked MRAM l2 cache for CMPS,” in 2009 IEEE 15th International
pp. 25–26.
80 Symposium on High Performance Computer Architecture (IEEE, 2009), pp.
N. Panwar, B. Rajendran, and U. Ganguly, “Arbitrary spike time dependent
239–249.
plasticity (STDP) in memristor by analog waveform engineering,” IEEE
113 132
M. Sharad, C. Augustine, G. Panagopoulos, and K. Roy, “Spin-based neuron B. Obradovic, T. Rakshit, R. Hatcher, J. Kittl, R. Sengupta, J. G. Hong, and M.
model with domain-wall magnets as synapse,” IEEE Trans. Nanotechnol. 11, S. Rodder, “A multi-bit neuromorphic weight cell using ferroelectric FETs,
843–853 (2012). suitable for SOC integration,” IEEE J. Electron Devices Soc. 6, 438–448 (2018).
114 133
A. Sengupta, Y. Shim, and K. Roy, “Proposal for an all-spin artificial neural V. Kornijcuk, H. Lim, J. Y. Seok, G. Kim, S. K. Kim, I. Kim, B. J. Choi, and D.
network: Emulating neural and synaptic functionalities through domain wall S. Jeong, “Leaky integrate-and-fire neuron circuit based on floating-gate inte-
motion in ferromagnets,” IEEE Trans. Biomed. Circuits Syst. 10, 1152–1160 grator,” Front. Neurosci. 10, 212 (2016).
134
(2016). D. Kahng and S. M. Sze, “A floating gate and its application to memory
115
A. F. Vincent, J. Larroque, N. Locatelli, N. B. Romdhane, O. Bichler, C. devices,” Bell Syst. Tech. J. 46, 1288–1295 (1967).
135
Gamrat, W. S. Zhao, J.-O. Klein, S. Galdin-Retailleau, and D. Querlioz, “Spin- R. H. Fowler and L. Nordheim, “Electron emission in intense electric fields,”
transfer torque magnetic memory as a stochastic memristive synapse for neu- Proc. R. Soc. London, Ser. A 119, 173–181 (1928).
136
romorphic systems,” IEEE Trans. Biomed. Circuits Syst. 9, 166–174 (2015). M. Lenzlinger and E. Snow, “Fowler-Nordheim tunneling into thermally
116
G. Srinivasan, A. Sengupta, and K. Roy, “Magnetic tunnel junction based long- grown SiO2,” J. Appl. Phys. 40, 278–283 (1969).
137
term short-term stochastic synapse for a spiking neural network with on-chip T.-S. Jung, Y.-J. Choi, K.-D. Suh, B.-H. Suh, J.-K. Kim, Y.-H. Lim, Y.-N. Koh,
stdp learning,” Sci. Rep. 6, 29545 (2016). J.-W. Park, K.-J. Lee, J.-H. Park et al., “A 3.3 V 128 Mb multi-level NAND
117
D. Zhang, L. Zeng, Y. Zhang, W. Zhao, and J. O. Klein, “Stochastic spintronic flash memory for mass storage applications,” in 1996 IEEE International Solid-
device based synapses and spiking neurons for neuromorphic computation,” State Circuits Conference. Digest of Technical Papers, ISSCC (IEEE, 1996), pp.
in 2016 IEEE/ACM International Symposium on Nanoscale Architectures 32–33.
138
(NANOARCH) (IEEE, 2016), pp. 173–178. M. Bauer, R. Alexis, G. Atwood, B. Baltar, A. Fazio, K. Frary, M. Hensel, M.
118
A. Sengupta and K. Roy, “Short-term plasticity and long-term potentiation in Ishac, J. Javanifard, M. Landgraf et al., “A multilevel-cell 32 Mb flash memo-
magnetic tunnel junctions: Towards volatile synapses,” Phys. Rev. Appl. 5, ry,” in Proceedings 30th IEEE International Symposium on Multiple-Valued
024012 (2016). Logic (ISMVL 2000) (IEEE, 2000), pp. 367–368.
119 139
M. Romera, P. Talatchian, S. Tsunegi, F. A. Araujo, V. Cros, P. Bortolotti, J. M. Holler, S. Tam, H. Castro, and R. Benson, “An electrically trainable artifi-
Trastoy, K. Yakushiji, A. Fukushima, H. Kubota et al., “Vowel recognition cial neural network (ETANN) with 10240 floating gate synapses,” in
with four coupled spin-torque nano-oscillators,” Nature 563, 230 (2018). International Joint Conference on Neural Networks (1989), Vol. 2, pp.
120
A. Hirohata, H. Sukegawa, H. Yanagihara, I. Zutić, T. Seki, S. Mizukami, and 191–196.
140
R. Swaminathan, “Roadmap for emerging materials for spintronic device B. W. Lee, B. J. Sheu, and H. Yang, “Analog floating-gate synapses for general-
applications,” IEEE Trans. Magn. 51, 1–11 (2015). purpose VLSI neural computation,” IEEE Trans. Circuits Syst. 38, 654–658
121
A. Sengupta, M. Parsa, B. Han, and K. Roy, “Probabilistic deep spiking neural (1991).
141
systems enabled by magnetic tunnel junction,” IEEE Trans. Electron Devices P. E. Hasler, C. Diorio, B. A. Minch, and C. Mead, “Single transistor learning
63, 2963–2970 (2016). synapses,” in Advances in Neural Information Processing Systems (Curran
122
S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. Lee, K. Miura, H. Hasegawa, M. Associates, 1995), pp. 817–824.
151
A. Yousefzadeh, E. Stromatias, M. Soto, T. Serrano-Gotarredona, and B. on-chip learning,” in 2015 IEEE/ACM International Conference on Computer-
Linares-Barranco, “On practical issues for stochastic STDP hardware with 1- Aided Design (ICCAD) (IEEE, 2015).
167
bit synaptic weights,” Front. Neurosci. 12, 665 (2018). I. Kataeva, F. Merrikh-Bayat, E. Zamanidoost, and D. Strukov, “Efficient train-
152
G. Srinivasan and K. Roy, “Restocnet: Residual stochastic binary convolutional ing algorithms for neural networks based on memristive crossbar circuits,” in
spiking neural network for memory-efficient neuromorphic computing,” 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE,
Front. Neurosci. 13, 189 (2019). 2015).
153 168
S. Agarwal, T.-T. Quach, O. Parekh, A. H. Hsia, E. P. DeBenedictis, C. D. M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge,
James, M. J. Marinella, and J. B. Aimone, “Energy scaling advantages of resis- J. J. Yang, and R. S. Williams, “Dot-product engine for neuromorphic comput-
tive memory crossbar based computation and its application to sparse coding,” ing: Programming 1T1M crossbar to accelerate matrix-vector multiplication,”
Front. Neurosci. 9, 484 (2016). in Proceedings of the 53rd Annual Design Automation Conference (ACM,
154
P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. 2016), p. 19.
Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura et al., “A million 169
B. Liu, H. Li, Y. Chen, X. Li, T. Huang, Q. Wu, and M. Barnell, “Reduction
spiking-neuron integrated circuit with a scalable communication network and and IR-drop compensations techniques for reliable neuromorphic computing
interface,” Science 345, 668–673 (2014). systems,” in 2014 IEEE/ACM International Conference on Computer-Aided
155
M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Design (ICCAD) (IEEE, 2014).
Dimou, P. Joshi, N. Imam, S. Jain et al., “Loihi: A neuromorphic manycore 170
I. Boybat, M. Le Gallo, S. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B.
processor with on-chip learning,” IEEE Micro 38, 82–99 (2018). Rajendran, Y. Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic
156
C. M. Liyanagedera, A. Sengupta, A. Jaiswal, and K. Roy, “Stochastic spiking computing with multi-memristive synapses,” Nat. Commun. 9, 2514 (2018).
neural networks enabled by magnetic tunnel junctions: From nontelegraphic 171
J. H. Lee, T. Delbruck, and M. Pfeiffer, “Training deep spiking neural networks
to telegraphic switching regimes,” Phys. Rev. Appl. 8, 064017 (2017). using backpropagation,” Front. Neurosci. 10, 508 (2016).
157
E. O. Neftci, B. U. Pedroni, S. Joshi, M. Al-Shedivat, and G. Cauwenberghs, 172
Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation
“Stochastic synapses enable efficient brain-inspired learning machines,” Front.
for training high-performance spiking neural networks,” Front. Neurosci. 12,
Neurosci. 10, 241 (2016).
158 331 (2018).
W. Senn and S. Fusi, “Convergence of stochastic learning in perceptrons with 173
C. Lee, P. Panda, G. Srinivasan, and K. Roy, “Training deep spiking convolu-
binary synapses,” Phys. Rev. E 71, 061907 (2005).
159 tional neural networks with STDP-based unsupervised pre-training followed
S. Yu, X. Guan, and H.-S. P. Wong, “On the stochastic nature of resistive
by supervised fine-tuning,” Front. Neurosci. 12, 435 (2018).
switching in metal oxide RRAM: Physical modeling, Monte Carlo simulation, 174
D. Lee, X. Fong, and K. Roy, “R-MRAM: A ROM-embedded STT MRAM
and experimental characterization,” in 2011 International Electron Devices
cache,” IEEE Electron Device Lett. 34, 1256–1258 (2013).
Meeting (IEEE, 2011) p. 17. 175
160 S. Jain, A. Ranjan, K. Roy, and A. Raghunathan, “Computing in memory with
D. Ielmini, A. L. Lacaita, and D. Mantegazza, “Recovery and drift dynamics of
resistance and threshold voltages in phase-change memories,” IEEE Trans. spin-transfer torque magnetic RAM,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst. 26, 470–483 (2018).