Advanced Materials - 2018 - Hu - Memristor Based Analog Computation and Neural Network Classification With A Dot Product
Advanced Materials - 2018 - Hu - Memristor Based Analog Computation and Neural Network Classification With A Dot Product
Adv. Mater. 2018, 30, 1705914 1705914 (1 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
networks to sizes found in human brains (>1015 synapses). single-layer neural network to classify the MNIST database of
Crossbars built out of memristors offer a scalable nanotech- handwritten characters, showing 89.9% recognition accuracy.
nology[26] with broad conductance tunability,[27] and various The DPE system was designed to perform precise program-
bioneurological qualities such as stochasticity and transient ming of individual memristor cells, reprogramming of the
plasticity.[28,29] In order to be implemented practically, demon- memristor conductance matrix, and, critically, VMM com-
strations of sufficient memristor crossbar yield, repeatability, putation in a single step within the memristor crossbar. The
and controllability are required for these targeted applications. VMM computation is performed by applying an input vector
Many recent works have explored using memristor crossbar of voltages to the rows as brief simultaneously pulses (Ohm’s
arrays for computation, particularly in neural network applica- law) and the resulting summed currents (Kirchhoff’s law) are
tions.[30–43] These works considered various memristor technol- collected along the columns (Figure 1a). These currents are
ogies, but predominantly chalcogenide phase change memories converted to a voltage (via a transimpedance amplifier, or TIA)
(PCM) and transition metal-oxide memristors. However, the and measured after a short delay from the input voltage pulses.
majority of these works relied heavily on simulations to forecast The memristor array is programmed so that the individual con-
performance and accuracy for computations within crossbars, ductances of the memristor cells comprise the desired compu-
without demonstrating actual computations within memristor tational kernel, and as memristors are nonvolatile, the mem-
crossbars. In many cases, the coupled nonlinearities, sneak ristor array maintains the programmed computational kernel.
currents, and other circuit issues were ignored, which are As many applications such as neural network inference do not
critical to consider when assessing the promise of memristors require frequent reprogramming, the memristor crossbars can
for computation. The few previous experimental works in full be programmed once and then ignored. However, a new appli-
memristor arrays have been limited to small sizes (<1024 mem- cation can easily be implemented by simply reprogramming
ristors), binary device states, or exhibited limitations on recon- the memristor conductance matrix values and supplying the
figurability. Large neural networks (165 000 synapses) were appropriate inputs to the rows, as we will demonstrate below.
demonstrated with PCM arrays,[42] but this work was limited by Within the memristor array, we utilize one transistor-one
a sequential interface and could not carry out VMM computa- memristor (1T1M) cells in which the transistor limits and
tions in a single step with access to all word-lines and bit-lines controls current during programming to facilitate high accu-
simultaneously, as would be required for an actual computa- racy programming but is left fully open in computation mode.
tional accelerator. Face recognition with a 128 × 8 1T1R array The CMOS–memristor integration is carried out in a foundry-
was demonstrated with online learning,[41] and sparse encoding compatible back-end-of-the-line (BEOL) process (Figure 1b). A
was demonstrated within small memristor arrays.[40] However, transition metal-oxide memristor layer is deposited and pat-
neither work demonstrated or reported computations within terned atop CMOS access transistors and wiring with 2.0 µm
the array, nor crucially how the VMM computational accuracy technology. While such 1T1M integration can increase the area
relates to the final performance of the application. The cur- compared to purely passive crossbar arrays,[34] even 1T1M-based
rent lack of understanding for how materials, device, and cir- architectures can reduce silicon area compared to purely digital
cuit issues relate to a memristor array’s computing capability approaches.[11] Passive or 1S1R arrays typically utilized to pro-
is a major obstacle for designers attempting to simulate and vide a degree of isolation during programming through device
develop complex architectures utilizing memristor crossbar nonlinearity are not suitable for many VMM applications, as
arrays for real computational applications. the nonlinearity directly invalidates the utilization of Ohm’s
In this work, we present a platform for reprogrammable law and reduces the programming accuracy of analog levels,
analog computations with integrated transistor-memristor substantially compromising VMM computation accuracy. Our
arrays that we call the Dot Product Engine (DPE). Our plat- memristor device stack consists of Ta/HfO2/Pd (cross-section
form enables exploration of dense VMM computations with shown in Figure 1b) and has been developed to provide multi-
arbitrary target matrices converted directly from digital soft- level conductances implemented by continuous tuning of the
ware algorithms. Further, our material stack design for the chemical composition of a Ta-rich conduction channel within
1T1M memristor cells enables the precise analog tuning of the switching matrix.[44,45] To achieve constant resistance under
the memristor conductance over a wide range to implement different voltages (i.e., a linear IV relationship), we selected a
these arbitrary target matrices faithfully. Our approach differs resistance range of 1.1–10.0 kΩ (or conductance of 100–900 µS)
from online trained implementations[34,41] in which the derived for accurate analog VMM operation.[46] With other memristor
matrix values are trained specifically for that crossbar array’s material systems, we previously demonstrated experimentally
circuit properties. Instead, our approach enables general accel- that 64 reprogrammable conductance levels (6 bits) can be
eration of any matrix operations, taking digital input vectors achieved in individual 1T1M cells.[27] Here, we extend that work
and matrices, converting into the analog domain for low power, from individual cells to full arrays constituting thousands of
high speed computation, and then providing digital outputs. cells by programming a 128 × 64 array to display the Hewlett
We demonstrate VMM within memristor arrays up to 128 × 64 Packard Enterprise logo and mapping the greyscale logo image
in size at 10 MHz, yielding over 16 000 multiplications and to ≈180 distinct memristor conductance levels. All program-
additions in a single clock step, and leading to a forecasted per- ming and computational signals are generated from periph-
formance of about 115 trillion operations per second per Watt. eral printed circuit boards (PCBs) connected to the memristor
The resulting VMM outputs have 6 bits of precision, and we arrays through probe-card connections (Figure S1, Supporting
demonstrate multiple reprogramming iterations of our mem- Information), and the design allows for an expandable set of
ristor arrays for different applications. Finally, we implement a row and column boards that can simultaneously drive (or read)
Adv. Mater. 2018, 30, 1705914 1705914 (2 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
Adv. Mater. 2018, 30, 1705914 1705914 (3 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
Figure 2. Memristor cell 1T1M properties and tuning capability. a) Memristor switching sweeps with a range of achievable memristor conductance
values. Plot shows the conductance (at 0.2 V) following the programming pulse amplitude given by the x axis. By adjusting transistor gate voltage Vgate,
applied memristor voltage Vmemristor, and pulse width, the memristor cell can be tuned to different conductance values, shown here for the example
of increasing SET gate voltage. b) Example feedback programming algorithm for obtaining a desired memristor conductance (indicated by two black
dashed lines in the top panel). For each programming cycle, the algorithm decides whether to apply a SET or RESET operation. The programming
algorithm starts by increasing Vmemristor. If the memristor conductance does not reach the desired target, then Vgate is iteratively increased as well.
Finally, the pulse width can additionally be increased as necessary. This algorithm will correct for overshooting, or the occasional drift or disturb that
can occur while programming other cells in the array. c) Histogram of the programming error (target − final programmed value) for the single-layer
neural network weight matrix shown in Figure 3a. d) Histogram of the standard deviation for the set of devices targeted for a particular programming
level for the pattern shown in Figure 1c. This shows the spread of values for individual conductance levels in the desired conductance matrix. The
inset shows the histogram of programming error (Programmed − Targeted conductance) in µS, similar to that shown in (c). Note that the inset plot
is zoomed into the main error peak, while outlier points far off-scale are also present, similar to the inset of (c).
The individual conductance state tuning shown in Figure 2a the changing applied voltage pulse amplitude and sign, and
is systematically applied to all memristor cells sequentially the bottom panel shows the changing gate voltage magnitude.
within an array using a feedback algorithm to program the After applying this programming algorithm to every cell in an
large number of memristors to match the desired conduct- array, the result is a target pattern[48] such as those in Figure 1c,
ance matrix Gtarget for a given application. Parallel program- Figure 3a, and Figure 4a, demonstrating different applications
ming schemes in which a vector of row and column voltages is for images, signal processing, and neural network inference,
applied in a single step have been proposed by other groups,[47] respectively. There is a trade-off between number of attempted
particularly for neural network training, but are not explored programming cycles and final programming error. In Figure 2,
here. For a given target conductance and tolerance range, the 50 programming cycles were used and the associated errors
programming feedback algorithm can vary the applied voltage (Gactual −Gtarget) are plotted as histograms in Figure 2c and the
pulse amplitude and sign, gate voltage, and pulse width inset of d. As seen, the majority of cells are centered close to
(Figure 2b). The top panel of Figure 2b shows the conductance zero error, although the inset of (d) yields a standard devia-
trajectory over 200 programming cycles, the middle panel shows tion of 72.7 µS due to outliers beyond the plotted x axis. The
Adv. Mater. 2018, 30, 1705914 1705914 (4 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
Figure 3. Operation and accuracy of DPE VMM operations for different use cases. a) Two VMM applications programmed and implemented on the
same DPE array. First, on the left, a signal processing application was tested using the discrete-cosine transform (DCT) which converts a time-based
signal into its frequency components. The same 1T1M memristor array was then reprogrammed for the second application on the right. This second
matrix implements a neural network application using a single-layer softmax neural network for recognition of handwritten digits. The DPE VMM output
Adv. Mater. 2018, 30, 1705914 1705914 (5 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
distribution is not Gaussian, and some cells can remain stuck circuit in each column, without any performance overhead.
at either high or low conductances (identified as defects) while When necessary, an additional calibration process can also
others follow a log-normal tail.[49] However, the spread of indi- include compensation to reduce the impact of very high or
vidual conductance levels is fairly reasonable, as shown in the low conductance defects, as discussed in Note S4 in the Sup-
main panel of Figure 2d. Here, the histogram of standard devia- porting Information. A simple linear scaling reduces errors
tions is plotted for the different targeted programming levels in significantly, as shown in Figure 3e. Here, it is shown that
the pattern shown in Figure 1c. This effectively gives the spread the final error for the DPE system is typically well below
in conductance for individual target conductance levels in the 50 µS and corresponds to ≈6 bits of VMM output accuracy.
desired conductance matrix and shows that most levels have a This is expected to improve further with state-of-the-art fabri-
spread <10 µS. More details on the array-level feedback tuning cation that has higher device yield (fewer defects), lower wire
algorithms are found in the Supporting Information. resistances, and by operating memristors at lower conduct-
A critical parameter for analog computing is the equivalent ance states.
digital precision for the computation results. We have quanti- Here, we experimentally demonstrate a single-layer neural
fied these results in our DPE system for different use cases in network inference application with the DPE for handwriting
Figure 3. VMM operations were performed for both a signal recognition. Neural networks are a key application of interest
processing application of the discrete cosine transform (DCT) for acceleration,[6,11,20,34,38] and this application is characterized
as well as neural network inference for the MNIST database by frequent reuse of matrix convolution kernels (and thus an
(Figure 3a). The DCT application converts a time-based signal advantage of using memristors by reducing data-fetching), tol-
into its frequency components and demonstrates full bipolar erance for defects, and reduced precision requirements.[50–52]
inputs and matrix values, using two cells per DCT matrix value We program a software-trained single-layer network in a 96 × 40
in order to represent signed numbers.[46] The neural network portion of a 128 × 64 memristor crossbar for handwritten digit
application uses only positive inputs and matrix values, imple- recognition on the full 10 000 MNIST dataset.[53] To our knowl-
menting a softmax layer for recognition of handwritten digits. edge, this is the largest demonstration utilizing memristor
Figure 3b shows a histogram of the measured VMM errors crossbars in hardware to date. Implementing this single-layer
compared to ideal expected results. A circuit simulation was neural network in the DPE platform requires reshaping and
also performed (in gray) that takes into account wire resistances partitioning of the neural network weight matrix and input
and sneak path currents, showing a fairly good match to the images. A software single-layer network for MNIST classi-
experimental results and hence an understanding for the error fies 784 pixel inputs (28 × 28) into 10 possible classes (0 to 9),
source which is primarily circuit parasitics. These can be cor- yielding a network size of 7840 weights. To fit this network into
rected with a linear scaling factor for each column (see below). a smaller array capable of supporting 4096 elements, we resize
Figure 3c shows the VMM errors for the neural network appli- the MNIST images to 19 × 20 and retrain the weight matrix
cation along with the circuit simulations, again showing a close in software. Ideally (without reshaping), this implementation
match. would use a 380 × 10 array, but we reshape this into an equiva-
To understand the circuit simulations and the key sources lent 96 × 40 array. The overall recognition accuracy of the neural
of VMM errors, a visualization for the analog signal deteriora- network in software did not degrade due to the image resizing,
tion due to parasitics is shown in Figure 3d for a 16 × 16 array. remaining at 92.4%. To perform the digit classification using
The color of the middle pillar at each vertex indicates the con- the DPE, input images are processed by unwrapping the image
ductance of the device, and five “stuck on” defects (red dots) are to a single 1 × 380 vector (Figure 4a), and then partitioning this
shown. This array is operated by applying voltages on all rows into four sets of 96 values (four zeroes are added to the end of
from the left and grounding all columns on top as in the experi- the input vector to get the 384 numbers). This yields four sets
mental setup. Thus, the signal degrades from left to right (red of pixel values that are converted to voltage signals and applied
to white color) and from top to bottom. to the rows of the memristor array, which is programmed to the
To improve the accuracy of our analog VMM computa- trained weight matrix (right side of Figure 4a). Only ten output
tions, each column output is linearly calibrated to account columns are needed for each of the four sets, performing the
for the above circuit parasitics. The calibration parameters partial synaptic weight matrix multiplication. A full-image clas-
are determined by first running one hundred test input pat- sification is concluded after four such computations, resulting
terns and finding the linear scaling that best matches the in four 1 × 10 vectors which are summed, and the resulting digit
expected results. This is a one-time linear tuning of the TIA with the maximal value yields the predicted classification of the
yields the digit classification. b) Histogram of experimentally measured DPE VMM error from the raw mathematical VMM for the DCT application, and
comparison results from a circuit simulation (in gray) that takes into account wire resistances and sneak path current. As shown, the circuit simula-
tion reproduces the experimental results quite well, indicating the main source of error of the DPE VMM to a raw mathematical VMM in simply due
to circuit parasitics which is easily corrected with a linear scaling factor per column output. c) Same as in (b) but for the neural network application.
Inset plots experimentally measured VMM data and circuit simulations versus raw mathematical VMM for both applications. d) Visualized circuit
simulation showing circuit nonidealities in a 16 × 16 array with 5 “stuck on” defects as red vertices. This array is operated by applying voltages on all
rows from the left and grounding all columns on top. Therefore, the input signal along the rows degrades left to right (red to white color), and on the
columns, you can see the grounding effects also degrading. The color of the middle pillar at each vertex indicates device conductance. e) Histogram
of VMM error following implementation of linear scaling factor for each column for DCT and MNIST applications. Inset shows excellent agreement of
raw VMM and experimental VMM following a simple linear column scaling.
Adv. Mater. 2018, 30, 1705914 1705914 (6 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
Figure 4. Experimental demonstration of a single-layer neural network for MNIST handwritten digit classification. a) Illustration of the computing pro-
cedure. A single-layer softmax neural network is trained offline, converted to target conductance values, and programmed into a 96 × 40 1T1M crossbar
array. For a given input image, the 19 × 20 pixel image is unwrapped to a 380 input vector, converted to voltages, and applied to the memristor matrix in
four sets. The summed result of the four outputs yields the DPE classification. This is illustrated here for the digit “9.” b) Some example classification
results of the softmax single-layer neural network on the inset digits “4,” “9,” and “5” with ideal software results (blue) compared to the experimental
results (red). c) Total recognition accuracy for each digit for 10 000 images from the MNIST database. Results are shown for the single-layer trained
software result compared to the experimental system (same color legend as in (b)). The overall recognition accuracy is 89.9%.
Adv. Mater. 2018, 30, 1705914 1705914 (7 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
input image. The weight matrix was linearly mapped to the A key question is how the power efficiency of such an analog
conductance range of the memristor cells from 100 to 700 µS. DPE system compares to the state of the art, both in performing
The experimental output can show nonideal values caused by VMM computations and more broadly in convolutional neural
a combination of small cell-level programming errors as well network applications. This requires forecasting the DPE when
as stuck ON defects, as can be seen in the output current per implemented in an integrated chip and using scaled technology
column for a few example input patterns (digits 4,9,5) in the nodes. Although the present system operates at 10 MHz, this is
10k MNIST set (Figure 4b). These defects are expected to be due to the larger parasitics of 2 µm technology and probecard
significantly reduced in state-of-the-art fabrication facilities. interfaces. By assuming <100 nm technology and integrated
Our experimental demonstration of inference on the 10k control electronics, an operating frequency of 150 MHz is easily
test patterns from MNIST shows the promise of our hardware achievable, along with analog–digital converters (ADCs) oper-
acceleration. The full recognition accuracy for all 10k testing ating at <133 µW[57] per column channel. Given the 16 320 mul-
patterns as a function of the input digit shows that our hard- tiplications and additions performed in a 128 × 64 array, this
ware implementation can even outperform the software-trained leads to a computational efficiency of 115 TOPS W−1 (Tera Oper-
network (for digits 3,0, see Figure 4c). With linear correction, ations per Second per Watt). In comparison, digital technology
the accuracy of the hardware DPE neural network implemen- performing the same VMM operations at only 4-bit accuracy
tation reaches 89.9%, only a 2.5% reduction compared to the using 40 nm CMOS is estimated to operate at 7 TOPS W−1.[40]
ideal software accuracy. This remaining accuracy loss is due to In addition, the high power and area cost of ADCs could be
device programming inaccuracies and cumulative finite wire alleviated by only applying ADCs to final outputs of multilayer
resistances across the array. It is expected that more faithful neural networks. For a large family of neural network algo-
array programming, fewer memristor cell defects, and espe- rithms, ADCs could also be replaced with simple circuits such
cially the use of multilayer neural network implementations as threshold gates, comparators, or amplifiers, even further
will help close the accuracy gap to software levels (see multi- reducing DPE power and area.
layer simulations[54]). Additionally, using nonlinearity in the The above shows the power efficiency in replacing digital
activation function would be expected to increase performance VMM blocks with analog, memristor-based DPE circuits. An
further. Our results directly computed the forward inference important question is whether such computational efficiency
using existing trained neural networks, showing that these may is maintained at the application level, for example in full con-
be utilized directly without redesigning or retraining for the volutional neural network (CNN) inference where a streaming
particular hardware. This is well-matched to the paradigms of input of images is classified. In an earlier architectural study
IoT and Edge computing, in which the goal is to deploy and and performance estimation,[11] we evaluated CNN applica-
use trained networks in mobile applications where low power tions computed by a DPE-based system composed of many
and low cost (small chip area) are key, but speed and perfor- 128 × 128 arrays, each with only 2 bits per cell and a 10 MHz
mance cannot be compromised. Additionally, in such applica- operating frequency. All ADC elements, data routing, and buff-
tions, a reduced precision can often be tolerated. Consequently, ering are taken into account, along with the costs in breaking
a key figure of merit is the computational power efficiency of down the larger images across multiple arrays. The results show
the system. The present design, including power consumption a 14.8x, 5.5x, and 7.5x improvement in throughput (inferences
in the peripheral circuitry, is estimated to yield 115 TOPS W−1 per second), energy, and computational density (inferences
(Tera operations per second per Watt, see Table S2 in the Sup- per second per chip area) over the leading Digital ASIC imple-
porting Information). mentation[6] for the same task. Thus, the present work experi-
In this work, we have experimentally validated the potential mentally validates the assumptions for that architectural study,
for analog computing in nonvolatile memristor crossbar arrays. providing a base-line for future analog computing systems and
We demonstrated VMM computations in arrays up to 128 × 64, the potential to accelerate and significantly lower energy con-
supporting the assumptions in architectures showing significant sumption for important applications.
acceleration of neural network and signal processing computa-
tions compared to digital implementations.[11,54,55] We further
showed that multiple, stable conductance levels can be realized
in 1T1M cells composed of Ta/HfO2 memristors with 6 bits
Experimental Section
of precision. Given the low-precision requirements in many Transistor and Memristor Integration: An array of NMOS transistors
applications,[50–52] this exceeds the demands considerably. We were fabricated in a commercial fab with low wire resistance. The
transistors had a feature size of 2 µm. The memristor arrays were
showed that large memristor crossbars can carry out single-
fabricated in a university clean room aligned to the underlying
step VMM operations in-memory leading to high-throughput transistors following an argon plasma to remove protective metal-oxide
and low-energy consumption. The computing accuracy was layers. Photolithography patterning was used, along with thin-film
shown to be acceptable for machine learning applications deposition and liftoff. Sputter deposition of 5 nm silver (Ag) and 200 nm
including image recognition, and we also demonstrated repro- palladium (Pd) was used for the metal vias. After lifting-off in warm
grammability for multiple applications. Direct implementation acetone, the sample was annealed at 300 °C for 30 min in nitrogen
of the forward inference of a neural network for MNIST image with a flow of 20 sccm. A 60 nm Pd layer with a 5 nm tantalum (Ta)
adhesive layer was sputtered to serve as the bottom electrode. A 5 nm
recognition was shown, yielding 89.9% accuracy for a single HfO2 switching layer was deposited by atomic layer deposition using
layer. Reducing stuck-ON cells and using multilayer neural water and tetrakis(dimethylamido)hafnium as precursors at 250 °C,
networks are expected to increase the accuracy to state of the to ensure high film quality and step coverage. The patterning of the
art.[56] switching layer was done by photolithography and reactive ion etch
Adv. Mater. 2018, 30, 1705914 1705914 (8 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
Adv. Mater. 2018, 30, 1705914 1705914 (9 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
15214095, 2018, 9, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/adma.201705914 by The Librarian, Wiley Online Library on [10/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
www.advancedsciencenews.com www.advmat.de
[29] Z. Wang, S. Joshi, S. E. Savel’ev, H. Jiang, R. Midya, P. Lin, M. Hu, [43] S. Yu, P. Y. Chen, Y. Cao, L. Xia, Y. Wang, H. Wu, in Proc. 2015 IEEE
N. Ge, J. P. Strachan, Z. Li, Q. Wu, M. Barnell, G.-L. Li, H. L. Xin, Int. Electron Devices Meeting (IEDM), IEEE, Piscatawy, NJ, USA
R. S. Williams, Q. Xia, J. J. Yang, Nat. Mater. 2017, 16, 101. 2015, pp. 17.3.1–17.3.4.
[30] M. Suri, O. Bichler, D. Querlioz, O. Cueto, L. Perniola, V. Sousa, [44] A. Wedig, M. Luebben, D.-Y. Cho, M. Moors, K. Skaja, V. Rana,
D. Vuillaume, C. Gamrat, B. DeSalvo, in Proc. 2011 IEEE Int. Electron T. Hasegawa, K. K. Adepalli, B. Yildiz, R. Waser, I. Valov, Nat. Nano-
Devices Meeting (IEDM), IEEE, Piscatawy, NJ, USA 2011, p. 4. technol. 2015, 11, 67.
[31] S. Yu, B. Gao, Z. Fang, H. Yu, J. Kang, H.-S. P. Wong, in Proc. 2012 [45] H. Jiang, L. Han, P. Lin, Z. Wang, M. H. Jang, Q. Wu, M. Barnell,
IEEE Int. Electron Devices Meeting (IEDM), IEEE, Piscatawy, NJ, USA J. J. Yang, H. L. Xin, Q. Xia, Sci. Rep. 2016, 6, 28525.
2012, pp. 10–14. [46] C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang,
[32] G. W. Burr, P. Narayanan, R. M. Shelby, S. Sidler, I. Boybat, W. Song, N. Davila, C. E. Graves, Z. Li, J. P. Strachan, P. Lin,
C. di Nolfo, Y. Leblebici, in Proc. 2015 IEEE Int. Electron Devices Z. Wang, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, Q. Xia, Nat.
Meeting (IEDM), IEEE, Piscatawy, NJ, USA 2015, p. 4. Electron. 2017, https://doi.org/10.1038/s41928-017-0002-z.
[33] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler, [47] S. Agarwal, T.-T. Quach, O. Parekh, A. H. Hsia, E. P. DeBenedictis,
K. Virwani, M. Ishii, P. Narayanan, A. Fumarola, L. L. Sanches, C. D. James, M. J. Marinella, J. B. Aimone, Front. Neurosci. 2016, 9,
I. Boybat, M. Le Gallo, K. Moon, J. Woo, H. Hwang, Y. Leblebici, 484.
Adv. Phys.: X 2017, 2, 89. [48] K.-H. Kim, S. Gaba, D. Wheeler, J. M. Cruz-Albrecht, T. Hussain,
[34] M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, N. Srinivasa, W. Lu, Nano Lett. 2012, 12, 389.
K. K. Likharev, D. B. Strukov, Nature 2015, 521, 61. [49] G. Medeiros-Ribeiro, F. Perner, R. Carter, H. Abdalla, M. D. Pickett,
[35] P. M. Sheridan, C. Du, W. D. Lu, IEEE Trans. Neural Netw. Learn. R. S. Williams, Nanotechnology 2011, 22, 95702.
Syst. 2016, 27, 2327. [50] M. Courbariaux, Y. Bengio, J.-P. David, presented at International
[36] S. Ambrogio, S. Balatti, V. Milo, R. Carboni, Z.-Q. Wang, Conference on Learning Representations (ICLR), San Diego, CA, USA
A. Calderoni, N. Ramaswamy, D. Ielmini, IEEE Trans. Electron May 2015.
Devices 2016, 63, 1508. [51] M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, in Proc. Euro-
[37] S. Yu, Z. Li, P.-Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, H. Qian, pean Conf. on Computer Vision, Springer, Cham, Switzerland 2016,
in Proc. 2016 IEEE Int. Electron Devices Meeting (IEDM), IEEE, pp. 525–542.
Piscatawy, NJ, USA 2016, pp. 12–16. [52] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, Y. Zou, arXiv Prepr.
[38] S. Agarwal, S. J. Plimpton, D. R. Hughart, A. H. Hsia, I. Richter, arXiv1606.06160 2016.
J. A. Cox, C. D. James, M. J. Marinella, in Proc. 2016 Int. Joint Conf. [53] Y. LeCun, C. Cortes, C. J. C. Burges, The MNIST Database of Hand-
on Neural Networks, (IJCNN), IEEE, Piscatawy, NJ, USA 2016, written Digits, National Institute of Standards and Technology,
pp. 929–938. Gaithersburg, MD, USA 1998.
[39] G. Indiveri, E. Linn, S. Ambrogio, Resistive Switching: From Funda- [54] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves,
mentals of Nanoionic Redox Processes to Memristive Device Applications, S. Lam, N. Ge, J. J. Yang, R. S. Williams, in Proc. 2016 53nd ACM/
(Eds: D. Ielmini, R. Waser), Wiley-VCH, Weinheim, Germany 2016, EDAC/IEEE Design Automation Conf., (DAC), Association for Com-
pp. 715–736. puting Machinery, New York 2016, pp. 1–6.
[40] P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang, W. D. Lu, [55] M. Hu, J. P. Strachan, in Proc. IEEE Int. Conf. on Rebooting Computing
Nat. Nanotechnol. 2017, 12, 784. (ICRC), IEEE, Piscatawy, NJ, USA 2016, pp. 1–5.
[41] P. Yao, H. Wu, B. Gao, S. B. Eryilmaz, X. Huang, W. Zhang, [56] C. Liu, M. Hu, J. P. Strachan, H. Li, in 2017 54th ACM/EDAC/IEEE
Q. Zhang, N. Deng, L. Shi, H.-S. P. Wong, H. Qian, Nat. Commun. Design Automation Conference (DAC), Association for Computing
2017, 8, 15199. Machinery, New York 2017, pp. 1–6.
[42] G. W. Burr, R. M. Shelby, S. Sidler, C. di Nolfo, J. Jang, I. Boybat, [57] G. Van Der Plas, B. Verbruggen, in Proc. 2008 IEEE Int. Solid-State
R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti, Circuits Conf. (ISSCC) Dig. Tech. Pap., IEEE, Piscatawy, NJ, USA
B. N. Kurdi, H. Hwang, IEEE Trans. Electron Devices 2015, 62, 3498. 2008, pp. 242–610.
Adv. Mater. 2018, 30, 1705914 1705914 (10 of 10) © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim