Gan TNNLS
Gan TNNLS
LICENSE
CC BY 4.0
13-04-2022 / 19-04-2022
CITATION
Sahay, Shubham; Nikam, Honey; Satyam, Siddharth (2022): Energy-Efficient Implementation of Generative
Adversarial Networks on Passive RRAM Crossbar Arrays. TechRxiv. Preprint.
https://doi.org/10.36227/techrxiv.19590667.v1
DOI
10.36227/techrxiv.19590667.v1
1
Abstract— Generative algorithms such as GANs are at data bearing similar features as that of the original data set,
the cusp of next revolution in the field of unsupervised have been extensively explored [1], [2]. Generative Adversar-
learning and large-scale artificial data generation. How- ial Networks (GANs), a subclass of generative models, are
ever, the adversarial (competitive) co-training of the dis-
criminative and generative networks in GAN makes them considered one of the most promising approaches towards
computationally intensive and hinders their deployment large-scale synthetic data generation and unsupervised/semi-
on the resource-constrained IoT edge devices. Moreover, supervised learning [1], [2]. GANs have been applied to solve
the frequent data transfer between the discriminative and a wide variety of problems including image synthesis and
generative networks during training significantly degrades super-resolution, 3D object generation, image-to-text trans-
the efficacy of the von-Neumann GAN accelerators such
as those based on GPU and FPGA. Therefore, there is an lation (and vice-versa), image-to-image translation, speech
urgent need for development of ultra-compact and energy- recognition, attention prediction, autonomous driving, etc. [1]–
efficient hardware accelerators for GANs. To this end, [3]. In GANs, two adversarial (competitive) networks are co-
in this work, we propose to exploit the passive RRAM trained alternately: the generator network is trained to produce
crossbar arrays for performing key operations of a fully- artificial (fake) data which may not be distinguishable from the
connected GAN: (a) true random noise generation for
the generator network, (b) vector-by-matrix-multiplication original data set; while the discriminator network is trained
with unprecedented energy-efficiency during the forward to classify whether the input data belongs to the original
pass and backward propagation and (C) in-situ adversarial data set or obtained from the generator network. The training
training using a hardware friendly Manhattan’s rule. Our methodology for GAN involves movement of large amount
extensive analysis utilizing an experimentally calibrated of data between the two adversarial networks leading to a
phenomological model for passive RRAM crossbar array
reveals an unforeseen trade-off between the accuracy and significant memory and computational resource consumption
the energy dissipated while training the GAN network with [3]–[9] which restricts their application in IoT edge devices
different noise inputs to the generator. Furthermore, our with limited energy and area. Therefore, development of a
results indicate that the spatial and temporal variations compact and ultra-low power GAN processing engine is in-
and true random noise, which are otherwise undesirable dispensable for enabling unsupervised learning and generating
for memory application, boost the energy-efficiency of the
GAN implementation on passive RRAM crossbar arrays artificial data on resource-constrained IoT edge devices.
without degrading its accuracy. The digital GAN accelerators based on FPGA and GPUs
exhibit significantly high latency and energy consumption
Index Terms— Generative Adversarial Networks, Passive
RRAM crossbar, Manhattan’s rule, True random noise. owing to the intensive data shuffling between the storage
and computational blocks due to their inherent von-Neumann
architecture. Since vector-by-matrix multiplication (VMM)
I. I NTRODUCTION is the fundamental operation in GANs during the forward
The unprecedented development in the field of supervised pass and backward propagation using optimization algorithms
deep learning models, which rely on large labelled or an- such as gradient descent, RMSprop, ADAM, etc. [1]–[3],
notated data sets, has transformed almost all the facets of the data shuffling and the energy consumption while training
human endeavor in this era of internet of things (IoT) and GANs can be significantly reduced by performing in-memory
big data. However, their application is limited in domains VMM operations exploiting cross-point arrays of emerg-
where generating such a labelled data is extremely difficult ing non-volatile memories. Recently, several innovative deep
or costly. Therefore, unsupervised and semi-supervised gen- convolutional (DC) GAN architectures including layer-wise
erative models, which may learn the patterns or features in a pipelined computations [4], efficient deconvolutional operation
complicated data set and generate high-quality artificial (fake) [5], computational deformation technique to facilitate efficient
utilization of computational resources in transpose convolution
Siddharth Satyam and Honey Nikam are with the Department of
Mechanical Engineering, Indian Institute of Technology Kanpur, Kanpur [6], and a ternary GAN [7] utilizing in-memory VMM engines
208016, India based on RRAMs (with binary and 2-bit storage capability)
Shubham Sahay is with the Department of Electrical Engineering, and SOT-MRAMs were proposed. Moreover, a hybrid CMOS-
Indian Institute of Technology Kanpur, Kanpur 208016, India (e-mail:
[email protected]). This work was partially supported by the Semicon- analog RRAM-based implementation of DCGAN (without the
ductor Research Corporation (SRP Task 3056.001) and IIT Kanpur. pooling layer) including digital error propagation and weight
2
update units was also proposed [8]. However, the non-linearity the generator network on the training energy and accuracy of
in RRAM conductance update during the training process was the fully-connected GAN. Our extensive analysis utilizing an
not considered, and the weight sign crossbar and sequential experimentally calibrated phenomological model for passive
reading of output limits the efficacy of the analog DCGAN RRAM crossbar array (which accurately captures the non-
implementation. ideal effects such as spatial and temporal device variations
Recently, a fully-connected GAN implementation utilizing and noise) indicates that the proposed GAN implementation
the intrinsic read noise (conductance fluctuation during the with true random noise input to the generator and device-
read operation) and write noise (imprecise conductance tuning to-device variations in the passive RRAM crossbar array ex-
during the write operation) of active (1T-1R) RRAM crossbar hibits a significantly enhanced energy-efficiency with accuracy
array was experimentally demonstrated in [9] for generating comparable to the software implementation. Our results may
(3 classes of) artificial digital patterns of handwritten digits provide incentive for experimental demonstration of GANs on
after training on reduced MNIST dataset. An optimal level passive RRAM crossbar arrays.
of write noise was found to increase the diversity in the The manuscript is organized as follows: Sections I.A and
generated patterns and mitigate the mode dropping issue [9]. I.B provide a brief overview of the fully-connected GANs
Although the selector MOSFET in the active (1T-1R) RRAM and passive RRAM crossbar arrays, respectively. The intricate
configuration reduces the cell leakage current, improves the details regarding the simulation framework developed in this
tuning precision, provides current compliance, and facilitates work utilizing an experimentally calibrated phenomological
partial/selective programming of the array for large-scale hard- model for passive RRAM crossbar array are discussed in
ware demonstrations of neuromorphic networks [10]–[14], it section II. The performance estimates of the proposed GAN
also leads to a significantly large area overhead. The scalability implementation using passive RRAM crossbar arrays for im-
of the selector MOSFET is limited since it has to provide portant metrics such as accuracy, diversity in the generated
large programming/forming currents to the RRAM device and images, area and energy are reported in section III while the
sustain high voltages during forming/write operation [15]. conclusions are drawn in section IV.
On the other hand, the passive RRAM crossbar arrays ex-
hibit a significantly reduced area, lower fabrication complexity A. Generative Adversarial Networks (GANs)
and cost, and an inherent scaling benefit since they do not
Fully-connected GANs rely on an adversarial training pro-
require a selector MOSFET [15]–[21]. However, unlike active
cess that involves a zero-sum game between two multi-layer
(1T-1R) RRAM cells, the passive RRAM crossbar cells are
perceptrons: the generator and the discriminator network as
susceptible to sneak path leakage currents and half-select
shown in Fig. 1. Two conflicting objectives must be ful-
cell disturbance [16], [17]. Moreover, the spatial variation
filled while training a fully-connected GAN: the discriminator
in the switching threshold and limited crossbar yield de-
should be able to predict accurately whether the data produced
grades their performance [16], [17]. Nevertheless, the recent
by the generator is real or fake, and the generator should be
advancements in the fabrication process and material stacks
able to synthesize artificial data that contains features which
for RRAMs, novel programming schemes and conductance
are indistinguishable from the original data and deceive the
mapping techniques have enabled the realization and CMOS
discriminator to classify it as real data. The generator network
BEOL integration of large passive RRAM crossbar arrays with
is fed with a noise input and produces a fake image based on
high conductance tuning precision, large yield and uniformity,
its weights and parameters. These fake images are fed to the
highly non-linear characteristics and significantly suppressed
discriminator along with the real images and the discriminator
sneak path leakage current [15], [18]–[20]. Considering the
promising scaling prospects of the passive RRAM crossbar
arrays, it becomes imperative to explore their potential for
implementation of GANs. Moreover, the inherent spatial and
temporal variations of the passive RRAM crossbar array can
be explored to extract a true random noise source for the
generator network.
To this end, in this work, we develop a hardware-aware
simulation framework and demonstrate a compact and ultra-
energy efficient fully-connected GAN utilizing passive RRAM
crossbar arrays for synthetic image generation. We propose
a methodology to generate true random noise using passive
RRAM crossbar array for input to the generator network and
perform the VMM during the forward pass and backward
propagation, and in-situ adversarial training of GAN using a
hardware friendly Manhattan’s rule (fixed pulse training) on
passive RRAM crossbar arrays. While most hardware solutions Fig. 1. The schematic representation of a fully-connected GAN. The
generator produces a fake image while the discriminator gives the output
aimed towards efficient implementation of GANs focus on probability for both real and fake image inputs (represented by the black
discriminative networks, we also investigated the impact of boxes). These are fed back during backward propagation (represented
different (pseudo random and true random) noise input to by the dashed blue lines).
3
Fig. 5. Conductance evolution while training the GAN implemented on 54 (64 × 64) RRAM crossbar arrays for generator network with (a) true
random noise input considering device-to-device variations and (b) pseudo random noise input.
RRAMs across ≈2 million data points [30]. The change in different noise-input to the generator in terms of metrics such
the conductance-state ∆G follows the dynamic equation [30]: as accuracy, energy consumption and area. Moreover, to inves-
tigate the impact of device-to-device variations on the efficacy
∆G = Dm (G0 , Vp , tp ) + Dd2d (G0 , Vp , tp ) (4)
of GAN, we analyse the performance of the proposed GAN
where Dm is the expected noise-free absolute conductance implementation both in the presence and absence of spatial
change which depends on the amplitude Vp and duration tp ) variations (by appropriately switching the mismatch/variation
of the voltage pulse as well as the conductance-state, and Dd2d flag in the compact model).
represents the device-to-device variations for different RRAMs For efficient performance benchmarking, we train (a) soft-
on the passive crossbar array. ware GAN, and the GAN implementations based on passive
RRAM crossbar array with (b) psuedo-random noise input, (c)
III. RESULTS AND DISCUSSION true-random noise input without considering device-to-device
We perform an extensive analysis of the proposed GAN variations and (d) true-random noise input considering device-
implementation on passive RRAM crossbar array utilizing the to-device variations for synthesizing digit ”3” and divide the
hardware-aware simulation framework developed in section II. training set from MNIST data set into 10 batches (of size
We compare the performance of the GAN implementation with 608). The evolution of the conductance-states of the cells in
Fig. 6. The synthetic images of digit ”3” generated by (a) software GAN implementation and the GAN implementation based on the passive RRAM
crossbar array utilizing (b) pseudo random normal noise input to the generator network and (c) true random noise input to the generator network
without considering the spatial variations (d) true random noise input to the generator network while considering the device-to-device variations.
6
Fig. 7. Synthetic images of different digits generated by the proposed GAN implementation on passive RRAM crossbar array exploiting true random
noise input to the generator network.
IV. C ONCLUSIONS
In this work, we have proposed a highly scalable, compact
and energy-efficient GAN accelerator which performs the key
operations such as forward pass and backward propagation,
training and noise generation in-situ on a passive RRAM
crossbar array. Unlike the prior GAN implementations, we
have also evaluated the impact of the noise input used for
training the generator network on the performance of GAN.
Our extensive investigation utilizing an experimentally cali-
brated phenomenological model for passive RRAM crossbar
array reveals that training GAN with a true-random noise
input leads to a significant reduction in the training energy
without degrading the accuracy. Our results may encourage
experimental demonstration of GAN accelerators on passive
RRAM crossbar arrays.
[4] F. Chen, L. Song and Y. Chen, ”ReGAN: A pipelined ReRAM-based [24] S. Sahay, M. Bavandpour, M. R. Mahmoodi and D. Strukov, “A 2T-1R
accelerator for generative adversarial networks,” 2018 23rd Asia and cell array with high dynamic range for mismatch-robust and efficient
South Pacific Design Automation Conference (ASP-DAC), 2018, pp. 178- neurocomputing,” in proc. IEEE Int. Memory Workshop (IMW), pp. 1-4,
183, doi: 10.1109/ASPDAC.2018.8297302. 2020. doi:10.1109/IMW48823.2020.9108142.
[5] Z. Fan, Z. Li, B. Li, Y. Chen and H. Li, ”RED: A ReRAM-based [25] M. Bavandpour, S. Sahay, M. R. Mahmoodi and D. Strukov, ”Ef-
Deconvolution Accelerator,” 2019 Design, Automation and Test in ficient mixed-signal neurocomputing via successive integration and
Europe Conference Exhibition (DATE), 2019, pp. 1763-1768, doi: division,” IEEE Trans. VLSI systems, vol. 28, no. 3, pp. 823-827, 2020.
10.23919/DATE.2019.8715103. doi:10.1109/TVLSI.2019.2946516.
[6] F. Chen, L. Song, H. Li and Y. Chen, ”ZARA: A Novel Zero- [26] H. Nili, G. C. Adam, B. Hoskins, M. Prezioso, J. Kim, M. R. Mahmoodi,
free Dataflow Accelerator for Generative Adversarial Networks in F. M. Bayat, O. Kavehei, and D. B. Strukov, ”Hardware-intrinsic
3D ReRAM,” 2019 56th ACM/IEEE Design Automation Conference security primitives enabled by analogue state and nonlinear conductance
(DAC), 2019, pp. 1-6. variations in integrated memristors,” Nature Electronics, vol. 1, no. 3,
[7] A. S. Rakin, S. Angizi, Z. He and D. Fan, ”PIM-TGAN: A Processing- pp.197-202, 2018. doi: 10.1038/s41928-018-0039-7.
in-Memory Accelerator for Ternary Generative Adversarial Networks,” [27] S. Sahay, A. Kumar, V. Parmar, and M. Suri, ”OxRAM RNG circuits
2018 IEEE 36th International Conference on Computer Design (ICCD), exploiting multiple undesirable nanoscale phenomena,” IEEE Trans-
2018, pp. 266-273, doi: 10.1109/ICCD.2018.00048. actions on Nanotechnology, vol. 16, no. 4, pp.560-566, 2017. doi:
[8] O. Krestinskaya, B. Choubey, and A. P. James, ”Memristive GAN 10.1109/TNANO.2016.2647623.
in analog.” Scientific reports, vol. 10, no. 1, pp.1-14, 2020. [28] S. Sahay, and M. Suri, ”Recent trends in hardware security exploit-
doi:10.1038/s41598-020-62676-7. ing hybrid CMOS-resistive memory circuits,” Semiconductor Science
[9] Y. Lin et al., ”Demonstration of Generative Adversarial Network by and Technology, vol. 32, no. 12, p.123001, 2017. doi: 10.1088/1361-
Intrinsic Random Noises of Analog RRAM Devices,” 2018 IEEE 6641/aa8f07.
International Electron Devices Meeting (IEDM), 2018, pp. 3.4.1-3.4.4, [29] I. Kataeva, F. Merrikh-Bayat, E. Zamanidoost and D. Strukov, “Efficient
doi: 10.1109/IEDM.2018.8614483. training algorithms for neural networks based on memristive crossbar
[10] H. Wu et al., ”Device and circuit optimization of RRAM for neuromor- circuits,” in proc. IEEE Int. Joint Conf. Neural Networks (IJCNN), pp.
phic computing,” 2017 IEEE International Electron Devices Meeting 1-8, 2015. doi: 10.1109/IJCNN.2015.7280785.
(IEDM), 2017, pp. 11.5.1-11.5.4, doi: 10.1109/IEDM.2017.8268372. [30] H. Nili, A. F. Vincent, M. Prezesio, M. R. Mahmoodi, I. Kataeva and
[11] P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J.J. Yang and D. B. Strukov, “Comprehensive compact phenomenological modeling of
H. Qian, “Fully hardware-implemented memristor convolutional neural integrated metal-oxide memristors,” IEEE Trans. Nanotechnology, vol.
network,” Nature, vol. 577, pp.641-646, 2020. doi:10.1038/s41586-020- 19, pp. 344-349, 2020. doi: 10.1109/TNANO.2020.2982128.
1942-4. [31] G. C. Adam, B. D. Hoskins, M. Prezioso, F. Merrikh-Bayat, B.
[12] M. Hu, C.E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, Chakrabarti and D. B. Strukov, “3-D memristor crossbars for analog and
H. Jiang, R.S. Williams, J.J. Yang and Q. Xia, “Memristor-based analog neuromorphic computing applications,” IEEE Trans. Electron Devices,
computation and neural network classification with a dot product en- vol. 64, no. 1, pp.312-318, 2016. doi:10.1109/TED.2016.2630925.
gine,” Advanced Materials, vol. 30, 2018. doi:10.1002/adma.201705914.
[13] S. Ambrogio, S. Balatti, V. Milo, R. Carboni, Z.Q. Wang, A. Calderoni,
N. Ramaswamy and D. Ielmini, “Neuromorphic learning and recogni-
tion with one-transistor-one-resistor synapses and bistable metal oxide
RRAM,” IEEE Transactions on Electron Devices, vol. 63, no. 4,
pp.1508-1515, 2016. doi: 10.1109/TED.2016.2526647.
[14] F. Cai, J.M. Correll, S.H. Lee, Y. Lim, V. Bothra, Z. Zhang, M.P. Flynn
and W.D. Lu, “A fully integrated reprogrammable memristor–CMOS
system for efficient multiply–accumulate operations,” Nature Electronics
vol. 2, pp.290-299, 2019. doi:10.1038/s41928-019-0270-x.
[15] H. Kim, H. Nili, M. Mahmoodi and D. Strukov, “4K-memristor analog-
grade passive crossbar circuit,”Nat. Comm., vol. 12, no. 1, pp.1-11, 2021.
doi:10.1038/s41467-021-25455-0.
[16] M. Prezioso, et al. Training and operation of an integrated neuromorphic
network based on metal-oxide memristors. Nature 521, 61–64 (2015).
https://doi.org/10.1038/nature14441.
[17] F. Alibart, E. Zamanidoost, and D. Strukov, “Pattern classification by
memristive crossbar circuits using ex situ and in situ training,” Nat.
comm., vol. 4, no. 1, pp.1-7, 2013. doi:10.1038/ncomms3072.
[18] F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva and D.
Strukov, “Implementation of multilayer perceptron network with highly
uniform passive memristive crossbar circuits,” Nat. comm., vol. 9, no.
1, pp.1-7, 2018. doi:10.1038/s41467-018-04482-4.
[19] P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang and W. D. Lu,
“Sparse coding with memristor networks”, Nature nanotechnology,
12(8), pp.784-789, 2017. doi:10.1038/nnano.2017.83.
[20] H. Yeon, P. Lin, C. Choi, S. H. Tan, Y. Park, D. Lee, J. Lee, F. Xu, B.
Gao, H. Wu, H. Qian, Y. Nie, S. Kim and J. Kim, “Alloying conducting
channels for reliable neuromorphic computing”, Nat. Nanotechnol 15,
574–579 (2020). doi:10.1038/s41565-020-0694-5.
[21] H. Nikam, S. Satyam and S. Sahay, ”Long Short-Term Memory
Implementation Exploiting Passive RRAM Crossbar Array,” in IEEE
Transactions on Electron Devices, doi: 10.1109/TED.2021.3133197.
[22] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S.
Lam, N. Ge, J. J. Yang and R. S. Williams, “Dot-product engine for
Neuromorphic computing: Programming 1T1M crossbar to accelerate
matrix-vector multiplication,” in Proc. 53rd ACM/IEEE Design Automat.
Conf. (DAC), pp. 1–6, 2016. doi:10.1145/2897937.2898010.
[23] M. J. Marinella et al., ”Multiscale Co-Design Analysis of Energy,
Latency, Area, and Accuracy of a ReRAM Analog Neural Training
Accelerator,” in IEEE Journal on Emerging and Selected Topics in
Circuits and Systems, vol. 8, no. 1, pp. 86-101, 2018. doi: 10.1109/JET-
CAS.2018.2796379.