0% found this document useful (0 votes)
7 views

Gan TNNLS

This paper proposes implementing generative adversarial networks (GANs) using passive resistive random access memory (RRAM) crossbar arrays to achieve energy-efficient vector-matrix multiplication. The paper develops a simulation framework using an experimentally calibrated model for passive RRAM. Analysis shows the GAN implementation with true random noise input and device variations in the passive RRAM array has significantly improved energy efficiency compared to software, while maintaining comparable accuracy.

Uploaded by

Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Gan TNNLS

This paper proposes implementing generative adversarial networks (GANs) using passive resistive random access memory (RRAM) crossbar arrays to achieve energy-efficient vector-matrix multiplication. The paper develops a simulation framework using an experimentally calibrated model for passive RRAM. Analysis shows the GAN implementation with true random noise input and device variations in the passive RRAM array has significantly improved energy efficiency compared to software, while maintaining comparable accuracy.

Uploaded by

Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Energy-Efficient Implementation of Generative Adversarial

Networks on Passive RRAM Crossbar Arrays


This paper was downloaded from TechRxiv (https://www.techrxiv.org).

LICENSE

CC BY 4.0

SUBMISSION DATE / POSTED DATE

13-04-2022 / 19-04-2022

CITATION

Sahay, Shubham; Nikam, Honey; Satyam, Siddharth (2022): Energy-Efficient Implementation of Generative
Adversarial Networks on Passive RRAM Crossbar Arrays. TechRxiv. Preprint.
https://doi.org/10.36227/techrxiv.19590667.v1

DOI

10.36227/techrxiv.19590667.v1
1

Energy-Efficient Implementation of Generative


Adversarial Networks on Passive RRAM
Crossbar Arrays
Siddharth Satyam, Honey Nikam and Shubham Sahay, Member, IEEE

Abstract— Generative algorithms such as GANs are at data bearing similar features as that of the original data set,
the cusp of next revolution in the field of unsupervised have been extensively explored [1], [2]. Generative Adversar-
learning and large-scale artificial data generation. How- ial Networks (GANs), a subclass of generative models, are
ever, the adversarial (competitive) co-training of the dis-
criminative and generative networks in GAN makes them considered one of the most promising approaches towards
computationally intensive and hinders their deployment large-scale synthetic data generation and unsupervised/semi-
on the resource-constrained IoT edge devices. Moreover, supervised learning [1], [2]. GANs have been applied to solve
the frequent data transfer between the discriminative and a wide variety of problems including image synthesis and
generative networks during training significantly degrades super-resolution, 3D object generation, image-to-text trans-
the efficacy of the von-Neumann GAN accelerators such
as those based on GPU and FPGA. Therefore, there is an lation (and vice-versa), image-to-image translation, speech
urgent need for development of ultra-compact and energy- recognition, attention prediction, autonomous driving, etc. [1]–
efficient hardware accelerators for GANs. To this end, [3]. In GANs, two adversarial (competitive) networks are co-
in this work, we propose to exploit the passive RRAM trained alternately: the generator network is trained to produce
crossbar arrays for performing key operations of a fully- artificial (fake) data which may not be distinguishable from the
connected GAN: (a) true random noise generation for
the generator network, (b) vector-by-matrix-multiplication original data set; while the discriminator network is trained
with unprecedented energy-efficiency during the forward to classify whether the input data belongs to the original
pass and backward propagation and (C) in-situ adversarial data set or obtained from the generator network. The training
training using a hardware friendly Manhattan’s rule. Our methodology for GAN involves movement of large amount
extensive analysis utilizing an experimentally calibrated of data between the two adversarial networks leading to a
phenomological model for passive RRAM crossbar array
reveals an unforeseen trade-off between the accuracy and significant memory and computational resource consumption
the energy dissipated while training the GAN network with [3]–[9] which restricts their application in IoT edge devices
different noise inputs to the generator. Furthermore, our with limited energy and area. Therefore, development of a
results indicate that the spatial and temporal variations compact and ultra-low power GAN processing engine is in-
and true random noise, which are otherwise undesirable dispensable for enabling unsupervised learning and generating
for memory application, boost the energy-efficiency of the
GAN implementation on passive RRAM crossbar arrays artificial data on resource-constrained IoT edge devices.
without degrading its accuracy. The digital GAN accelerators based on FPGA and GPUs
exhibit significantly high latency and energy consumption
Index Terms— Generative Adversarial Networks, Passive
RRAM crossbar, Manhattan’s rule, True random noise. owing to the intensive data shuffling between the storage
and computational blocks due to their inherent von-Neumann
architecture. Since vector-by-matrix multiplication (VMM)
I. I NTRODUCTION is the fundamental operation in GANs during the forward
The unprecedented development in the field of supervised pass and backward propagation using optimization algorithms
deep learning models, which rely on large labelled or an- such as gradient descent, RMSprop, ADAM, etc. [1]–[3],
notated data sets, has transformed almost all the facets of the data shuffling and the energy consumption while training
human endeavor in this era of internet of things (IoT) and GANs can be significantly reduced by performing in-memory
big data. However, their application is limited in domains VMM operations exploiting cross-point arrays of emerg-
where generating such a labelled data is extremely difficult ing non-volatile memories. Recently, several innovative deep
or costly. Therefore, unsupervised and semi-supervised gen- convolutional (DC) GAN architectures including layer-wise
erative models, which may learn the patterns or features in a pipelined computations [4], efficient deconvolutional operation
complicated data set and generate high-quality artificial (fake) [5], computational deformation technique to facilitate efficient
utilization of computational resources in transpose convolution
Siddharth Satyam and Honey Nikam are with the Department of
Mechanical Engineering, Indian Institute of Technology Kanpur, Kanpur [6], and a ternary GAN [7] utilizing in-memory VMM engines
208016, India based on RRAMs (with binary and 2-bit storage capability)
Shubham Sahay is with the Department of Electrical Engineering, and SOT-MRAMs were proposed. Moreover, a hybrid CMOS-
Indian Institute of Technology Kanpur, Kanpur 208016, India (e-mail:
[email protected]). This work was partially supported by the Semicon- analog RRAM-based implementation of DCGAN (without the
ductor Research Corporation (SRP Task 3056.001) and IIT Kanpur. pooling layer) including digital error propagation and weight
2

update units was also proposed [8]. However, the non-linearity the generator network on the training energy and accuracy of
in RRAM conductance update during the training process was the fully-connected GAN. Our extensive analysis utilizing an
not considered, and the weight sign crossbar and sequential experimentally calibrated phenomological model for passive
reading of output limits the efficacy of the analog DCGAN RRAM crossbar array (which accurately captures the non-
implementation. ideal effects such as spatial and temporal device variations
Recently, a fully-connected GAN implementation utilizing and noise) indicates that the proposed GAN implementation
the intrinsic read noise (conductance fluctuation during the with true random noise input to the generator and device-
read operation) and write noise (imprecise conductance tuning to-device variations in the passive RRAM crossbar array ex-
during the write operation) of active (1T-1R) RRAM crossbar hibits a significantly enhanced energy-efficiency with accuracy
array was experimentally demonstrated in [9] for generating comparable to the software implementation. Our results may
(3 classes of) artificial digital patterns of handwritten digits provide incentive for experimental demonstration of GANs on
after training on reduced MNIST dataset. An optimal level passive RRAM crossbar arrays.
of write noise was found to increase the diversity in the The manuscript is organized as follows: Sections I.A and
generated patterns and mitigate the mode dropping issue [9]. I.B provide a brief overview of the fully-connected GANs
Although the selector MOSFET in the active (1T-1R) RRAM and passive RRAM crossbar arrays, respectively. The intricate
configuration reduces the cell leakage current, improves the details regarding the simulation framework developed in this
tuning precision, provides current compliance, and facilitates work utilizing an experimentally calibrated phenomological
partial/selective programming of the array for large-scale hard- model for passive RRAM crossbar array are discussed in
ware demonstrations of neuromorphic networks [10]–[14], it section II. The performance estimates of the proposed GAN
also leads to a significantly large area overhead. The scalability implementation using passive RRAM crossbar arrays for im-
of the selector MOSFET is limited since it has to provide portant metrics such as accuracy, diversity in the generated
large programming/forming currents to the RRAM device and images, area and energy are reported in section III while the
sustain high voltages during forming/write operation [15]. conclusions are drawn in section IV.
On the other hand, the passive RRAM crossbar arrays ex-
hibit a significantly reduced area, lower fabrication complexity A. Generative Adversarial Networks (GANs)
and cost, and an inherent scaling benefit since they do not
Fully-connected GANs rely on an adversarial training pro-
require a selector MOSFET [15]–[21]. However, unlike active
cess that involves a zero-sum game between two multi-layer
(1T-1R) RRAM cells, the passive RRAM crossbar cells are
perceptrons: the generator and the discriminator network as
susceptible to sneak path leakage currents and half-select
shown in Fig. 1. Two conflicting objectives must be ful-
cell disturbance [16], [17]. Moreover, the spatial variation
filled while training a fully-connected GAN: the discriminator
in the switching threshold and limited crossbar yield de-
should be able to predict accurately whether the data produced
grades their performance [16], [17]. Nevertheless, the recent
by the generator is real or fake, and the generator should be
advancements in the fabrication process and material stacks
able to synthesize artificial data that contains features which
for RRAMs, novel programming schemes and conductance
are indistinguishable from the original data and deceive the
mapping techniques have enabled the realization and CMOS
discriminator to classify it as real data. The generator network
BEOL integration of large passive RRAM crossbar arrays with
is fed with a noise input and produces a fake image based on
high conductance tuning precision, large yield and uniformity,
its weights and parameters. These fake images are fed to the
highly non-linear characteristics and significantly suppressed
discriminator along with the real images and the discriminator
sneak path leakage current [15], [18]–[20]. Considering the
promising scaling prospects of the passive RRAM crossbar
arrays, it becomes imperative to explore their potential for
implementation of GANs. Moreover, the inherent spatial and
temporal variations of the passive RRAM crossbar array can
be explored to extract a true random noise source for the
generator network.
To this end, in this work, we develop a hardware-aware
simulation framework and demonstrate a compact and ultra-
energy efficient fully-connected GAN utilizing passive RRAM
crossbar arrays for synthetic image generation. We propose
a methodology to generate true random noise using passive
RRAM crossbar array for input to the generator network and
perform the VMM during the forward pass and backward
propagation, and in-situ adversarial training of GAN using a
hardware friendly Manhattan’s rule (fixed pulse training) on
passive RRAM crossbar arrays. While most hardware solutions Fig. 1. The schematic representation of a fully-connected GAN. The
generator produces a fake image while the discriminator gives the output
aimed towards efficient implementation of GANs focus on probability for both real and fake image inputs (represented by the black
discriminative networks, we also investigated the impact of boxes). These are fed back during backward propagation (represented
different (pseudo random and true random) noise input to by the dashed blue lines).
3

Fig. 2. 3D view of the passive RRAM crossbar array based on


Pt/Al2 O3 /TiO2−x /Ti/Pt stack used in this work. Fig. 3. Proposed scheme for generation of true-random noise for the
generator network.

assigns appropriate labels to these images (whether fake or


real). The cost function used for training can be formulated as
II. M ODELING A PPROACH AND S IMULATION
[2]:
F RAMEWORK
Cost = Ex∼Pdata log(D(x)) + Ez∼Pz log(1 − D(G(z))) (1) Considering the scaling prospects and ultra-high energy-
efficiency of the passive RRAM crossbar arrays while per-
where G(z; Wg ) represents the mapping of an input vector forming in-situ computations, we implemented a vanilla GAN
consisting of a randomly distributed data from the noise to synthesize handwritten digits from the MNIST data set.
variable Pz to the generator output based on the parameters The MNIST data set is a collection of 28×28 pixel images of
Wg and D(x; Wd ) represents a mapping from the data space handwritten digits which were flattened into 784-dimensional
to a scalar quantity which indicates the probability that the vectors and then normalized to the range [-1,1].
input image to the discriminator belongs to the real data set (or The generator network in a GAN model creates a mapping
the artificial data provided by the generator). The cost function between an input latent space and an output sample space.
represented by equation (1) is a ”minimax” game in which the The inputs are provided from a random distribution (noise)
generator tries to minimise the cost (such that D(G(z)) → 1 to ensure that the generator takes a different input at each
and Cost → −∞) while the discriminator tries to maximise iteration and generates a different instance of output data. The
the cost (D(G(z)) → 0) and the game concludes when the weights of the generator network are then trained considering
system reaches the Nash equilibrium [9]. the output probability of the discriminator corresponding to
the input fed by the generator at each iteration. In our
implementation, the random input is a 1D-array, and the
B. Passive RRAM crossbar arrays output is a flattened 2D-image (28 × 28 pixels) arranged
Filamentary resistive RAMs (RRAMs) are metal-insulator- as a 1D-array with 784 elements. While most of the prior
metal (MIM) structures in which the insulator (typically a hardware GAN implementations have focused only on the
non-stoichiometric transition metal oxide) exhibits a reversible discriminator network or the deconvolution operation of the
switching between different resistance-states with the aid of generator network, in this work, we have also explored the
electrical pulses. RRAM devices can be arranged in two impact of the input noise distribution of the generator network
configurations: (a) active 1T-1R crossbar where the RRAM on the energy consumption and the accuracy of the GAN
devices are integrated on top of the drain of a selector MOS- implementation. We have considered two types of input noise
FET which provides efficient current compliance and precise distribution: (a) pseudo-random noise input where the samples
conductance-tuning capability at the cost of an increased area are generated using the software pseudo random number
overhead, and (b) passive crossbar array in which the RRAMs generator (PRNG) from a standard normal distribution N (0, 1)
are realized at the intersection of orthogonal word lines (WLs) with a predefined seed and (b) true-random noise input where
and bit lines (BLs) as shown in Fig. 2. The passive RRAM the samples are generated exploiting the inherent spatial and
crossbar provides an inherent scaling benefit since the footprint temporal variations in the passive RRAM crossbar array.
of the RRAM device is dictated by the width of the metal We propose a novel methodology to generate the random
interconnects rather than a selector MOSFET. Moreover, the noise inputs from passive RRAM crossbar array as shown in
VMM operation can be performed in-situ on a passive RRAM Fig. 3. We randomly select n (out of N ) columns from the
crossbar array exploiting physical laws with an unprecedented passive RRAM crossbar array in each iteration and divide
energy-efficiency by encoding inputs as WL voltages and the them into two groups of n/2 columns each. We program
weights as the conductance of the RRAM devices [22]–[25]. all the RRAM cells in the two sets of n/2 columns using
identical write pulses in an attempt to realize cells with same
conductance-states (within the range of 150µS to 300µS) in
4

both groups. However, owing to the spatial (device-to-device)


variation in the switching threshold voltage of RRAMs in the
array, the RRAM cells at different location in the array exhibit
different conductance-states [16], [26]. The random input bit
is then generated by applying a read pulse to m rows (selected
randomly out of N rows in each iteration) and comparing the
integrated current from the two groups of n/2 columns. The
temporal read current fluctuation and random telegraph noise
(RTN) further add to the entropy and enhance the randomness
[27], [28].
We initialise a generator network of dimension
[100,128,784] with random weights and feed it with a
noise input of size 100 bits generated either through
the software PRNG or the proposed true random noise
generator (TRNG) utilising the passive RRAM crossbar
Fig. 4. The flowchart of the weight update process used in this work.
array. The forward propagation of inputs, represented by the The superscripts k and k+1 indicate the kth and (k+1)th iterations while
affine transformation z in equation (2), is followed by the ∆G represents the absolute conductance change.
application of an activation function a = f (z) at each layer:
z = WT · x + b (2) output which indicates the probability of the input image being
real or fake. The outputs of the discriminator and the generator
where b represents the biases and W represents the weights networks are then used as inputs for the back propagation step
which are stored in the passive RRAM crossbar array in the to determine the gradients and train the GAN.
form of conductance-states in the proposed GAN implemen- We use a hardware friendly fixed-amplitude in-situ training
tation. For any practical application, the fully-connected GAN methodology, also known as the Manhattan’s rule [29], which
implementations require a large number of weights which only requires the calculation of sign of the weight gradi-
cannot be accommodated on a single passive RRAM crossbar ent (∆W ) for tuning the weights (conductance-states of the
array. Therefore, we utilize 54 (64 × 64 [15]) RRAM crossbar RRAM cells encoding weights in the passive crossbar array).
arrays in the proposed GAN implementation. Moreover, for Depending on the sign of the gradient and the conductance-
extracting optimal performance, the conductance values of state of the RRAM cell, the update rule (shown in Fig. 4) for
RRAMs are chosen in the range of Gmin = 150µS to a positive weight can be given as:
Gmax = 300µS. Since conductance values are always positive
while the weights of a neural network can be bipolar in the 
Go + ∆G(Go , Vset , tp ), if ∆W > 0
software, we propose a novel conductance-to-weight mapping



G − ∆G(G , V
scheme f : G → W as: o o reset , tp ), if ∆W < 0
G=


 Gmin , if Go − ∆G < Gmin
Gmax , Go + ∆G > Gmax

Gij − Gmin if
Wij = ±{Wmin + · (Wmax − Wmin )} (3)
Gmax − Gmin whereas the update rule for a negative weight can be given
where Wmax and Wmin are the maximum and minimum as:
weights used during the training process. While Wmin is kept 
0 for both discriminator and generator, we have clipped the Go − ∆G(Go , Vset , tp ),
 if ∆W > 0
Wmax to 0.4 for the generator and 0.15 for the discriminator. 
G + ∆G(G , V
o o reset , tp ), if ∆W < 0
The proposed mapping scheme requires that the weights do G=
Gmin , if Go − ∆G < Gmin
not change their sign during the training process i.e. positive 

Gmax , Go + ∆G > Gmax

weights remain positive and negative weights remain negative if
throughout the training of GAN. The + sign and − sign from where, Go is the conductance-state of the RRAM cell,
equation (3) represent the mapping of positive and negative ∆G is the absolute change in conductance value, Vset =
weights from the conductances. Using this mapping scheme, 0.8V and Vreset = −0.8V are the amplitude for set and
we assign separate RRAM cells for encoding the positive and reset (fixed-amplitude) pulses, respectively, and tp = 100
negative weights on the same passive RRAM crossbar unlike ns is the pulse width. For evaluating the change in the
the differential scheme where the positive and negative weights conductance value ∆G upon the application of the fixed-
are stored in different crossbars. amplitude pulses, we use an experimentally calibrated compre-
A 784-dimensional vector which represents the fake image hensive phenomological model for the passive RRAM crossbar
is produced at the output of the generator after the forward array based on the Pt/Al2 O3 /TiO2−x /Ti/Pt stack [30]. The
propagation. The generator output along with the real image model not only captures the static characteristics including
(from flattened 784-dimensional MNIST training data) are noise, but also reproduces the experimentally observed dy-
then fed to the discriminator network with a dimension of namic set/reset/conductance-tuning behavior including device-
[784,128,1]. The discriminator network generates a scalar to-device variations and non-linearity for more than 324
5

Fig. 5. Conductance evolution while training the GAN implemented on 54 (64 × 64) RRAM crossbar arrays for generator network with (a) true
random noise input considering device-to-device variations and (b) pseudo random noise input.

RRAMs across ≈2 million data points [30]. The change in different noise-input to the generator in terms of metrics such
the conductance-state ∆G follows the dynamic equation [30]: as accuracy, energy consumption and area. Moreover, to inves-
tigate the impact of device-to-device variations on the efficacy
∆G = Dm (G0 , Vp , tp ) + Dd2d (G0 , Vp , tp ) (4)
of GAN, we analyse the performance of the proposed GAN
where Dm is the expected noise-free absolute conductance implementation both in the presence and absence of spatial
change which depends on the amplitude Vp and duration tp ) variations (by appropriately switching the mismatch/variation
of the voltage pulse as well as the conductance-state, and Dd2d flag in the compact model).
represents the device-to-device variations for different RRAMs For efficient performance benchmarking, we train (a) soft-
on the passive crossbar array. ware GAN, and the GAN implementations based on passive
RRAM crossbar array with (b) psuedo-random noise input, (c)
III. RESULTS AND DISCUSSION true-random noise input without considering device-to-device
We perform an extensive analysis of the proposed GAN variations and (d) true-random noise input considering device-
implementation on passive RRAM crossbar array utilizing the to-device variations for synthesizing digit ”3” and divide the
hardware-aware simulation framework developed in section II. training set from MNIST data set into 10 batches (of size
We compare the performance of the GAN implementation with 608). The evolution of the conductance-states of the cells in

Fig. 6. The synthetic images of digit ”3” generated by (a) software GAN implementation and the GAN implementation based on the passive RRAM
crossbar array utilizing (b) pseudo random normal noise input to the generator network and (c) true random noise input to the generator network
without considering the spatial variations (d) true random noise input to the generator network while considering the device-to-device variations.
6

Fig. 7. Synthetic images of different digits generated by the proposed GAN implementation on passive RRAM crossbar array exploiting true random
noise input to the generator network.

54 (64×64) passive RRAM crossbar arrays during the training


of GAN with psuedo-random noise input and true-random
noise input in the presence of spatial variations are shown
in Fig. 5. It can be observed that different noise inputs to
the generator network during training lead to a significantly
different conductance-state distribution of the RRAM cells
(which represents the weight matrices). This results in synthe-
sized images with different quality and features for different
GAN implementations as shown in Fig. 6. Moreover, the
representative images generated by the GAN implementation
on passive RRAM crossbar array with true random noise input
to the generator after training on MNIST dataset are also
shown in Fig. 7. As can be observed from Fig. 7, all the
classes of synthesized images are not generated with the same
accuracy.

A. Accuracy Fig. 8. Evolution of the accuracy of different GAN implementations


during the training process.
Evaluation of the quality of fake images generated by a
GAN model is a challenging task. While the method of
visual inspection can give a qualitative judgement, several transfer between the memory and processing units significantly
quantitative techniques have also been reported recently [9]. In increasing their energy consumption and latency. Moreover,
this work, we use an image classification-based methodology simultaneous training of two adversarial networks results in
to estimate the accuracy of the GAN with the aid of features a considerably large number of weight updates during the
extracted from the generated images. We trained a multi-layer training process with massive exchange of parameters be-
perceptron on the MNIST dataset and used it to extract the tween the generator and the discriminator. Therefore, training
features of the fake images and classify them in different process dominates the energy landscape of the hardware
class labels. As can be observed from Fig. 8, although the GAN implementations. However, the forward pass and the
software implementation of GAN shows the highest accuracy backward propagation in the proposed GAN implementation
after training for 10 batches (1 epoch), the proposed GAN based on a passive RRAM crossbar array is inherently non-
implementation on passive RRAM crossbar array utilizing in- von-Neumann and significantly reduces the data movement
situ training exhibits comparable accuracy when its generator between the processing and storage units increasing its energy-
network is fed with true random noise input generated from efficiency. Furthermore, the hardware-friendly in-situ training
the RRAM crossbar. Furthermore, the large device-to-device approach utilizing the Manhattan’s rule eliminates the need for
variations in the RRAM switching threshold across the array calculation of the exact values of the weight gradients in the
degrades the accuracy of the GAN implementation. However, peripheral circuitry further reducing the energy consumption.
the GAN implementation based on passive RRAM crossbar ar- However, during the in-situ training using Manhattan’s rule,
ray with true-random noise input still exhibits a better accuracy set/reset voltage pulses are applied to the RRAM cells in the
as compared to the GAN implementation with pseudo-random crossbar array after each batch (weight-update iteration) to
noise input even in the presence of hardware imperfections change their conductance-state according to the sign of weight
such as device-to-device variations, noise and non-linearity. gradients which lead to an energy consumption given by:
Moreover, the accuracy of all the GAN implementations 
2
converges towards their optimal value after training for large Vset · Gi−1 · tp ,
 if Gi > Gi−1
number of batches. Ei = Vreset2
· Gi−1 · tp , if Gi < Gi−1

0, otherwise

B. Energy consumption where Vset and Vreset are the amplitudes of set/reset
The conventional von-Neumann GAN implementations such voltage pulses and tp is time period of the pulse applied to
as those based on CPUs and GPUs involve frequent data change the conductance-state from Gi−1 to Gi . Owing to
7

the large number of weight updates, the energy consumed


during the conductance-update process dominates the energy
landscape of the proposed GAN implementation on passive
RRAM crossbar array.
Utilizing the above equation, the energy consumed during
the training process for each batch (weight-update iteration)
was calculated for GAN implementations with (a) pseudo-
random noise inputs to the generator and true-random noise
input to the generator (b) considering spatial variations in
the RRAM cells and (c) without considering device-to-device
variations as shown in Fig. 9. While the batch-training energy
increases with the number of batches for GAN implementation
with a pseudo-random noise input to the generator, training the
GAN with a true-random noise input to the generator leads
to a reduction in the batch-training energy as the training
progresses. This can also be inferred from the conductance
evolution trends for the two cases as shown in Fig. 5. While
the conductance-states change significantly while training the Fig. 10. Cumulative energy dissipated in the passice RRAM crossbar
array while training the different GAN implementations.
GAN with pseudo-random noise inputs from epoch 10 to
epoch 20, the conductance update is rather gradual while
training with true-random noise inputs. is crucial for area-efficient hardware GAN implementations.
The cumulative energy consumption during the training pro- Since a single RRAM cell in a passive RRAM crossbar
cess until the GAN implementations converge to their optimal array occupies an area of 0.36 µm2 (0.6 µm × 0.6 µm), a
accuracy are also shown in Fig. 10. The energy consumption 64×64 crossbar has a footprint of 1474.56 µm2 . The total
is lower when the GAN implementation is trained with true- area occupied by the proposed GAN implementation with
random noise inputs (48.34 µJ) as compared to the GAN 54 such crossbars is 0.079 mm2 which is extremely low as
trained with pseudo-random noise input (52.06 µJ). Moreover, compared to the area occupied by the active 1T-1R array of
device-to-device variations do not lead to a significant change similar size [10]. Moreover, the area-efficiency of the proposed
in the energy consumption of the proposed GAN implemen- implementation can be further enhanced by utilizing 3D-
tation on passive RRAM crossbar array. integration of several layers of RRAM cells [31].

IV. C ONCLUSIONS
In this work, we have proposed a highly scalable, compact
and energy-efficient GAN accelerator which performs the key
operations such as forward pass and backward propagation,
training and noise generation in-situ on a passive RRAM
crossbar array. Unlike the prior GAN implementations, we
have also evaluated the impact of the noise input used for
training the generator network on the performance of GAN.
Our extensive investigation utilizing an experimentally cali-
brated phenomenological model for passive RRAM crossbar
array reveals that training GAN with a true-random noise
input leads to a significant reduction in the training energy
without degrading the accuracy. Our results may encourage
experimental demonstration of GAN accelerators on passive
RRAM crossbar arrays.

Fig. 9. Energy dissipation in the passive RRAM crossbar array in


R EFERENCES
different batches while training the different GAN implementations. [1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, ”Generative adversarial nets,”
Advances in neural information processing systems, pp. 27, 2014.
doi:10.5555/2969033.2969125.
C. Area [2] A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta,
and A. A. Bharath, ”Generative adversarial networks: An overview.
For practical applications such as image super-resolution, IEEE Signal Processing Magazine, vol. 35, no. 1, pp.53-65, 2018.
etc., GANs need to generate high resolution images which doi:10.1109/MSP.2017.2765202.
requires large weight matrices between the different layers of [3] N. Shrivastava, M. A. Hanif, S. Mittal, S. R. Sarangi, and M. Shafique,
”A survey of hardware architectures for generative adversarial net-
the adversarial networks. Therefore, highly scalable and com- works. Journal of Systems Architecture, vol. 118, p.102227, 2021.
pact realization of synaptic elements (representing weights) doi:10.1016/j.sysarc.2021.102227.
8

[4] F. Chen, L. Song and Y. Chen, ”ReGAN: A pipelined ReRAM-based [24] S. Sahay, M. Bavandpour, M. R. Mahmoodi and D. Strukov, “A 2T-1R
accelerator for generative adversarial networks,” 2018 23rd Asia and cell array with high dynamic range for mismatch-robust and efficient
South Pacific Design Automation Conference (ASP-DAC), 2018, pp. 178- neurocomputing,” in proc. IEEE Int. Memory Workshop (IMW), pp. 1-4,
183, doi: 10.1109/ASPDAC.2018.8297302. 2020. doi:10.1109/IMW48823.2020.9108142.
[5] Z. Fan, Z. Li, B. Li, Y. Chen and H. Li, ”RED: A ReRAM-based [25] M. Bavandpour, S. Sahay, M. R. Mahmoodi and D. Strukov, ”Ef-
Deconvolution Accelerator,” 2019 Design, Automation and Test in ficient mixed-signal neurocomputing via successive integration and
Europe Conference Exhibition (DATE), 2019, pp. 1763-1768, doi: division,” IEEE Trans. VLSI systems, vol. 28, no. 3, pp. 823-827, 2020.
10.23919/DATE.2019.8715103. doi:10.1109/TVLSI.2019.2946516.
[6] F. Chen, L. Song, H. Li and Y. Chen, ”ZARA: A Novel Zero- [26] H. Nili, G. C. Adam, B. Hoskins, M. Prezioso, J. Kim, M. R. Mahmoodi,
free Dataflow Accelerator for Generative Adversarial Networks in F. M. Bayat, O. Kavehei, and D. B. Strukov, ”Hardware-intrinsic
3D ReRAM,” 2019 56th ACM/IEEE Design Automation Conference security primitives enabled by analogue state and nonlinear conductance
(DAC), 2019, pp. 1-6. variations in integrated memristors,” Nature Electronics, vol. 1, no. 3,
[7] A. S. Rakin, S. Angizi, Z. He and D. Fan, ”PIM-TGAN: A Processing- pp.197-202, 2018. doi: 10.1038/s41928-018-0039-7.
in-Memory Accelerator for Ternary Generative Adversarial Networks,” [27] S. Sahay, A. Kumar, V. Parmar, and M. Suri, ”OxRAM RNG circuits
2018 IEEE 36th International Conference on Computer Design (ICCD), exploiting multiple undesirable nanoscale phenomena,” IEEE Trans-
2018, pp. 266-273, doi: 10.1109/ICCD.2018.00048. actions on Nanotechnology, vol. 16, no. 4, pp.560-566, 2017. doi:
[8] O. Krestinskaya, B. Choubey, and A. P. James, ”Memristive GAN 10.1109/TNANO.2016.2647623.
in analog.” Scientific reports, vol. 10, no. 1, pp.1-14, 2020. [28] S. Sahay, and M. Suri, ”Recent trends in hardware security exploit-
doi:10.1038/s41598-020-62676-7. ing hybrid CMOS-resistive memory circuits,” Semiconductor Science
[9] Y. Lin et al., ”Demonstration of Generative Adversarial Network by and Technology, vol. 32, no. 12, p.123001, 2017. doi: 10.1088/1361-
Intrinsic Random Noises of Analog RRAM Devices,” 2018 IEEE 6641/aa8f07.
International Electron Devices Meeting (IEDM), 2018, pp. 3.4.1-3.4.4, [29] I. Kataeva, F. Merrikh-Bayat, E. Zamanidoost and D. Strukov, “Efficient
doi: 10.1109/IEDM.2018.8614483. training algorithms for neural networks based on memristive crossbar
[10] H. Wu et al., ”Device and circuit optimization of RRAM for neuromor- circuits,” in proc. IEEE Int. Joint Conf. Neural Networks (IJCNN), pp.
phic computing,” 2017 IEEE International Electron Devices Meeting 1-8, 2015. doi: 10.1109/IJCNN.2015.7280785.
(IEDM), 2017, pp. 11.5.1-11.5.4, doi: 10.1109/IEDM.2017.8268372. [30] H. Nili, A. F. Vincent, M. Prezesio, M. R. Mahmoodi, I. Kataeva and
[11] P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J.J. Yang and D. B. Strukov, “Comprehensive compact phenomenological modeling of
H. Qian, “Fully hardware-implemented memristor convolutional neural integrated metal-oxide memristors,” IEEE Trans. Nanotechnology, vol.
network,” Nature, vol. 577, pp.641-646, 2020. doi:10.1038/s41586-020- 19, pp. 344-349, 2020. doi: 10.1109/TNANO.2020.2982128.
1942-4. [31] G. C. Adam, B. D. Hoskins, M. Prezioso, F. Merrikh-Bayat, B.
[12] M. Hu, C.E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, Chakrabarti and D. B. Strukov, “3-D memristor crossbars for analog and
H. Jiang, R.S. Williams, J.J. Yang and Q. Xia, “Memristor-based analog neuromorphic computing applications,” IEEE Trans. Electron Devices,
computation and neural network classification with a dot product en- vol. 64, no. 1, pp.312-318, 2016. doi:10.1109/TED.2016.2630925.
gine,” Advanced Materials, vol. 30, 2018. doi:10.1002/adma.201705914.
[13] S. Ambrogio, S. Balatti, V. Milo, R. Carboni, Z.Q. Wang, A. Calderoni,
N. Ramaswamy and D. Ielmini, “Neuromorphic learning and recogni-
tion with one-transistor-one-resistor synapses and bistable metal oxide
RRAM,” IEEE Transactions on Electron Devices, vol. 63, no. 4,
pp.1508-1515, 2016. doi: 10.1109/TED.2016.2526647.
[14] F. Cai, J.M. Correll, S.H. Lee, Y. Lim, V. Bothra, Z. Zhang, M.P. Flynn
and W.D. Lu, “A fully integrated reprogrammable memristor–CMOS
system for efficient multiply–accumulate operations,” Nature Electronics
vol. 2, pp.290-299, 2019. doi:10.1038/s41928-019-0270-x.
[15] H. Kim, H. Nili, M. Mahmoodi and D. Strukov, “4K-memristor analog-
grade passive crossbar circuit,”Nat. Comm., vol. 12, no. 1, pp.1-11, 2021.
doi:10.1038/s41467-021-25455-0.
[16] M. Prezioso, et al. Training and operation of an integrated neuromorphic
network based on metal-oxide memristors. Nature 521, 61–64 (2015).
https://doi.org/10.1038/nature14441.
[17] F. Alibart, E. Zamanidoost, and D. Strukov, “Pattern classification by
memristive crossbar circuits using ex situ and in situ training,” Nat.
comm., vol. 4, no. 1, pp.1-7, 2013. doi:10.1038/ncomms3072.
[18] F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva and D.
Strukov, “Implementation of multilayer perceptron network with highly
uniform passive memristive crossbar circuits,” Nat. comm., vol. 9, no.
1, pp.1-7, 2018. doi:10.1038/s41467-018-04482-4.
[19] P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang and W. D. Lu,
“Sparse coding with memristor networks”, Nature nanotechnology,
12(8), pp.784-789, 2017. doi:10.1038/nnano.2017.83.
[20] H. Yeon, P. Lin, C. Choi, S. H. Tan, Y. Park, D. Lee, J. Lee, F. Xu, B.
Gao, H. Wu, H. Qian, Y. Nie, S. Kim and J. Kim, “Alloying conducting
channels for reliable neuromorphic computing”, Nat. Nanotechnol 15,
574–579 (2020). doi:10.1038/s41565-020-0694-5.
[21] H. Nikam, S. Satyam and S. Sahay, ”Long Short-Term Memory
Implementation Exploiting Passive RRAM Crossbar Array,” in IEEE
Transactions on Electron Devices, doi: 10.1109/TED.2021.3133197.
[22] M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S.
Lam, N. Ge, J. J. Yang and R. S. Williams, “Dot-product engine for
Neuromorphic computing: Programming 1T1M crossbar to accelerate
matrix-vector multiplication,” in Proc. 53rd ACM/IEEE Design Automat.
Conf. (DAC), pp. 1–6, 2016. doi:10.1145/2897937.2898010.
[23] M. J. Marinella et al., ”Multiscale Co-Design Analysis of Energy,
Latency, Area, and Accuracy of a ReRAM Analog Neural Training
Accelerator,” in IEEE Journal on Emerging and Selected Topics in
Circuits and Systems, vol. 8, no. 1, pp. 86-101, 2018. doi: 10.1109/JET-
CAS.2018.2796379.

You might also like