0% found this document useful (0 votes)
78 views7 pages

Quantum-Classical Hybrid Machine Learning For Image Classification

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views7 pages

Quantum-Classical Hybrid Machine Learning For Image Classification

Uploaded by

jaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Quantum-Classical Hybrid Machine Learning for Image Classification

(ICCAD Special Session Paper)


Mahabubul Alam1 , Satwik Kundu1 , Rasit Onur Topaloglu2 , Swaroop Ghosh1
1
School of Electrical Engineering and Computer Science, Penn State University, University Park
2
IBM Corporation
[email protected], [email protected], [email protected], [email protected]

Abstract—Image classification is a major application domain medical diagnostics [9], [10], biometric security [11], [12],
for conventional deep learning (DL). Quantum machine learning to name a few. An image classification ML pipeline gen-
(QML) has the potential to revolutionize image classification. In erally consists of two stages: (i) feature extraction, and (ii)
any typical DL-based image classification, we use convolutional
neural network (CNN) to extract features from the image and classification based on the extracted features. Before the rise
multi-layer perceptron network (MLP) to create the actual of convolutional neural networks (CNN), various statistical
arXiv:2109.02862v3 [cs.CV] 17 Sep 2021

decision boundaries. QML models can be useful in both of these techniques dominated feature extraction from images (e.g.,
tasks. On one hand, convolution with parameterized quantum SIFT, SURF, FAST, etc.) commonly referred to as feature
circuits (Quanvolution) can extract rich features from the images. engineering [13]. Later, these extracted features are used as
On the other hand, quantum neural network (QNN) models
can create complex decision boundaries. Therefore, Quanvolution inputs to a classifier (e.g., KNN, SVM, Decision Tree, Naive
and QNN can be used to create an end-to-end QML model for Bayes, MLP, etc.) [14]. CNN can extract the features and learn
image classification. Alternatively, we can extract image features the classification decision boundaries simultaneously, and thus,
separately using classical dimension reduction techniques such as, eliminates the tedious step of feature engineering. As a result,
Principal Components Analysis (PCA) or Convolutional Autoen- CNN has become the ML algorithm of choice for image
coder (CAE) and use the extracted features to train a QNN. We
review two proposals on quantum-classical hybrid ML models for classification in recent years. It has also achieved human-level
image classification namely, Quanvolutional Neural Network and accuracy in many image recognition tasks [9], [10], [15].
dimension reduction using a classical algorithm followed by QNN. Several QML models have been proposed for image classifi-
Particularly, we make a case for trainable filters in Quanvolution cation to exploit quantum computers in practical use cases [6],
and CAE-based feature extraction for image datasets (instead [16]–[22]. In [6], the authors proposed Quanvolutional Neural
of dimension reduction using linear transformations such as,
PCA). We discuss various design choices, potential opportunities, Networks where parametric quantum circuits are used as
and drawbacks of these models. We also release a Python-based filters/kernels to extract features from images. These quantum
framework to create and explore these hybrid models with a filters take image segments as inputs and produce output fea-
variety of design choices. ture maps by transforming the data in the quantum space. The
output features are used as inputs to an MLP network. In [17],
I. I NTRODUCTION
the authors extended the classical transfer learning approach
Quantum computing is a new computing paradigm with to the quantum domain. Here, the trained convolutional layers
tremendous future potential. Even though the technology is in a classical deep neural network are used to extract image
still in a nascent stage, the community is seeking com- features. Later, a Quantum Neural Network (QNN) is trained
putational advantage from quantum computers (i.e., quan- separately to learn the classification decision boundaries from
tum supremacy) for practical applications. Recently, Google these features. Several works have used classical dimension
claimed quantum supremacy can be achieved even with a 53- reduction techniques (e.g., Principal Component Analysis or
qubit quantum processor by completing a specific computation PCA) to extract image features and later, used them as inputs
in 200 seconds that might take 10K years [1] (later rectified to a QNN [23], [24]. In [16], the authors propose Quantum
to 2.5 days [2]) on a state-of-the-art supercomputer. Convolutional Neural Network (QCNN) which is motivated
The near-term quantum devices have limited number of by CNN. Here, convolutions are multi-qubit operations per-
qubits. Moreover, they suffer from various types of noises formed on neighboring pairs of qubits. These convolutions
(decoherence, gate errors, measurement errors, crosstalk, etc.). are followed by pooling layers, which are implemented by
Due to these limitations, these machines are not yet perfectly measuring a subset of the qubits, and using the results to
suitable to execute quantum algorithms that rely on high orders control subsequent operations. The network ends with pairwise
of error correction (e.g., Shor’s factorization, Grover’s search). operations on the remaining qubits before measurement.
Quantum machine learning (QML) promises to achieve quan- In this paper, we review two promising hybrid architectures
tum advantage with near-term machines because it is based on for image classification, (i) Quanvolutional Neural Network
a variational principle (similar to other near-term algorithms (Quanvolution + MLP), and (ii) classical dimension reduction
such as, Quantum Approximate Optimization Algorithm or + QNN. We discuss their design choices, characteristics,
QAOA [3], Variational Quantum Eigensolver or VQE [4] and enhancements, and potential drawbacks. Particularly, we advo-
so on) that does not necessitate error correction [5]. cate for trainable quantum filters in Quanvolution, and classical
Image classification is one of the most useful ML task Convolutional Autoencoder (CAE) for image feature extrac-
having wide applications in autonomous driving [7], [8], tion in quantum-classical hybrid image classification models.
Repeat

Quantum Data

R(w1, w2,…)
Input 2D

Encoding

Measure
Image Output
features
Input 2D Filters Output |0 ( , , )
Image features Parametric
|0 ( , , ) Quantum Circuit
Hidden Output Parameterized Quantum
(PQC) Input
Quantum Layer Layer Circuit (PQC)
|0 ( , , )
Filter
(a) Feature Extraction through
Classical Convolution (b) Feature Extraction through Quanvolution (c) Multi-layer Perceptron (MLP) (d) Quantum Neural Network (QNN)

Fig. 1. (a) shows a classical convolution operation. (b) shows a toy Quanvolution operation proposed in [6]. In Quanvolution, a quantum circuit (also referred
to as quantum filter) encodes an image segment as a quantum state, and produces output features corresponding to that segment through state transformation
using a parameterized circuit and subsequent measurement operations. (c) shows the network diagram of a toy Multi-layer Perceptron (MLP) network. A
conventional Quantum Neural Network (QNN) is shown in (d). It consists of a data encoding circuit, a parameterized circuit, and measurement operations.

A QNN/quantum filter has a myriad of design choices in terms tionship. QNN generally consists of three segments: (i) a
of encoding methods, parametric circuits, and measurement classical to quantum data encoding (or embedding) circuit,
operations. However, in this work, we only use two config- (ii) a parameterized circuit, and (iii) measurement operations.
urations for demonstration. The accompanying Python-based A variety of encoding methods are available in the literature
framework supports a wide variety of QNN/quantum filter [26]. For continuous variables, the most widely used encoding
design choices (6 encoding circuits, 19 parametric circuits, and scheme is angle encoding where a continuous variable input
6 measurement circuits). Interested readers can utilize/extend classical feature is encoded as a rotation of a qubit along
this framework to explore the design space. the desired axis (X/Y/Z) [26]–[29]. For ‘n’ classical features,
In the remaining paper, we cover basics on quantum com- we require ‘n’ qubits. For example, RZ(f1) on a qubit in
puting and QNN in Section II, discuss the hybrid architectures superposition (Hadamard - H gate is used to put the qubit in
in Section III, present relevant results in Section IV, and draw superposition) is used to encode a classical feature ‘f1’ in Fig.
the conclusions in Section V. 2(b). We can also encode multiple continuous variables in a
II. P RELIMINARIES single qubit using sequential rotations. For example, ‘f1’, ‘f2’,
‘f3’, and ‘f4’ are encoded using consecutive RZ(f1), RX(f2),
Qubits, Quantum Gates, State Vector, & Measurements:
RZ(f2), and RX(f4) rotations on a single qubit in Fig. 2(c).
Qubit is analogous to classical bits. However, unlike a classical
As the states produced by a qubit rotation along any axis will
bit, a qubit can be in a superposition state i.e., a combination
repeat in 2π intervals (Fig. 2(a)), features are generally scaled
of |0i and |1i at the same time. A variety of qubit technolo-
within 0 to 2π (or -π to π) in a data pre-processing step.
gies exists, e.g., superconducting qubits, trapped-ions, neutral
atoms, silicon spin qubits, to name a few [25]. Quantum gates The parametric circuit has two components: entangling op-
such as, single qubit (e.g., Pauli-X (σx ) gate) or multiple qubit erations and parameterized single-qubit rotations. The entan-
(e.g., 2-qubit CNOT gate) gates modulate the state of qubits glement operations are a set of multi-qubit operations between
to perform computations. These gates can either perform a all the qubits to generate correlated states [29]. The following
fixed or tunable computation e.g., an X gate flips a qubit state parametric single-qubit operations are used to search through
while the RY(θ) gate rotates the qubit along the Y-axis by the solution space. This combination of entangling and single-
θ. A two-qubit gate changes the state of one qubit (target qubit rotation operations is referred to as a parametric layer in
qubit) based on the current state of the other qubit (control QNN. A widely used parametric layer architecture is shown in
qubit). A quantum circuit can contain many gate operations. Fig. 2(d) [27], [30]. Here, CRZ(θ) gates between neighboring
Qubits are measured in a desired basis to retrieve the final qubits create the entanglement, which is followed by rotations
state of a quantum program. In physical quantum computers, along Y-axis using RY(θ). Normally, these layers are repeated
measurements are generally restricted to a computational basis, multiple times to extend the search space [27], [28].
e.g., Z-basis in IBM quantum computers. QNN Cost Functions: Qubits in a QNN circuit are measured
Expectation Value of an Operator: Expectation value is the in the computational basis to retrieve the output state. A cost
average of the eigenvalues, weighted by the probabilities that function is derived from the measurements to train the network
the state is measured to be in the corresponding eigenstate. [27], [28], [31]. For example, in a binary classification prob-
Mathematically, expectation value of an operator (σ) is defined lem, the authors measured all the qubits in the QNN model
as hψ|σ|ψi where |ψi is the qubit state vector. It varies be- in Pauli-Z basis and associated class 0 with the probability
tween the minimum and maximum eigenvalues of the operator. of obtaining even parity, and class 1 with odd parity [27].
For example, the Pauli-Z (σz ) operator has two eigenvalues: Then, the model is trained using binary cross-entropy loss. In
+1 and -1. Therefore, the Pauli-Z expectation value of a qubit [32], the authors used the Pauli-Z expectation value of a single
will vary in the range of [-1, 1] depending on the qubit state. qubit (-1 associated with class 1 and +1 associated with class
Quantum Neural Network: QNN involves parameter op- 0) for a binary classifier and trained it using mean squared
timization of a PQC to obtain a desired input-output rela- error (MSE) loss.In [23], the authors fed the outputs of the
H RZ( ) RZ( ) RY( )
0 H RZ( ) RX( ) RZ( ) RX( )

H RZ( ) RZ( ) RY( )


H RZ( ) RX( ) RZ( ) RX( )

H RZ( ) RZ( ) RY( )


H RZ( ) RX( ) RZ( ) RX( )

1 H RZ( ) H RZ( ) RX( ) RZ( ) RX( ) RY( )

(a) (b) (c) (d)

Fig. 2. (a) Bloch sphere representation of a qubit. At any given step, a qubit can be rotated along the X, Y, or Z axis by applying a gate. The states will
repeat in 2π intervals. (b) Angle encoding 1:1 (1 continuous variable encoded in a single qubit state using RZ rotation, n qubits are required to encode n
continuous variables as an n-qubit state). (c) Angle encoding 4:1 (4 continuous variables encoded in a single qubit state using alternating RZ and RX rotations,
4 qubits encode 16 continuous variables as a 4-qubit state). (d) Parametric layer used in this work. Parametric CRZ gates entangle the qubits; this is followed
by single qubit RY rotations. Each n-qubit parametric layer has 2n circuit parameters.

QNN to a classical neural network and trained it using the


Quanvolution
binary cross-entropy loss function.
Flatten
Training QNN: QNN’s can be trained using any gradient- Quanvolutional Neural Network
used in this work. One can stack
Linear
based optimization algorithm such as, Adam [33] or Adagrad (a) multiple Quanvolutional layers
or (b) Convolutional layers with
[34]. To apply these methods, we need to compute the gradi- ReLU
Quanvolutional layers to build
more complex networks
ents [35], [36] of the QNN outputs with respect to the circuit Linear

parameters. The parameter-shift rule is a known method to SoftMax


compute the gradients [35], [36]. Conceptually, parameter-shift
rule is very similar to the age-old finite difference method Fig. 3. A quantum-classical hybrid neural network based on the Quanvolu-
which uses two evaluations of a target function at close tional Neural Network architecture [6] with a single Quanvolutional layer.
proximity to compute the gradients with respect to a parameter. rotations (Rot is an arbitrary quantum gate that takes three
Unlike finite difference, the two data points can be far from rotation parameters). Similar to CNN, these quantum filters are
each other in parameter-shift rule. As a result, it shows greater moved across the 2D plane in finite steps (strides) to generate
resilience to shot noise and measurement errors compared to the complete output feature map of the image.
finite difference [36]. Alternatively, one can also use gradient The qubit size of a filter depends on the chosen encoding
free optimizer such as Nelder-Mead to train a QNN [37]. method and the kernel size. For example, if we use the
However, a gradient-free optimizer may perform poorly when 1 variable/qubit encoding method (Fig. 2(b)), the resulting
the network has lots of parameters. quantum filter will be a 16-qubit circuit for kernel size of
4x4. It will reduce to a 4-qubit circuit if we choose the 4
III. H YBRID A RCHITECTURES FOR I MAGE
variables/qubit encoding method (Fig. 2(c)). Concepts like
C LASSIFICATION
data re-uploading [38] can be used to encode an arbitrary
A. Quanvolution + MLP number of variables in an arbitrary number of qubits. Choice
Quanvolution is simply an extension of classical convolu- of encoding methods will most likely be dictated by the
tion where small local regions of an image are sequentially availability of quantum resources [21]. In this work, we use
processed with the same kernel/filter. A kernel is a small 2D the 4 variables/qubit encoding method for 4x4 kernels. We use
matrix. The dot product between the kernel and the image the circuit architecture shown in Fig. 2(d) (Circuit 13 of [30])
segment is used to generate an output feature. For 3D RGB as our preferred PQC in the quantum filters. We also use the
images, separate kernels are applied across the channels (2D Pauli-Z expectation values of the qubits as output features (an
planes), and they are collectively referred to as filters. For n-qubit filter generates n-features/image segment). Increasing
2D images, filters and kernels are synonymous. The results the number of filters increases the number of extracted features
obtained for each region are usually associated to different for the downstream classifier. Similar to a CNN with high
channels of a single output pixel. The union of all the output number of stages, a large number of quantum filters improve
pixels produces a new image-like object, which can be further performance (lower cost/higher accuracy/faster training) [6].
processed by additional layers. A toy convolutional layer Filter Trainability: In the original work in [6], the quantum
operation is shown in Fig. 1(a). In Quanvolution, quantum filters did not have any trainable parameters. However, in
circuits mimic the behavior of classical CNN filters. CNN, filters have trainable weights, and they are learned
Quantum Filters: The quantum filters encode image segments during the training. Similarly, quantum filters can too have
as input state of a quantum circuit. The state is transformed trainable parameters. For example, we can either initialize the
using a parameterized quantum circuit and subsequent mea- PQC parameters (θ1 − θ2n in Fig. 2(d)) randomly and keep it
surement operations produce output features corresponding to constant throughout the training, or we can update them during
that segment. Fig. 1(b) shows a toy Quanvolution layer. Here, a the training alongside other parameters in the network.
3-qubit quantum circuit is used as a filter. It encodes 3x3 image Trainable filters will result in many-fold increase in quan-
segments as a 3-qubit quantum state using 3 Rot(α, β, γ) tum circuit execution during the training. For example, if a
quantum filter has p trainable parameters, it will add 2xp more
quantum circuit executions for each image segment to compute Linear(d, 128)
Conv2d(1, 8)
ReLU
the required gradients using parameter-shift rule [36]. Note ReLU
Linear(128, 288)
that, feature generation using non-trainable quantum filters is Conv2d(8, 16) In the Encoder/Decoder:
ReLU
equivalent to random transformation of the image segments. BatchNorm2d(16) d Latent-space
ConvTranspose2d (32, 16) dimension
A classical random function can replace such filters. In fact, ReLU
BatchNorm2d(16)
Conv2d(16, 32) In the Conv2d/
the work in [6] showed that the performance of classical ReLU ConvTranspose2d layers:
Linear(288, 128) kernel size 3x3
random transformations of the image segments matches the ConvTranspose2d (16, 8) stride 2
ReLU
performance of random transformations with non-trainable BatchNorm2d(8)
Linear(128, d)
quantum filters. However, the trainable quantum circuits are ReLU
LayerNorm(d)
hard to simulate classically [39]. If they exhibit significant Encoder
ConvTranspose2d (8, 1)

performance benefits over their non-trainable counterparts, it Decoder

will be worthwhile for the research community to explore them Fig. 4. Convolutional Autoencoder (CAE) architecture used in this work to
for possible quantum advantage [40]. extract image features (for both the MNIST and Fashion-MNIST datasets).
Network Design: Similar to CNN, a Quanvolutional layer can
have many filters and multiple Quanvolutional layers can be Autoencoders (CAE) are much more powerful tools for image
stacked upon each other to develop a deep Quanvolutional feature extraction/dimension reduction [42], [43]. PCA is a
Neural Network [6], [21]. The outputs from the final Quanvo- linear transformation of the data whereas AE/CAE can model
lutional layer can be fed to a an MLP (or a QNN). One can also more complex non-linear relationships in the data using non-
apply classical non-linear activation functions (for additional linear activation functions and regularization [43].
non-linearity) and maxpooling (downsampling) at the output AE/CAE: AE’s are a specific type of feedforward neural
of a Quanvolutional layer. One can create separate filters by networks. They compress the input into a lower-dimensional
initializing the same PQC with different random seeds. Al- code using an encoder network and then reconstruct the output
ternatively, we can use different PQC architectures, encoding from this representation through a decoder network. The code
methods, and measurement operations altogether to create is a compact representation of the input, also called the latent-
different filters. One can also stack classical convolutional space representation. The distance between the input and
layers with Quanvolutional layers. Fig. 3 shows the network reconstructed output (e.g., MSE loss) is used as the feedback
diagram used in this work with a single Quanvolutional layer signal to train the network. Both the encoder and decoder in
followed by two fully connected classical layers. a simple AE consist of several fully connected layers. CAE
Number of Circuit Executions: The number of quantum cir- provides a better architecture than AE to extract the textural
cuit executions per sample during training/inference depends features of images. In CAE, the encoder block starts with one
on the kernel size, image size, and the stride (the amount of or more successive convolutional layers. The decoder block
movement of the kernel in terms of pixels). For a 28x28 image ends with convolutional transpose/deconvolutional layers. In
and 4x4 kernel size, we need to execute a total of 7x7 quantum the middle there is a fully connected AE whose innermost
circuits when stride = 4 (a single non-trainable quantum filter). layer is composed of a small number of neurons. Once trained,
If this filter has 10 trainable parameters, the total number of the encoder block can be used as a standalone entity to extract
circuit execution becomes 7x7 + 2x10x7x7 where the later lower dimensional representation of the input data.
2x10x7x7 circuit executions are necessary to compute the Fig. 4 shows the CAE network architecture used in this
gradients using parameter-shift rule (2 extra circuit executions work (for both MNIST and Fashion-MNIST datasets). The
for each parameters [36]). If we take 50 samples per batch, final ConvTranspose2d layer uses Sigmoid activation where
we will need 50x(7x7 + 2x10x7x7) filter execution for each ‘d’ is the dimension of the latent-space.
batches during training. This is in fact a prohibitively large Network Design: The hybrid network (Fig. 5) consists of two
number for a single batch and a filter. However, all these separate networks - a CAE and a QNN. The CAE is trained
circuits are independent of each other. Hence, one can argue with the original image dataset to learn a lower dimensional
that all these computations can be done simultaneously if one representation of the data. The trained encoder network is
has access to multiple quantum computing resources. used to extract image features. A conventional QNN is trained
with these extracted features and image labels to perform final
B. Classical Dimension Reduction + QNN classification. When both of these networks are trained, the
Another popular hybrid QML model targeted for smaller encoder block and the QNN block is used together to classify
quantum devices uses classical algorithm (e.g., Principal Com- data samples. We refer to this architecture as CAE+QNN.
ponent Analysis or PCA, Linear Discriminant Analysis or QNN Design-Space: As mentioned earlier, numerous choices
LDA, etc.) to reduce data dimension to a level that is tractable exist for the encoding circuits, PQC, and measurement circuits
for a small QNN model [23], [24], [41]. Although PCA/LDA to build a QNN model. The accompanying Python framework
work quite well to extract most salient features from small to this work supports a wide variety of these choices which
tabular data, they are not suitable to extract features from will impact the learnability of the QNN [30]. However, in
large images. Autoencoders (AE), particularly Convolutional this work we only use the single feature/qubit encoding
Convolutional Autoencoder Fashion-MNIST datasets. Later, we have created 6 smaller
Image Decoder
classification datasets as before using the trained models with
Encoder
Dataset
various latent dimensions (5/10) and principal component (10).
Metrics: We have divided the datasets into two equal sets for
Training:
• CAE is trained on image dataset training and validation (600 samples/set). We use the average

Quantum Data
• Trained Encoder extracts features

Encoding
loss and accuracy over the entire training and validation

Measure
• QNN is trained separately with

PQC
the extracted dataset datasets to measure the performance of the QML models [46].
Inference:
• Data Encoder QNN Training Setup: We use the gradient-based Adagrad, SGD,
Class Assignment
Quantum Neural Network
and Adam optimizers to train these models [47], [48]. We use
the same set of hyper-parameters across all the runs (learning
Fig. 5. The CAE + QNN network architecture. The trained CAE Encoder rate = 0.5 for all the quantum/hybrid models).
block creates a lower-dimensional representation of the image for the QNN.
Trainable Vs. Non-trainable Filters in Quanvolution: We
method (Fig. 2(b)), the PQC layer of Fig. 2(d), and Z-basis trained quanvolutional neural networks with a single quanvo-
measurements of the qubits in the QNN. We also restrict the lutional layer (Fig. 3) for six 3-class classification problems
number of parametric layers to 3. Following [23], we feed the with a trainable and a non-trainable quantum filter (stride =
QNN outputs to a fully-connected layer. The number of output 4). We used 4-qubit circuits as Quanvolutional filters. We used
neurons is equal to the number of classes in the dataset. the 4 variables/qubit encoding method shown in Fig. 2(c) to
CAE+QNN Vs. Transfer Learning: Although, the encode 4x4 pixels into a 4-qubit state. The input pixels were
CAE+QNN network has some similarities with transfer scaled to 0-2π (originally 0-1). The PQC architecture in Fig.
learning [17], there are some noteworthy differences as 2(d) was used with 3 parametric layers (3x2x4 parameters). In
well. Both of these approaches extract image features using Quanvolution with trainable filters, these PQC parameters were
a classical network. In transfer learning, the convolutional trained alongside other network parameters using gradient
layers of a classical CNN network, trained with a different descent. In the non-trainable filter, we had set the PQC
dataset (e.g., AlexNet trained to classify the ImageNet parameters randomly (-π to π) at the beginning, and kept them
dataset), is used to extract features for a target dataset. In constant throughout the training. Pauli-Z expectation values
contrast, the CAE in CAE+QNN network is trained separately of qubits were used as the output features. The results are
to extract features from the target dataset. Therefore, the tabulated in Table I (performance after 10 training epochs).
CAE extracted features may capture more variance in the Each training epoch took ≈195 seconds with the trainable
target dataset compared to transfer learning, and thus, it filter compared to ≈57 with the non-trainable filter on a single
may provide better performance (lower training cost/higher Core i7-10750H machine with 16 GB RAM.
accuracy). The features extracted through transfer learning On average, Quanvolution with trainable filter provided
can be more generic [17], and it eliminates the need to train 15.98% lower training loss, 7.49% lower validation loss,
the classical network separately (as it is already trained). 3.46% higher training accuracy, and 3.32% higher valida-
tion accuracy after 10 training epochs. In some cases, the
IV. E VALUATION Quanvolution with non-trainable filter performed at a similar
In this Section, we compare the performance differences level as its trainable counterpart. For example, MNIST 179
between (a) Quanvolutional Neural Networks with trainable and MNIST 358 provided similar performance in both these
filters, and non-trainable filters, and (b) CAE + QNN and PCA- approaches. However, in all other cases, there was a no-
based approach for a variety of datasets. ticeable performance gap between these two approaches. We
Datasets: We pick the MNIST and Fashion-MNIST datasets repeated the experiments 5 times with the MNIST 179 and
for this work which are widely used in contemporary works MNIST 358 dataset with different random initialization. How-
in QML research (each pixel value scaled within 0-1) [44], ever, the performance remained at similar levels in both these
[45]. Both of these datasets have 60,000 training samples, approaches. In fact, both these models performed poorly on
and 10,000 test samples of 2D images (28x28 pixels) that these two datasets compared to the others (average training
belong to 10 different classes. For empirical evaluation of the loss of 0.535 against 0.266 overall with the trainable filter).
Quanvolution approach, we have picked 1,200 samples from The overall results indicate potential benefits of trainable filters
three different classes to create 6 smaller classification datasets which can be worthwhile to explore in the future.
- MNIST 179, MNIST 246, MNIST 358, Fashion 012, Fash- CAE + QNN: We trained the CAE in Fig. 4 with 60000 train-
ion 345, and Fashion 678. MNIST 179 has ≈400 samples ing samples of the MNIST and Fashion-MNIST datasets with
of digits 1, 7, and 9 each. Similarly, Fashion 012 has ≈400 latent-dimension of 5 and 10 (optimizer: Adam, learning rate:
samples of classes 0 (t-shirt/top), 1 (trouser/pant), and 2 0.001, weight-decay: e−5 , epochs: 30, batch-size: 50). The
(pullover shirt) each. We have reduced the dimension of the extracted datasets (5/10 features) were used to train a QNN.
samples from 28x28 to 14x14 using maxpooling to lower In the QNN, we used 1 variable/qubit encoding method as
the simulation time. To compare CAE and PCA, we have shown in Fig. 2(b). We used 5 and 10 qubits for the 5-feature
trained the corresponding CAE (with latent dimension of and 10-feature datasets, respectively. The QNN shared same
5 and 10) and PCA models with the entire MNIST and PQC architectures and output measurements as the quantum
TABLE I
Q UANVOLUTIONAL NEURAL NETWORK PERFORMANCE AFTER 10 EPOCHS OF TRAINING (O PTIMIZER : A DAGRAD , LEARNING RATE : 0.5)

Quanvolution - Non-trainable Filters Quanvolution - Trainable Filters

Training Set Validation Training Set Validation Training Set Validation Training Set Validation Set
Datasets
Loss Set Loss Accuracy Set Accuracy Loss Set Loss Accuracy Accuracy
MNIST_179 0.3717 0.5537 0.8416 0.783 0.3881 0.5338 0.8500 0.7733
MNIST_246 0.3562 0.4607 0.8733 0.8333 0.2684 0.4879 0.9100 0.8583
MNIST_358 0.6577 0.8453 0.6916 0.6350 0.6825 0.9452 0.7083 0.6300
Fashion_012 0.1149 0.4213 0.9533 0.8833 0.0560 0.3813 0.9783 0.9200
Fashion_345 0.1416 0.2788 0.9433 0.8983 0.0827 0.1670 0.9666 0.9366
Fashion_678 0.2626 0.4434 0.8950 0.8450 0.1226 0.2631 0.9650 0.9216

TABLE II
CAE + QNN NETWORK PERFORMANCE AFTER 20 EPOCHS OF TRAINING OF THE QNN (O PTIMIZER : SGD, LEARNING RATE : 0.5)

CAE + QNN: Latent Dimension = 5 CAE + QNN: Latent Dimension = 10

Training Set Validation Training Set Validation Training Set Validation Training Set Validation Set
Datasets
Loss Set Loss Accuracy Set Accuracy Loss Set Loss Accuracy Accuracy
MNIST_179 0.1479 0.1676 0.9633 0.9533 0.1385 0.1120 0.9633 0.9683
MNIST_246 0.0574 0.0793 0.9900 0.9883 0.0676 0.0791 0.9833 0.9816
MNIST_358 0.3630 0.3281 0.8917 0.9183 0.2000 0.1866 0.9350 0.9383
Fashion_012 0.3620 0.2888 0.9083 0.9200 0.2154 0.1758 0.9266 0.9483
Fashion_345 0.3876 0.2084 0.8500 0.7533 0.1793 0.1541 0.9300 0.9466
Fashion_678 0.2468 0.1962 0.9233 0.9483 0.2968 0.2662 0.9033 0.9550

filters (Fig. 2(d)). We restricted the parametric layers to 3 in the TABLE III
10-qubit model (3x2x10 parameters). To match the number of CAE + QNN AND PCA + QNN NETWORK PERFORMANCE AFTER 20
EPOCHS OF TRAINING ON MNIST DATASET (4000 SAMPLES , 10 CLASSES )
trainable circuit parameters, we restricted the parametric layers
to 6 in the 5-qubit models (6x2x5 parameters). Approach
Training Validation Training Validation
Loss Loss Accuracy Accuracy
The results are tabulated in Table II (performance after 20 PCA(10) + QNN 0.6496 0.6724 0.7875 0.7175

epochs of training). All these models were trainable as evident CAE(10) + QNN 0.3336 0.3463 0.8965 0.8980

from their loss and accuracy values. In the CAE+QNN model,


and 25.3% higher validation accuracy.
the chosen number of latent-dimension (d) dictates the QNN
Python Framework Supports: Numerous choices exist for
architecture. It also affects the overall network performance.
the QNN/quantum filter design using various encoding meth-
A higher value of d means more input features for the QNN
ods, parametric circuit architectures, and measurements. In
model that generally translates to better training performance
this work, we only explored a limited set (Fig. 2). However,
of the QNN. On average, the CAE + QNN model with d
we also release a Python-based framework (created using
= 10 provided 29.85% lower training loss, 23.23% lower
PennyLane, TensorFlow, and PyTorch packages [47]–[49])
validation loss, 2.08% higher training accuracy, and 4.68%
that supports 19 parametric circuit architectures from [30], 6
higher validation accuracy after 20 training epochs compared
encoding techniques, and 6 measurement circuits [50]. We also
to d = 5. Therefore, a higher d (at the cost of larger QNN)
make the datasets available through the repository. Interested
may provide better performance in practical applications.
readers can utilize this repository for further exploration of
CAE + QNN Vs. PCA + QNN: As PCA uses linear these models on any chosen dataset.
transformation, the extracted image features are expected to be
poor which may translate to poor training performance of the V. C ONCLUSION
QNN. To perform this comparison, we trained a PCA model In this article, using empirical evidence we argue that train-
with the 60000 MNIST training samples and extracted 4000 able quantum filters in Quanvolution may provide performance
samples (≈400/class) as before with 10 principal components benefits over the non-trainable filters, and thus, it can be worth-
as the feature variables. We also extracted another 4000 while to explore for potential quantum advantage on image
samples from trained CAE with d = 10. Later, we trained classification tasks. We also show that in the later architecture,
10-qubit QNN models (parametric layers set to 3) with these dimension reduction with convolutional autoencoder (CAE)
datasets for 20 epochs using the same set of training hyper- can be more useful compared to the linear transformation-
parameters (optimizer: Adagrad, learning rate: 0.5). The results based approaches such as, PCA for image datasets.
are shown in Table III. As expected, the CAE+QNN approach Acknowledgements: The work is supported in parts by
outperformed the PCA+QNN approach by significant margin. NSF (CNS-1722557, CCF-1718474, OIA-2040667, DGE-
The CAE + QNN model provided 48.47% lower training loss, 1723687 and DGE-1821766) and seed grants from Penn State
49.2% lower validation loss, 14.1% higher training accuracy, ICDS and Huck Institute of the Life Sciences.
R EFERENCES [26] M. Schuld, R. Sweke, and J. J. Meyer, “Effect of data encoding on
the expressive power of variational quantum-machine-learning models,”
[1] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, Physical Review A, vol. 103, no. 3, p. 032430, 2021.
R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell et al., “Quantum [27] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, A. Figalli, and S. Woerner,
supremacy using a programmable superconducting processor,” Nature, “The power of quantum neural networks,” Nature Computational Sci-
vol. 574, no. 7779, pp. 505–510, 2019. ence, vol. 1, no. 6, pp. 403–409, 2021.
[2] E. Pednault, J. Gunnels, D. Maslov, and J. Gambetta, “On “quantum [28] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, “Circuit-centric
supremacy”,” IBM Research Blog, vol. 21, 2019. quantum classifiers,” Physical Review A, 2020.
[3] E. Farhi and H. Neven, “Classification with quantum neural networks [29] S. Lloyd, M. Schuld, A. Ijaz, J. Izaac, and N. Killoran, “Quantum
on near term processors,” arXiv preprint arXiv:1802.06002, 2018. embeddings for machine learning,” arXiv preprint arXiv:2001.03622,
[4] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. 2020.
Chow, and J. M. Gambetta, “Hardware-efficient variational quantum [30] S. Sim, P. D. Johnson, and A. Aspuru-Guzik, “Expressibility and entan-
eigensolver for small molecules and quantum magnets,” Nature, vol. gling capability of parameterized quantum circuits for hybrid quantum-
549, no. 7671, pp. 242–246, 2017. classical algorithms,” Advanced Quantum Technologies, 2019.
[5] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, “The [31] M. Schuld and N. Killoran, “Quantum machine learning in feature
theory of variational hybrid quantum-classical algorithms,” New Journal hilbert spaces,” Physical review letters, vol. 122, no. 4, p. 040504, 2019.
of Physics, vol. 18, no. 2, p. 023023, 2016. [32] M. Alam, A. Ash-Saki, and S. Ghosh, “Addressing temporal variations
[6] M. Henderson, S. Shakya, S. Pradhan, and T. Cook, “Quanvolutional in qubit quality metrics for parameterized quantum circuits,” in 2019
neural networks: powering image recognition with quantum circuits,” IEEE/ACM International Symposium on Low Power Electronics and
Quantum Machine Intelligence, vol. 2, no. 1, pp. 1–9, 2020. Design (ISLPED). IEEE, 2019, pp. 1–6.
[7] X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object [33] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
detection network for autonomous driving,” in Proceedings of the IEEE arXiv preprint arXiv:1412.6980, 2014.
CVPR, 2017, pp. 1907–1915. [34] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods
[8] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural net- for online learning and stochastic optimization.” Journal of machine
work for real-time object recognition,” in 2015 IEEE/RSJ International learning research, vol. 12, no. 7, 2011.
Conference on Intelligent Robots and Systems (IROS). IEEE, 2015, pp. [35] L. Banchi and G. E. Crooks, “Measuring analytic gradients of general
922–928. quantum evolution with the stochastic parameter shift rule,” Quantum,
[9] S. M. McKinney, M. Sieniek, V. Godbole, J. Godwin, N. Antropova, vol. 5, p. 386, 2021.
H. Ashrafian, T. Back, M. Chesus, G. S. Corrado, A. Darzi et al., [36] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and N. Killoran, “Eval-
“International evaluation of an ai system for breast cancer screening,” uating analytic gradients on quantum hardware,” Physical Review A,
Nature, vol. 577, no. 7788, pp. 89–94, 2020. vol. 99, no. 3, p. 032331, 2019.
[10] A. Esteva, B. Kuprel, R. A. Novoa, J. Ko, S. M. Swetter, H. M. Blau, [37] W. Lavrijsen, A. Tudor, J. Müller, C. Iancu, and W. de Jong, “Classical
and S. Thrun, “Dermatologist-level classification of skin cancer with optimizers for noisy intermediate-scale quantum devices,” in 2020 IEEE
deep neural networks,” nature, vol. 542, no. 7639, pp. 115–118, 2017. International Conference on Quantum Computing and Engineering
[11] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “Deepface: Closing the (QCE). IEEE, 2020, pp. 267–277.
gap to human-level performance in face verification,” in Proceedings of [38] A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and J. I. Latorre,
the IEEE CVPR, 2014, pp. 1701–1708. “Data re-uploading for a universal quantum classifier,” Quantum, vol. 4,
[12] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embed- p. 226, 2020.
ding for face recognition and clustering,” in Proceedings of the IEEE [39] V. Havlı́ček, A. D. Córcoles, K. Temme, A. W. Harrow, A. Kandala,
CVPR, 2015, pp. 815–823. J. M. Chow, and J. M. Gambetta, “Supervised learning with quantum-
[13] M. Nixon and A. Aguado, Feature extraction and image processing for enhanced feature spaces,” Nature, vol. 567, no. 7747, pp. 209–212, 2019.
computer vision. Academic press, 2019. [40] P. Atchade-Adelomou and G. Alonso-Linaje, “Quantum enhanced filter:
[14] J. Friedman, T. Hastie, R. Tibshirani et al., The elements of statistical Qfilter,” arXiv preprint arXiv:2104.03418, 2021.
learning. Springer series in statistics New York, 2001, vol. 1, no. 10. [41] K. Batra, K. M. Zorn, D. H. Foil, E. Minerali, V. O. Gawriljuk, T. R.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification Lane, and S. Ekins, “Quantum machine learning algorithms for drug
with deep convolutional neural networks,” Communications of the ACM, discovery applications,” Journal of Chemical Information and Modeling,
vol. 60, no. 6, pp. 84–90, 2017. 2021.
[16] I. Cong, S. Choi, and M. D. Lukin, “Quantum convolutional neural [42] J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolu-
networks,” Nature Physics, vol. 15, no. 12, pp. 1273–1278, 2019. tional auto-encoders for hierarchical feature extraction,” in International
[17] A. Mari, T. R. Bromley, J. Izaac, M. Schuld, and N. Killoran, “Transfer conference on artificial neural networks. Springer, 2011, pp. 52–59.
learning in hybrid classical-quantum neural networks,” Quantum, vol. 4, [43] M. Chen, X. Shi, Y. Zhang, D. Wu, and M. Guizani, “Deep features
p. 340, 2020. learning for medical image analysis with convolutional autoencoder
[18] Y. Li, R.-G. Zhou, R. Xu, J. Luo, and W. Hu, “A quantum deep neural network,” IEEE Transactions on Big Data, 2017.
convolutional neural network for image recognition,” Quantum Science [44] L. Deng, “The mnist database of handwritten digit images for machine
and Technology, vol. 5, no. 4, p. 044003, 2020. learning research [best of the web],” IEEE Signal Processing Magazine,
[19] I. Kerenidis, J. Landman, and A. Prakash, “Quantum algorithms for deep vol. 29, no. 6, pp. 141–142, 2012.
convolutional neural networks,” arXiv preprint arXiv:1911.01117, 2019. [45] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image
[20] Y. Dang, N. Jiang, H. Hu, Z. Ji, and W. Zhang, “Image classification dataset for benchmarking machine learning algorithms,” arXiv preprint
based on quantum k-nearest-neighbor algorithm,” Quantum Information arXiv:1708.07747, 2017.
Processing, vol. 17, no. 9, pp. 1–18, 2018. [46] M. Anthony and P. L. Bartlett, Neural network learning: Theoretical
[21] M. Henderson, J. Gallina, and M. Brett, “Methods for accelerating foundations. cambridge university press, 2009.
geospatial data processing using quantum computers,” Quantum Ma- [47] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
chine Intelligence, vol. 3, no. 1, pp. 1–9, 2021. T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An
[22] J. Li, M. Alam, C. M. Sha, J. Wang, N. V. Dokholyan, and S. Ghosh, imperative style, high-performance deep learning library,” arXiv preprint
“Drug discovery approaches using quantum machine learning,” arXiv arXiv:1912.01703, 2019.
preprint arXiv:2104.00746, 2021. [48] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,
[23] H.-Y. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-
H. Neven, and J. R. McClean, “Power of data in quantum machine scale machine learning,” in 12th {USENIX} symposium on operating
learning,” Nature communications, vol. 12, no. 1, pp. 1–9, 2021. systems design and implementation ({OSDI} 16), 2016, pp. 265–283.
[24] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V. Stojevic, [49] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M. S. Alam, S. Ahmed,
A. G. Green, and S. Severini, “Hierarchical quantum classifiers,” npj J. M. Arrazola, C. Blank, A. Delgado, S. Jahangiri et al., “Pennylane:
Quantum Information, vol. 4, no. 1, pp. 1–8, 2018. Automatic differentiation of hybrid quantum-classical computations,”
[25] M. A. Nielsen and I. Chuang, Quantum computation and quantum arXiv preprint arXiv:1811.04968, 2018.
information. American Association of Physics Teachers, 2002. [50] https://github.com/mahabubul-alam/iccad 2021 invited QML.

You might also like