How to Build a Software Quantum Simulator
How to Build a Software Quantum Simulator
Gilberto Javier Díaz , Luiz Angelo Steffenel * , Carlos Jaime Barrios * , Jean Francois Couturier *
doi: 10.20944/preprints202409.1497.v1
Copyright: This is an open access article distributed under the Creative Commons
Attribution License which permits unrestricted use, distribution, and reproduction in any
medium, provided the original work is properly cited.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
Disclaimer/Publisher’s Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and
contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting
from any ideas, methods, instructions, or products referred to in the content.
Article
How to Build a Software Quantum Simulator
Gilberto Díaz 1 , Luiz Steffenel 2, * , Carlos Barrios 1, * and Jean Couturier 2, *
1 Universidad Industrial de Santander, Bucaramanga, Colombia; [email protected]
2 Université de Reims Champagne-Ardenne, Reims, France
* Correspondence: [email protected] (L.S.); [email protected] (C.B.);
[email protected] (J.C.)
Abstract: Software quantum simulators are the most accessible tools for designing and testing quantum algo-
rithms. This paper presents a comprehensive approach to building a software-based quantum simulator designed
to run on classical computing architectures. We explore fundamental quantum computing concepts, including
state vector representations, quantum gates, and memory management techniques. The simulator prototype
implements various memory optimization strategies, such as full-state representation, dynamic state pruning,
and shared memory parallelization with OpenMP and distributed memory models using MPI. Additionally, data
compression techniques, like ZFP, are explored to enhance simulation performance by reducing memory footprint.
The results are validated through performance comparisons with leading open-source quantum simulators, such
as Intel-QS, QuEST, and qsim. Our findings highlight the trade-offs between computational overhead and memory
efficiency. This demonstrates that a hybrid approach using distributed memory and compression offers the
best scalability for simulating large quantum systems. This work provides a foundation for developing efficient
quantum simulators supporting more complex quantum algorithms on classical hardware.
1. Introduction
One of the primary reasons to develop quantum computing is that, theoretically, it has been
demonstrated that it allows efficient solutions to some complex problems whose best-known solution
has an exponential cost for the input size. Quantum superposition, quantum uncertainty, and quantum
entanglement are powerful resources that we can use to encode, decode, transmit, and process
information in a highly efficient way that is impossible in the classical world.
Recent technological advances have enabled the development of real quantum devices accessed
through the cloud. However, these devices are expected to be limited in the short term in terms of
the number and quality of their fundamental component, the qubit. Most current quantum devices
have a limited number of qubits. In the quantum circuit model, Atom Computing has 1180 qubits,
and IBM Osprey has 433 qubits. In the adiabatic model, the D-Wave 2000Q has 2000 qubits. These
quantum computers represent prototypes that are not scalable and sufficient to test complex quantum
algorithms. Constructing a full-scale quantum computer comprising millions of qubits is a longer-term
prospect.
The growing interest in Quantum Computing and the limitations of real quantum devices have
caused many organizations to focus on developing software quantum simulators that run on classical
computers. These simulators are trendy tools suitable for testing quantum computing concepts on
ideal conditions, avoiding hardware challenges like the limited number and quality of physical qubits
and quantum error correction. A list of the very recent initiatives is maintained on several websites
[1–4]. This large number of projects reflects the area’s growth and makes it difficult for researchers to
decide which tool to use in their research.
Quantum computing simulators, which operate on classical computers, have emerged as valuable
and widely used tools in the field of quantum computing. These simulators play a crucial role in
the development, testing, and validation of quantum algorithms before they are implemented on
actual quantum hardware. One of the primary advantages of quantum simulators is their accessibility.
Unlike quantum computers, which are still relatively scarce and often require significant resources
2 of 18
and expertise to operate, simulators can be run on conventional computers. This accessibility allows a
broader range of researchers and developers to explore quantum algorithms and concepts without the
need for physical quantum computing resources.
Quantum simulators offer a controlled environment for designing and refining quantum algo-
rithms. They can simulate ideal quantum systems without the noise and error rates present in current
quantum hardware, providing clearer insights into the theoretical performance of an algorithm. This
is particularly useful for educational purposes and theoretical research, where understanding the
principles of quantum computation and algorithm design is the main focus.
However, the simulation of quantum computing models in classical computers requires exponen-
tial time and involves highly complex memory management. The problem is that using conventional
techniques to simulate an arbitrary quantum process significantly more prominent than any of the
existing quantum prototypes would soon require considerable memory on a classical computer. For
instance, to simulate a 60-qubit quantum state, the process would take 18.000 petabytes (18 Exabytes)
of classical computer memory. Therefore, researchers try to reduce such challenges by proposing
efficient simulators.
This work explores the fundamental principles of developing a software-based quantum simulator
capable of performing simulations on classical computers.
Figure 1. Visual representation of qubit: a qubit can be written as a superposition α0 |0⟩ + α1 |1⟩.
The Bloch sphere is commonly used to depict a qubit, Figure 1b. Two angles represent the
state, 0 < θ < π and 0 <= ϕ <= 2π. Thus, the state |ψ⟩ can be rewrited as |ψ⟩ = cos(θ/2)|0⟩ +
eiϕ sin(ϕ/2)|1⟩ The vector from the origin to the point representing the state makes an angle of θ with
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
3 of 18
the z-axis and its component in the x-y plane make an angle of ϕ with the x-axis. The state |0⟩ is the
North Pole of the sphere, and the state |1⟩ is the South Pole.
N 1
The general equation of a n-qubit state is |ψ⟩ = ∑2x=−0 α x | X ⟩ Or, in its expanded form
Where |0...00⟩ = |0⟩ ⊗ ...|0⟩ ⊗ |0⟩...|1...11⟩ = |1⟩ ⊗ ...|1⟩ ⊗ |1⟩ As we can see, a single complex
number can specify a single-qubit state, so n complex numbers can specify any tensor product of n
individual single-qubit states.
The special characteristic of quantum states is that they allow the system to be in a few states
simultaneously, this is called superposition [7].
Quantum bits are not constrained to be wholly 0 or wholly 1 at a given instant. In quantum
physics if a quantum system can be found to be in one of a discrete set of states, which we’ll write as
|0⟩ or |1⟩, then, whenever it is not being observed it may also exist in a superposition, or blend of those
states simultaneously [8].
Because a qubit can take on any one of infinitely many states, one can think that a single qubit
could store lots of classical information. However, the properties of quantum measurement severely
restrict the amount of information that can be extracted from a qubit. Information about a quantum bit
can be obtained only by measurement, and any measurement results in one of only two states, the two
basis states associated with the measuring device; thus, a single measurement yields, at most, a single
classical bit of information [9].
The quantum entanglement describes a correlation between different parts of a quantum system
that surpasses anything classically possible. It happens when the subsystems interact so that the
resulting state of the whole system can not be expressed as the direct product of the states of its parts
[5]. States that cannot be written as the tensor product of n single-qubit states are called entangled
states. Thus, most quantum states are entangled [9]. If we can write the tensor product of those states,
they are said to be separate states.
In the Quantum Circuits model, the fundamental transformation of a quantum state is carried
out using Quantum Gates, which are the basic components of quantum circuits. Quantum gates are
analogous to classical logic gates but operate on qubits instead of classical bits.
To transform the state of the Equation (1), we need 2n × 2n unitary matrices. Applying a single-
qubit gate G to the i-th qubit of an n-qubit quantum state amounts to multiplying the state vector of
coefficients αi by the matrix.
12 ⊗ ... ⊗ 12 ⊗ G ⊗ 12 ⊗ ... ⊗ 12 (2)
| {z } | {z }
n − i −1 i
Figure 2. Quantum Gates: Pauli X gate acts linearly and it takes the state α|0⟩ + β|1⟩ to the correspond-
ing state in which the role of |0⟩ and |1⟩ have been interchanged; it is the quantum equivalent of the
NOT gate for classical computers. The Hadamard gate is the first authentic quantum gate because
can generate superposition states. Phase Shift Gate is a single qubit gate that leaves the basis state |0⟩
unchanged and maps the state |1⟩ to eiϕ |1⟩.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
4 of 18
• Innovative Features: Each simulator offers unique capabilities that set them apart, such as opti-
mized algorithms, integration with widely-used programming frameworks, or novel approaches
to handling quantum state representations. For example, qsim’s integration with Cirq and its
ability to simulate up to 40 qubits on a high-performance workstation make it a significant tool
for developers and researchers.
• Adoption and Partnerships: Some of these simulators are backed by major tech companies and
have extensive partnerships within the industry, increasing their influence and credibility.
• Academic and Commercial Use: These tools are not only used in academic research but are
also increasingly adopted by industries for practical applications, which demonstrates their
effectiveness and robustness.
• Recent Updates and Community Support: The continual updates, community support, and
documentation available for these tools contribute to their status as leaders in the field. This
ongoing development ensures they remain relevant and useful as quantum computing technology
evolves.
• Open Collaboration: Open-source projects encourage open collaboration among developers,
researchers, and users. Ensuring the source code is available for modification and redistribution
fosters a community-driven development approach. This can lead to rapid improvements and
innovations, as a diverse group of contributors can work on the software.
The combination of these factors makes these simulators outstanding in the current world of
quantum computing, pointing towards their innovativeness and leadership in technological advance-
ment.
To evaluate the performance of the selected simulators, the following platforms were used:
• Platform 1: One of the nodes of the cluster Guane of Supercomputing Center of Universidad In-
dustrial de Santander with the following configuration: two AMD EPYC 9554 64-Core Processors
and 375 GB of RAM memory.
• Platform 2: A workstation with One Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz processor with 32
GiB of RAM and a NVIDIA Corporation GP106GL Quadro P2000.
3.1. Intel-QS
It is an open-source quantum circuit simulator implemented in C++. It uses multiprocessing and
has an intuitive Python interface. It is a full-state vector simulator using arbitrary single-qubit gates
and gates controlled by two qubits. [10]. The Intel Quantum Simulator leverages the full capabilities
of an HPC system through its shared and distributed memory implementation. The implementation
on a single node incorporates enhancements such as vectorization, threading, and cache optimization
through the process of gate fusion. The primary object in the Intel Quantum Simulator (IQS) is the
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
5 of 18
QubitRegister, representing the quantum state of the qubits in the system of interest. When declaring a
QubitRegister, the number of qubits must be specified to allocate enough memory to describe their
state. The state can then be initialized to any computational basis state, uniquely identified by its
index.
3.2. Quantum++
Is a general-purpose multi-threaded quantum simulator with high performance. The library
is not restricted to qubit systems or specific quantum information processing tasks, being capable
of simulating arbitrary quantum processes [11]. Quantum++ is developed using standard C++17
and has minimal external dependencies. It primarily utilizes the Eigen 3 linear algebra template
library, which is header-only. Additionally, when available, it employs the OpenMP library to facilitate
multi-processing. The primary data types are complex vectors and complex matrices, such as complex
dynamic matrices, double dynamic matrices, complex dynamic column vectors, complex dynamic row
vectors, etc.
3.3. qsim
Developed by Google, qsim is an optimized quantum circuit simulator that uses gate fusion and
vectorized instructions to simulate up to 40 qubits on a powerful workstation [12]. Integrated with
Cirq, it provides a robust environment for developing and testing quantum algorithms. To achieve
cutting-edge simulations of quantum circuits, it uses gate fusion, AVX/FMA vectorized instructions,
and openMP multi-threading. This relies on cuQuantum to integrate GPU support.
3.4. cuQuantum
NVIDIA’s cuQuantum SDK is another leading tool, designed to accelerate quantum circuit
simulations on GPUs. This toolkit is essential for developers looking to leverage the power of GPUs
to enhance simulation performance and scalability. It provides an integrated programming model
tailored for a hybrid environment, enabling the combined operation of CPUs, GPUs, and QPUs.
3.5. QuEST
QuEST, or the Quantum Exact Simulation Toolkit, is a high-performance open-source quantum
computing simulator designed for simulating quantum circuits, state-vectors, and density matrices.
Developed by the Quantum Technology Theory Group at the University of Oxford, QuEST is distin-
guished by its ability to utilize multithreading, GPU acceleration, and distribution, making it highly
effective across various computing environments, from laptops to networked supercomputers. The
toolkit is capable of simulating both pure quantum states and mixed states with precision, and sup-
ports a wide array of quantum operations. It allows for simulations that are extensible and adaptable,
thanks to its open-source nature and support for various back-end hardware via its simple and flexible
interface [13]. QuEST represents a pure state for a system of n qubits using 2n complex floating-point
numbers, with each real and imaginary component having double precision by default. However,
QuEST can be configured to use single or quad precision if desired. The simulator stores the state
using C/C++ primitives, which means that by default, the state vector alone consumes 16 × 2n bytes
of memory.
3.6. Qrack
Qrack is a high-performance quantum computer simulator that is written in C++ and supports
OpenCL and CUDA [14] [? ]. It is particularly notable for its ability to simulate arbitrary numbers
of entangled qubits, limited only by system resources. Qrack is designed to be embedded in other
projects and includes a comprehensive suite of standard quantum gates, along with variations suitable
for register operations and arbitrary rotations. The simulator is integrated with other quantum
computing frameworks like ProjectQ and Qiskit, enhancing its versatility and application. Qrack also
features optimizations for noiseless pure state simulations and includes tools that aid in the control,
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
6 of 18
extension, and visualization of data from quantum circuits. Qrack maintains the state representation
in a factorized form to enhance simulation efficiency. A general ket state |ψ⟩ of n qubits is described by
O(2n ) complex amplitudes.
Other projects, like XACC and Qiskit, provide a full-stack approach to quantum computing,
including a simulator and compilers and the possibility to run the program on real quantum processors.
The following graphs are shown to compare some of these simulators under equal conditions.
First, GPU capable simulators are depicted in Figure 3a. Second, OpenMP capable simulators are
shown in Figure 3b.
7 of 18
4. Implementing a Simulator
To gain a deeper understanding of the fundamental operations of quantum computing and to
test the various memory management approaches, a software quantum simulator prototype was
developed in C++. (The Memory eFficient Quantum Simulator, TMFQS) [15]. This prototype was
designed in such a way that it allows us to change strategies easily through minimal modifications.
It allows us to easily adjust the data structures to represent the fundamental concepts of quantum
computing and the use of compression libraries.
It has to be pointed out that this prototype does not implement all the concepts of quantum
computing, such as quantum error correction, entanglement, measurement and an extended set of
quantum gates. This prototype’s primary purpose is to provide a minimal platform for understanding
the principles of building a software quantum simulator. Several scenarios were implemented to carry
out the tests.
• Dynamic memory management. The primary purpose is to test the strategy of removing less
probable states.
• Full State: The objective is to accelerate the simulations, avoiding the overhead introduced by the
search of the states.
• Full State with OpenMP: The intention is to accelerate the simulations of the previous version.
• Full State with data compression: The purpose is to test a lossy compression library like ZFP.
• Full State with MPI: The main objective of this scenario is to distribute the amplitude vector
among different computing nodes, allowing for a greater number of qubits.
• Full State with MPI and data compression: Here, data compression was incorporated into the
previous scenario.
Figure 4. Class Diagram of the Prototype: QuantumRegister class represents a quantum state and
implements the main method to transform a quantum state (applyGate). The QuantumGate class
implements a small set of quantum gates using the matrix representation.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
8 of 18
Where αi are the amplitudes. As we said previously, these amplitudes are complex numbers, so
we need two float or two double numbers to represent them in the code. Of course, the state vector
must fit in the local memory.
The amplitudes of the states are implemented using a single-dimension double-precision array
stored in a continuous memory space. To increase performance, a single array was used to store both
the real and the imaginary parts of each amplitude; that is, the state vector was linearized. The real
parts are placed in the odd positions of this arrangement, and the imaginary parts are placed in the
even positions. This strategy avoids jumping between two arrays, one for the real part and one for the
imaginary part. Figure 5 depicts this data structure.
′
α0...00
α′
0...01
.
Gk |ψ⟩ = ψ′ ..
= (4)
′
α1...10
′
α1...11
To the k-th qubit of a quantum register of N qubits is equivalent to applying the gate to pairs of
amplitudes whose indices differ by k-th bits from their binary index.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
9 of 18
′ 1 1 1
α00 = √ ·1+ √ ·0 = √
2 2 2
(8)
′ 1 1 1
α10 = √ · 1 − √ ·0 = √
2 2 2
α′∗..∗1c ∗..∗0t ∗..∗ = g11 · α∗..∗1c ∗..∗0t ∗..∗ + g12 · α∗..∗1c ∗..∗1t ∗..∗
(9)
α′∗..∗1c ∗..∗1t ∗..∗ = g21 · α∗..∗1c ∗..∗0t ∗..∗ + g22 · α∗..∗1c ∗..∗1t ∗..∗
Let’s see how to apply the CPS gate to the second qubit of the state |11⟩ controlled by the first
qubit. All amplitudes are equal to 0 except α11 which is equal to 1. Replacing these values in the
Equation (9) we have:
′ ′
α10 = 1·0+0·0 = 0
′
(10)
α11 = 0 · 1 + eiϕ · 1 = eiϕ
Thus, we obtain the amplitude values for the states |10⟩ and |11⟩
Qubits Order
Some simulators, like qiskit, reverse the order of the qubits such that qubit 0 corresponds to the
least significant bit of the binary representation of the state. In this case, the distance between α′∗...∗0k ∗...∗
and α′∗...∗1 ∗...∗ is equal to 2k .
k
In this work, we maintain the natural order of the qubits. For example, in state |011⟩, qubit 0 is
the leftmost, qubit 1 is in the middle, and qubit 2 is the rightmost. Therefore, the distance between
α′∗...∗0k ∗...∗ and α′∗...∗1 ∗...∗ is equal to 2(numQubits−1)−(k−th qubit) . To illustrate this, Figure 6 shows the
k
distance between the states of a 4-qubit state vector.
Generally, a single-qubit gate can be applied to a quantum register performing the following
pseudo-code.
10 of 18
done
In summary, calculating the amplitudes for the current state and the new affected state is done
as follows: Determine the value of the current state’s amplitude using Equation (6). Then, find the
pair corresponding to the current state, and finally, calculate the value of the latter using that same
equation.
To find the pair corresponding to the current state, we can use two methods: the first calculates
the distance using the relation 2(numQubits−1)−(k−th qubit) , as we explained before. The second method
applies an XOR operation between the binary representation of the current state and a sequence of
0s with a 1 placed in the k-th position corresponding to the qubit we are working on. For example,
applying a quantum gate on the 0th qubit on a for 4-qubits state |0101⟩ we can find the corresponding
pair using the following operation.
0101
1000
⊕ (11)
1101
This result can be corroborated in Figure 6. C++ offers binary operations to execute this operation
efficiently and effortlessly.
11 of 18
Because of this, the less probable states elimination approach was discarded early, therefore, we
focus on pure states, which imply that the state vector contains the complete information about the
quantum state; and this approach was adopted for the rest of this simulator’s design.
The QuantumRegister::applyGate method iterates through the state vector, implementing Equa-
tion (6). To enhance performance, we partition the data and execute instructions on segments of the
state vector, thereby speeding up the simulation. It is crucial to carefully manage the method’s internal
variables to prevent race conditions.
12 of 18
2numQubits
numProcs = (12)
2m
Where 2m is the number of states per process. In this case we can face two cases:
Figure 9 shows the pairwise calculation scheme for a 5-qubit state vector, applying each qubit.
Partitioning with 2, 4, and 8 processes is also shown to visualize the communication process easily
when we use the distributed programming model.
For instance, consider performing a calculation on qubit 0 of the state |00010⟩; the corresponding
pair would be |10010⟩. If two processes are used, communication should be established with process 1.
If four processes are utilized, the remote process is process 2. Lastly, if eight processes are employed,
the remote process will be process 4.
We use the following expression to calculate the process’s identifier where the corresponding pair
is located.
pairState
remoteProcID = (13)
2m
In Figure 9, it is evident that for 2 processes, specifically regarding qubit 0, the number of
communications required is 2numQubits/2 . This substantially degrades performance. To mitigate the
overhead caused by the extensive number of communications, the entire segment of the state vector is
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
13 of 18
exchanged between the peer processes involved, as outlined in Equation (6). The calculations are then
made locally, and the results are communicated back to the original process.
For this reason, we are unable to use the the total sum of local memory of each node to augment
the number of qubits, and can only utilize half of the combined memory from all nodes. Figure 10
depicts this idea.
14 of 18
The test was executed by initializing the first state with a probability equal to one, that is to say,
1 × |0000⟩. Then, we repeat the experience with 1 × |0001⟩ and so on until executing the test with
the last state 1 × |1111⟩. The results of excecuting this quantum circuit with intel-qs, quantum++ and
TMFQS were the same.
5. Results
This section presents the results of several quantum simulation tests performed using the proto-
type software quantum simulator developed in C++. By simulating fundamental quantum operations,
we can assess how well these strategies reduce memory consumption and improve the efficiency of
quantum computing simulations on classical hardware. Throughout the section, we compare the simu-
lator’s performance with and without the proposed memory management techniques, highlighting
the improvements achieved. By providing a comprehensive evaluation of these memory management
strategies, this section aims to contribute to ongoing efforts to make quantum computing simulations
more efficient and scalable, ultimately advancing the field of quantum computing.
15 of 18
Fourier Transform algorithm. The quantum register contains all the states at the end of executing the
Quantum Fourier transform algorithm. Due to the initial superposition process, the quantum register
also has all the states in the first stage of Grover’s algorithm. Therefore, this approach does not work
well for these algorithms.
For these reasons, along with the risks outlined previously, we have decided to discard this
approach because its numerous disadvantages outweigh its benefits.
(a) Dynamic Memory vs Full-State Approach (b) QFT Full-State Approach with OpenMP (log10 )
Figure 12. QFT performance with different approaches.
We observe a significant decrease in the processing time between the serial and parallel execution
of the full-state version as the number of qubits increases. That is, a considerable acceleration is
obtained by parallelizing the simulation. However, for clearer interpretation of the results, we show
the results calculating the base 10 logarithm of the simulation time. In the graph shown in Figure 12b,
it is evident that for smaller numbers of qubits, there is an overhead caused by the setup of the parallel
environment.
16 of 18
The graph of Figure 13b shows the amount of memory used by both simulator versions. We can
observe that the compression approach is highly efficient. This enables the possibility of increasing the
number of qubits in simulations.
(a) QFT Full-State with ZFP (log10 ) (b) QFT Full-State with ZFP Data Size
Figure 13. QFT performance with ZFP.
(a) QFT Full-State with MPI (b) QFT Full-State with MPI and OpenMP
Figure 14. QFT performance distributed and hybrid memory model.
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
17 of 18
Comparing the results of the graphs in Figure 14a,b, we see that the combination of MPI and
OpenMP increases the performance, especially for cases where the size of the state vector portion at
each node is large.
(a) QFT with MPI with Data Compression (b) Quantum Simulators Comparison
Figure 15. QFT with ZFP.
As can be seen in the graph in Figure 15b, the Intel-QS simulator performs lower than the other
simulators. QuEST exhibited the best performance. Our prototype TMFQS performs acceptably
compared to these mature tools that have been optimized, for example, by using libraries such as MKL
in the case of Intel-QS.
In conclusion, building a software quantum simulator requires a delicate balance between theo-
retical understanding and practical implementation strategies. The limitations of current quantum
hardware, including qubit count and quality, drive the need for quantum simulators that allow re-
searchers to explore quantum algorithms on classical computers. This work has shown that memory
Preprints.org (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 September 2024 doi:10.20944/preprints202409.1497.v1
18 of 18
management techniques, such as dynamic pruning, full-state representation, and data compression,
are essential for optimizing the simulation of quantum systems. While pruning techniques introduce
certain challenges, such as fidelity loss and increased computational complexity, full-state represen-
tation with parallelization (via OpenMP or MPI) provides a robust framework for simulating larger
quantum states. The use of data compression, such as ZFP, further extends the capacity to simulate a
greater number of qubits without exceeding memory limits, though it introduces some overhead in
processing time.
The comparative performance of the prototype simulator against established simulators like
Intel-QS, QuEST, and qsim demonstrates the viability of these memory management techniques. By
combining distributed and shared memory models, along with data compression, the simulator can
handle increasingly complex simulations. Ultimately, this work contributes valuable insights into
making quantum computing simulations more scalable and efficient, supporting the broader field of
quantum computing as it continues to evolve.
References
1. Report, Q.C. Qbit Count. https://quantumcomputingreport.com/scorecards/qubit-count/, 2019.
2. Quantiki. List of QC simulators. https://www.quantiki.org/wiki/list-qc-simulators, 2019.
3. Fingerhuth, M. Open-Source Quantum Software Projects. https://github.com/qosf/os_quantum_
software, 2019.
4. Team, Q.O.S.F. Quantum Open Source Foundation. https://qosf.org/, 2019.
5. Bergou, J.A.; Hillery, M. Introduction to the Theory of Quantum Information Processing; Springer Publishing
Company, Incorporated, 2013.
6. Artur Ekert, P.H.; Inamori, H. Basic concepts in quantum computation. Coherent atomic matter waves 2001,
pp. 661–701.
7. Shoshany, B. In layman’s term, what is a quantum state? https://www.quora.com/In-laymans-term-
what-is-a-quantum-state, 2018.
8. Williams, C.P. Explorations in Quantum Computing, Second Edition; Texts in Computer Science, Springer,
2011. doi:10.1007/978-1-84628-887-6.
9. Eleanor, R.; Wolfgang, P. Quantum Computing, A Gentle Introduction; The MIT Press, 2011.
10. Guerreschi, G.G.; Hogaboam, J.; Baruffa, F.; Sawaya, N. Intel Quantum Simulator: A cloud-ready high-
performance simulator of quantum circuits, 2020, [arXiv:quant-ph/2001.10554].
11. Gheorghiu, V. Quantum++: A modern C++ quantum computing library 2014. [arXiv:1412.4704].
doi:10.1371/journal.pone.0208073.
12. team, Q.A.; collaborators. qsim, 2020. doi:10.5281/zenodo.4023103.
13. Jones, T.; Brown, A.; Bush, I.; Benjamin, S.C. QuEST and High Performance Simulation of Quantum
Computers. Scientific Reports 2019, 9, 10736. doi:10.1038/s41598-019-47174-9.
14. Strano, D. Qrack. https://vm6502q.readthedocs.io/en/latest/, 2019.
15. Díaz, G. Prototype Quantum Computing Simulator. https://github.com/diaztoro/TMFQSfullstate.git,
2024.
16. Trieu, D.B. Large-Scale Simulations of Error-Prone Quantum Computation Devices. Dr. (univ.), Universität
Wuppertal, Jülich, 2009. Record converted from VDB: 12.11.2012; Universität Wuppertal, Diss., 2009.
17. Smelyanskiy, M.; Sawaya, N.P.D.; Aspuru-Guzik, A. qHiPSTER: The Quantum High Performance Software
Testing Environment. CoRR 2016, abs/1601.07195.
18. Lindstrom, P. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and
Computer Graphics 2014, 20, 2674–2683. doi:10.1109/TVCG.2014.2346458.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those
of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s)
disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or
products referred to in the content.