Advanced Optical Materials Template
Advanced Optical Materials Template
1 Introduction
Since the advent of computers, researchers have been captivated by the prospect of endowing machines
with human-like intelligence, a field known as Artificial Intelligence (AI). The ultimate goal is to em-
power machines with cognitive abilities, including abstract thinking, decision-making, adapting to new
situations, creativity, and social skills [1]. AI, which focuses on specific tasks, has seamlessly integrated
into everyone’s daily life, evident in applications such as automatic photo-tagging, customer service advi-
sory calls, and personalized product recommendations.
The impact of AI extends even further, encompassing its potential to enhance medical diagnosis, drug
design, and cancer detection/treatment. This rapid advancement in AI can be largely attributed to the
swift growth in computational power, encompassing both data storage and processing speed. Further-
more, the rapid progress in deep learning algorithms and their applications across various fields has spurred
the demand for high-performance computing platforms.
With that in mind, a key challenge in AI computing is the von Neumann bottleneck, [2, 3, 4] stemming
from the architectural design of computer systems. This bottleneck highlights the limitations in system
throughput, which are a result of the bandwidth constraints on data being transferred in and out. To
address this computational bottleneck, which is primarily caused by the physical separation between the
CPU and memory, various strategies have been investigated that utilize modified Von Neumann devices
such as GPU architecture, to process large models and datasets [5, 6, 7] but the scalability of such in-
ventions is still limited by the latency associated with fetching data from memory islands . However, the
past decade has witnessed a growing interest in exploring alternative computing paradigms to acceler-
ate deep learning tasks. These alternatives include the exploration of optical-related disciplines such as
metasurfaces, non-linear photonics, and the utilization of photonics accelerators [8, 9].
Photonic accelerators, also known as optical accelerators, have emerged in the last decade, leveraging on
prior photonic enabling technologies such as modulators, photo-detectors, and optical filters [10]. This
growth is illustrated in Figure 1 in terms of publications per year and in comparison, the major evolu-
tionary milestones of traditional electronic AI-based technologies such as AlexNet [11], AI co-processors
[12], Microsoft Project catapult [13] and In-memory computing [14]. Unlike traditional electronic com-
ponents such as transistors, electronic switches, modulators, and microprocessors, photonic accelerators
1
Figure 1: Published literature on Photonic accelerators.
utilize light particles (photons) to process information. This innovative approach enables parallel pro-
cessing and faster information transfer, making it particularly advantageous for AI workloads involving
intensive matrix calculations in neural network operations. The success of photonic-based accelerators
has been driven by decades of innovations at the device and chip level of optical systems. These devel-
opments build upon foundational photonic technologies such as lasers, modulators, photo-detectors, and
optical filters. This progress is illustrated in figure 1, which displays the yearly increase in related publi-
cations and contrasts this with the developmental milestones of traditional electronic AI technologies.
The figure highlights key optical devices and systems introduced in the early 2000s in silicon photonics,
like photonic Wavelength Division Multiplexing (WDM) filters [15, 16], Mach-Zehnder Interferometer
(MZI) modulators [17, 18] and IQ-modulators [19]. This evolution continued with the advent of smaller-
sized Microring Resonators (MRRs), crucial in many optical filter designs and high-speed Non-return-to-
zero (NRZ) modulators [20] with bandwidths around 25 GHz. Pulse Amplitude Modulation with Four
Levels (PAM4) modulation schemes have been explored using the ring resonator in order to increase the
throughput per footprint of device [21]. These ring resonators, possessing high Q factors, have been en-
gineered to function as switches, integrators, differentiators, and memory elements at terahertz (THz)
frequencies.
Recently, these devices have been integrated to create optical neural networks that are energy-efficient,
compact, and offer high throughput [22]. Figure 1 also includes two throughput/mm² lines for compar-
ison. The first line, indicating 25 TOPS/mm², has been achieved by various optical devices. This high
level of performance is achieved at lower power consumption due to parallelism (utilizing multiple wave-
lengths) as well as smaller foot print per wavelength. Some examples of these innovations are photonic
tensor cores [23, 24, 25], programmable phase-change metasurface for multimode photonic Convolutional
Neural Networks (CNNs) [26, 27], in-memory computing, and hybrid co-processors.
In contrast, the second line in the figure shows a throughput of 10 TOPS/mm², representing the pro-
jected maximum capability of current non-photonic hardware accelerators. This comparison underscores
the advancements and potential of photonic technologies in achieving higher throughput and efficiency in
computing, particularly in the context of neural network processing and AI acceleration. The earliest op-
tical accelerators could be traced in the assemblage of typical lab bench-top discrete optical components
interconnected with long fiber spools intended to perform canonical mathematical functions [28, 29]. The
first of such functions essential to AI computations is the unitary matrix. In which it was first demon-
strated optically by [30] in 1994, using optical beam splitters. This development laid the groundwork
for subsequent advancements in photonic integrated computations using MZIs. It was later shown by
Miller et al. [31, 32, 33] that meshes of MZI networks could be self-configured to define adaptive filter
functions. Such reconfigurable networks are promising candidates for building adaptive neural networks
2
and photonic FPGA (Field-Programmable Gate Array) systems [34, 35, 36, 37].
However, optical computing has been viewed skeptically in applications that require large data storage
and efficient flow control such as activation functions and non-linear computations. Therefore, the cur-
rent trend is to use photonic accelerators in areas that maximize the inherent advantages of optics. These
applications include parallel computing using WDM, polarization diversity, and mode multiplexing [38].
The long path length often associated with MZI circuits, however, has challenged the suitability of their
use in large-scale photonic circuits like accelerators and FPGA fabrics. This context still raises consid-
erable concerns regarding high latency and insertion loss due to the longer physical length of the circuit
[39].
MRRs, on the other hand, present an alternative with better scalability and compactness. When light
goes through ring resonators such as in 2 × 2 switches, the drop port of the switch induces a time de-
lay determined by the Q factor of the ring [40, 41, 42, 43]. The latency can be tuned by either inserting
Phase-changing Materials (PCMs) as cladding or cascading additional switches in tandem. The phase
transition of the PCMs leads to appreciable alterations in their optical properties, controllable either
electrically or optically [27, 44]. This characteristic offers a notable advantage in power efficiency for pro-
grammable photonic devices compared to electro-optic or thermo-optic methods [45, 46].
Moreover, incorporating non-volatile PCMs as photonic devices enables optical memory functions and
in-memory computing, achieved by transmitting optical input through the programmed device. Opti-
cal memory in ring resonators has also been studied using Volttera series in microwave photonics [47].
The memory effect is modelled as a multi-dimensional impulse response in the time domain or Voltterra
kernels in the frequency domain. By using the ring resonator as a differentiator, it is possible to induce
nonlinear mixing of multiple wavelengths to realize a frequency dependant memory function. Yet an-
other key enabling technology, highlighted in Figure 1 for realizing synaptic-like non-linear functions in
photonic computing is the optoelectronic neuron.
In certain applications, such as deep learning inference, the trained synaptic weights may not require
frequent updates or any at all. Here, non-volatile analog memory, like ”in-memory” computing, is ad-
vantageous. This can be achieved either optically or electronically using PCMs [48, 49, 50, 51]. By us-
ing digital electronic drivers with photonics-compatible firmware, a real-time neural network can be es-
tablished. This neuron is a hybrid of the well-modelled electronic non-linearities with the, negligably
lossy optical systems. Photodetectors (PDs), modulators, and lasers are combined in building such a de-
vice with the PD generating electrical current in proportion to the incident optical power in a waveguide
[23, 52, 53, 54].
Likewise, a spiking function can be realized when photons are generated from a threshold-based semicon-
ductor laser [55]. Such optoelectronic spiking neural networks have been realized with very low energy
consumption of 4.8 fJ/bit operating at 10 Gbit/s [53]. However, such neurons require constant biasing
of the laser source which increases the overall power consumption for a photonic accelerator with many
neurons. Nonetheless, spiking neural networks present attractive energy efficiency metrics owing to the
spontaneous and mostly idle mode of operation in neuromorphic communication [56].
Figure 1 also plots two lines indicating the projected speed of operation for various technologies and sys-
tems. The reported compute efficiency together with the theoretical projections in TOPs per mm2 help
to classify the throughput contribution of the various technologies. Here TOPs (tera-operations) per sec-
ond is normalized by the processor area as a figure-of-merit for performance. For the SiN devices, the
area of one MAC unit cell is 285µm × 354µm [57, 58]. This, when operating at 12 GHz with 4 input
vectors via WDM, corresponds to a compute density of 1.2 TOPS/mm2 . If SOI MRR devices are used
instead with a nominal bend radius of 5µm electrical power management, the area of the MAC unit cell
could be reduced to less than 30µm × 30µm, increasing the compute density to 420 TOPS/mm2 per in-
put channel [59, 60]. In-memory-computing photonic tensor cores show predicted compute density and
compute efficiencies of 880 TOPS/mm2 and 5.1 TOPS/W for a 64×64 Xbar core at 25-GHz clock speed
[61]. Compared with digital electronic accelerators (ASIC and GPU), the photonic core has 13 orders of
magnitude improvement in both compute density and efficiency.
This study investigates and reviews the implementation of deep learning accelerators from a very gran-
3
Figure 2: Classification of Photonic accelerators.
ular design methodology perspective to a very broad application perspective. The rest of the paper is
organized as follows: In Section 2, a total of 3 broad branches of photonic accelerator classes are dis-
cussed. Next, Section 3 provides an overview of how Deep Learning (DL) computations work and what
role accelerators play in these processes. In Section 4, DL Photonic accelerators are classified into 7 dis-
tinct categories based on their design. Then, Section 5 further discusses methodology approaches aimed
at improving design efficiency. Finally, Section 6 concludes by pointing out possible research gaps that
remain unexplored and discussing the extracted key points.
2.1 Modalities
Optical Processing Units (OPUs) leverage photonic devices for machine learning applications, perform-
ing a broad spectrum of mathematical and logical tasks crucial for deep learning algorithms. Photonic
Integrated Circuits (PICs), the predominant form of OPUs, are engineered for efficiency in operations
such as matrix multiplications and convolutions, which are fundamental in deep learning models[62].
Optical processors implementing matrix-vector multiplications at Gb/s processing rates have been im-
plemented in literature [63, 64, 65, 66, 67]. One example of such an OPU is the LightOn [63], which has
been shown to operate at 50 TeraOPS/watt with x, y input vector dimensions of 1 million × 2 million.
This OPU circumvents the limitations of using the von Neumann architecture by reducing the computa-
tional size and time from O(n2 ) to O(1) i.e. the computational time is independent of data size. OPUs
that break the memory-dependant computations would allow direct single-chip implementation of larger
datasets with extended Random Access Memory (RAM) limits. This emerging field leverages photons
for energy-efficient matrix multiplications, capitalizing on their high speed and compatibility with the
semiconductor industry. Companies like Lightmatter, Lightelligence, Luminous, and LightOn [68] are
the new face of developing Photonic Neural Networks (PNNs) for low-power Multiply-and-accumulate
4
2.1 Modalities
5
2.2 Operating Mechanisms
Figure 3: Analog Photonics ODE Solver. Reproduced without changes under terms of the CC-BY license [100]. 2016, Li et
al., published by Springer Nature. .
strated by a diffractive Optical Neural Network (ONN) accelerator in which hidden layers have their
phases tuned with the help of extra electronic Digital to Analog Converters (DACs) and drivers [89].
This offloading can markedly enhance the overall throughput and efficiency of machine learning systems
[90, 91, 92, 93]. In [94], a hybrid optoelectronic CNN, was designed which allows for more difficult clas-
sification tasks than the standard optical correlator [28]. These hybrid OPUs have demonstrated scala-
bility and the facilitation of complex computational acceleration within standard computing frameworks
[95, 96, 97].
6
2.3 Applications
cessors that work with discrete optical signals, enabling operations like binary logic and bit manipula-
tion. These processors are used in optical computing for specific tasks. Quantum Digital Optical Proces-
sors (QDOP) leverage discrete photonic qubits for quantum computation [111]. They enable quantum
gates and algorithms that work with digital quantum information, facilitating quantum-enhanced ma-
chine learning algorithms. Photonic quantum systems offer practical approaches for areas such as quan-
tum communication, quantum sensing, quantum computing, and simulation. A recent study shows the
potential applications of photonic quantum computers in optical processing units. Photonic Integrated
Circuits (PICs) have also been used for digital signal manipulation where the complex optical circuits al-
low for the precise control and manipulation of discrete photons, making them suitable for digital-like
processing tasks. Many mathematical operations have been solved using Binary Photonic Arithmetic
(BPA) where photonic accelerators perform binary arithmetic operations using discrete optical signals
[111]. These can be applied to various machine learning tasks, especially in binary neural networks and
binary-coded optimization problems. Digital photonic data transmission has also emerged in optical in-
terconnects for data compression, multiplexing and encoding in optical interconnects for digital data
handling between processing units and memory components in high-performance computing clusters.
2.3 Applications
As a subgroup of ONNs, Photonic Deep Learning Accelerators (PDLAs) focus on leveraging photonic
technology to accelerate deep learning models. They utilize the speed and parallelism of photons to en-
hance various aspects of deep learning, including CNNs and Recurrent Neural Networks (RNNs) [99].
One recent innovation in this field is the development of integrated photonic devices that can perform
complex matrix operations crucial for neural network computations at much higher speeds than tradi-
tional electronic hardware. Researchers are also exploring the use of photonic circuits for faster and more
efficient training of deep neural networks.
Optical co-processors represent another vital development, acting in tandem with CPUs and GPUs. These
co-processors, particularly those dedicated to matrix multiplication tasks, have been integrated into the
hardware ecosystem to enhance machine learning throughput [111]. This integration is supported by ad-
vances in optical interconnects, which have improved bandwidth and reduced latency, critical for dis-
tributed machine learning systems. High-bandwidth optical interconnects are central to optical data trans-
mission accelerators, and recent advancements here have focused on increasing data rates, decreasing
power consumption, and achieving higher reliability. Technologies such as silicon photonics are paving
the way for scalable, energy-efficient data transmission [98]. Lastly, the area of Quantum-Enhanced Clas-
sical Machine Learning Accelerators (CMLA) is an intriguing new development within quantum photon-
ics. This approach seeks to enhance classical machine learning algorithms using quantum-inspired meth-
ods, with a promise of solving complex optimization problems more quickly than classical approaches
[111].
Also, within the context of deep learning, another paradigm in focused applications is the use of Quan-
tum Photonic Machine Learning Accelerators (QPMLAs). These accelerators aim to harness the power
of quantum mechanics for machine learning tasks. Recent innovations include the development of quan-
tum photonic processors capable of executing quantum algorithms that could significantly speed up cer-
tain machine learning tasks [99]. Further research in this field can offer the potential for exponential speedup
in solving optimization problems and enhancing pattern recognition tasks.
These subgroups represent cutting-edge research and development in the field of optical photonic accel-
erators for machine learning, with ongoing efforts to harness the potential of photonic technology and
quantum mechanics for faster, more efficient, and scalable machine learning solutions. The latest innova-
tions are driving progress in areas such as deep learning, quantum-enhanced algorithms, and high-speed
data handling, pushing the boundaries of what’s possible in the realm of machine learning hardware ac-
celeration.
The introduction of GPUs, ASICs, and neuromorphic chips like IBM’s True North [113] and Intel’s Loihi
[114] has dramatically improved energy efficiency and speed. However, hardware accelerators in neural
networks face two primary challenges: (i) maximizing the parallelism of neural networks requires scal-
7
Figure 4: Illustration of Wavelength Division Multiplexing (WDM) [112].
ing the accelerators, and (ii) minimizing energy consumption necessitates optimization of data movement
[115]. This has led to the emergence of analog neural networks characterized by rapid processing, WDM-
assisted parallelism and high energy efficiency. Figure 4 provides an illustrative figure for the WDM con-
cept widely employed in various photonic accelerator designs.
The compact size of photonic integrated silicon platforms and their seamless integration with electronic
systems make them highly desirable. Implementations of silicon photonic accelerators utilize photonic
components to significantly reduce energy consumption, especially during matrix–vector multiplications
in deep-learning models and CNNs. Unlike electronics, silicon photonics do not consume power as they
merely propagate light through the silicon photonic structure, which is inherently low-power [116].
a′ ← a + (w × x)
Subsequently, for a more NN-specific use of MAC operations, one can think of different layers of NNs
that contain different nodes within each layer. More Nodes j (or neurons) receive signals from a large
number of other nodes i for a set of input variables xi and output variables yj . A weighted sum is calcu-
P
lated based on the inputs yj = i wij xi . In the next layer, yj is input through a non-linear function:
( )
x′j
X
=f wij xi
i
Any non-linear operation can be expressed in the form of f {x} (eg. ReLUs, pooling operations, etc.).
Weighted sums can be expressed as ai = ai−1 + wi xi for i = 1. It takes M parallel MAC operations
8
Figure 5: Typical signal pathway for a modern AI chip. Energy is consumed primarily by moving data in systems widely
used in literature. The passing of information occurs between MAC processors performing a + (wx), memory caches, and
non-linear operations f {x} [117].
to operate each neuron. The amount of MAC operations per time step required for a neural network of
size N is M × N , or one operation for each node in the network. Using an interconnected network of
N nodes (M = N case), MAC operations required per step are ∆t or characteristic time constant τ in
analog hardware is N 2 per step. Energy can also be consumed by the non-linear function f {x}, however,
it scales O(N ) for this operation and not with O (N 2 ), thus, not a very costly operation. The MAC be-
comes the most difficult hardware bottleneck as the network size N grows [117].
Essentially, computations in the photonic space are passive, therefore, even in the case of O (N 2 ) fixed-
point operations, it has efficient energy scaling costs probably at O(N ). Ultimately, it is only the periph-
ery of modulation and detection that can create bottlenecks in photonic matrix multiplication [117].
9
4.1 WDM-based Accelerators
Figure 6: An MRR bank-based BW protocol. A bundled wavelength is propagated through an MRR bank as it enters.
Through the tuning of corresponding rings, each bank weights each wavelength. Photodiodes create photocurrents by
adding all wavelengths together. Photo-currents modulate light waves of wavelength λm . Multiplexing of all laser beams is
used to broadcast the beams to the next layer. Reproduced, with permission [118], from Mehrabian et al. 2018 © IEEE.
Ultimately, with this design, PCNNA addresses a significant drawback associated with using MRRs. In
traditional setups, the number of microrings required to perform multiplication in the MAC operation
scales as Ni ×Ni+1 , where N represents the number of neurons in a layer, and i represents the layer num-
ber. However, in PCNNA’s innovative design, only kernel values over the input feature map of a specific
size N are utilized in the convolution operation. This careful selection serves the purpose of controlling
the number of MRR banks employed. As a consequence, only N kernel MRRs are necessary for weight-
ing, thereby requiring just N MRR banks. By employing this selective approach, PCNNA achieves sig-
nificant savings in terms of wavelength representations and MRRs needed for demultiplexing incoming
wavelengths in the subsequent layer. Figure 7 provides an overview of this selective use of MRR weight
banks for kernel convolutions.
As part of PCNNA’s concept of reduced parameter count, sparsity considerations became the guiding
principle for photonic accelerator development. Therefore, [119] proposed Albireo, which essentially fo-
cuses on introducing sparsity convolution based on WDM. Photonic Locally Connected Units (PLCUs)
are the main component of the design, for which a series of photonic computation schemes are presented
to leverage multicast data distribution and shared parameters inherent to CNNs. A WDM dot product
processing is also used in these proposed PLCUs and through passively overlapping receptive fields using
WDM, Albireo leverages shared parameters to significantly increase computation parallelism.
To extend the analogy of WDM-based sparsity designs to even deeper CNNs beyond Albireo, DNNARA-
E was introduced by Peng et al. [120]. DNNARA-E is a hybrid optoelectronic computing architecture
and Residue Number System (RNS) accelerator. The authors proposed a novel approach by combining
residue adders and optical multipliers within a matrix-vector multiplication architecture. This innovative
method not only reduced optical critical paths but also minimized power consumption, making it highly
suitable for intricate and deeper DL networks like ResNet [121], as further discussed in Section 6. Facil-
itating sparsity in these deeper networks is achieved through high-level parallelism at the system level,
a capability inherent in WDM for high-speed operations. Consequently, RNS seamlessly transitions be-
tween electrical and optical modes, utilizing one-hot encoding.
With the RNS, a number can be represented as pairwise coprime moduli, and because residue arithmetic
is digit-irrelevant, high parallelism can be achieved. It is then possible to combine the results separately
during the residue operation, and ensemble them at the end so addition is represented by mappings in a
residue arithmetic system. Thus, every modulo digit has a single-bit output without repetition, enabling
computation-in-the-network using one-hot encoding photonic routing. This process ensures that convolu-
tion takes place using fewer parameters.
Nevertheless, even with sparsity designs aimed at deeper models, cross-talk between channels and within
10
4.2 Memristor-based Accelerators
Figure 7: A 16×16 input feature map with 5 kernels of 3×3, without filtering the input feature map and with the input
feature map filtered to only pass through the receptive field. As can be seen, taking advantage of narrow receptive fields
results in fewer MRRs. Reproduced, with permission [118], from Mehrabian et al. 2018 © IEEE.
channels still presented an issue for deeper networks. Therefore, [122] presented an MRR weight bank-
based accelerator that demonstrates parallel cascading in the WDM system. The aforementioned design
utilized nanoscale etching to embed neuron nodes into silicon substrates in order to implement this neu-
romorphic network physically. When the input optical signal is captured, the MRR weigh bank modu-
lates the output signal of the laser near its threshold. In addition, feedback is used to achieve non-linearity
in the system. Then, WDM is achieved by using MRRs at nodes, each with a specific wavelength of light,
and with a paired laser channel and a probing Source Meter (SM), each MRR is powered by an electrical
source, and its resonance is thermally tuned after calibration. An illustrative overview of this process is
showcased in Figure 8.
11
4.2 Memristor-based Accelerators
Figure 8: Schematic representation of how an on-chip MRR weight bank can be used in an experimental setup to perform
a photonic accelerator. The components include Distributed Feedback Lasers (DFBs), a Source Meter (SM), and a Bal-
anced Photo-detector (BPD). Reproduced without changes under terms of the CC-BY license [122]. 2023, Zhang et al.,
published by arxiv.
compared to FPGA-based Caffeine accelerator [124] and memristor crossbar-based ISAAC accelerator
[125], ConvLight demonstrated exceptional performance, boasting 250× and 28× higher CE, respectively.
This remarkable efficiency can be attributed to ConvLight’s fully analog nature, in contrast to the slower
DACs and ADCs used in other approaches [126]. These comparisons were based on training and infer-
ence tasks executed on four versions of the VGG [127] model applied to the MNIST dataset [128].
Nevertheless, due to ConvLight’s focus on inference with offline training, attention shifted toward accel-
erating the online training of deep neural networks. This challenge was addressed by Dang et al. [129],
who introduced a novel photonics-based backpropagation accelerator named BPLight. The aim was to
enhance performance for large-size deep learning online training. Additionally, the authors presented
a comprehensive design for a CNN that incorporates the BPLight accelerator. This CNN architecture
is tailored for end-to-end training and inference, utilizing a combination of photonics components and
memristors. With a configurable memristor-integrated photonic CNN accelerator design, the proposed
BPLight-CNN stands out as an analog and scalable silicon photonic-based backpropagation accelerator.
In the overall CNN design, BPLight introduced a reversible convolution approach for each layer. Lever-
aging energy-efficient Semiconductor Optical Amplifiers (SOAs), optical comparators, and a fully ana-
log feature extraction method, BPLight demonstrates superior computational and energy efficiency com-
pared to conventional GPU implementations. However, it’s essential to acknowledge that insertion losses
in photonic components could potentially impact accuracy, especially in deeper stages of deep learning.
Addressing these insertion losses represents a crucial opportunity to enhance the accelerator’s applicabil-
ity and scalability for more complex CNN models.
Following the introduction of the BPLight and ConvLight accelerators in the domain, the use of memris-
tors was limited to memristor-based crossbar array memory. This constraint aimed to enable the adop-
tion of a fully optical photonic accelerator for comprehensive online training and inference. This concept
materialized with the introduction of LiteCON [130], an all-photonic neuromorphic accelerator designed
for efficient training and inference of deep learning models. LiteCON employs silicon microdisk-based
convolution, memristor-based memory, and Dense WDM (DWDM) to achieve its functionality.
LiteCON comprises four key components: a feature extractor, a feature classifier, a backpropagation ac-
celerator, and a weight updater. These components enable complete analog end-to-end CNN training
and inference. In LiteCON, convolution layers are constructed using silicon microdisks, Rectified Linear
Unit (ReLU) [131] layers are utilized in feature extractors and classifiers, and pooling layers are incorpo-
rated in the construction of these extractors and classifiers. The Feedforward CNN accelerators consist
of both Feature Extractors (FEs) and Feature Classifiers (FCs). Backpropagation accelerators are con-
structed using microdisks, splitters, and multiplexers, while LiteCON’s weight update unit is comprised
of memristors. This analog configuration categorizes LiteCON as neuromorphic, mimicking the behavior
12
4.3 FPGA Photonic Accelerators
Figure 9: Schematic diagram of a Convolutional layer in ConvLight. Readapted, with permission [123], from Dang et al.
2017 © IEEE.
of a neural network, specifically a CNN. LiteCON operates entirely in the analog domain, incorporat-
ing silicon photonic microdisk-based convolution, memristive memory, high-speed photonic waveguides,
and analog amplifiers. The efficiency of LiteCON is enhanced by the combination of its compact foot-
print, low-power characteristics, and the ultrafast nature of silicon microdisks. For simplicity of explana-
tion, an input of 9 pixels can be assumed when describing the photonic convolution analogy in LiteCON.
Therefore, in the memristor crossbar, 9 pixels are stored as analog inputs as In11 , In12 , . . . In33 . For sim-
plification, another assumption is that there are 9 weights or a 3 × 3 filter. There exists w11 , w12 , . . . w33
for the filter matrix. Using a multiplexed waveguide, the weights are modulated onto 9 wavelength chan-
nels. By using a microdisc, each input pixel Inxy is converted into a weight-carrying channel. This is
done by modulating In11 into the channel carrying weight w11 . By doing this, the microdisk performs
amplitude modulation, or multiplication, so that the channel now contains w11 × In11 . Likewise, other
channels carry w12 × In12 , . . . w33 × In33 . As a result, a photodiode captures all of these photonic signals
in the sum total, i.e., w11 × In11 + w12 × In12 + · · · w33 × In33 . Briefly, this is how the accelerator works
using photonic convolution.
13
4.4 Scalability-focused Accelerators
Figure 10: Diagram of a homodyne optical neural network with a single layer. (a) A neural network composed of K layers,
where each layer consists of Matrix-vector multiplications (gray) and element-wise non-linearities (red). (b) Implementing
a single layer. The matrix multiplication process involves combining input and weight signals and performing balanced ho-
modyne detection (inset) between each pair of signal and weight. Reproduced without changes under terms of the CC-BY
license [134]. 2019, Hamerly et al., published by American Physial Society.
the core processing element block of OASys incorporates optics. This compact system design incorpo-
rates free-space bulk optics, SLMs, FPGAs, and relevant electronics. The optical setup consists of a Fourier-
based convolution and multiplication system that will complement the conventional FPGA processor by
leveraging highly parallel optical processing. Therefore, a self-contained system like this can be used for
in situ computations without having to communicate with a host.
14
4.4 Scalability-focused Accelerators
Figure 11: An overview of the SONIC architecture, with NCONV layer-specific VDUs and KFC layer-specific VDUs. Re-
produced, with permission [136], from Sunny et al. 2022 © IEEE.
scales as O(mk) + O(nk) + O(mn), while the number of MACs scales as O(mnk). A performance range
of ∼ 10 f J/M AC can therefore be feasible for moderately large problems in CNNs with (m, n, k ≥ 100)
at moderate IO energies (∼ J), displaying the scalable nature of such an accelerator design.
The photonic accelerator benchmarking results primarily focused on popular classification models de-
signed for scalable image classification. These models were compared against common benchmarks, in-
cluding the ImageNet dataset. To address scalability concerns in real-world deployment scenarios, [135]
introduced a universal optical vector convolutional accelerator capable of operating beyond 10 Tera-OPS
(TOPS - operations per second). Specifically tailored for facial image recognition, this design is based on
the simultaneous interleaving of temporal, wavelength, and spatial dimensions, made possible by an in-
tegrated microcomb source. This innovative approach is scalable and adaptable to much more intricate
networks, catering to demanding applications such as unmanned vehicles and real-time video recogni-
tion.
With optical convolutions, the accelerator in [135] is capable of processing and compressing large amounts
of data. For which interleaving wavelengths, temporal dimensions, and spatial dimensions, Kerr frequency
combs, or microcombs, can then be used. The convolution accelerator has the capability of acting as
both a CNN front-end with fully connected neurons and as a convolutional accelerator front-end using
the same hardware. As the accelerator scheme is a stand-alone, universal system, it can be used both
with electrical and optical interfaces. Consequently, it is capable of serving as a universal high band-
width data compression front end for neuromorphic hardware, either optical or electronic, resulting in
massive-data, ultrahigh bandwidth machine learning.
Another entry under this subcategory dubbed, SONIC [136], used photonic components to increase scal-
ability by including sparse NN layers. Hence, SONIC was designed to utilize optimal sparsity in order to
accelerate Sparse Neural Networks (SpNNs) in an energy-efficient and low-latency manner.
In the photonic domain, SONIC’s core computes multiplications and accumulations for fully-connect and
convolution layers using Vector-dot-product Units (VDUs). It also integrates various peripheral elec-
tronic modules to connect with the main memory, map image data to photonic VDUs, and interface with
the main memory. This involves applying non-linearities to the photonic core, accumulating partial sums,
and performing other post-processing operations. Different wavelengths are generated by Vertical Cavity
Surface Emitting Lasers (VCSELs) within VDUs using DAC arrays which convert buffered signals into
analog tuning signals. In photonic summation, analog signals are converted to digital values via ADC ar-
rays before being processed and buffered for further processing. Eventually, with such a modular, vector-
granularity-aware structure, SONIC is designed for high throughput and scalability. Furthermore, it is
designed to optimize the photonic accelerator’s operation based on Sparsity-aware data compression and
dataflow techniques. Figure 11 provides an overview of the SONIC accelerator architecture in summary.
Next, to further explore real-time deployment, [116] presented two novel Bayesian learning schemes to
help classify real handwritten digits in the MNIST dataset, namely regularized and fully Bayesian learn-
ing schemes. The algorithms target hand-written digits in the MNIST [128] dataset using 512 phase shifters.
15
4.5 All Optical Neural Network Architecture
Then, in conjunction with pre-characterization stages that provide passive offsets, the new schemes sig-
nificantly reduce the processing power required by the PIC without compromising classification accuracy.
Therefore, on top of reducing energy, the full Bayesian scheme provides information about phase shifter
sensitivity. As a result, the phase actuators may be partially deactivated and the driving system is sig-
nificantly simplified.
The phase tuning process in [116] is based on an offline training scheme that takes into account uncer-
tainty. The novel feature of this study is its Bayesian approach to photonic accelerators, where, instead
of defining optimum phase shifter values through training, a parametric Probability Distribution Func-
tion (PDF) is defined for each phase shifter, which is optimized by updating variational parameters at
every iteration. Aside from indicating the correct values for phase shifters, this Bayesian procedure also
quantifies their robustness to phase deviation. Using this data, novel algorithms can be developed for
adjusting and controlling photonic accelerators, which will increase their noise robustness and scalability.
16
4.7 Conventional NNs Single Task/Operation Accelerators
17
5.1 Architectural Improvements
demonstrations have measured ERs of single MRRs up to 25 dB. Hence, it is objectively more efficient
to design and develop solutions using both MZIs and MRRs together, as introduced by [142].
While process technology improvement can mitigate conversion overhead, it remains a fundamental limi-
tation affecting the speed and efficiency of designs in this field. Additionally, incorporating more optical
operations might be beneficial, but it can lead to losses in optical devices, thereby reducing SNR and bit
precision. Similarly, MZIs have limited bandwidth, causing latency in weight programming. In this sub-
section, the exploration focuses on photonic accelerators that utilize a combination of photonic compo-
nents and concepts to maximize efficiency during design [142].
By integrating optical delay lines into accelerator architecture, one notable entry into this design direc-
tion kept efficiency at its core, for which [143] showcased the IPCNN accelerator design. This IPCNN
design focuses on manifesting efficiency in the convolution layers using photonic convolution. As a result,
the electronic circuits that manipulate data prior to matrix multiplication are eliminated, resulting in re-
duced power consumption/latency and fewer E/O interfaces for less energy consumption.
In IPCNN, the electronic circuitry responsible for data patching and allocation is replaced by optical de-
lay lines functioning as data buffers. As a result, the data manipulation process becomes nearly power-
free and operates at the speed of light. IPCNN utilizes WDM to combine multiple delay lines into one,
addressing the footprint issue associated with individual delay lines and making chip fabrication more
practical. With WDM implementation, the number of optical delay lines is significantly reduced, en-
abling the feasibility of current integration technologies. The performance of the IPCNN system is fur-
ther evaluated considering practical fabrication challenges such as noise, imbalance, and insertion loss,
as well as criteria including prediction accuracy, maximal integration scale, computing speed, and energy
efficiency.
The performance of IPCNN is significantly influenced by two critical factors. The SNR is primarily de-
termined by photodetection noise, where lower noise levels translate to higher efficiency, larger hardware
scales, and reduced power consumption. To mitigate this noise, PDs and Trans-Impedance Amplifiers
(TIA) must be constrained in bandwidth. Specifically, if the modulation rate is fm , the PDs need to
have a bandwidth of fm /2. Additionally, insertion losses can occur due to laser inputs, modulators, and
delay lines. The use of advanced lithium niobate modulators can help minimize insertion losses. Em-
ploying low-loss delay lines and integral lithium niobate modulators can further reduce insertion loss,
enhancing overall system efficiency.
Efficiency in neural network accelerators has also been explored through the paradigm of cross-layer net-
work acceleration. This approach, as proposed in [144], utilizes silicon photonics as NN accelerators to
optimize the performance of cross-layer NNs. The cross-light architecture achieves this optimization by
integrated device-level engineering, allowing for greater tolerance to process variations, minimizing ther-
mal crosstalk through circuit tuning, and enhancing resolution, energy efficiency, and throughput through
architecture engineering.
To overcome the challenges mentioned above, a cross-layer design approach was employed to develop
CrossLight, as presented in [144]. This silicon photonic neural network accelerator addresses the issues
by adopting a cross-layer design paradigm, which optimizes hardware-software designs holistically, con-
sidering multiple layers in the stack. The authors introduced an enhanced photonic device design fabri-
cated to enhance resilience to manufacturing process variations. Additionally, the process incorporates a
high-speed device tuning circuit that supports large thermally-induced resonance shifts simultaneously.
CrossLight also improves wavelength reuse and implements matrix decomposition at the architecture
level, enhancing weight resolution, energy efficiency, and throughput achievable in this architecture.
Furthermore, [145] introduced TRON, the first silicon photonic hardware accelerator for Vision Trans-
formers (ViTs), designed to support the latest advancements in deep learning models, particularly transformer-
based networks. TRON utilizes non-coherent silicon photonics and presents a transformer accelerator ca-
pable of accelerating transformer neural networks.
The TRON architecture, functioning as a non-coherent photonic accelerator, serves as a fundamental
framework for inference in transformer models. At its core, the TRON accelerator comprises Feed-forward
(FF) and Multi-head Attention (MHA) units, allowing for the reuse of encoder and decoder resources.
18
5.2 Algorithmic Designs
Figure 12: Overview of the TRON accelerator architecture. Reproduced with permission [145], 2023, Afifi et al., published
by Association of Computing Machinery.
Integrated Electronic Control Units (ECUs) facilitate interaction with the main memory, buffer interme-
diate results, and map matrices to the photonic architecture. Additionally, software optimization tech-
niques can be applied to further reduce the memory footprint of the transformer, leading to enhanced
performance and efficiency. Figure 12 illustrates the TRON accelerator architecture.
19
5.2 Algorithmic Designs
Figure 13: Pixel NN Accelerator: (a) A simple STR configuration using bitwise multiplication and addition. (b) An
OMAC unit that performs multiplication optically while addition and shifting are performed electrically. (c) Optical ac-
cumulation with extended OMAC. Reproduced, with permission [148], from Shiflett et al. 2020 © IEEE.
the optical domain, which significantly reduces the use of energy. Additionally, instead of preloading the
filter weights into the MRRs, photonics can also be used to send the weight information to the OMACs
on a specific channel. The only active component is the MRR, so scaling up means driving the optical
signal with more intensity. Figure 13 provides an overview of the process flow in the proposed architec-
ture design.
As far as some accelerators that have been discussed this far, there exists a reliance on ripple-carry adders
and SRAMs, both of which are severely limiting the frequency and inference throughput of the accelera-
tor, primarily due to the adder’s long critical path and the SRAM’s long access latency.
To address the aforementioned problem, by processing binarized CNNs using photonic XNOR gates and
popcount units, [149] proposed a photonic non-volatile memory (NVM)-based accelerator, LightBulb.
Besides using photonic racetrack memory as input/output registers, LightBulb also uses photonic race-
track memory for power. With LightBulb’s photonic XNOR and popcount units, inferences of binarized
CNNs are processed electro-optically.
To replace floating-point MACs with XNORs and popcounts, LightBulb first binarizes the weights and
activations of a CNN into linear combinations of (−1, +1)s. The authors also propose XNOR gates based
on photonic microdisks and an XNOR-based ADC converter with PCM-based photonics for LightBulb.
A photonic XNOR gate as well as an ADC can operate at 50 GHz. Then, LightBulb is further equipped
with a photonic racetrack memory which serves as input and output registers to support its high-frequency
operation. This study implements, evaluates, and compares Lightbulb against state-of-the-art GPU, FPGA,
ASIC, ReRAM, and photonic CNN accelerators, thereby showing its efficiency.
Synonymous to Lightbulb, with a concentration on DNN acceleration efficiency and latency, the design
by [150] combined WDM with a Residue Number System (RNS) to present a hybrid opto-electronic com-
puting architecture for accelerating DNNs. In this case, RNS is combined with WDM to focus on achiev-
ing efficiency. By reducing the area of the accelerator, WDM enables a high level of parallelism while re-
ducing the number of optical components. RNS can also produce optical components with short opti-
cal critical paths. The advantage of this is that it reduces optical losses as well as the need for high laser
power. A key feature of RNS compute modules is their one-hot encoding, which facilitates fast switching
between the electrical and optical domains.
As a result of its high parallelism, RNS is perfect for CNNs and DNNs, particularly those using MACs.
Because of this, the authors try to avoid converting binary numbers into residue numbers. The imple-
mentation of activation functions like sigmoid and hypertangent functions with RNS is difficult. As a re-
sult, σ and tanh would be treated as Taylor series, and they can be implemented as polynomials with
20
5.2 Algorithmic Designs
21
Architecture Total Network Parameters (Million) FLOPS (Billion)
AlexNet [158] 62.38 1.50
VGG-16 138.42 19.60
Inception-v3 [159] 24.00 6.00
ResNet-152 60.30 11.00
MobileNet-v2 [160] 6.90 1.17
ShuffleNet-v1 (1x) [161] 1.87 0.14
DenseNet-121 [162] 7.98 5.69
EfficientNet-B1 [163] 7.80 0.70
Vision Transformer (Base/16) [164] 86.60 17.60
Table 1: Total number of parameters and FLOPS in the most commonly used classification deep learning models in ma-
chine learning literature.
6.1 Scalability
Inherently, photonic accelerators must be able to support the weight capacity or parameters of DL mod-
els widely used in order to replace electronic accelerators. As shown in Table 1, ResNet-152, which is
a popular and widely used DL classification architecture by Microsoft, has already surpassed 60 mil-
lion parameters. Therefore, one paradigm of the direct and effective solution is to manufacture larger-
scale photonic accelerators. Consequently, as presented in Section 4, partially or fully optical accelera-
tors promise massive parallelism by employing WDM which has been adopted by a majority of photonic
accelerators in literature and possibly Mode division Multiplexing (MDM) which is yet to be realized or
investigated in this domain [157]. Ultimately, the more scalable and efficient a photonic accelerator ar-
chitecture is, the more capable it is to support more sophisticated DL neural networks.
For instance, a large-scale silicon photonic MRR array requires a seamless control technique, which can
be achieved using integrated photoconductive heaters without the need for additional components, so-
22
6.2 Future Perspectives
phisticated tuning algorithms, or additional electrical interfaces. Integrated with silicon photonics, lithium
niobate and barium titanate electrooptical modulators provide high-speed phase modulation and low op-
erating voltage, making them extremely attractive for photonic accelerators. Taking into account the
high thermal tuning costs associated with phase shifters in MZI mesh schemes, scaling to a larger ma-
trix can be problematic, and as a result of the accumulation of thermal energy for thousands of phase
shifter units, photonic accelerators will be less competitive [157]. In this review of photonic accelerator
methodologies, the analysis was conducted in terms of positives and negatives to address efficiency and
scalability concerns. These aspects emerge as key areas requiring further research.
Despite the considerable advantages that photonic DL accelerators offer over their electronic counter-
parts, several challenges persist. One such challenge is the implementation of caching memory subsys-
tems for neural networks, which becomes intricate when scaling up to handle real workloads generat-
ing substantial intermediate data. To execute large-scale neural networks, electronic memories such as
SRAM and DRAM can be integrated with optical video memory modules. The development of cross-
layer co-design tools and the establishment of simulation methodologies are also crucial. The design space
exploration of nanophotonic accelerators involves detailed power/performance modeling. Lastly, defining
hardware-software interfaces for nanophotonic accelerations and considering optical-electrical heteroge-
neous computing models are important [165].
It is possible to discretize equations with constant coefficients with Toeplitz matrices. Therefore, based
on the fact that Toeplitz matrices have only 2n − 1 parameters, it is reasonable to expect that linear sys-
tems T x = b, synonymous to NNs, can be solved in fewer flops than Lower Upper (LU) factorization
would require. In fact, there are methods that require only O(n2 ) flops [166]. As can be seen in Figure
14, Toeplitz matrices exhibit a constant diagonal structure, making them suitable for scenarios requiring
shift invariance, as is often the case in classification deep learning neural networks [167].
Hence, the implementations of DL utilizing photonic accelerators inherently come with limitations that
restrict the complexity of DL models that can be effectively supported. A fundamental requirement for
such demanding computations is unitary weight matrices. Therefore, machine learning mechanisms need
to transform weight matrices to their nearest unitary ones. Additionally, DL models that perform con-
volutions on the data need to be handled differently in order to fit within the parameters of the pho-
tonic system. It has also been shown in this study that it is possible to convert non-unitary matrices to
unitary ones, as well as to apply linear algebra techniques to the transformation of Convolutional Neu-
ral Networks (CNNs) into models capable of feed-forward learning using Toeplitz matrix operations for
convolutions [168]. Experimental results in [168] proved that post-training or iterative techniques to find
the nearest unitary weight matrix can be applied for photonic chips with the minimum loss in accuracy,
while CNNs adapted well in a photonic configuration employing a Toeplitz matrix implementation. This
proposed approach addresses the limitation of DL models for deployment in photonic FPGAs and there-
fore possibilities for fully optical accelerators.
An excellent example of a CNN is that it illustrates a straightforward but significant observation regard-
ing computer image recognition problems: since detecting an object is primarily a separate task from
detecting where it appears in the image, it should be equivariant with translation. In a CNN, the lin-
ear transformation is not completely connected, but if the unitary weight matrices in a photonic com-
ponent acting as a placeholder or pixel values such as MRR or MZI has its implementation transformed
23
Figure 14: Toeplitz Matrix (P )
to a matrix multiplication and then non-linear activation, convolution can be achieved, despite the pho-
tonic restrictions in terms of unitary weight matrices. By doing this, instead of using for-loops to do 2D
convolutions on images, the filter is converted into a Toeplitz matrix and the image to a vector, followed
by one matrix multiplication to complete the convolution. As such, a single matrix multiplication can
be effectively replicated using photonic components in this case. The method is based on forcing the
Toeplitz matrix to be a square matrix. Since unitary matrices are square by definition, this step is nec-
essary [168].
In contrast to fully-connected deep neural networks, the Toeplitz nature of N × N matrix P allows only
the encoding of N degrees of freedom. Toeplitz implementations, however, allow photonic FPGAs to han-
dle CNNs with the least possible complexity, without utilizing backpropagation algorithms. With this
introduction of Toeplitz implementations for Photonic FPGAs and possibly fully photonic accelerators,
larger networks can be implemented more efficiently. In which the majority of work on photonic acceler-
ators in the literature reviewed and adopted within the scope of this paper, exists for results on narrower
networks due to complexity and size restrictions. Due to turnarounds’ enabling deep and effective DL
implementations, training large models in the photonic domain such as ConvNeXt [169], deep ViTs, etc.,
operating fully optical DL learning accelerators for larger and deeper CNN models will be made possible
[168].
7 Conclusion
Matrix-Vector Multiplication (MVM) operations utilized in CNNs through convolutions have been proven
to be accelerated either partially or in their entirety by photonics. This allowed for remarkable speed
and significantly lower energy consumption compared to their electronic counterparts, showing promise
for their acceleration in more complex AI applications. The high compactness and integration density of
on-chip integrated photonic circuit platforms make them excellent frameworks for artificial neural net-
works. In contrast, electronic components, due to their complexity, necessitate a large number of transis-
tors and an additional scheduler program to perform simple operations. MVM operations, on the other
hand, can be effortlessly implemented using fundamental photonic components like MRRs, MZIs, and
diffractive planes.
Our study therefore explored the use of photonic deep learning accelerators in neuromorphic systems
with the aim of reducing power consumption and enhancing processing speed in the future. The pursuit
of improved metrics, such as J/MAC or MAC/s, has driven a significant segment of the photonic com-
munity to meticulously replicate algorithmic neural networks in photonic hardware. Overall, A signifi-
cant and consistent drawback highlighted in our paper is scalability and as evident in this study, Wave-
length Division Multiplexing (WDM) is often employed for multiplexing in accelerator designs to address
this issue. However, despite the significant added value of WDM, it is demands sophisticated hardware,
leading to increased costs and a larger physical footprint. Ultimately, this calls for a shift towards even
more efficient design approaches as discussed in the paper, driven by the already promising entries in
this emerging domain.
Acknowledgements
24
REFERENCES
This work was made possible with the support of the NYUAD Research Enhancement Fund. The au-
thors express their gratitude to the NYU Abu Dhabi Center for Smart Engineering Materials and The
Center for Cyber Security for their valuable contributions and support.
References
[1] J. Schmidhuber, Neural networks 2015, 61 85.
[2] X.-Y. Xu, X.-M. Jin, ACS Photonics 2023, 10, 4 1027.
[3] J. Von Neumann, IEEE Annals of the History of Computing 1993, 15, 4 27.
[4] M. D. Hill, G. S. Sohi, Readings in computer architecture, Gulf Professional Publishing, 2000.
[5] J. Song, Y. Cho, J.-S. Park, J.-W. Jang, S. Lee, J.-H. Song, J.-G. Lee, I. Kang, In 2019 IEEE In-
ternational Solid-State Circuits Conference-(ISSCC). IEEE, 2019 130–132.
[6] A. C. Elster, T. A. Haugdahl, Computing in Science & Engineering 2022, 24, 2 95.
[7] J. Koomey, S. Berard, M. Sanchez, H. Wong, IEEE Annals of the History of Computing 2010, 33,
3 46.
[8] M. Ahmadi, H. Bolhasani, Photonic neural networks: A compact review, 2023.
[9] J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu,
A. Lukashchuk, A. S. Raja, et al., Nature 2021, 589, 7840 52.
[10] M. S. Rasras, D. M. Gill, M. P. Earnshaw, C. R. Doerr, J. S. Weiner, C. A. Bolle, Y.-K. Chen,
IEEE Photonics Technology Letters 2009, 22, 2 112.
[11] A. Krizhevsky, I. Sutskever, G. E. Hinton, Advances in neural information processing systems
2012, 25.
[12] P. Colangelo, O. Segal, A. Speicher, M. Margala, In 2019 IEEE High Performance Extreme Com-
puting Conference (HPEC). IEEE, 2019 1–8.
[13] D. Chiou, In 2017 IEEE International Symposium on Workload Characterization (IISWC). IEEE
Computer Society, 2017 124–124.
[14] C. Rı́os, N. Youngblood, Z. Cheng, M. Le Gallo, W. H. Pernice, C. D. Wright, A. Sebastian,
H. Bhaskaran, Science advances 2019, 5, 2 eaau5759.
[15] S. Xiao, M. H. Khan, H. Shen, M. Qi, Optics Express 2007, 15, 12 7489.
[16] S. Cheung, T. Su, K. Okamoto, S. Yoo, IEEE Journal of Selected Topics in Quantum Electronics
2013, 20, 4 310.
[17] V. J. Sorger, N. D. Lanzillotti-Kimura, R.-M. Ma, X. Zhang, Nanophotonics 2012, 1, 1 17.
[18] E. Timurdogan, C. M. Sorace-Agaskar, J. Sun, E. Shah Hosseini, A. Biberman, M. R. Watts, Na-
ture communications 2014, 5, 1 1.
[19] H. Sepehrian, J. Lin, L. A. Rusch, W. Shi, Journal of Lightwave Technology 2019, 37, 13 3078.
[20] J. Rosenberg, W. Green, S. Assefa, D. Gill, T. Barwicz, M. Yang, S. Shank, Y. Vlasov, Optics ex-
press 2012, 20, 24 26411.
[21] Y. Ban, J. Verbist, M. Vanhoecke, J. Bauwelinck, P. Verheyen, S. Lardenois, M. Pantouvaki,
J. Van Campenhout, In 2019 IEEE Optical Interconnects Conference (OI). IEEE, 2019 1–2.
25
REFERENCES
[22] M. A. Nahmias, T. F. De Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, P. R. Prucnal, IEEE Jour-
nal of Selected Topics in Quantum Electronics 2019, 26, 1 1.
[23] M. Miscuglio, G. C. Adam, D. Kuzum, V. J. Sorger, APL Materials 2019, 7, 10.
[24] X. Ma, J. Meng, N. Peserico, M. Miscuglio, Y. Zhang, J. Hu, V. J. Sorger, In Optical Fiber Com-
munication Conference. Optica Publishing Group, 2022 M2E–4.
[25] N. Peserico, X. Ma, B. J. Shastri, V. J. Sorger, Emerging Topics in Artificial Intelligence (ETAI)
2022 2022, 12204 53.
[26] C. Wu, S. Lee, H. Yu, R. Peng, I. Takeuchi, M. Li, In 2020 IEEE Photonics Conference (IPC).
IEEE, 2020 1–2.
[27] C. Wu, H. Yu, S. Lee, R. Peng, I. Takeuchi, M. Li, Nature communications 2021, 12, 1 96.
[28] B. Javidi, J. Li, Q. Tang, Applied optics 1995, 34, 20 3950.
[29] B. Javidi, Optical Engineering 1990, 29, 9 1013.
[30] M. Reck, A. Zeilinger, H. J. Bernstein, P. Bertani, Physical review letters 1994, 73, 1 58.
[31] D. A. Miller, Journal of Lightwave Technology 2013, 31, 24 3987.
[32] D. A. Miller, Optics express 2013, 21, 5 6360.
[33] D. Miller, In APS March Meeting Abstracts, volume 2015. 2015 S6–001.
[34] G. Cong, N. Yamamoto, T. Inoue, M. Okano, Y. Maegami, M. Ohno, K. Yamada, Optics Express
2019, 27, 18 24914.
[35] J. Wang, S. Gu, In 2021 11th International Conference on Information Science and Technology
(ICIST). IEEE, 2021 571–577.
[36] M. T. Ajili, Y. Hara-Azumi, IEEE Access 2022, 10 9603.
[37] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., In Pro-
ceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays.
2016 26–35.
[38] D. A. Miller, Optics express 2013, 21, 17 20220.
[39] J. Hardy, J. Shamir, Optics Express 2007, 15, 1 150.
[40] O. Schwelb, Journal of Lightwave Technology 2004, 22, 5 1380.
[41] Q. Xu, J. Shakya, M. Lipson, Optics express 2006, 14, 14 6463.
[42] L. Zhang, R. Ji, L. Jia, L. Yang, P. Zhou, Y. Tian, P. Chen, Y. Lu, Z. Jiang, Y. Liu, et al., Optics
letters 2010, 35, 10 1620.
[43] Q. Xu, D. Fattal, R. G. Beausoleil, Optics express 2008, 16, 6 4309.
[44] S. R. Tamalampudi, G. Dushaq, J. E. Villegas, N. S. Rajput, B. Paredes, E. Elamurugu, M. S.
Rasras, Optics Express 2021, 29, 24 39395.
[45] G. Dushaq, B. Paredes, J. E. Villegas, S. R. Tamalampudi, M. Rasras, Optics Express 2022, 30,
10 15986.
[46] S. R. Tamalampudi, G. Dushaq, J. E. Villegas, B. Paredes, M. S. Rasras, Journal of Lightwave
Technology 2023.
26
REFERENCES
27
REFERENCES
28
REFERENCES
[94] J. Chang, V. Sitzmann, X. Dun, W. Heidrich, G. Wetzstein, Scientific reports 2018, 8, 1 12324.
[95] Z. Ying, C. Feng, Z. Zhao, S. Dhar, H. Dalir, J. Gu, Y. Cheng, R. Soref, D. Z. Pan, R. T. Chen,
Nature communications 2020, 11, 1 2154.
[96] Z. Ying, C. Feng, Z. Zhao, J. Gu, R. Soref, D. Z. Pan, R. T. Chen, IEEE Photonics Journal 2020,
12, 6 1.
[97] Q. Xu, R. Soref, Optics Express 2011, 19, 6 5244.
[98] Y. T. J. e. a. Li, M.; Deng, Sci Rep 6 2016, 19985.
[99] S. N. G. B. Nikita, Analog photonics computing for information processing, inference and optimi-
sation, 2023.
[100] M. Li, Y. Deng, J. Tang, S. Sun, J. Yao, J. Azaña, N. hua Zhu, Scientific Reports 2016, 6.
[101] M. Nakajima, K. Tanaka, T. Hashimoto, Communications Physics 2021, 4, 1 20.
[102] T. K. Paraiso, T. Roger, D. G. Marangon, I. De Marco, M. Sanzaro, R. I. Woodward, J. F. Dynes,
Z. Yuan, A. J. Shields, Nature Photonics 2021, 15, 11 850.
[103] E. Pelucchi, G. Fagas, I. Aharonovich, D. Englund, E. Figueroa, Q. Gong, H. Hannes, J. Liu, C.-Y.
Lu, N. Matsuda, et al., Nature Reviews Physics 2022, 4, 3 194.
[104] P. Sibson, J. E. Kennard, S. Stanisic, C. Erven, J. L. O’Brien, M. G. Thompson, Optica 2017, 4, 2
172.
[105] S. Buck, R. Coleman, H. Sargsyan, arXiv preprint arXiv:2107.02151 2021.
[106] D. Bunandar, A. Lentine, C. Lee, H. Cai, C. M. Long, N. Boynton, N. Martinez, C. DeRose,
C. Chen, M. Grein, et al., Physical Review X 2018, 8, 2 021009.
[107] Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. Schrauwen, M. Haelterman, S. Massar, Scientific
reports 2012, 2, 1 287.
[108] K. Vandoorne, J. Dambre, D. Verstraeten, B. Schrauwen, P. Bienstman, IEEE transactions on
neural networks 2011, 22, 9 1469.
[109] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutiérrez, L. Pesquera, C. R. Mirasso,
I. Fischer, Optics express 2012, 20, 3 3241.
[110] G. Paulin, R. Andri, F. Conti, L. Benini, IEEE Transactions on Very Large Scale Integration
(VLSI) Systems 2021, 29, 9 1624.
[111] W. P. K. S. O. B. V. Z. AW, Elshaari;, Nat Photonics 2020, 14, 5.
[112] H. Ishio, J. Minowa, K. Nosu, Journal of Lightwave Technology 1984, 2, 4 448.
[113] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L. Jack-
son, N. Imam, C. Guo, Y. Nakamura, et al., Science 2014, 345, 6197 668.
[114] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi,
N. Imam, S. Jain, et al., Ieee Micro 2018, 38, 1 82.
[115] T. Wang, C. Wang, X. Zhou, H. Chen, arXiv preprint arXiv:1901.04988 2018.
[116] G. Sarantoglou, A. Bogris, C. Mesaritakis, S. Theodoridis, IEEE Journal of Selected Topics in
Quantum Electronics 2022, 28, 6: High Density Integr. Multipurpose Photon. Circ. 1.
[117] M. A. Nahmias, T. F. de Lima, A. N. Tait, H.-T. Peng, B. J. Shastri, P. R. Prucnal, IEEE Journal
of Selected Topics in Quantum Electronics 2020, 26, 1 1.
29
REFERENCES
30
REFERENCES
[140] L. D. Marinis, F. Nesti, M. Cococcioni, N. Andriolli, In OSA Advanced Photonics Congress (AP)
2020 (IPR, NP, NOMA, Networks, PVLED, PSC, SPPCom, SOF). Optica Publishing Group,
2020 PsTh1F.3, URL https://opg.optica.org/abstract.cfm?URI=PSC-2020-PsTh1F.3.
[141] F. Ashtiani, M. B. On, D. Sanchez-Jacome, D. Perez-Lopez, S. J. Ben Yoo, A. Blanco-Redondo, In
2023 Optical Fiber Communications Conference and Exhibition (OFC). 2023 1–3.
[142] C. Demirkiran, F. Eris, G. Wang, J. Elmhurst, N. Moore, N. C. Harris, A. Basumallik, V. J.
Reddi, A. Joshi, D. Bunandar, arXiv preprint arXiv:2109.01126 2021.
[143] S. Xu, J. Wang, W. Zou, arXiv preprint ArXiv:1910.12635 2019.
[144] F. Sunny, A. Mirza, M. Nikdast, S. Pasricha, In 2021 58th ACM/IEEE Design Automation Con-
ference (DAC). 2021 1069–1074.
[145] S. Afifi, F. Sunny, M. Nikdast, S. Pasricha, In Proceedings of the Great Lakes Symposium on VLSI
2023. 2023 15–21.
[146] Q. Lou, W. Liu, W. Liu, F. Guo, L. Jiang, In 2020 25th Asia and South Pacific Design Automa-
tion Conference (ASP-DAC). 2020 464–469.
[147] S. Hochreiter, J. Schmidhuber, Advances in neural information processing systems 1996, 9.
[148] K. Shiflett, D. Wright, A. Karanth, A. Louri, In 2020 IEEE International Symposium on High Per-
formance Computer Architecture (HPCA). IEEE, 2020 474–487.
[149] F. Zokaee, Q. Lou, N. Youngblood, W. Liu, Y. Xie, L. Jiang, In 2020 Design, Automation & Test
in Europe Conference & Exhibition (DATE). IEEE, 2020 1438–1443.
[150] J. Peng, Y. Alkabani, S. Sun, V. J. Sorger, T. El-Ghazawi, In Proceedings of the 49th International
Conference on Parallel Processing. 2020 1–11.
[151] F. P. Sunny, A. Mirza, M. Nikdast, S. Pasricha, ACM Trans. Embed. Comput. Syst. 2021, 20, 5s.
[152] F. Sunny, M. Nikdast, S. Pasricha, In 2022 IEEE Computer Society Annual Symposium on VLSI
(ISVLSI). IEEE, 2022 98–103.
[153] Y. Li, K. Wang, H. Zheng, A. Louri, A. Karanth, IEEE Transactions on Circuits and Systems I:
Regular Papers 2022, 69, 7 2730.
[154] F. Sunny, M. Nikdast, S. Pasricha, In Proceedings of the Great Lakes Symposium on VLSI 2022.
2022 367–371.
[155] M. Al-Qadasi, L. Chrostowski, B. Shastri, S. Shekhar, APL Photonics 2022, 7, 2.
[156] S. Zheng, J. Zhang, W. Zhang, arXiv preprint arXiv:2303.01287 2023.
[157] H. Zhou, J. Dong, J. Cheng, W. Dong, C. Huang, Y. Shen, Q. Zhang, M. Gu, C. Qian, H. Chen,
et al., Light: Science & Applications 2022, 11, 1 30.
[158] A. Krizhevsky, I. Sutskever, G. E. Hinton, In F. Pereira, C. Burges,
L. Bottou, K. Weinberger, editors, Advances in Neural Information
Processing Systems, volume 25. Curran Associates, Inc., 2012 URL
https://proceedings.neurips.cc/paperf iles/paper/2012/f ile/c399862d3b9d6b76c8436e924a68c45b−
P aper.pdf.
[159] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for
computer vision, 2015.
[160] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam,
Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017.
31
REFERENCES
[161] X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: An extremely efficient convolutional neural network
for mobile devices, 2017.
[162] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, In Proceedings of the IEEE conference
on computer vision and pattern recognition. 2017 4700–4708.
[163] M. Tan, Q. Le, In International conference on machine learning. PMLR, 2019 6105–6114.
[164] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, CoRR 2020, abs/2010.11929.
[165] K.-i. Kitayama, M. Notomi, M. Naruse, K. Inoue, S. Kawakami, A. Uchida, Apl Photonics 2019,
4, 9.
[166] G. H. Golub, C. F. Van Loan, Johns Hopkins 2013.
[167] S. Kokila, A. Jayachandran, Remote Sensing Applications: Society and Environment 2023, 29
100881.
[168] G. Agrafiotis, E. Makri, I. Kalamaras, A. Lalas, K. Votis, D. Tzovaras, In Proceedings of the
Northern Lights Deep Learning Workshop, volume 4. 2023 .
[169] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, 2022.
32
REFERENCES
Mohammad Atwany, currently a PhD student at the IBME branch of Engineering Science at the University of Oxford.
Mohammed completed his BSc (Hons) and Master’s degrees in Machine Learning in December 2022. He then joined NYU
as a research engineer in the Department of Electrical Engineering. He then officially joined the MultiMeDIA lab at the
University of Oxford in October 2023. He is motivated by a broad range of research interests in Photonic Integrated
Circuit (PIC) design, photonics in biomedical engineering, and machine learning including, Domain Generalization and
Domain Adaptation in interventional medicine.
Solomon M. Serunjogi received the B.Sc. degree in telecommunications from Kyambogo University, Kyambogo, Uganda,
and the M.Sc. and Ph.D. degrees from the Masdar Institute of Science and Technology, Abu Dhabi, UAE from the Depart-
ment of Electrical Engineering and Computer Science. He is currently a research associate in the Photonics lab at NYU
Abu Dhabi. His research interests include the area of microwave photonics, signal processing, and the design of CMOS
driver circuits for high-speed telecom applications.
Mahmoud Rasras, an Associate Professor of Electrical Engineering at New York University Abu Dhabi (NYUAD), earned
his Ph.D. in physics from the Catholic University of Leuven, Belgium, with research conducted at imec. With over 11
years of industrial research at Bell Labs, Alcatel-Lucent, NJ, USA, he joined NYUAD after serving as a faculty and for-
mer Director of the SRC/Globalfoundries Center of Excellence for Integrated Photonics at Masdar Institute, UAE. Dr.
Rasras has authored 180 papers in high-impact journals and holds many granted US patents. He specializes in 2D mate-
rials optoelectronics, silicon photonics, plasmonic-enhanced optoelectronics, and microwave photonics. Dr. Rasras served
as an Associate Editor of Optics Express and he is a senior IEEE member and a member of the Mohammed Bin Rashid
Academy of Scientists.
33