0% found this document useful (0 votes)

15 views

Asynchronous Wrapper Based Low Power GALS Structural QDMA

The document discusses an asynchronous wrapper-based low-power globally asynchronous locally synchronous (GALS) structural queue direct memory access (QDMA). It proposes using asynchronous wrappers with locally synchronous modules that communicate through asynchronous finite state machines defined by signal transition graphs. This GALS architecture implemented for point-to-point interfaces can be modified for multipoint interfaces. The QDMA subsystem is implemented using a simulator and evaluated based on latency, power, and gate count, showing improvements over previous methods.

Uploaded by

VINAY B K ECE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Asynchronous Wrapper Based Low Power GALS Structural QDMA

Uploaded by

VINAY B K ECE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

IETE Journal of Research

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tijr20

Asynchronous Wrapper-Based Low-Power GALS

Structural QDMA

B.K. Vinay, S. Pushpa Mala & S. Deekshitha

To cite this article: B.K. Vinay, S. Pushpa Mala & S. Deekshitha (2022): Asynchronous
Wrapper-Based Low-Power GALS Structural QDMA, IETE Journal of Research, DOI:
10.1080/03772063.2021.2021814

To link to this article: https://doi.org/10.1080/03772063.2021.2021814

Published online: 24 Jan 2022.

Submit your article to this journal

Article views: 104

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tijr20
IETE JOURNAL OF RESEARCH
https://doi.org/10.1080/03772063.2021.2021814

Asynchronous Wrapper-Based Low-Power GALS Structural QDMA

B.K. Vinay 1,2 , S. Pushpa Mala2 and S. Deekshitha3

1 Department of Electronics and Communication Engineering, Vidyavardhaka College of Engineering, Mysuru 570002, India; 2 Department of
Electronics and Communication, Dayananda Sagar University, Bangalore, Karnataka 560114, India; 3 CMR Institute of Technology, Bangalore,
Karnataka 560037, India

ABSTRACT KEYWORDS
The design of System-on-Chip systems using synchronous circuits involves complex clock distribu- Asynchronous wrapper;
tion strategies, which envisage challenges for designers to integrate large-scale systems. Globally Globally asynchronous
Asynchronous Locally Synchronous architectures containing asynchronous port controllers encap- locally synchronous (GALS);
sulated in the self-timed wrapper have been adopted in this work. These port controllers com- Handshake protocol; Muller C
element; Port controller;
municate through Asynchronous Finite State Machines defined by Signal Transition Graphs are Signal transition graphs
implemented adopting the C element. This GALS architecture implemented for the point-to-point (STG); Synthesis logic
interface can also be modified for the multipoint interface. The proposed methodology uses a two-
phase handshake protocol to communicate between two Locally Synchronous modules as it has
fewer signal transitions, which, in turn, reduces latency. In this paper, the Queue Direct Memory
Access subsystem is implemented using the Vivado simulator on UltraScale+TM device at a maxi-
mum frequency of 257.4MHz, and various parameters are reported. A comparison shows that the
proposed wrapper has improved latency time of 53%, with a reduction in power dissipated by 27%
and an increase in gate count by 13%.

1. INTRODUCTION
a method known as clock stretching. Clock stretching
A System-on-Chips (SoC) integrates individual IPs with eliminates metastability. The performance gain is less due
a specific functionality onto a single platform. Syn- to increased modular multiplications during the execu-
chronous circuits require optimization of clock distri- tion phase [4]. A novel architecture for the asynchronous
bution networks to attain reduced latency, which is a GALS wrapper has been proposed for port controllers
complex process [1]. Consequently, a methodology for to communicate with the asynchronous wrapper using
implementing asynchronous designs has to be developed. direct mapping style [5]. With an increase in the input
Globally Asynchronous Locally Synchronous (GALS) and output ports for data communication, the wrapper
architectures encompass asynchronous wrappers with befits complexity and the area increases. The data trans-
locally synchronous (LS) modules. These LS modules fer in SoC with GALS systems uses multi-point interfaces
communicate with each other using handshake proto- reducing latency and area [6]. Asynchronous wrapper
cols, which are technically asynchronous [2]. An asyn- (AW) (AW) realizes fault tolerance for autonomous mod-
chronous wrapper espouses three handshake processes. ules and delay insensitive (DI) designs, which does not
The first handshake is at the input port for receiving data, require isochronic fork conditions to be met [7].
the second handshake between the input port and the
output port for generating the clock and the third hand- Various design methodologies for GALS architecture
shake at the output port for transferring the data. This include plausible clocks, asynchronous and locally syn-
wrapper has an average power consumption of 1mW for chronous modules. Plausible clocks avoid metastability
a stream of data with 50 MHz [3]. by delaying the sampling of the clock until the arrival of
data. In asynchronous interface design styles, the signal
Stretchable Clock Asynchronous Flexible FPGA Inter- received from the outer clock domain is transferred to the
faces (SCAFFI) interconnect the LS modules to Field Pro- local clock domain by synchronizers [8]. LS design styles
grammable Gate Array (FPGA) for GALS architectures. analyze time bounds, overcoming the need for hand-
These architectures use arbiters to pause the LS mod- shaking for data transfer [9]. Signal Transition Graphs
ules’ clock before the data are transferred. At a later stage, (STG) represent the flow of positive and negative edges of
the clock is restarted once the data achieve a stable state, the signals. In the proposed wrapper, a modified STG is

adopted to reduce the communication time between two developing individual LS modules and they are inte-
LS modules. A latch is added between two LS modules grated via port controllers with asynchronous logic devel-
to store data for efficient communication [10]. Further- oped using CAD tools. Furthermore, two-phase and
more, a gated clock-based interface for GALS has been four-phase handshake protocols are implemented by
suggested wherein the external clock is gated to drive port controllers to initiate asynchronous communica-
the local clock of the LS modules based on the request tion between the sender and receiver LS modules. In
from port controllers [11]. The GALS interface uses First this proposed methodology, a two-phase handshake pro-
in First out (FIFO) buffers operating in asynchronous tocol is adopted since it has a fewer transitions and
mode for data transfer between mixed clock-based LS reduced latency compared to a four-phase handshake
modules [12]. The latency involved in synchronization protocol. The communication between two LS mod-
between two LS modules is reduced using high band- ules can be point-to-point or point-to-multipoint. The
width communication called STARI-based GALS inter- AW encapsulates port controllers besides the LS mod-
face deploying single-stage FIFO at receiver with the ules. The port controllers modeled by AFSMs for pro-
advantage of the stability of the clock [13]. Oliveira et viding asynchronous communication between LS mod-
al. [7] proposed a single-port controller for managing ules are made hazard-free by implementing the same
data communication in multipoint and point-to-point using STG. The logical equations are mapped by the
GALS for reduced area consumption. Stretchable clocks STG into standard library cells using a 3D tool, and
are realized to control the clock generator [14]. Asyn- finally the gate-level netlist is generated. Point-to-point
chronous elements, such as join and fork, could be used, communication between AW involves a single incom-
like “join” various data signals and send to GALS module ing and a single outgoing signal. The AW wrappers
and “fork” being used to send data to various sinks [3]. can be generalized to multi-point GALS with multiple
incoming and outgoing signals, which cannot be acti-
Applications, involving SoC with multiple IPs integrated vated concurrently as the arbiters are not used. Although
on a single chip, are quite challenging to design due to point-to-multipoint GALS wrappers consume area on
advancements in the scale of integration. The complex- the chip compared to point-to-point GALS wrapper,
ity of an SoC circuit design escalates due to a constant they eliminate redundancy to a greater extent. They
reduction in feature size instigated by scaling. The design coordinate in sending and receiving data by activat-
of an SoC circuit plays a vital role in increasing the per- ing LS modules accordingly through stretchable clocks.
formance of the system. The synchronous strategies for These stretchable clocks are chosen over plausible clocks
designing an SoC adopt a master clock for the synchro- to design the wrapper to handle reduced performance
nization of various data signals across the chip. These issues.
synchronous design strategies contribute to various chal-
lenges, such as the clock skew and high dynamic power
2. GALS INTERFACE: AN OVERVIEW
consumption at high frequencies. This encompasses the
need for complex timing analysis to be performed by GALS modules adopt LS modules with their own clock
Considered Capacitive Loads and Interconnect Resis- generator and asynchronous port controllers encapsu-
tances of clock signals. Due to the complexity involved lated in a self-timed wrapper. The operation of an asyn-
in synchronous design strategies, SoC applications have chronous port controller is modeled by AFSMs imple-
adopted asynchronous design strategies. Hence, GALS mented through STG. Since implementation through
techniques are introduced for asynchronous designs to STG promises hazard-free ports, the ports can be
achieve the maximum performance of the SoC system. designed in burst mode or extended burst mode formats.
The process-voltage-temperature (PVT) variations are The local clock generator is made tunable for stopping
within the tolerance levels for asynchronous circuits and adjusting the frequency to synchronize data transfer
compared to synchronous circuits, meeting the require- (Figure 1).
ments for robust applications. The performance parame-
ters, corresponding to low power, high speed and reduc- The new clock pulse is generated only if the request from
tion of electromagnetic interferences are improved, opt- all ports is low to stop the clock. The metastability is
ing for asynchronous design strategies over synchronous resolved by receiving all the requests from ports with
design strategies. different mutual exclusion (MuTex) elements.

GALS techniques simplify timing analysis, time to Data communication between the sender and the
market for an SoC circuit by reusing functional IP receiver in asynchronous systems follows handshake pro-
blocks. These structures adopt a modular approach by tocols indicating data arrival and availability. Figure 2 [15]
B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA 3

Figure 1: Asynchronous wrapper

Figure 2: Communication protocol: (a) four-phase handshake. (b) Two-phase handshake

represents two communication protocols i.e. two-phase Return to Zero (RTZ) signaling is also known as a four-
and four-phase handshake protocols, communicating phase bundled data protocol. The sender transfers data,
through request and acknowledge signals. In a two- which are indicated by setting the Request signal high.
phase protocol, Figure 2(a), a request signal is sent from The receiver accepts the data, which is indicated by set-
the transmitting circuit to the receiving circuit, indi- ting the Ack signal high. The response from the sender
cating the presence of data, and as the receiver circuit is indicated by high-to-low transition on the Request sig-
receives the data, the acknowledge signal undergoes a nal (this shows that data validity is not guaranteed fur-
transition. In a four-phase protocol, Figure 2(b), the ther). Finally, a high-to-low transition on the Ack signal
start of data transmission is indicated by the transmit- indicates an acknowledgement by the receiver. Hence-
ter circuit, and the request signal takes a transition, forth, the sender may initiate the next communication
the receiver acknowledgement is denoted by a transi- cycle. Although simplicity is its advantage, due to the
tion in acknowledging signal. This, in turn, causes the RTZ transition nature of this protocol, more energy and
request signal to go its initial state at the transmission time are consumed. If time to process valid data (when
side. Furthermore, data are accepted by the receiver. The Request signal is high) and time to process null data
acknowledge signal is restored after the restored signal is (when Request signal is low) are equal, then the resul-
restored [15]. tant data rate or throughput is reduced by a factor of
4 B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA

Figure 3: Proposed design ﬂow

2. To overcome these disadvantages, a two-phase bun- functionality is defined using Hardware Description Lan-
dled data protocol could be used, and it is also known as guage on Vivado Tool Suite at the RTL development
Non-Return to Zero (NRZ), indicating signaling or tran- stage. The IP integrator in Vivado interconnects various
sition signaling. Hence, information on the Request and IP cores by instantiating them to build the final Queue
Ack signal is transferred as signal transitions, and there is Direct Memory Access (QDMA) module. Design ver-
no difference in 1- > 0 and 0- > 1 transition. This shows ification is done using the Vivado Simulator to verify
that four-phase protocols have several signal transitions specific functionalities of the QDMA module. The syn-
compared to two-phase protocols during data transfer. thesized netlist generated is used to analyze the hierarchy
As a result, two-phase protocols are chosen over four- of design and ensure design optimization by eliminat-
phase protocols, and comparatively, a higher latency time ing redundant logic modules. The syntax is verified, and
is obtained. the obtained netlist is saved as a Native Generic Circuit
(NGC) file.

3. PROPOSED DESIGN FLOW Furthermore, into the process, Floor planning, placement
and route (PAR) are performed as a part of the design
GALS architectures comprise asynchronous wrappers implementation. Translate constitutes the design file, a
constituting LS modules and port controllers to han- combination of relevant netlist with constraints, wherein
dle communication between various LS modules. The constraints assign the ports to the physical component in
proposed design flow is depicted in Figure 3. The IP FPGA. This information is saved as UCF.
B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA 5

Figure 4: Asynchronous wrapper with point-to-point communication

the quantitative summary of time delay and power con-

sumed. The generated bitstream after the design imple-
mentation is used to configure the target Virtex Ultra-
Scale FPGA device.

3.1 QDMA
Figure 5: Stretchable clock Various integrated blocks in the UltraScale+TM encom-
pass QDMA for large DMA. This provides improved
performance and flexibility with its bridge infrastructure,
Map process fits the submodules of the entire circuit data transfer with a large packet count and higher band-
onto the FPGA. PAR carries out the placement and rout- width. QDMA implements queues that could be config-
ing process. Functional simulation is performed after ured to be operated in different modes with PCI Express
the translation process to validate the functionality of interface for virtualized application spaces and a broad
the module. Static timing analysis and power analysis range of malfunctioning. It also provides for enhanced
reports are generated after the PAR process, comprising traffic management. Descriptors, incorporating QDMA,
6 B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA

new data, the start and the done signal go to their ini-
tial state i.e. from 1 to 0. AFSMs are implemented in
port controllers, and communication is enabled through
implemented STGs, and their structures are defined by
the C element. A latch is added to the dataflow path to
prevent metastability. Logic hazards are not introduced
as each signal in control occupies one Look-Up Table
(LUT) for conventional mapping. The plausible clocking
in GALS systems has a major drawback of metastabil-
ity [16] due to the arrival clock’s rising edge and Request
signal occurring simultaneously.

Stretchable clock circuits are used to resolve the issue of

synchronization failure and reduce power consumption.
These stretchable clock architectures employed using
basic gates are unreliable in a few input states [17].
Stretchable clock circuits with standard cells have a
Figure 6: Signal transition graph (STG) of a stretchable clock large delay compared to basic gates. Thus, these stretch-
able clock circuits in the proposed wrapper employ the
Muller-C element in which the Stretch signal controls the
could be used to transfer data from Host to Card (H2C) clock generator, as shown in Figure 5. During data trans-
or vive versa. fer between two LS modules, the phase of clock domains
St1 and St2 signals is stretched.
3.2 Fast GALS Interface
Figure 6 shows signal transition involved in W-port and
In asynchronous wrappers with point-to-point commu- R-port. If the LS module is ready to transfer data, then
nication (Figure 4), the clock generator is only active wr+ is triggered and when the LS module is ready to
during data transfer. When signal Start transitions from accept data, the rd+ is triggered. Figure 7 shows the
0 to 1, the XOR gate drives the stop signal from 0 to 1, trace plot of the stretchable clock STG simulated using
activating the clock. After processing the data, the Done Petrinets.
signal goes from 0 to 1, and hence data are transferred to
the output port through the FIFO block, which is a stor- Wrappers can be designed for the multipoint GALS archi-
age element at the receiver in the data path. To receive tecture (Figure 8), with two incoming signals and a single

Figure 7: Simulation of stretchable clock

B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA 7

Figure 8: Asynchronous wrapper with point-to-multipoint communication

outgoing signal, while the data are not transferred con- 4. RESULTS AND DISCUSSION
currently as the arbiter is not present. The control module
processes one data signal at a given time and transfers the The QDMA module is simulated using Vivado Tool Suite
data to the FIFO block after processing. The design of on UltraScale+TM xcvu9pfsgd2104 device at a maximum
a multipoint GALS architecture is much more complex frequency of 257.4MHz. Table 1 shows the confidence
than point-to- point GALS. The point-to-point GALS levels obtained for the proposed GALS architecture. The
wrapper contains a single input port and an output port. proposed design uses stretchable clocks, thus improving
The wrapper architecture can be modified as a multipoint the source clock and destination clock delay, respectively
topology with several inputs and output ports depending (Table 2).
on the application, but only one request can be processed
at a time. Thus, wrapper reusability is availed in a multi- There is improvement in latency compared with [18]
point wrapper, saving overall area and power consump- since the proposed GALS wrapper comprising plausi-
tion. “Join” and “Fork” are used for multipoint GALS ble clock with four-phase handshake protocol has sev-
interfaces. Arbiters can be used for concurrent processing eral transitions. The circuit functions if these bounds
in multipoint GALS. are met correctly. The average latency is reduced by
8 B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA

Table 1: Confidence level Table 3: Average time of latency for GALS architectures
User input data Confidence AW [18] AW [2] AW [4] AW [19] Proposed AW
Design implementation state Low 33.8 ns 38.3 ns 35.6 ns 25.52 ns 6.40 ns
Clock nodes activity High
I/O nodes activity Low
Internal nodes activity Medium
Overall confidence level Medium
Table 4: Dynamic power dissipation
With stretchable clock Without stretchable clock
Table 2: Environment set-up 164.98 nW 235.16 nW
Ambient temp (C) 25.0
ThetaJA (C/W) 0.5
Airflow (LFM) 250
Heat sink Medium (medium profile) Table 5: Results obtained by the proposed GALS architecture
ThetaSA (C/W) 0.7 Specification Power without GALS Power with GALS
Board selection medium (10"×10")
# of board layers 12 to 15 (12–15 Layers) Total on-chip power (W) 5.422 3.990
Design power budget (W) Unspecified∗ Unspecified∗
Power budget margin (W) NA NA
Dynamic (W) 2.910 1.500
Device static (W) 2.512 2.490
82% compared to [18], 81% compared to [2] and 83% Effective TJA (C/W) 0.5 0.5
compared to [4] by the proposed GALS architecture. Max ambient (C) 97.2 97.9
Junction temperature (C) 27.8 27.1
Another reason or improvement achieved is due to the Confidence level Medium Medium
implementation of two-phase handshaking signals over
four-phase handshaking signals. These two-phase hand-
shaking signals have the edge over four-phase handshak-
ing signals, with fewer signal transitions and dynamic the dynamic power dissipation reduces by 29% when a
power dissipation. stretchable clock is used in the GALS wrapper. Hence,
stretchable clocking schemes can be used in the GALS
Synchronization is achieved using D-latch followed by technique requiring low-power application.
a T-flip flop to avoid metastability, circumventing sys-
tem failure. The signal reaching D-latch is asynchronous, Static power dissipation remains unchanged. The average
and this signal will not reach T-flip-flop if the signal is power dissipated is reduced by 48% due to the implemen-
metastable. The signal resolves from a metastable state tation of an asynchronous wrapper compared to circuits
and contains logic levels, further passes through the T- without GALS (Table 3). Comparatively, a two-phase
flip flop, which gives the output with respect to the bundled-data protocol is more efficient than a four-phase
synchronized signal. These circuits are called Synchro- bundled-data protocol since return-to-zero transition
nizer circuits and combine D-latch and T-flip flop that has high performance and power dissipation is avoided.
convert asynchronous signal to synchronous signal, thus Edge-sensitive devices are often more complicated than
eliminating the issue of metastability. These synchro- level-sensitive devices.
nizers are low power strategies, consume less area, are
highly reliable with high MTBF (Mean Time Between Response by control logic, storage elements to transition
Failures), and have low latency. However, synchroniza- on signal is more complex. Thus, a two-phase bundled
tion between wrappers is accomplished using handshake data protocol is a chosen approach in a high-speed system
signals. In this proposed methodology the synchronizer with unconditional data flow. Besides, there is a signifi-
circuit of [19] is replaced with FIFO-based synchronizer cant reduction in dynamic power dissipation as there are
,which reduces bandwidth and ensures communication reduced transitions in two-phase handshake protocol and
to be reliable. FIFO-based synchronizer will ensure the clock gating techniques. The improvement in the reduced
matching of frequency rate. power dissipation and better throughout attainment is
achieved at the cost of a marginal increase in average gate
Stretchable clock architecture in the proposed GALS count.
wrapper has the Muller-C element; hence, it operates
at higher frequencies than circuits employing standard There is a trade-off in gate count of up to 13%. There
cells. Thus, the stretchable clock gating scheme used is a marginal increase in the LUT utilization due to
in this work improves the performance by resolving logic implementation, as shown in Tables 4 and 5. The
the issue of metastability encountered, while plausible improvement in performance and latency achieved is 7%
clocking is applied in the wrapper. Table 4 shows that and 5%, respectively (Tables 6–9).
B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA 9

Table 6: Power consumed by different modules with GALS the QDMA subsystem. The proposed work uses a two-
Entity Power (W) phase bundled-data protocol over a four-phase bundled-
local_sync_core 1.915 data protocol since it provides increased performance
LS_core 1.915 although the circuit implementation is quite complex.
inst_core 1.915
The two-phase communication protocol inculcated in
this paper reduces the latency as it has fewer signal transi-
Table 7: Power consumed by different modules without tions. The single port controller controls the entire com-
GALS munication between LS modules, and it is modified for
Entity Power (W) multi-point and point-to-point interfaces. The motive
GALS_async_core 1.489 behind this approach is to design the system in a mod-
LS_core 1.489 ular way, wherein each module of the system provides
inst_core 1.489
more optimistic delay models, and the interconnection
between independent modules is established based on
the Delay Insensitive models. Therefore, the proposed
Table 8: Power consumption report
architecture contributes to latency reduction and adopts
Area
without Area with better power efficient techniques traded off for a marginal
Site type GALS Utilization % GALS Utilization % increase in the gate count.
CLB LUTs 25751 2.18 29647 2.51
LUT as logic 22799 1.93 26667 2.26
LUT as memory 2952 0.50 2980 0.50 DISCLOSURE STATEMENT
LUT as 2952 2980
distributed No potential conflict of interest was reported by the author(s).
RAM
LUT as shift 0 0
register ORCID
CLB registers 53706 2.27 62347 2.64
Register as flip 53642 2.27 62283 2.63 B.K. Vinay http://orcid.org/0000-0001-7778-1376
flop
Register as 64 < 0.01 64 < 0.01
latch REFERENCES
CARRY8 1201 0.81 1683 1.14
F7 Muxes 1393 0.24 1198 0.20 1. E. G. Friedman, “Clock distribution networks in syn-
F8 Muxes 689 0.23 351 0.12 chronous digital integrated circuits,” Proc. IEEE, Vol. 89,
F9 Muxes 0 0.00 0 0.00 pp. 665–92, 2001. doi:10.1109/5.929649

2. J. Muttersbach, “Globally-asynchronous locally-

Table 9: Area utilization report synchronous architectures for VLSI systems,” Ph.D. thesis,
Without
ETH, Zurich, 2001.
GALS With GALS
On-chip power (W) power (W) 3. M. Krstic, and E. Grass, “System integration by request-
Clocks 0.495 0.329 driven GALS design,” IEEE Proc. Comput. Digit Tech.,
CLB logic 0.409 0.341 Vol. 153, no. 5, pp. 362–72, September 2006. doi:10.1049/
LUT as logic 0.274 0.238 ip-cdt:20050210
LUT as distributed RAM 0.067 0.06
Register 0.059 0.039
CARRY8 0.009 0.005
4. J. Potes, R. Soares, E. Carvalho, F. Moraes, and N. Calazans,
BUFG < 0.001 < 0.001 “SCAFFI: An intrachip FPGA asynchronous interface
Others 0 0 based on hard macros,” in 25th International Confer-
F7/F8 Muxes 0 0 ence on Computer Design, 2007, pp. 541–546. doi:10.1109/
Signals 0.584 0.418 ICCD.2007.4601950
Block RAM 1.404 0.395
URAM 0.015 0.015
I/O 0.002 0.002 5. D. L. Oliveira, L. A. Faria, and E. Lussari, “Design of
Static power 2.512 2.49 an improved and robust asynchronous wrapper (AW) for
Total 5.422 3.99 FPGA applications,” J. Integr. Circuits Syst., Vol. 8, no. 1,
pp. 54–63, 2013. doi:10.29292/jics.v8i1.372

6. D. L. Oliveira, T. Curtinhas, L. A. Faria, J. L. V. Oliveira, and

5. CONCLUSION L. Romano, “Design of gated-clock asynchronous wrappers
for multi-point GALS systems,” IEEE ANDESCON, 1–4,
Some of the challenges faced in implementing an SoC
2016. doi:10.1109/ANDESCON.2016.7836214
circuit can be overcome by adopting GALS architec-
tures. The architecture proposed in this paper contains an 7. D. L. Oliveira, E. Lussari, S. S. Sato, and L. A. Faria,
asynchronous wrapper with a single port controller for “An asynchronous interface with robust control for
10 B.K. VINAY ET AL: ASYNCHRONOUS WRAPPER-BASED LOW-POWER GALS STRUCTURAL QDMA

globally asynchronous locally-synchronous systems,” J. 14. D. L. Oliveira, T. Curtinhas, L. A. Faria, H. A. Delsoto,

Technol. Manage., Vol. 5, no. 1, pp. 91–102, 2013. and L. Romano, “Design of low-latency asynchronous
doi:10.5028/jatm.v5i1.191 wrapper for GALS systems,” in XVIII Simpósio de Apli-
cações Operacionais em Áreas de Defesa (SIGE), 2016.
8. D. M. Chapiro, “Globally-asynchronous locally-
synchronous systems,” PhD thesis, Stanford University. 15. A. Reddy Ravi, “Globally-asynchronous, locally-
October 1984. synchronous wrapper configurations for point-to-point
and multi-point data communication,” Master thesis, Uni-
9. P. Techan, M. Greenstreet, and G. Lemieux, “A sur- versity of Central Florida, 2004.
vey and taxonomy of GALS design styles,” IEEE Des.
Test Comput., Vol. 24, pp. 418–28, September–October 16. K. Y. Yun, and R. P. Donohue, “Pausible clocking: A first
2007. doi:10.1109/MDT.2007.151 step toward heterogeneous systems,” in Proceedings of
International Conference on Computer Design (ICCD),
10. Y.-T. Chang, W.-C. Chen, H.-Y. Tsai, W.-M. Cheng, C.-J. Texas, USA, Oct. 7–9, 1996, pp. 118–23.
Chen, and F.-C. Cheng, “A Low-latency GALS interface
implementation,” in 2010 IEEE Asia Pacific Conference 17. D. S. Bormann, and P. Y. K. Cheung, “Asynchronous wrap-
on Circuits and Systems, 6–9 Dec. 2010, pp. 1183–1186. per for heterogeneous systems,” in Proceedings of Inter-
doi:10.1109/APCCAS.2010.5774997 national Conference on Computer Design (ICCD), Texas,
USA, Oct. 12–15, 1997, pp. 307–14.
11. E. Amini, M. Najibi, and H. Pedram, “Globally asyn-
chronous locally synchronous wrapper circuit based on 18. D. L. Oliveira, T. Curtinhas, L. A. Faria, L. and Romano,
clock gating,” in Symposium on Emerging VLSI Tech- “A novel asynchronous interface with pausible clock for
nologies and Architectures, 2006. doi:10.1109/ISVLSI. partitioned synchronous modules,” in IEEE 6th Latin
2006.48 American Symposium on Circuits & Systems (LASCAS),
2015, pp. 1–4. doi:10.1109/LASCAS.2015.7250441
12. D. Kim, M. Kim, and G. E. Sobelman, Asynchronous FIFO
interfaces for GALS on-chip switched networks. SoC, 2005. 19. T. Curtinhas, D. L. Oliveira, O. Saotome, and J. B.
Brandolin, “FPGA implementation of low-latency robust
13. A. Chakraborty, and M. R. Greenstreet. Efficient self- asynchronous interfaces for GALS systems,” in Elec-
timed interfaces for crossing clock domains. Ninth Inter- tronics Electrical Engineering and Computing (INTER-
national Symposium on Asynchronous Circuits and Sys- CON) 2018 IEEE XXV International Conference, 2018,
tems, ASYNC, 2003. pp. 78–88. doi:10.1109/ASYNC.2003. pp. 1–4. doi:10.1109/LASCAS.2015.7250441
1199168

AUTHORS S. Pushpa Mala has completed her Ph.D.

in Jain University, Bangalore. Her research
B.K. Vinay has a Bachelor’s Degree in interests include Image Processing, Sig-
Electronics and Communication Engi- nal Processing and Very Large-Scale Inte-
neering and Master’s in Signal processing grated Systems. Some of her projects have
& VLSI Design. He has more than 7 years been funded by Karnataka State Coun-
of teaching and industry experience. Cur- cil for Science and Technology. She has
rently he is pursuing his Ph.D. in VLSI published around 30 papers in various
design. Some of his projects have been indexed international journals and conferences. She is a
funded by the Department of Science & SMIEE.
Technology Government of India, Karnataka State Council for
Science and Technology. His research interest includes Low
Corresponding author. Email: [email protected]
power VLSI Design, Analog and Mixed signal VLSI Design,
Circuit design and simulations, DSP and Embedded Systems S. Deekshitha is an undergraduate stu-
Design. dent in the Department of Electronics
and Communication Engineering from
Corresponding author. Email: [email protected] CMRIT, Visvesvaraya Technological Uni-
versity. Her research interest includes
VLSI Design.

Email: [email protected]

SCAR 10 Procedure
No ratings yet
SCAR 10 Procedure
21 pages
Globally Asynchronous, Locally Synchronous PDF
No ratings yet
Globally Asynchronous, Locally Synchronous PDF
12 pages
Three Generations of Asynchronous Microprocessors
No ratings yet
Three Generations of Asynchronous Microprocessors
14 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
A Pausible Bisynchronous FIFO For GALS Systems: Ben Keller, Matthew Fojtik, and Brucek Khailany
No ratings yet
A Pausible Bisynchronous FIFO For GALS Systems: Ben Keller, Matthew Fojtik, and Brucek Khailany
8 pages
Journal of Applied Research and Technology
No ratings yet
Journal of Applied Research and Technology
15 pages
(Legal Code) Disclaimer
No ratings yet
(Legal Code) Disclaimer
184 pages
CDR Linear Model
No ratings yet
CDR Linear Model
139 pages
Reusable Delay Path Synthesis For Lightening
No ratings yet
Reusable Delay Path Synthesis For Lightening
14 pages
High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors
No ratings yet
High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors
14 pages
Design and FPGA-implementation of Asynchronous Circuits Using Two-Phase Handshaking
No ratings yet
Design and FPGA-implementation of Asynchronous Circuits Using Two-Phase Handshaking
10 pages
Microelectronics Journal: Seyed Mohamad Taghi Adl, Siamak Mohammadi
No ratings yet
Microelectronics Journal: Seyed Mohamad Taghi Adl, Siamak Mohammadi
12 pages
Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style
No ratings yet
Lowering Power Consumption in Clock by Using Globally Asynchronous Locally Synchronous Design Style
6 pages
Abbas_Mustafa_S_201911_MSc_thesis
No ratings yet
Abbas_Mustafa_S_201911_MSc_thesis
59 pages
Ics For Communications: Enhanced Serial Communication Controller Escc2 Sab 82532 Saf 82532
No ratings yet
Ics For Communications: Enhanced Serial Communication Controller Escc2 Sab 82532 Saf 82532
272 pages
A Wrapper of PCI Express With FIFO Interfaces Based On FPGA - Hu Li - Yuan'an Liu - Dongming Yuan - Hefei Hu
No ratings yet
A Wrapper of PCI Express With FIFO Interfaces Based On FPGA - Hu Li - Yuan'an Liu - Dongming Yuan - Hefei Hu
5 pages
Asynchronous vs. Synchronous Timing: Muhammad Wasiur Rashid
100% (1)
Asynchronous vs. Synchronous Timing: Muhammad Wasiur Rashid
30 pages
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
From Everand
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
Dr Chaitra HV
No ratings yet
RTL Synthesizable Asynchronous FIFO
No ratings yet
RTL Synthesizable Asynchronous FIFO
7 pages
Vlsi Unit III
No ratings yet
Vlsi Unit III
106 pages
Design of A Dynamic Depth High-Throughput Multi-Clock FIFO For The DSPIN
No ratings yet
Design of A Dynamic Depth High-Throughput Multi-Clock FIFO For The DSPIN
6 pages
Design of High-Speed SerDes Transceiver For Chip-To-Chip Communications in CMOS Process
No ratings yet
Design of High-Speed SerDes Transceiver For Chip-To-Chip Communications in CMOS Process
204 pages
Design and Implementation of Novel Source Synchronous Interconnection in Modern GPU Chips
No ratings yet
Design and Implementation of Novel Source Synchronous Interconnection in Modern GPU Chips
6 pages
Distributed Simulation of Asynchronous Hardware: The Program Driven Synchronisation Protocol
No ratings yet
Distributed Simulation of Asynchronous Hardware: The Program Driven Synchronisation Protocol
33 pages
Design A PLL
No ratings yet
Design A PLL
124 pages
Asynchronous Design E Bk 1st Edition Chris J. Myers - Download the full set of chapters carefully compiled
100% (2)
Asynchronous Design E Bk 1st Edition Chris J. Myers - Download the full set of chapters carefully compiled
52 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
part1
No ratings yet
part1
40 pages
A Scalable Dual-Clock FIFO For Data Transfers Between Arbitrary and Haltable Clock Domains
No ratings yet
A Scalable Dual-Clock FIFO For Data Transfers Between Arbitrary and Haltable Clock Domains
10 pages
Hybrid Synchronous / Asynchronous Design
No ratings yet
Hybrid Synchronous / Asynchronous Design
115 pages
Dsa 261779
No ratings yet
Dsa 261779
489 pages
Wrapper p1500
100% (4)
Wrapper p1500
8 pages
Istc 18 Paper
No ratings yet
Istc 18 Paper
6 pages
Asynchronous Design E Bk 1st Edition Chris J. Myers - The ebook with all chapters is available with just one click
100% (1)
Asynchronous Design E Bk 1st Edition Chris J. Myers - The ebook with all chapters is available with just one click
48 pages
A Reconfigurable System Featuring Dynamically Extensible Embedded Microprocessor, and Customisable
No ratings yet
A Reconfigurable System Featuring Dynamically Extensible Embedded Microprocessor, and Customisable
4 pages
Unit 2
No ratings yet
Unit 2
82 pages
Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS
From Everand
Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS
Alcatel-Lucent
No ratings yet
Layout Theiss
No ratings yet
Layout Theiss
93 pages
Unit 2 Can
No ratings yet
Unit 2 Can
15 pages
Communication Protocols For Modular FPGA Designs
No ratings yet
Communication Protocols For Modular FPGA Designs
1 page
Thesis Presentation
No ratings yet
Thesis Presentation
22 pages
SDA6000 Micronas
No ratings yet
SDA6000 Micronas
412 pages
Isscc 2009 / Session 3 / Microprocessor Technologies / 3.1: 3.1 A 45nm 8-Core Enterprise Xeon Processor
No ratings yet
Isscc 2009 / Session 3 / Microprocessor Technologies / 3.1: 3.1 A 45nm 8-Core Enterprise Xeon Processor
2 pages
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
From Everand
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
Robert Johnson
No ratings yet
An Implementation of A Visible Light
No ratings yet
An Implementation of A Visible Light
63 pages
Cat 1 (unit1)
No ratings yet
Cat 1 (unit1)
4 pages
Design of A High Speed Serializer, Timing Analysis and Optimization in TSMC 28nm Process Technology
No ratings yet
Design of A High Speed Serializer, Timing Analysis and Optimization in TSMC 28nm Process Technology
6 pages
Exercise 1 - Introduction To Embedded Systems
No ratings yet
Exercise 1 - Introduction To Embedded Systems
3 pages
Clockless Chips
No ratings yet
Clockless Chips
15 pages
8 Bit Parallel To Serial Converter
No ratings yet
8 Bit Parallel To Serial Converter
7 pages
Digital Electronics
From Everand
Digital Electronics
Knowledge Flow
No ratings yet
Instant download Mixed-Signal Embedded Systems Design: A Hands-on Guide to the Cypress PSoC Edward H. Currie pdf all chapter
100% (1)
Instant download Mixed-Signal Embedded Systems Design: A Hands-on Guide to the Cypress PSoC Edward H. Currie pdf all chapter
47 pages
document_2_FbR0_08122017
No ratings yet
document_2_FbR0_08122017
7 pages
Full Delay Insensitive Circuits - Structures, Semantics, and Strategies 1st Edition Dennis Furey PDF All Chapters
100% (1)
Full Delay Insensitive Circuits - Structures, Semantics, and Strategies 1st Edition Dennis Furey PDF All Chapters
62 pages
Cikuit: Asynchronous Design
No ratings yet
Cikuit: Asynchronous Design
14 pages
Mixed-Signal Embedded Systems Design: A Hands-on Guide to the Cypress PSoC Edward H. Currie - Quickly download the ebook to start your content journey
100% (1)
Mixed-Signal Embedded Systems Design: A Hands-on Guide to the Cypress PSoC Edward H. Currie - Quickly download the ebook to start your content journey
71 pages
Java Streams Explained: A Practical Guide with Examples
From Everand
Java Streams Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Serial To Parallel in Vhdl......
No ratings yet
Serial To Parallel in Vhdl......
18 pages
Synchroniser Design
No ratings yet
Synchroniser Design
11 pages
LDPC CCSDS
No ratings yet
LDPC CCSDS
64 pages
ZL3068x Product Brief 112019
No ratings yet
ZL3068x Product Brief 112019
6 pages
Data Center Design and The ANSI - TIA-942 Standard (PDFDrive)
100% (2)
Data Center Design and The ANSI - TIA-942 Standard (PDFDrive)
61 pages
1customer Pricelist 2012
No ratings yet
1customer Pricelist 2012
61 pages
4x4 Keypad
No ratings yet
4x4 Keypad
8 pages
Data and Information Sinhala
No ratings yet
Data and Information Sinhala
22 pages
Effect of Harmonic Currents On Semiconductor Fuse Ratings: R Wilkins J. F. de Palma C. Mulertt
No ratings yet
Effect of Harmonic Currents On Semiconductor Fuse Ratings: R Wilkins J. F. de Palma C. Mulertt
6 pages
Power Management in PV Wind Battery Base
No ratings yet
Power Management in PV Wind Battery Base
6 pages
WMC Practical File
No ratings yet
WMC Practical File
25 pages
Controlling Rail Potential of DC Supplied Rail Traction Systems PDF
100% (1)
Controlling Rail Potential of DC Supplied Rail Traction Systems PDF
10 pages
ES200 Easy PDF
100% (2)
ES200 Easy PDF
8 pages
Sree Vidyanikethan Engineering College: Code No.
No ratings yet
Sree Vidyanikethan Engineering College: Code No.
4 pages
Alspa MV3000 Buyers Guide PDF
No ratings yet
Alspa MV3000 Buyers Guide PDF
68 pages
Experiment.3.SingleTunedAmp 22112016
No ratings yet
Experiment.3.SingleTunedAmp 22112016
12 pages
Want To Learn Faster? : ONAN Cooling of Transformer
No ratings yet
Want To Learn Faster? : ONAN Cooling of Transformer
4 pages
Mini FSV D - 160722
No ratings yet
Mini FSV D - 160722
4 pages
Human/Machine Interfaces: Catalog
No ratings yet
Human/Machine Interfaces: Catalog
16 pages
BA45F6622 + BC2161 + BC68F2332: Wireless Doorbell Solution (PIR)
No ratings yet
BA45F6622 + BC2161 + BC68F2332: Wireless Doorbell Solution (PIR)
3 pages
Capacitors Solutions Level Up Eduniti
No ratings yet
Capacitors Solutions Level Up Eduniti
6 pages
Model Answers: Maharashtra State Board of Technical Education (Autonomous) (ISO/IEC-27001-2013 Certified)
No ratings yet
Model Answers: Maharashtra State Board of Technical Education (Autonomous) (ISO/IEC-27001-2013 Certified)
22 pages
TZT 100InstallerManualv2
No ratings yet
TZT 100InstallerManualv2
43 pages
Features: SM / SY / SZ, S-Series, Scroll Compressors
No ratings yet
Features: SM / SY / SZ, S-Series, Scroll Compressors
18 pages
A New Compact Wideband Balun
100% (1)
A New Compact Wideband Balun
3 pages
Electronic Properties of Materials
No ratings yet
Electronic Properties of Materials
23 pages
Outline of All Modules: AC 800M Controller
No ratings yet
Outline of All Modules: AC 800M Controller
4 pages
TCM3.2L La 22PFL1234 D10 32PFL5604 78 32PFL5604 77 42PFL5604 77
No ratings yet
TCM3.2L La 22PFL1234 D10 32PFL5604 78 32PFL5604 77 42PFL5604 77
67 pages
Installation and Operation Manual: General Index
No ratings yet
Installation and Operation Manual: General Index
16 pages
2455RC Thermostat
No ratings yet
2455RC Thermostat
24 pages
Rife Instrument History
No ratings yet
Rife Instrument History
224 pages
TFG Document Guillem Yrla
No ratings yet
TFG Document Guillem Yrla
82 pages
Tascam US-144MKII Manual
No ratings yet
Tascam US-144MKII Manual
36 pages

Asynchronous Wrapper Based Low Power GALS Structural QDMA

Uploaded by

Asynchronous Wrapper Based Low Power GALS Structural QDMA

Uploaded by

IETE Journal of Research

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tijr20

Asynchronous Wrapper-Based Low-Power GALS

B.K. Vinay, S. Pushpa Mala & S. Deekshitha

To link to this article: https://doi.org/10.1080/03772063.2021.2021814

Published online: 24 Jan 2022.

Submit your article to this journal

Article views: 104

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Asynchronous Wrapper-Based Low-Power GALS Structural QDMA

B.K. Vinay 1,2 , S. Pushpa Mala2 and S. Deekshitha3

Figure 1: Asynchronous wrapper

Figure 2: Communication protocol: (a) four-phase handshake. (b) Two-phase handshake

Figure 3: Proposed design ﬂow

Figure 4: Asynchronous wrapper with point-to-point communication

the quantitative summary of time delay and power con-

Stretchable clock circuits are used to resolve the issue of

Figure 7: Simulation of stretchable clock

Figure 8: Asynchronous wrapper with point-to-multipoint communication

2. J. Muttersbach, “Globally-asynchronous locally-

6. D. L. Oliveira, T. Curtinhas, L. A. Faria, J. L. V. Oliveira, and

globally asynchronous locally-synchronous systems,” J. 14. D. L. Oliveira, T. Curtinhas, L. A. Faria, H. A. Delsoto,

AUTHORS S. Pushpa Mala has completed her Ph.D.

You might also like