Design and FPGA-implementation of Asynchronous Circuits Using Two-Phase Handshaking
Design and FPGA-implementation of Asynchronous Circuits Using Two-Phase Handshaking
Abstract—This paper addresses the design and FPGA- they can simulate the circuits, and they can implement and
prototyping of asynchronous circuits using static data-flow hand- operate their circuits using a conventional FPGA board. For
shake components implemented using the two-phase bundled- this purpose, the click-element template [18] using only D-flip
data protocol. The contributions are partly tutorial and partly
scientific. The paper introduces the design process, including flops and combinational gates, seems to be a good fit.
initialization and design of coupled rings with any number of The contributions of the paper are partly tutorial and partly
tokens. Following this, the paper presents gate-level implementa- scientific. The material and insights presented emerged from a
tions of the full set of handshake components as well as some course on asynchronous design, where students were asked to
peephole optimizations that merge the implementation of several design and build small asynchronous circuits. This turned out to
components. The components are implemented using the click-
template. The handshake register implementation is extended be surprisingly difficult. The reason is that when going beyond
with circuitry that decouples the phase of the handshake signals simple pipelines, many important details such as initialization,
on the input and output ports. Such decoupling is needed to numbers of tokens in rings, implementation of components etc.
facilitate implementation of rings with one token (or in the are not well covered in the literature. The aim of our paper is to
general case, rings with any number of tokens). Finally, the paper fill this void, and to enable newcomers to experiment with, and
illustrates the design process using two circuits: one that outputs
the sequence of Fibonacci numbers, and one that computes the get hands-on experience with, the design and implementation
greatest common divisor of two positive integers. All components of small asynchronous circuits, in order to support the learning
are described in VHDL, and all code is available as open source. process.
All components and the two circuits mentioned have been tested The paper makes three contributions: (1) We discuss and
on a Xilinx Nexys4DDR FPGA board. decide on a set of design guidelines, including how to
implement two-phase rings with any number of tokens (often
I. I NTRODUCTION
just a single token). (2) We present the design and FPGA-
When engineering students learn digital electronics, they typ- implementation of the set of handshake components from
ically do lab exercises where they design (small) synchronous [23, Ch. 3]. The handshake registers are what we call “phase-
sequential circuits and implement these in FPGA technology. decoupled” (based on ideas first proposed in [19]). The rest of
A similar situation does not exist for asynchronous design. the components are transparent to the handshaking, in contrast
Despite decades of research, there are no widely used tools, to what is used in most other works based on the click-template.
and the situation has not improved during the last decade. CAD (3) We illustrate the use of the design guidelines and the
tools are typically developed by and used within individual component library using two small circuits: one that emits the
university groups and companies. Many of these groups have sequence of Fibonacci numbers and one that computes the
used variants of CSP [10] to describe asynchronous circuits and greatest common divisor of two unsigned numbers. All code
systems. Some examples are [2], [16], [25]. The last of these and all examples are available as open source.
was later commercialized by the start-up company Handshake The paper is structured as follows: Section II presents
Solutions, and at one point, their Haste language and synthesis background and related work. Section III discusses design
tools were available to universities through Europractice [5]. challenges and presents a set of design guidelines or policies.
Our experience at that time was that students ended up writing Section IV presents the design and FPGA-implementation of
concurrent programs with very limited understanding of what the set of handshake components. Section V shows component
hardware their programs would generate – a paradox in light optimizations that fuse several components. Section VI presents
of the full transparency of syntax directed compilation. the two example data-flow structure circuits, and finally
For a newcomer, and in a teaching context, we believe that Section VII concludes the paper.
less-is-more, and for that reason we aim for a simple and
II. BACKGROUND AND RELATED WORK
straightforward component based-approach. Our aim is to pro-
vide students with FPGA-implementations (i.e., synthesizable A. Data-flow components
VHDL descriptions) of the handshake components presented Asynchronous circuits are often designed using data-flow
in [23, Ch. 3]. From this, they can then build static data flow components. The data-flow abstraction decouples high level
structures by simply wiring together the relevant components, thinking from low-level implementation details including what
10
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
B. Rings using two-phase handshaking
In_Ack Out_Req
Out_Ack
In_Req In_Req Out_Ack
The static data flow structures view of a three-stage ring In_Ack Out_Req
using four-phase handshaking is that the three latches contain
a valid token, an empty token and a bubble. Such a three-stage
ring containing one valid token can be used to implement
- Pi Po
iterative computations where the result from the current step
depends on the result from the previous step. P In_Data Out_Data
n D n
For two-phase designs, the situation is different. The static In_Data Out_Data
D
data-flow structure abstraction involves only tokens and bubbles, n n
11
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
Token Barrier Bubble Token Token Barrier Bubble
0 1 0 0 0 1 1 0 0 0
(a) R1 R2 (a) R1 R2 R3
(b) R1 R2 R3 (b) R1 R2 R3
(c) R1 R2 R3
12
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
Listing 1. Phase-decoupled handshake register
e n t i t y decoupled_hs_reg i s Req delay
generic ( Ack
DATA_WIDTH : n a t u r a l : = DATA_WIDTH ;
VALUE : natural := 0; Data CL
PHASE_INIT_IN : s td _lo gi c := ’ 0 ’ ;
PHASE_INIT_OUT : s t d _ l o g i c : = ’ 0 ’ ) ;
port ( r s t : in s t d _ l o g i c ;
Fig. 5. Function Block
−− I n p u t channel
in_ack : out s t d _ l o g i c ;
in_req : in s t d _ l o g i c ;
in_data : in s t d _ l o g i c _ v e c t o r CLB CLB
(DATA_WIDTH−1 downto 0 ) ; Slice 1 Slice 1
−− Output channel X1Y1 X3Y1
o u t _ r e q : out s t d _ l o g i c ; Slice 0 Slice 0
o u t _ d a t a : out s t d _ l o g i c _ v e c t o r X0Y1 X2Y1
(DATA_WIDTH−1 downto 0 ) ;
out_ack : i n s t d _ l o g i c ) ;
end decoupled_hs_reg ; CLB Slice 1 CLB Slice 1
X1Y0 X3Y0
a r c h i t e c t u r e b e h a v i o r a l of decoupled_hs_reg i s
Slice 0 Slice 0
X0Y0 X2Y0
s i g n a l phase_in , phase_out , c l i c k : s t d _ l o g i c ;
signal data_sig : s t d _ l o g i c _ v e c t o r
(DATA_WIDTH−1 downto 0 ) ;
begin Fig. 6. A Xilinx FPGA is composed of slices (each containing a number of
o u t _ r e q <= phase_out ; LUTs and DFFs). Slices are identified by Cartesian coordinates.
in_ack <= phase_in ;
o u t _ d a t a <= d a t a _ s i g ;
c l i c k <= ( i n _ r e q xor phase_in ) and This can be done by adding a delay to the click signal (delaying
( out_ack xnor phase_out ) ; the clocking of phase and data flip-flops) or by adding a delay
c l o c k _ r e g s : process ( c l i c k , r s t ) after one of the phase flip-flops (delaying In_ack or Out_req).
begin
i f r s t = ’ 1 ’ then C. Function blocks and delay elements
phase_in <= PHASE_INIT_IN ;
phase_out <= PHASE_INIT_OUT ; A function block is an ordinary combinational circuit
d a t a _ s i g <= s t d _ l o g i c _ v e c t o r ( to_unsigned extended with a request and an acknowledge signal, see
(VALUE, DATA_WIDTH ) ) ;
e l s i f r i s i n g _ e d g e ( c l i c k ) then Fig. 5. The request signal must be delayed by more than
phase_in <= not phase_in ; the propagation delay of the combinational circuit. For this, we
phase_out <= not phase_out ; use delay elements that are initially set with a very large safety
d a t a _ s i g <= i n _ d a t a ;
end i f ; margin. Later, based on post place and route simulation, the
end process ; designer may manually trim down the delays to better match
end b e h a v i o r a l ;
the propagation delay in the logic. Automation of this process
is future work. This simple and straightforward implementation
of a function block does not offer any joining of inputs or
forking of outputs.
The delay elements are implemented following the guidelines
B. Handshake Register outlined in [14]: a chain of LUTs whose relative physical
placement on the FPGA is constrained/controlled. Listing 2 on
The handshake register and the phase-decoupled handshake
the next page shows the VHDL code for the delay element. The
register are described in the previous section and their imple-
LUT component used in this implementation has a single input
mentations can be seen in Fig. 2. The VHDL code for the
and implements a buffer (a so-called LUT1 initialized with
phase-decoupled handshake register is shown in Listing 1.
truth table "10"). In order to obtain reproducible delay values,
When deciding the initial state of the phase flip-flops in a
the placement of the LUTs that implement delay elements is
circuit, it is important to note that if a stage holds a token, the
crucial. The rloc attribute allows the designer to specify the
request and acknowledge signals in its output channel must
relative location of the slices in which the LUTs are placed. The
have the opposite phases. If a stage represents a bubble, the
relative placement is specified using the Cartesian coordinates
request and acknowledge signals in its output channel must
(X#Y#) of the slices, as illustrated in Fig. 6. The VHDL code
have the same phase as its downstream neighbor (c.f. policies
for the delay element is shown in Listing 2. As seen in the
P1 and P2).
code the chain of LUTs are placed in a single column in slices
The click pulse has a very short duration. This does not X0Y0, X0Y1, X0Y2, . . . in the following order, as specified
cause problems for the edge-triggered flip-flops (FF) on the by the Y-index of the slice: (0, 1, 0, 1, 0, 1, 0, 1, 2, 3, . . . ).
FPGA we used for testing. If desired, the pulse-width can be By always placing the next LUT in a different slice, we get a
increased by delaying the self-resetting of the control circuit. higher delay due to the delay of the wires.
13
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
Listing 2. Delay element
l i b r a r y IEEE ; OutB_Req InA_Req
InA_Req InB_Req
use IEEE . STD_LOGIC_1164 . ALL ; OutC_Req OutC_Req
l i b r a r y unisim ;
use u n i s i m . vcomponents . l u t 1 ; OutB_Ack
OutC_Ack
InA_Ack
e n t i t y delay_element i s
generic (
s i z e : n a t u r a l range 1 to 30 : = 1 0 ) ;
port (
d : i n s t d _ l o g i c ; −− Data i n P
z : out s t d _ l o g i c ) ;
end delay_element ; InA_Ack
P OutC_Ack
InB_Ack
a r c h i t e c t u r e l u t of delay_element i s
component l u t 1
generic ( (a) (b)
init : b i t _ v e c t o r : = " 10 " ) ;
port ( Fig. 7. (a) Fork. (b) Join.
I0 : in std_ulogic ;
O : out s t d _ u l o g i c
); sel_a
InA_Req
end component ;
InA_Ack
−− I n t e r n a l s i g n a l s .
Pa
s i g n a l s_connect : s t d _ l o g i c _ v e c t o r ( s i z e downto 0 ) ; InB_Req
−− s i g n a l c o n s t r a i n t s sel_b
a t t r i b u t e DONT_TOUCH : s t r i n g ; InB_Ack
a t t r i b u t e DONT_TOUCH of s_connect : s i g n a l Pb
is " true " ;
attribute rloc : string ; click_out
Pc
begin
s_connect ( 0 ) <= d ; click_in OutC_Req
−− Create a r i p l e −c h a i n o f l u t s OutC_Ack
l u t _ c h a i n : f o r i n d e x i n 0 to ( s i z e −1) generate
signal o : s t d _ l o g i c ; InA_Data
n
type y_placement i s array n
OutC_Data
( i n t e g e r range 0 to 29) of i n t e g e r ; InB_Data
n
−− y c o o r d i n a t e s f o r r e l a t i v e l o c a t i o n
constant y _ v a l : y_placement : = ( 0 , 1 , 0 , 1 , 0 , 1 , 0 , 1 ,
2 ,3 ,2 ,3 ,2 ,3 ,2 ,3 ,4 ,5 ,4 ,5 ,4 ,5 ,4 ,5 ,6 ,7 ,6 ,7 ,6 ,7); Fig. 8. Merge
a t t r i b u t e r l o c of d e l a y _ l u t : l a b e l i s
"X0Y" & i n t e g e r ’ image ( y _ v a l ( i n d e x ) ) ;
are guaranteed to be in phase. Again, the shaded rectangles
begin
delay_lut : lut1 indicate combinational logic that is implemented in a single
generic map( LUT in a FPGA.
i n i t => " 10 " ) −−t r u t h t a b l e
port map(
I 0 => s_connect ( i n d e x ) , E. Merge
O => o
); The implementation of the Merge is shown in Fig. 8. It
assumes mutually exclusive inputs and therefore uses separate
s_connect ( i n d e x +1) <= o a f t e r 1 ns ;
end generate l u t _ c h a i n ;
phase flip-flops (denoted Pa and Pb) in the input ports. As
−− Connect t h e o u t p u t o f d e l a y element the input and output phase flip-flops are clocked by separate
z <= s_connect ( s i z e ) ; signals, it also needs a separate phase flip-flop (denoted Pc) in
end lut ; the output port.
The circuit functions as follows: A transition on either
InA_Req or InB_Req asserts either Sel_A or Sel_B and the
multiplexor propagates the proper input data to the output
D. Join and Fork (Out_Data). This also creates a rising edge on the signal
click_out, which causes a transition on Out_Req. Finally, this
Simple and straightforward implementations of the join and
creates a (silent) falling transition on signal click_in. When the
fork components are shown in Fig. 7. They are textbook imple-
right hand environment later acknowledges by transitioning
mentations [23, Sect. 5.2] using a click-circuit to implement
signal Out_Ack, this causes a rising edge on signal click_in.
the functionality of C-element.
This clocks both Pa and Pb and causes a transition on InA_Ack
Following design policies P1 and P2, the simple join in
if the operation of the merge started by a transition on InA_Req
Fig. 7 can always be used. The phase flip-flop is initialized
or a transition on InB_Ack if the operation of the merge started
according to the state of the input and output channels.
by a transition on InB_Req.
Because the component is transparent to handshaking, these
14
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
InA_Req
InA_Ack
InSel_Req
InA_Req OutC_Req
InSel_Ack
InA_Ack OutC_Ack
Ps
InSel_Data InB_Req OutD_Req
InB_Ack OutD_Ack
click_out Pa Pb Pc Pd
Pc
OutC_Req InA_Data OutC_Data
click_in OutC_Ack n D n
InB_Data OutD_Data
InA_Data n 1
OutC_Data
InB_Data 0 n
n
Fig. 12. Implementation of a phase-decoupled fused Join+Register+Fork
component with two input channels (A and B) and two output channels (C
Fig. 9. MUX and D).
InA_Req OutB_Req
InA_Ack OutB_Ack the output channel (signal OutC_Req) and is toggled whenever
InSel_Req OutC_Req there is a token on the selector channel and the selected input.
InSel_Ack OutC_Ack Similar to the Merge, the MUX has phase-decoupled channels
click_in
click_out due to the nature of its function.
Fig. 10 shows the implementation of the DEMUX (inspired
Pa Pb
by [13]). It has two input channels (InA and InSel) and two
InSel_Data output channels (OutB and OutC). The component joins the two
Pc
inputs and produces an output on the selected channel. Similar
OutB_Data to the MUX, the DEMUX has multiple internal phase flip-
InA_Data
n
n
flops. The phase flip-flops Pb and Pc are clocked when both
n OutC_Data request signals on the input channels transition. Phase flip-flop
Pa (participating in the input channel handshakes) is clocked
Fig. 10. DEMUX whenever an acknowledgement is received (as indicated by the
following expression: OutB_Ack = OutB_Req) ∧ (OutC_Ack
= OutC_Req). Again, we prefer this style of clocking to the
This way of using a phase-flip-flop to produce an acknowl- gated clocking used in the components described in [13], [18].
edge based on the corresponding request is a small variation
that we prefer instead of the clock-gating used in in the buffered V. P EEPHOLE OPTIMIZATIONS
Merge in [18] and the plain Merge described in [13]. A gated It is possible to reduce the hardware cost of a circuit by
clock produced by an AND-gate requires the gating signal to be performing peephole optimizations, where certain combinations
stable in a time window overlapping the period where the clock of handshake components are replaced by a single fused circuit.
signal is high. Our solution avoids this timing requirement. All of these optimizations involve merging handshake registers
and one or more of the passive components. The original click-
F. MUX and DEMUX paper [18] showed how easy it is to extend the click-template
The MUX and DEMUX components are used to implement with join-functionality on the input and fork functionality on
conditional flow control. The MUX has two input channels the output. The same is the case for our phase-decoupled
(InA, InB), a selection (input) channel for choosing between handshake register. Below we describe a range of such fused
InA and InB and an output channel (OutC). Fig. 9 shows the components.
implementation of the MUX. The phase flip-flops Pa, Pb and Ps
are all clocked on every transition of the incoming acknowledge A. Join+Register+Fork, Join+Register and Register+Fork
by the same signal derived from the function OutC_Req = The schematic symbols for a handshake register fused with
OutC_Ack. The phase flip-flop Pc drives the request signal of a Join and/or a Fork are shown in Fig. 11, and Fig. 12 shows
15
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
RF1
the implementation of a fused Join+Register+Fork circuit with R0 RF0 0
1
two input channels and two output channels. For simplicity the 0
1 1 0
0 1 0
0
figure shows a design with separate phase flip-flops for each 0
0 0 0
outer ring containing the same components and handshake CL3 0
0
register RF1. The inner ring has two handshake registers and B-A
0
0
one token. The outer ring has three handshake registers and 0
ME0
two tokens. By following policy P2, we can ensure correct
initialization of both rings. Notice that the schematic shows Fig. 14. Schematic of the GCD circuit.
no annotations on the input channels of join J0; our passive
16
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
grouped by component. A file showing a similar simulation of
the GCD circuit is included in the GitHub repository.
Both circuits work correctly in simulation and on the
actual FPGA board. As this paper focuses on the design
process and on FPGA-prototyping, we use delay elements with
very conservative (high) values. For this reason, it does not
make sense to report performance measures. A more detailed
discussion of performance and performance optimization is
beyond the scope of this paper.
VII. C ONCLUSION
This paper presented a simple, structural approach to the
design and FPGA implementation of asynchronous circuits
using data-flow handshake components. The aim of the paper
is to enable students, and others who are in the process of
learning asynchronous design, to design and implement small
asynchronous circuits using FPGA technology.
The components use two-phase bundled-data handshaking
and are implemented using a novel phase-decoupled extension
of the click-element template. This phase-decoupling allow
implementation of nested rings with any number of tokens
including the most typical situation – rings with a single
token. In this way, two-phase bundled-data implementations
of iterative/recursive functions are now possible.
The paper presents the implementation (described in VHDL)
of all components in the library and it illustrates the design
Fig. 15. Post-synthesis timing simulation of the Fibonacci circuit. method using two example circuits: Fibonacci and greatest
common divisor. All code, including the design examples, is
available as open source.
Notice that R0 has different phases on the in- S OURCE CODE
put and output channels. This is because the ring The paper is accompanied by an on-line repository [11]
MX0 → RF0 → F0 → R0 has a single token. The other rings containing: (a) Schematics and VHDL source code for all the
in this circuit are: MX0 → RF0 → DX0 → RF1 → DX1 → ME0 handshake components. (b) Schematics and source code for
and MX0 → RF0 → F0 → DX0 → RF1 → DX1 → ME0. In all of the two design examples including VHDL test-benches for
the rings the tokens eventually get spread across several simulation. (c) A sequence of snapshots of the schematics
components as seen in the step-wise illustration provided in illustrating the token-flow operation of the circuits.
the Git repository.
R EFERENCES
C. FPGA Implementation
[1] Filipp Akopyan, Jun Sawada, Andrew Cassidy, et al. True North:
Both the Fibonacci circuit and the GCD circuit have been Design and Tool Flow of a 65 mW 1 Million Neuron Programmable
implemented on a Digilent Nexys4DDR FPGA-board (with Neurosynaptic Chip. IEEE Tran. Computer-Aided Design of Integrated
Circuits and Systems, 34(10):1537–1557, 2015.
a Xilinx Artix 7 chip) and the circuits have been operated [2] Erik Brunvand and Robert F. Sproull. Translating concurrent programs
manually. Input channels are implemented using a debounced into delay-insensitive circuits. In Proc. Int’l. Conf. Computer-Aided
pushbutton for the request signal, a set of switches for the Design, pages 262–265, November 1989.
[3] S. Chatterjee, M. Kishinevsky, and U. Y. Ogras. xmas: Quick formal
data, and an LED for the acknowledge signal. Output channels modeling of communication fabrics to enable verification. IEEE Design
are implemented using LEDs for the request signal and the Test of Computers, 29(3):80–88, 2012.
data signals, and a debounced pushbutton for the acknowledge [4] M. Davies, N. Srinivasa, T. Lin, et al. Loihi: A neuromorphic manycore
processor with on-chip learning. IEEE Micro, 38(1):82–99, 2018.
signal. The corresponding XDC-files (constraint files specifying [5] Europractice. URL: http://www.europractice.com.
the pinout) are included in the design sources in the GitHub [6] P. D. Ferguson, A. Efthymiou, T. Arslan, and D. Hume. Optimising
repository. In the component source files, the "DONT_TOUCH" self-timed FPGA circuits. In Proc. Euromicro Conference on Digital
System Design: Architectures, Methods and Tools, pages 563–570, 2010.
attribute is set for combinational signals and registers, to force [7] Alberto Ghiribaldi, Davide Bertozzi, and Steven M. Nowick. A transition-
the place and route tool to keep the signals. Therefore, minimal signaling bundled data NoC switch architecture for cost-effective GALS
project setup is necessary for using the designs. multicore systems. Proceedings - Design, Automation, and Test in Europe
Conference and Exhibition, pages 332–337, 2013.
A post synthesis simulation of the Fibonacci circuit is shown [8] Mark R. Greenstreet, Jørgen Staunstrup, and Ted E. Williams. Self-timed
in Fig. 15. The first five signals show the environment signals. iteration. In Carlo H. Séquin, editor, Proceedings of VLSI ’87, pages
Below these, some select internal signals are also plotted and 269–282. IFIP, August 1987.
17
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply
[9] Quoc Thai Ho, Jean-Baptiste Rigaud, Laurent Fesquet, Marc Renaudin, Stanford University Press, 1963.
and Robin Rolland. Implementing asynchronous circuits on LUT based [18] Ad Peeters, Frank te Beest, Mark de Wit, and Willem Mallon. Click
FPGAs. In Field-Programmable Logic and Applications: Reconfigurable elements: An implementation style for data-driven compilation. In Proc.
Computing Is Going Mainstream, pages 36–46. Springer, 2002. IEEE International Symposium on Asynchronous Circuits and Systems
[10] C. A. R. Hoare. Communicating sequential processes. Communications (ASYNC), pages 3–14, 2010.
of the ACM, 21(8):666–677, August 1978. [19] M. Roncken, S. M. Gilla, H. Park, N. Jamadagni, C. Cowan, and
[11] https://github.com/zuzkajelcicova/Async-Click-Library. I. Sutherland. Naturalized communication and testing. In Proc. IEEE
[12] Lana Josipović, Radhika Ghosal, and Paolo Ienne. Dynamically scheduled International Symposium on Asynchronous Circuits and Systems (ASYNC),
high-level synthesis. In Proc. ACM/SIGDA International Symposium on pages 77–84, 2015.
Field-Programmable Gate Arrays (FPGA), pages 127–136, 2018. [20] Basit Riaz Sheikh and Rajit Manohar. An asynchronous floating-point
[13] I Kotleas, D.R. Humphreys, R.B. Sørensen, E. Kasapaki, F. Brandner, multiplier. In Proc. IEEE International Symposium on Asynchronous
and J. Sparsø. A Loosely Synchronizing Asynchronous Router for TDM- Circuits and Systems (ASYNC), pages 89–96, 2012.
Scheduled NOCs. In Proc. IEEE/ACM International Symposium on [21] M. Singh and SM Nowick. MOUSETRAP: High-speed transition-
Networks-on-Chip (NOCS), pages 151–158, 2014. signaling asynchronous pipelines. IEEE Transactions on VLSI Systems,
[14] Jon Neerup Lassen. FPGA prototyping of asynchronous networks- 15(6):684–698, 2007.
on chip. Master’s thesis, Dept. of Information Technology, Technical [22] Danil Sokolov, Ivan Poliakov, and Alex Yakovlev. Analysis of static
University of Denmark, 2008. Report IMM-M.Sc.-2008-26) available at data flow structures. Fundamenta Informaticae, 88(4):581–610, 2008.
http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=7126. [23] J. Sparsø and S. Furber, editors. Principles of asynchronous circuit
[15] Rajit Manohar. Reconfigurable asynchronous logic. In Proc. Custom design – A systems perspective. Kluwer Academic Publishers, 2001.
Integrated Circuits Conference (CICC), pages 13–20. IEEE, 2006. [24] Ivan E. Sutherland. Micropipelines. Communications of the ACM,
[16] Alain J. Martin. Compiling communicating processes into delay- 32(6):720–738, June 1989.
insensitive VLSI circuits. Distributed Computing, 1(4):226–234, 1986. [25] C. H. van Berkel, C. Niessen, M. Rem, and R. J. J. Saeijs. VLSI
[17] David E. Muller. Asynchronous logics and application to information programming and silicon compilation: A novel approach from Philips
processing. In H. Aiken and W. F. Main, editors, Proc. Symp. on research. In Proceedings of the 1988 IEEE International Conference on
Application of Switching Theory in Space Technology, pages 289–297. Computer Design, pages 150–166. IEEE, 1988.
18
uthorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on January 05,2025 at 15:39:54 UTC from IEEE Xplore. Restrictions apply