0% found this document useful (0 votes)
10 views

RTL Synthesizable Asynchronous FIFO

Uploaded by

chetansb2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

RTL Synthesizable Asynchronous FIFO

Uploaded by

chetansb2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

IEEE - 35239

Implementation of an RTL synthesizable


asynchronous FIFO for conveying data by
avoiding actual data movement via FIFO

Shruti Sharmaa

a. University School of Information and Communication Technology, Guru Gobind Singh Indraprastha
University (GGSIPU), Delhi, India. Email: [email protected]
(Student of Masters of Technology in Electronics and Communication Engineering, Final Year)

Abstract has been employed in preserving parallelism and


therefore as an upgrade to system throughput whether it
An enhanced approach of integrating asynchronous be for high-performance processors such as graphics
design is thoroughly adapted into the conventional and multimedia units, or signal processors [1]. The
synchronous flow design to avoid problems like respective principles of synchronous and asynchronous
critical delays, clock skew or power consumption logics are based on protocols which are sternly time­
control. This feature comes as an outcome of driven and demand-driven [2]. In order to sort a
accepting Veri log models, thereby allowing the healthier approach against variations in process,voltage
logical synthesis and functional verification of the and temperature (PVT variations), the protocol scheme
design based on the RTL synthesis. The modified used in most of the works employs a handshaking
pipeline design followed in this paper are based on process where the sender of data specifies a token of
asynchronous nature which exhibits an exceptional the request or simply raising a request signal; in
dynamic feature of being a latch-less in control response to which the receiver of that data specifies a
operations, thus with proper sequencing can achieve token of acknowledgement or simply raising an
the implied latching functionality of the dynamic acknowledge signal. The operation of pipeline [2]
gates. This work analyses the performance depends on its key feature of involving transparent
evaluation of asynchronous and synchronous design latches nearly disparate from most latch-based
topologies of FIFO been constructed based on a synchronous pipelines which illustrate the directly
scheme where data movement is avoided in the passing of data across them, thereby creating a pipeline
FIFO. This work is implemented and synthesized in environment with a flow-through combinational path.
register-transfer-Ievel (RTL) using Verilog alongside Usually when the data passages across a separate
few sequential latches and digital logic gates and stage's latch, a transition on the C control input forms
simulation is done at the gate-level using Xilinx-ISE opaque latches, in that way storing and protecting the
12.1 tool to deliver delay and power for the same, data from any further changes on the stage's input
together with simple and easy integration in advance channel. Once data advances through the next stage's
design flows and processes. latches, a transition on the P control input,
Keywords: Asynchronous, FPGA, RTL, JTAG communicated from the next stage, makes the current
stage's latches transparent, thus safely allowing the
next data item to enter. Integration of asynchronous
1. INTRODUCTION design is thoroughly adapted into the conventional
Pipelining centered designs have been proven as one of synchronous flow design to avoid problems like critical
the practicalities to deliver high-performance digital delays, clock skew or power consumption control. This
system design. Pipelining has been beneficial in proved to be an enhanced practice which assimilates
synchronous systems, as it's been observed to be well into the conventional synchronous design flow an
employed as an essential technique in the past decades, asynchronous FIFO to due to its acceptance of Veri log

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

models (augmented with System Verilog features),thus retaining the low value of area and power consumption
allowing the use of any standard Verilog simulator that in any asynchronous control system, this work offered
supports System Verilog [3] features for functional a method which avoids the predictable syntax directed
verification, and its generation of Verilog models translation tactic. As an alternative,it occupies a design
suitable for a register-transfer level (RTL) synthesis. access which is a data-driven design style and also a
The concept of utilizing four-phase handshake coarse-grain access in a way to synthesise an
protocol [3,4] in this work is applied mostly to achieve asynchronous control system, thereby confining the
an asynchronous FIFO environment which is disparate communication channels from being implemented
from the flow-through FIFOs, i.e. where the movement asynchronously. The presented method is adopted for
of data through FIFO can be avoided to eliminate the the reason that it can be easily integrated into
chances of latency. To briefly describing the working conventional synchronous design styles since they are
of 4-phase protocol one can say that a particular cycle based on Verilog and System Verilog stipulations, and
has two phases, one is work phase and another is reset are able to produce their appropriate register-transfer
phase. The work phase expands from the start of the level (RTL) models for further enabling their functional
request rising edge to the start of the acknowledgement simulation and logical synthesis by means of currently
rising edge indicating the handling of a request and its available computer-aided design tools.
completion. The reset phase is constituted by the The work in [8] suggested the RTL synthesizable
return-to-zero of both the signals i.e. request and asynchronous design of a FIFO by consuming the
acknowledgement signals,the signalling transitions can modified pipeline construct which is proving beneficial
be arranged differently for different options of cost and in designing the FIFOs where the data movement is
performance. The work proposed in [3] presents the use avoided within the FIFO. Thus, by agreeing to the
of an early acknowledgement protocol featured as an modified pipeline concept the complexity of the control
improvement above the normal four-phase protocol by pipelines to exchange the tokens or signals permitting
entirely concealing the reset phase of the signalling i.e. to read or write in the FIFO, can be reduced. One more
the acknowledging of the receiver is specified by the possible benefit elaborated in this work is its design in
falling edge of the acknowledge signal instead of being RTL i.e. by utilizing the hardware description language
specified by the rising edge as in the normal four-phase VHDL, thus making it possible to incorporate with the
protocol. In the early acknowledgement protocol the other synchronous RTL designs suitable to construct a
signalling occurs or the request signal is marked to large, diverse system with both asynchronous­
reset whenever the request signal as well as the synchronous potentials such as a GALS on-chip
acknowledgement signal goes high. The end of the system.
ongoing transaction is delimited by the actual
2. ASYNCHRONOUS VS. SYNCHRONOUS
acknowledgement which is directed from the falling
DESIGN
edge of acknowledgement signal, this resets the
acknowledgement signal for the upcoming cycle of the Asynchronously designed circuits deliver numerous
request-acknowledge, by doing so this protocol benefits which are contradictory to their synchronous
removes the reset phase inherited by the simple 4-phase equivalents; one such benefit is the thorough avoidance
protocol. of clock skew, which eventually has become a main
The International Technology Roadmap for apprehension in deep sub-micron technologies [9].
Semiconductors (lTRS) in its 2008 edition recounts a Synchronously designed circuits are fetching more
clear requisite for asynchronous communication difficulties in the distribution of a global clock signal
protocols for control and synchronization in integrated and maintaining the low clock slew side-by-side,
circuits (lCs) [5]. The 2011 edition further laid throughout the circuit, due to withdrawing of
emphasis on the prominence of asynchronous technologies with rapidly growing design sizes and the
handshaking and on the design approach based on demand of increase in clock speed. The global
globally asynchronous locally synchronous (GALS) in synchronization of events followed in the synchronous
integrated circuits [6]. Most system-on-chip (SoC) designs,induces the necessity to further downturn,their
designs adapt to the synchronous configuration, circuits to compensate the skew, otherwise this skew
recommending an assumption of a discrete concept of may become a much bigger concern on further
time, this concept suggestively moderates the reduction of the feature size. The distribution of clock
complexity of a digital circuit design and allows them is also intended to consume more circuit power.
to be further easily modelled using register transfer In contrary to this, an asynchronous design which is
level (RTL) languages like VHDL or Verilog. The demand-driven, has become an only solution to
work proposed in [7] presents a group of modelling overcome such difficulties simply by removing the
directions and synthesis techniques to be preferred in clock signal, employs handshaking protocol exchange
designing asynchronous pipelines. In the context of among the neighbors participating in a transfer of data

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

without being globally synchronized by a single clock, Step c: Then the sender responds to the wr_ack signal
making it more energy-efficient. Conversely, fully by resetting wr-req signal.
asynchronous circuit designs are still not accepted as a Step d: finally,the FIFO resets the push-ack signal after
remedy to completely replace synchronous designs due push-req has been reset.
to absence of advanced computer-aided design tools. The process of reading the data from the FIFO is
A digital circuit is implied to be synchronous in its similar to writing/pushing process except that the data
logic, if it monitors the controlling of all activities in is supplied by the FIFO and obtained by the receiver.
the circuit with a single clock signal. A circuit The control block contains a control cell for each data­
operating in another sense is said to be a non­ cell in data bank block and provides control signals to it
synchronous or simply an asynchronous circuit, where as their second input. Each control cell is used to
no clock signal. The work in [10] presents an control one data cell in the data bank block and the
implementation of a GALS pipelined processor wr/rdJeq input signals are fed into every control cell
integrated on a synchronous viable FPGAs, where a by default. The basic principle of this control block is
pipelined accumulator-based processor as a sample for that every control cell responds to write and read
a pipelined processor is implemented with variable request signals only when it has permission for the
stages' delays. This work presents a unique idea of a same and then as for an answer to its request a token
port controller suitable for the pipeline to function for read operation is granted to control cell 1 after the
correctly under the influence of stage delays hence system reset. After that the write/push token will be
resulting in a performance with increased and relatively transferred to the next control cell after the current cell
reduced power consumption. The work in [9] also of write/push token completed its operation of
illustrated the categories of asynchronous circuits on writing/pushing data in the FIFO and when the
the basis of delay offered by the wires and gates. The write/push token reaches the last cell in the control
delay-insensitive (DI) model which is more robust can block, it will return to control cell 1. Thus, the data can
operate accurately irrespective of any wire or gate be written into or read from the asynchronous FIFO
delay offered, but this comes under a much restrictive without actually moving the data across the
class. The other class constituted by quasi-delay­ asynchronous FIFO data movement by following this
insensitive (QDI) model attracts the attention by token passing scheme.
offering signal transitions which occur at the exact time B. Data Bank Block:
only at the distinct end point of the mentioned forks. The data block is composed of a series of latches
arranged in an array in each data cell and a multiplexer.
3. ARCHITECTURE OF ASYNCHRONOUS FIFO The function of the data block is to latch the incoming
In this paper,an asynchronous FIFO is presented which data datajn and process the output data data_out
avoids data movement in a pipeline which means that requested by the control signals from the control block.
the data don't actually move through the pipeline, Each data cell block will take two-inputs where one
instead it moves through latches connected in an input is a data_in fed via default and the other is an
asynchronous manner in the data cell blocks. The enable/control signal coming from each control cell.
presented FIFO contains two main units: a control The enable/control signals coming from control block
block and a data cell/bank block. The basic block are used as the enable signals for latching the incoming
diagram considered in the architecture of the data. The latches used in data cell are same with the
asynchronous FIFO is provided in figure 1. latches used in the control block. The number of latches
A. Control Logic Block: in data cell latch may depend on the data width of the
The control block employs two inputs and two outputs FIFO. The read_ctr/en signals shown in the figure
(wrJeq & rdJeq) and (wr_ack & rd_ack). This FIFO above is the combination of all the read/ctr/en signals
works with four phase bundled-data handshaking coming from each control cells. It is used as the control
protocol. The process of moving data across the FIFO signal for selecting the requested data in the different
(asynchronous/synchronous) can be explained in the data cells. There's also an arbiter is used in the
following steps: construction of the proposed asynchronous FIFO
Step a: The sender sets the request signal ('wrJeq' because it can only handle one request write/read at a
signal) after the data to be sent is ready (Data_In). time, therefore, by using an arbiter this problem can be
Step b: The FIFO will set acknowledge signal (wr_ack solved as it can grant either read or write request to take
signal) after receiving the incoming data successfully. effect at one time. Table 1 illustrates the characteristics
of the implemented FIFO.

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

CO\1ROl
LOGIC
BlOCJ(

Fig 1. Basic block diagram of the presented asynchronous pipeline

Table 1: Characteristics of the implemented FlFO

Signal Description

It writes or push_req input signals are fed into every cell of the control block by default,
the basic principle of its working is that every control cell responds to push/write and
pop/read request signals only when it has permission for the corresponding operation.

It reads or popJeq input signals are fed into every control cell by default composed by
Wr ack/ Rd ack the 'push/pop-ack' signals from each cell of the control block,which is reasonable because
only one cell of this block will assert its push/pop acknowledge signal for the current
push/pop request.

Data iniData out The function of this data bank block is to latch the incoming data or output the requested
data by the control signals from the'Control Logic' Block.

design process before the synthesis of the asynchronous


4. THE DESIGN FLOW
pipeline model of FIFO with the system level
The experimental setup detail suggests the basic behavioral modelling, and then this pipeline model is
design flow has been elaborated in this section as evaluated. The RTL and timing evaluation is done
shown in figure 2, describes the head-on approach using the test vector values and with Verilog used at
applied to the problem pursued in our project. The resistor-transfer-level (RTL), a superior caution has
head-on approach is a step-by-step design flow which been intended in the coding of the two blocks, the
trails from design entry to design synthesis, to design control logic and the data bank, in an attempt to write
implementation and lastly the device programming. an entirely parameterized code for the design. When an
The initial stage for design entry begins with creating a RTL or behavioral simulation has been done to confirm
new project and under it some project files including the precise functionality of the circuit, the logical
user constraints (.ucf file), then assigning them with design synthesis is done,firstly the synthesis tool based
constraints such as timing,pin and area constraints. The on the Verilog code generates a technology­
required design verification involves both functional independent schematic and then it optimizes the circuit
verification and timing verification, which are to the selected FPGA specific library (Spartan-3A,
considered at different levels during the design flow, XC3S1200E). Beside of this, the requirements
which would respectively be before synthesis and after specifically comprising the timing, pin and area
translate. The primary concern is centered on circuit constraints, are required to be defined. The XILINX­
specifications and functionality, i.e. commencing the ISE design implementation includes translate, map and

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

place and route (PAR-tool), using XST-tool receives output. Finally, for programming the XILINX Device,
the generated input net list (.edt) file. At first this input the place and route program is received thereby
net list file is translated by a translation program along creating a programming file (.bit) for configuration.
with the other design constraints to a database file of The use of iMPACT to program the device employing
XILINX, ensuing to a positive run of the translation a programming cable is done. Prior to programming the
program, the map program then maps the design to a FPGA file, a timing simulation needs to be executed
XILINX FPGA design. Lastly the PAR tool program generating a JTAG file for debugging the circuit,
accepts the map design, accordingly places and routes regarded as a set of timing functions and requirements
the FPGA resulting in bit-stream generator (BitGen) corresponding to the RTL codes.
,-----�--�------------�----�

Fig 2. The Design Flow

5. The Simulation Results The supporting RTL schematic so generated is


provided in the figure 3.
Subsequently, after construction of all the key
constituents i.e. 2-input C-element, control logic 6. Conclusion
blocks, and the data bank block, in RTL, the
The asynchronous FIFO design, presented hereby,
asynchronous FIFO design at gate-level using Verilog,
escapes the movement of data during its flow through
this can be concluded that the asynchronous pipelined
the FIFO by relating to a method of exchanging token
design of FIFO can ensemble logically in the
in its control pipelines and applying multiplexer in its
synchronous design flow and tools. And that it can be
data register bank. With no global clock monitoring the
certainly assimilated with other synchronous designs
control logic and data bank blocks, we are able to
also. The modifications are that the synchronous
design an asynchronous pipelined FIFO design method
pipeline designs of FIFO relate four-phase handshake
suitable for a possible integration with other
protocol controlled by a global clock to read or write
synchronous RTL design methods to construct
data where a few counters can also be included as the
enormous mixed systems with both synchronous­
control logic to monitor the whereabouts of the present
asynchronous features such as macro synchronous and
position of read or write tokens. The presented
micro asynchronous (MSMA) pipelines and globally
asynchronous and synchronous FIFO designs have
asynchronous and locally synchronous (GALS) on-chip
been synthesized in RTL using Verilog. The considered
network. Compared to other synchronous FIFOs with
widths of data in the data bank and the depth of FIFO
similar capabilities, the new FIFO is simpler to design
depth are taken as 64 bits and four cells respectively.
due to the absence of timing constraints and with
The area estimation can be done by considering the
relatively reduced delay and power values.
number of LUTs and dynamic power consumption after
synthesis are illustrated in the device utilization table 2.
The estimated delay and power analysis calculated with
the help of Xiiinx ISE tool 12.1 is illustrated in table 3.

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

Table 2: FPGA utilization for the implemented FIFO architecture:

Logic Utilization Used Available Utilization

Number of 4 input LUTs 565 17,344 3%

Number of Slices containing only related logic 288 288 100%

Number of Slices containing unrelated logic o 288 0%

Total Number of 4 input LUTs 569 17,344 3%

Number used as logic 565

Number used as a route-thru 4

Number of bonded lOBs 12 190 6%


Average Fanout of Non-Clock Nets 3.63

Table 3: Delay and Power of FIFO architecture


WANG FIFO Delay Power
(ns) (mW)
Asynchronous 408.165ns 2.86
FIFO
Synchronous 414.352 3.84
FIFO

-
.. --
.-
PC'
FU
-

- r
...r:.
-
� ,c
tt-' �
ld;. - �.

U
-

bJ
--
r- r"'

tr
� "-'

.- ..
-
-
--

r- -t,.;J.
-
-
--..
-
l.di,

""". -

I! J -
-=
� "-'
--

.-
- -

r
-
� �.
"-'

i-f6., -
IbJ - -
'='

� -

["- "='

-<:: . -
"='
-

IbJ -
-«.-
-t..J

.-

Fig 3. Generated schematic of the presented asynchronous FIFO design

6th ICCCNT 2015


July 13 - 15, 2015, Denton, U.S.A
IEEE - 35239

References [6] Semiconductor Industry Associations, "The


InternationalTechnology Roadmap for

[1] Nowick, S. and Singh, M., "High-Performance Semiconductors: 2011", ITRS 2011 Edition

Asynchronous Pipelines: An Overview," IEEE Design, 2011.

Design Test of Computers, vol. 2S, no. 5, pp.S- [7] Law, C., Gwee, B. H., Senior Member, IEEE,

22, Sept 2011. and Chang, J. S., "Modeling and Synthesis of

[2] Steininger, A., Veeravalli, V. S., Alexandrescu, Asynchronous Pipelines", IEEE Transactions

D., Costenaro, E., Anghel, L., "Exploring the on Very Large Scale Integration (VLSI)

State Dependent SET Sensitivity of Systems, vol. 19, no. 4, pp. 6S2-695, April

Asynchronous Logic - The Muller-Pipeline 2011.

Example", 32nd IEEE International [8] Wang, X. and Nurmi, J., "A RTL

Conference on Computer Design (ICCD), pp. Asynchronous FIFO Design Using Modified

61-67, Oct 2014. Micropipeline", International Baltic Electronics

[3] Mannakkara, C. and Yoneda, T., Conference, pp. 1-4, Oct. 2006.

"Asynchronous Pipeline Controller Based on [9] Moreira, M. T., Oliveira, B. S., Moraes, F. G.

Early Acknowledgement Protocol", Sth and Calazans, N. L. V. , "Impact of C-elements

International Conference on Application of in asynchronous circuits," 13th International

Concurrency to System Design (ACSD), pp. Symposium on Quality Electronic Design

llS-127, June 200S. (ISQED), pp. 437-343, March 2012.

[4] Furber, S. B. and Day, P., "Four-phase [10] Farouk. H. A.. EI-Hadidi, M. T..

micropipeline latch control circuits", IEEE "Implementing Globally Asynchronous

Transactions on Very Large Scale Integration Locally Synchronous Processor Pipeline on

(VLSI) Systems, vol.4, no. 2, pp. 247-253, Commercial Synchronous FPGAs", IEEE 17th

June 1996. International Conference on

[5] Semiconductor Industry Association, 'The Telecommunications (ICT), pp. 9S9-994, April

International Technology Roadmap for 2010.

Semiconductors", ITRS 200S Edition, 200S.

6th lCCCNT 2015


July 13 - 15, 2015, Denton, U.S.A

You might also like