UNIT5 - PLDS, CPLDs & FPGA
UNIT5 - PLDS, CPLDs & FPGA
6/27/2024
UNIT-5
Contents
• Programmable Logic Devices
• Storage Devices
• PLA & PAL
• CPLDs
• FPGAs
6/27/2024
Introduction
• The popular digital ICs like TTL or CMOS have
fixed functionality and the user has no option to
change or modify their functionality .i.e they
work according to the design given by the
manufacturer.
• To get the user expected functionality from these
ICs designers started thinking of a methodology
by which the functionality of an IC can be
modified or changed.
• This introduced the idea of using Fuses in ICs
very soon gained momentum.
6/27/2024
Contd..
• This is the motivation for the invention of
programmable devices and was realized in early
70s with the design of PLD by Ron Cline from
Signetics .
• The method of changing or modifying the
functionality of an IC using the Fuses was
appreciated and gained momentum soon.
• This method of blowing a Fuse between two
contacts or keeping the Fuse intact was done by
using a software and hence these devices were
called Programmable Logic Devices(PLDs).
6/27/2024
Contd..
• Programmable Read-Only Memories(PROM)
were the first programmable logic devices to
achieve widespread use in digital systems.
• PROM allowed the chip vendor, to store code
in the device using a simple and relatively
inexpensive desktop programmer.
• This new device was called a programmable
6/27/2024
Contd..
• The process of storing the code in the PROM
is called programming, or “burning” the
PROM. PROMs, like ROMs, retain their
contents even after power has been turned
off.
• The PROMs were initially intended for storing
code and constant data but design engineers
found them useful for implementing logic also.
• The engineers could program state machine logic
into a PROM, creating what is called “micro-
coded” state machines.
6/27/2024
A simple PROM CELL
6/27/2024
Contd..
• A PROM can be constructed with an array of
fuses and transistors as shown in the previous
slide. The fuses are like household fuses that
consist of a wire that breaks connection when
a large amount of current goes through it.
• To program a one-bit cell as a logic 0 or 1, the
6/27/2024
Contd..
• Eventually, erasable PROMs were developed
which allowed users to program,erase, and
reprogram the devices using an inexpensive,
desktop programmer.
• Typically, PROMs now refer to devices that
cannot be erased after being programmed.
• Erasable PROMS include erasable
programmable read only memories (EPROMs)
that are programmed by applying high-voltage
electrical signals and erased by flooding the
devices with UV light
6/27/2024
Contd
… only
• Electrically erasable programmable read
memories (EEPROMs) are programmed and
erased by applying high voltages to the
device.
• Flash EPROMs are programmed and erased
electrically and have sections that can be
erased electricallyin a short time and
independently of other sections within the
device.
6/27/2024
Contd..
• PROMs are excellent for implementing any kind
of combinational logic with a limited number of
inputs and outputs. Each output can be any
combinatorial function of the inputs, no matter
how complex.
• The problem with PROMs is that they tend to be
extremely slow.
• Even today , access times are of theorder of
40 nano-seconds or more
• Hence they are not useful forhigh speed
applications .
6/27/2024
6/27/2024
Memory Arrays
Mask Programmed ROMs -Data is written during chip fabrication using a photo mask
Fused ROMs -Data is written by blowing the fuse electrically, hence cannot be modified later
Programmable Read Only Memories (PROMs) :Data is written after chip fabrication
Erasable PROMs -Complete block is erased using UV light which is penetrated through glass
window
Flash - Programmed using high electrical voltage. Erases data in blocks hence faster
14
Memory
Architecture
m×n
Stores large number of bits
memory
m x n: m words of n bits each …
words
k = Log2(m) address input signals
…
m
k
or m = 2 words
e.g., 4,096 x 8 memory:
n bits per
32,768 bits word
memory external
12 address input signals view
r/w
8 input/output data signals 2k × n read and write memory
enable
Memory access
A0
r/w: selects read or write …
A
enable: read or write only when asserted k-1
…
multiport: multiple accesses to different locations simultaneously
Q Q0
n-1
15
Semiconductor Memory Types
(Cont.)
RAM: the stored data is volatile
DRAM
A capacitor to store data, and a transistor to access the capacitor
Need refresh operation
Low cost , and high density ⇒ it is used for main memory
SRAM
Consists of a latch
Don’t need the refresh operation
High speed and low power consumption ⇒it is mainly used for cache
memory and memory in hand-held devices
16
ROM: “Read-Only” Memory
External
Nonvolatile view
Can be read from but not written to, by a processor in an 2k × n ROM
microcomputer system enable
…
Store software program for general-purpose processor A
k-1
…
Store constant data (parameters) needed by system
Implement combinational circuits (e.g., decoders)
Q Q0
n-1
17
Example: 8 x 4
ROM
Internal
view
18
Memory –
ROM
ROM Arrays
19
Memory –
ROM
NOR-based ROM
In order to Read from the
array, the Row line is
asserted and the desired
Column line is observed
a NOR-based ROM is
similar to a Hex Keypad
20
Memory –
ROM
NAND-based ROM
NAND-based ROM is a different architecture
array it uses a depletion-load NMOS as the pull-up
transistor
the Column NMOS’s are connected in series with the column lines (i.e.
a NAND configuration)
If an NMOS exists in the Column line and the Row line is asserted, the
NMOS will pull the Column Line down and represent a stored ’0’
If an NMOS is absent on the Column line and the
21
Memory –
ROM
23
EPROM: Erasable programmable
ROM
(b) Large positive voltage at gate causes negative charges to move out
(a
of channel and get trapped in floating gate storing a logic 0
)
(c) (Erase) Shining UV rays on surface of floating-gate causes negative +15V
charges to return to channel from floating gate restoring the logic 1 source drain
(b
(d) An EPROM package showing quartz window through which UV )
light can pass 5-30 min
(c
can be erased and reprogrammed thousands of times
)
Reduced storage permanence
(d
program lasts about 10 years but is susceptible to radiation and )
electric noise,Typically used during design
24 development
Sample EPROM
components
25
Sample EPROM
programmers
26
EEPROM: Electrically erasable programmable
ROM
Programmed and erased electronically
typically by using higher than normal voltage
can program and erase individual words
Better write ability
can be in-system programmable with built-in circuit to provide higher than normal voltage
built-in memory controller commonly used to hide details from memory user
writes very slow due to erasing and programming
“busy” pin indicates to processor EEPROM still writing
can be erased and programmed tens of thousands of times
Similar storage permanence to EPROM (about 10 years)
Far more convenient than EPROMs, but more expensive
27
FLASH
PROM
Extension of EEPROM
Same floating gate principle
Same write ability and storage permanence
Fast erase
Large blocks of memory erased at once, rather than one word at a time
Blocks typically several thousand bytes large
Writes to single words may be slower
Entire block must be read, word updated, then entire block written back
Used with embedded microcomputer systems storing large data items in nonvolatile memory
e.g., digital cameras, MP3, cell phones
28
Read/Write Memory
SRAM
bit
write
write_b
read
read_b
29
6T SRAM Cell
Cell size accounts for most of array size
Reduce cell size at expense of complexity
6T SRAM Cell bit bit_b
Used in most commercial chips word
Data stored in cross-coupled inverters
Read:
Precharge bit, bit_b
Raise wordline
Write:
Drive data onto bit, bit_b
Raise wordline
30
SRAM Read
bit bit_b
word
Precharge both bitlines high P1
P2
Then turn on wordline N2 N4
A A_b
One of the two bitlines will be pulled down by the cell
N1
Ex: A = 0, A_b = 1 N3
A_b bit_b
bit discharges, bit_b stays high
A must not flip 1.5
N1 >> N2
1.0
word bit
Read stability A
0.0
0 100 200 300 400 500 600
time (ps)
31
SRAM
Write
bi bit_
Drive one bitline high, the other low t b
wor
Then turn on wordline d P1 P2
Bitlines overpower cell with new value N N
2 4
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 A A_
N1 b
Force A_b low, then A rises high
N3
Writability
A_b
N2 >> P1 bit_b
1.0
0.5
word
0.0
0 100 200 300 400
500 600 700
time (ps)
32
DRAM
DRAM store their contents as charge on a capacitor rather than in a feedback loop.
The cell must be periodically read and refreshed so that its contents do not leak
away. Like SRAM accessed by asserting wordline to connect the capacitor to the
bitline.
33
DRAM READ
When wordline rises the capacitor shares its charge with the bitline causing a voltage change that can be
sensed.
Some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when writing a ‘1’.
DRAM capacitor must be physically small as possible to achieve good density.
According to charge-sharing equation the voltage swing on bitline during readout
34
Serial Access
Memories
35
Shift Register
clk
Din Dout
8
36
Serial In Parallel Out
clk
Sin
P0 P1 P2 P3
37
Parallel In Serial Out
P0 P1 P2 P3
shift/load
clk
Sout
38
FIFO, LIFO Queues
39
Content Addressable
Memories
40
CAMs
adr data/key
read
CAM match
write
41
What is
CAM?
01 0 1 1 0 X
Read operation in traditional memory: 0110X
10 0 1 1 X X
11 1 0 0 1 1
01101
Content Addressable
Memory
42
Simplified CAM Block
Diagram
The input to the system is the search word.
The search word is broadcast on the search lines.
Match line indicates if there were a match btw. the search and stored word.
Encoder specifies the match location.
If multiple matches, a priority encoder selects the first match.
Hit signal specifies if there is no match.
The length of the search word is long ranging from 36 to 144 bits.
Table size ranges: a few hundred to 32K.
Address space : 7 to 15 bits.
43
Type of
CAMs
44
CAM
Advantages
They associate the input (comparand) with their memory contents in one clock
cycle.
They are configurable in multiple formats of width and depth of search data
that allows searches to be conducted in parallel.
CAM can be cascaded to increase the size of lookup tables that they can store.
We can add new entries into their table to learn what they don’t know before.
They are one of the appropriate solutions for higher speeds.
45
CAM
Disadvantages
They cost several hundred of dollars per CAM even in large quantities.
They occupy a relatively large footprint on a card.
They consume excessive power.
Generic system engineering problems:
Interface with network processor.
Simultaneous table update and looking up requests.
110
47
48
49
Classification of PLDs
• The classification of PLDs is given
below.
6/27/2024
Simple Programmable Logic Device [SPLD]
6/27/2024
Contd
…
6/27/2024
Contd..
• There are three main types of SPLD
architectures
• (i).Programmable logic array (PLA),
• ii) Programmable array logic (PAL) , and
(iii) Generic array of logic (GAL)
6/27/2024
Contd…
• Two of the most popular SPLDs are the PALs
produced by Advanced Micro Devices (AMD)
known as the 16R8 and 22V10.
• Both of these devices are industry standards
6/27/2024
Contd..
• Another widely used and second sourced
SPLD is the Altera Classic EP610.
• This device is similar in complexity to
PALs, but it offers more flexibility in the
way that outputs are produced and has
larger AND- and OR- planes.
• In the EP610, outputs can be registered
and the flip-flops are configurable as any
of D, T, JK, or SR.
6/27/2024
PLA- Programmable logic array
• The PLA consists of two programmable planes
AND and OR . The AND plane consists of
programmable interconnect along with AND
gates.
• The OR plane consists of programmable
6/27/2024
PROGRAMMABLE ARRAY LOGIC (PAL)
• The first programmable device was the
programmable array logic (PAL) developed by
Monolithic Memories Inc(MMI).
• The Programmable Array Logic or PAL is
similar to PLA, but in a PAL device only AND
gates are programmable. The OR array is
fixed by the manufacturer.
• This makes PAL devices easier to program and
less expensive than PLA. On the other hand,
since the OR array is fixed, it is less flexible
than a PLA device.
6/27/2024
Schematic Representation- PAL
6/27/2024
Block diagram of
PAL
6/27/2024
PAL contd..
• The PAL device. has n input lines which are
fed to buffers/inverters.
• Buffers/inverters are connected to inputs of
6/27/2024
Verilog code to design a PLA
module PLA (
input wire A, In this example:
input wire B, •The AND plane has four products P1, P2, P3, P4
input wire C, •The OR plane sums these products to form the
output wire F1, outputs,F1 & F2 represented as follows.
output wire F2);
wire P1, P2, P3, P4; F1 = (AB + AC)
// OR plane
assign F1 = P1 | P2;
assign F2 = P3 | P4;
endmodule
6/27/2024
Verilog code to design a PAL
module PAL (
input wire A,
input wire B, In this example:
input wire C, •The AND plane has four products P1, P2, P3, P4
output wire F1, •The OR plane has fixed connections to produce
output wire F2); the outputs F1 & F2 represented as follows.
wire P1, P2, P3, P4;
F1 = (AB + AC1)
// AND plane
assign P1 = A & B;
assign P2 = A & ~C; F2 = (A1B + A1B1)
assign P3 = ~A & B;
assign P4 = ~A & ~B;
endmodule
6/27/2024
GAL-Generic Array Logic
• PAL and PLA devices are one-time
programmable (OTP) based on PROM, so the
PAL or PLA configuration cannot be changed
after it has been configured.
• This limitation means that the configured
device would have to be discarded and a new
device configured. The GAL, although similar
to the PAL architecture, uses EEPROM and
can be reconfigured.
6/27/2024
Contd…
• The Generic Array Logic (GAL) device was
invented by Lattice Semiconductor.
• The GAL was an improvement on the PAL
because one device was able to take the place
of many PAL devices or could even have
functionality not covered by the original range.
Its primary benefit, however, is that it is
erasable and re-programmable making
prototyping and design changes easier
engineers. for
6/27/2024
Complex Programmable Logic Devices
(CPLDs)
• CPLDs were pioneered by Altera, first in their
family of chips called Classic EPLDs, and then
in three additional series, called MAX 5000, MAX
7000 and MAX 9000.
• The CPLD is the complex programmable Logic
Device which is more complex than the SPLD.
• This is build on SPLD architecture and creates a
much larger design. Consequently, the SPLD can be
used to integrate the functions of a number of
discrete digital ICs into a single device and the
CPLD can be used to integrate the functions of a
number of SPLDs into a single device.
6/27/2024
Contd..
• CPLD architecture is based on a small number of
logic blocks and a global programmable
interconnect.
• Instead of relying on a programming unit
configure chip , it is advantageous to be to able to
perform the programming while the chip is still
attached to its circuit board.
• This method of programming is known is called
In-System programming (ISP). It is not usually
provided for PLAs (or) PALs , but it is available for
the more sophisticated chips known as Complex
programmable logic device.
6/27/2024
Architecture of CPLD
6/27/2024
Contd…
• The CPLD consists of a number of logic
blocks or functional blocks, each of which
contains a macrocell and either a PLA or
PAL circuit arrangement.
• In the diagram eight logic blocks are
shown. The building block of the CPLD is
the macro-cell, which contains logic
implementing disjunctive normal form
expressions and more specialized logic
operations.
6/27/2024
Contd..
• The macro cell provides additional circuitry
to accommodate registered or nonregistered
outputs, along with signal polarity control.
• Polarity control provides an output that is a
true signal or a complement of the true
signal.
• The actual number of logic blocks within a
CPLD varies the more logic blocks
available, the larger the design that can be
configured.
6/27/2024
Contd..
• In the center of the design is a global
programmable interconnect.
• This interconnect allows connections to the
logic block macrocells and the I/O cell arrays
(the digital I/O cells of the CPLD connecting
to the pins of the CPLD package).
• The programmable interconnect is usually
based on either array-based interconnect or
multiplexer-based interconnect
6/27/2024
CPLD
Architecture
6/27/2024
Contd..
• Multiplexer-based interconnect uses digital
multiplexers connected to each of the
macrocell inputs within the logic blocks.
• Specific signals within the programmable
6/27/2024
Contd..
Every FPGA consists of the following
elements.
• Configurable logic blocks(CLBs)
• Configurable input output blocks(IOBs)
• Two layer metal network of vertical and
horizontal lines for interconnecting the
CLBS. Which are called Programmable
Interconnects.
6/27/2024
XILINX Logic Cell Array(LCA)
• LCA is the novel architectural feature
introduced by XILINX in the year 1985
for their FPGA devices. It is almost like a
proprietary or trade mark property of
XILINX implemented for FPGA devices.
• The XILINX LCA architecture consists
of three major Components. They are
(i) Configurable Logic Blocks (CLBs)
(ii)Input/Output Blocks (lOBs)and
(iii)Programmable Interconnect.
6/27/2024
Contd…
• In addition, configuration memoryis used to
hold the configuration program bits which
control the configuration of CLBs, IOBs and
interconnect.
• This LCA architecture consists of an interior
matrix of logic blocks and a surrounding ring of
I/O interface blocks.
• Interconnect resources occupy the channels
between the rows and columns of logic blocks
and between the logic blocks and I/O blocks.
Like a microprocessor the LCA is a program
ay, De ce mb er 31, 2 0 13
Contd..
• The functions of the LCA‟s configurable
logic blocks and I/O blocks and their
interconnection are controlled by a
configurationprogram stored an on-chip
in memory.
• The configuration program is loaded
automatically from an external memory on
power-up or on command, or is programmed
by a microprocessor as part of system
initialization
6/27/2024
Contd..
As shown below diagram the configuration
memory consists of a distributed array of
static memory cells.
During configuration the cell is written
through the data line and is read through
the data line during read back operation
6/27/2024
LCA-Architecture
• The core of the LCA is a matrix of
identical
Configurable Blocks (CLBs).Each CLB
containsprogrammable combinational logic
and storage registers.
• The combinational logic section of of the
6/27/2024
CLB Block Diagram
6/27/2024
I/O Blocks
• The periphery of the Logic Cell Array is made
up of user programmable input/output blocks
(IOBs).
• Each block can be programmed independently
to be an input ,an output or bi-directional pin
with three state control. Inputs can be
programmed to recognize either TTL or
CMOS thresholds.
• Each IOB also includes flip-flops that can be
6/27/2024
a) General Purpose Interconnect
• It consists of a grid of five horizontal and
five vertical metal segments located between
the rows and columns of logic and IOBs.
• Each segment is the height or width of a
logic block.
• Switching matrices join the ends of these
segments and allow programmed
interconnections between the metal
segments of adjoining
gridrows and columns.
6/27/2024
Contd..
6/27/2024
Contd...
• The switches of an un-programmed device are
all non-conducting.
• The connections through the switch matrix
may be established by the automatic routing or
by selecting the desired pairs of matrix pins to
be connected or disconnected.
• The interconnect buffers are available to
propagate signals in either direction on a given
general interconnect segment.
• These bidirectional (bidi) buffers are found
adjacent to the switching matrices, above and
to the right.
6/27/2024
b) Direct Interconnect
• Direct interconnect provides the most
efficient implementation of networks between
adjacent CLBs or I/O Blocks. Signals routed
from block to block using the direct
interconnect exhibit minimum interconnect
propagation and use no general interconnect
resources.
6/27/2024
Contd..
6/27/2024
Contd
…
• Directinterconnect should be used to
maximize the speed of high-performance
portions of logic.
• Where logic blocks are adjacent to IOBs,
direct connect is provided alternately to the
IOB inputs (I) and outputs (O) on all four
edges of the die.
• The right edge provides additional direct
connects from CLB outputs to adjacent IOBs.
6/27/2024
c) Long lines
• The Long lines bypass the switch matrices
and are intended primarily for signals that
must travel a long distance, or must have
minimum skew among multiple destinations.
• Long lines, run vertically and horizontally the
height or width of the interconnect area.
• Each interconnection column has four
vertical Long lines, and each interconnection
row has two horizontal Long lines.
6/27/2024
Contd..
6/27/2024
Contd
• …
Two additional Long lines are located adjacent
to the outer sets of switching matrices.
• Long lines can be driven by a logic block or IOB
output on a column-by-column basis.
• This capability provides a common low skew
control or clock line within each column of
logic blocks.
• Isolation buffers are provided at each input to a
6/27/2024
Contd..
• There are two primary uses for the SRAM cells.
Most of them are used to set the select lines to
multiplexers that steer interconnect signals.
• The majority of the remaining SRAM cells are
used to store the data in the lookup-tables (LUTs)
that are typically used in SRAM-based FPGAs to
implement logic functions.
• Historically, SRAM cells were used to control the
tri-state buffers and simple pass transistors that
were also used for programmable interconnect.
6/27/2024
• SRAM-based programming technology has
become the dominant approach for FPGAs
because of its re-programmability and the use
of standard CMOS process technology and
therefore leading to increased integration,
higher speed and lower power
dynamic
consumption of new process with smaller
geometry.
6/27/2024
Contd..
• There are however a number of
associated drawbacks with SRAM-based
technology programming
• For
.whichexample an SRAM cell requires 6 transistors
makes this technology costly in terms of
area compared to other programming
technologies.
• Further SRAM cells are volatile in nature and
external devices are required to permanently store
the configuration data.
• These external devices add to the cost and area
overhead of SRAM-based FPGAs.
6/27/2024
ii) Flash Programming Technology
• An important alternative to theSRAM-based
programming technology is the use of
EEPROM based programming flashtechnology.
or This
technology inject charge onto a gate that
above the transistor.
“floats”
• This approach is used in flash or EEPROM memory
cells. These cells are non-volatile; they do not lose
information when the device is powered down.
• With modern IC fabrication processes, it has become
possible to use the floating gate cells directly as
switches.
6/27/2024
Contd..
• Flash memory cells, in particular, are now used
because of their improved area efficiency.
• The widespread use of flash memory cells for
non-volatile memory chips ensures that flash
manufacturing processes will benefit from
steady decreases in process geometries.
• Flash-based programming technology offers
several advantages. For example, this
programming technologyis nonvolatile
nature. in
6/27/2024
• Flash-based programming technology is also
more area efficient than SRAM-based
programming technology.
• Flash-based programming technology has its
own disadvantages also.
• Unlike SRAM-based programming
technology, flash based devices cannot be
reconfigured/reprogrammed an infinite
number of times.
• Also, flash-based technology uses non-
standard CMOS process.
6/27/2024
Contd..
• This flash-based programming technology
offers several unique advantages, most
importantly non-volatility.
• This feature eliminates the need for the
6/27/2024
Contd..
• In devices from Altera, Xilinx and Lattice,
on-chip flash memory is used to provide non-
volatile storage while SRAM cells are still
used to control the programmable elements in
the design.
• This addresses the problems associated with
6/27/2024
• Programming an anti-fuse requires extra
circuitry to deliver the high programming
voltage and a relatively high current of 5
mA or more.
• This is done in through fairly sizable pass
transistors to provide addressing to each
anti-fuse. Anti-fuse technology is used in
the FPGA‟s from Actel , Quick logic ,
and Cross point
6/27/2024
Contd..
• A major advantage of the anti-fuse is its
small size, little more than the cross-section
of two metal wires.
• But this advantage is limited by the large
size of the necessary programming
transistors, which handle large currents, and
the inclusion of isolation transistors that are
sometimes needed to protect low voltage
transistors from high programming voltages.
6/27/2024
Contd..
• A second major advantage of an anti-fuse is its
relatively low series resistance.
• The on-resistance of the ONO anti-fuse is 300
6/27/2024
Contd..
• The limitations of this technology are , this
technology does not make use of standard CMOS
process.
• Also, anti-fuse programming technology based
devices cannot be reprogrammed.
• The ideal technology should be re-
programmable, and that
non-volatile, standard uses a
• But
CMOSitprocess.
is clear that none of the above
technologies satisfy these conditions
6/27/2024
Comparison of Programming
Technologies
6/27/2024
FPGA Design Flow
• The earlier PLD and FPGA designs were
performed largely by hand But to-days
complex programmable logic devices requires
the use of an integrated Computer-Aided
Design (CAD) system.
• Both commercial CAD tool vendors and FPGA
companies offer appropriate tools.
• For example, traditional Electronic Design
Automation (EDA) vendors such as Cadence,
Mentor Graphics, Synopsys, and View Logic
etc. offer tools to support FPGA design. s
6/27/2024
contd..
• These tools are typically used for the front-end
design entry and simulation operations and
provide the necessary interfaces to vendor-
specific back-end tools for chip placement and
routing.
• Examples of vendor specific tools are the
Xilinx XACT system and the Altera
MAX+PLUS II software.
• The Altera‟s MAX+PLUS II software
supports the entire design flow on either PC or
workstation platforms.
6/27/2024
Steps of FPGA Design Flow
• The first step in the design process is the description
of the logic circuit, which can be done either by
schematic capture tool or with Boolean expressions.
• This is followed by a translation that converts the
original circuit description into a standard format
used by the suitable CAD tools (Ex: XILINX CAD
tools).
• The circuit is then passed through CAD programs that
partition it into appropriate logic blocks. Select a
specific location in the FPGA for each logic block
and form the required interconnections.( (Cadence,
View Logic, OrCAD, etc.)
6/27/2024
Initial Design Entry
• The detailed description of the logic circuit are
entered using a schematic capture program. In the
design entry phase, RTL or schematic entry is used to
create the logic to be implemented in the device.
• Pin assignments can also be made, including pin
placement information, and timing constraints that
might be necessary for building a functioning design.
• In the design entry step a schematic or Block Design
File (.bdf) is created that is the top-level design. The
library of parameterized modules (LPM) functions
are added and Verilog HDL code is used to add a
logic block
6/27/2024
Contd..
• The library may be either supplied by the
vendor of the schematic capture program or
any FPGA vendor(Like Xilinx or Altera etc).
• An alternate way to specify the logic circuit is
to use a Boolean expression or state machine
language.
• This is done without the graphical interface.
Some times it is possible to use a mixture of
both schematic and Boolean expressions
6/27/2024
Translation to XNF Format
6/27/2024
Contd..
• In the design flow process the simulation is
very important to learn, and there are entire
applications devoted to simulating hardware
designs.
• There are two types of simulation, RTL and
timing. RTL (or functional) simulation allows
you to verify that your code is place-and-route)
simulation verifies that the design meets
timing and functions appropriately in the
device
6/27/2024
contd..
• After completion of the design ,its
performance is checked either by downloading
the configuration bits into FPGA or by using
an interface to a timing simulation program.
• If the performance is not satisfactory ,suitable
modifications are done at some point in the
design flow.
• Once the timing and functionality is verified
the implementation is complete.
6/27/2024
APPLICATIONS OF FPGAs
• FPGAs have gained rapid acceptance over the
past two decades.
• Users can apply them to a wide range of
applications like random logic, integrating
multiple SPLDs, device controllers,
communication encoding and filtering,
small- to medium-size systems with SRAM
blocks, and many more.
• Another interesting FPGA application is
prototyping designs to be implemented in
gate arrays by using one or more large
FPGAs.
6/27/2024
contd..
• Another application is the emulation of entire
large hardware systems via the use of many
interconnected FPGAs.
• FPGAs offer particularly powerful solutions
for meeting machine vision, industrial
networking, motor control, and video
surveillance needs.
• For example, the flexibility of FPGAs allow
designers to quickly adapt to changing image
sensor interfaces and image processing
requirements, evolve analysis capabilities to
keep pace with market requirements, and add
features and functions long after deployment.
6/27/2024
contd..
• FPGAs are also used as custom computing
machines.
• This involves using the programmable parts to
execute software, rather than compiling the
software for execution on a regular CPU.
• FPGAs provide a unique combination of highly
parallel custom computation, relatively low
manufacturing/engineering costs, and low
power requirements.
6/27/2024
Contd..
• FPGAs meet critical timing and performance
requirements with parallel processing and real-
time industrial application performance,
permitting greater system integration and lower
development cost.
• In areas such as Industrial Networking and
Imaging, where the protocols and standards are
shifting and changing, the programmability of
FPGAs versus fixed logic chips such as ASICs
and ASSPs allows for both faster time-to-
market and longer time-in-market.
6/27/2024
Conclusion
• The low cost ,fast manufacturing turnaround is
the secret behind the market success of
FPGAs.
• Though the large, slow programmable
switches prevent FPGAs from providing the
speed performance ,the improvements in
architecture and CAD tools will overcome
these disadvantages.
• Over time FPDs will become the dominant
technology for implementing digital circuits.
6/27/2024
6/27/2024
Technology Mapping for FPGA
• The high functionality of FPGA logic blocks
presents new challenges for logic synthesis.
So,the technology mapping provides a solution
for FPGAs that use lookup tables to implement
combinational logic.
• Technology mapping is a process of
transforming a technology independent
Boolean network into a technology dependent
network.
• For example a K input lookup table (LUT) is a digital
memory that can implement any Boolean function of
K variables
6/27/2024
Contd..
• Technology mapping is the logic synthesis task
that is directly concerned with selecting the
circuit elements used to implement the
optimized circuit.
• Previous approaches to technology mapping
have focused on using circuit elements from a
limited set of simple gates.
• However such approaches are inappropriate
6/27/2024
Contd..
• In this method the set of available circuit elements is
represented as a library of functions and the
construction of the optimized circuit is divided into
three sub problems
• (i). Decomposition, (ii). Matching and (iii) Covering.
• The original network is first decomposed into a
canonical representation that uses limited fan in NAND
nodes.
• This decomposition guarantees that there will be no
nodes in the network that are too large to be
implemented by any library element provided the
library includes NAND gates that reach the fan in limit.
6/27/2024
contd..
• After decomposition the network is
partitioned into a forest of trees The optimal
sub circuit covering each tree is constructed
and finally the circuit covering the entire
network is assembled from these sub circuits.
• To form the forest of trees, the decomposed
6/27/2024
Contd..
• The major obstacle to applying library-based
technology mapping to LUT circuits is the
large number of different functions that a K-
input LUT can implement.
• The function implemented by a K-input LUT
is determined by the values stored in its 2K
memory bits. Since each bit can
independently be either 0 or 1, there are 22K
different Boolean functions of K- variables.
6/27/2024
contd..
• The major obstacle to applying library-based
technology mapping to LUT circuits is the
large number of different functions that a K-
input LUT can implement.
• The function implemented by a K-input LUT
is determined by the values stored in its 2K
memory bits. Since each bit can
independently be either 0 or 1, there are 22K
different Boolean functions of K- variables
6/27/2024
Contd..
• For values of K greater than 3 the library required to
represent a K-input LUT becomes very large.
• The size of the library can be reduced by noting that
some patterns are equivalent after a. permutation of
inputs.
• The inversion of outputs or inputs, which is trivially
accomplished with a LUT, can also produce
equivalent „patterns.
• Another alternative is to use a partial library tuned to
take advantage of the network structure likely to be
produced by technology independent logic
optimization.
6/27/2024
LUT-based Technology Mapping
• The limitations of earlier technology mapping
approaches paved the way for the development
of technology mapping that deals specially
with LUT circuits.
• The first LUT based technology mappers
appeared in 90s. and later improved for
optimized delay performance of LUT circuits
by minimizing the number of levels of LUT in
the final circuit.
6/27/2024
Contd..
• In LUT based FPGAs (example XILINX
FPGAs) the building blocks are LUTs and
Flip-Flops.
• In an LUT based FPGA chip the basic
programmable logic block is a K-input Look
Up Table.(K-LUT) which can implement any
Boolean function of up to K- variables.
• The technology mapping in LUT based FPGA
designs is to cover a general Boolean Network
using K-LUTs to obtain functionally equivalent
K-LUT network.
6/27/2024
Contd..
• The main objectives in LUT mapping are
(i).Cost optimal mapping i.e Minimizing the
number of LUTs and Minimizing the number of
CLBs
(ii) Delay optimal mapping i.e Minimizing the
number of LUT levels and Minimizing the delays
(including routing delays)
(iii).Maximizing the routability of the mapping
schemes.
• The LUT based technology can be implemented
• using
(a).Thetwo Area
types Algorithm
of algorithmsand
.They are
(b).The delay
algorithm
6/27/2024
MULTIPLEXER BASED TECHNOLOGY
MAPPING
• This Multiplexer based technology mapping is
used in ACTEL FPGAs and in recent Xilinx
VIRTEX 6 FPGA devices .
• Because their logic block architectures are
MUX based.
• In Actel based FPGAs ,the size of the
Multiplexers is small and suitable to achieve
the objective of area optimization and
minimum delays.
6/27/2024
Contd..
• Circuits usually contain a large number of
multiplexers (MUXes).
• This is mainly true for circuits that are automatically
synthesized from high-level descriptions.
• MUXes exist in the data-paths of circuits, where they
are used to route operands to operators. Also, the
control logic is frequently specified as a CASE
statement in HDL descriptions.
• MUXs arise as a result of a direct translation of
CASE statements in HDLs into a logic-level
description
6/27/2024
Contd..
• The main behind this Mux
technology
objective mapping based
is ,describinga
combinational circuit in terms Boolean
of
equations and realize it using minimum
number of basic blocks of the target Mux
based architecture and minimizing the delay
on the critical path.
• In this algorithm an appropriate base function
,a library of cells and a set of pattern graphs
are selected .
6/27/2024
Contd
• The advantages of MUX based …
technology
mapping are it generates optimal mappings,
which are often much better than those
produced by conventional heuristic techniques.
• Moderately large circuits can be mapped
optimally in a small amount of time. Very large
circuits can be mapped near-optimally by
partitioning the circuits and mapping each
partition individually
6/27/2024
XILINX XC3000 FPGA Device
• Xilinx introduced the first FPGA family, called
the XC2000 series, in 1984 and next offered
three more series of FPGAs namely XC3000,
XC4000, and XC5000 etc.
• XC3000 series of FPGA devices were
introduced in 1985 by XILINX Inc.
• This was the most successful family of
FPGAs. The XC3000 archtecture
enhancements includes
to the XC2000
architecture to improve performance ,density
and usability.
6/27/2024
Contd..
• The XC3000 Family covers a range of nominal
device densities from 2,000 to 9,000 gates,
practically achievable densities from 1,000 to
6,000 gates with up to 144 user-definable I/Os.
• The XC3000 Configurable Logic block is
substantially larger than XC2000 and Each of
the lookup tables has four inputs and requires
16 bits of configuration memory.
• There are now four distinct families within the
XC3000 Series of FPGA devices
6/27/2024
XC3000 Family of Devices
6/27/2024
Contd..
• Each IOB includes input and output storage
elements and I/O options selected by
configuration memory cells.
• A choice of two clocks is available on each
6/27/2024
• The XC3000 CLB has two flip-flops ,to ensure
that all combinational logic can be followed by
a pipelining flip-flop.
• The register rich CLB allows the XC3000 to
implement state intensive applications and
heavily pipe lined designs efficiently.
• Each CLB has a combinatorial logic section,
two flip-flops, and an internal control section.
The CLB has five logic inputs (A, B, C, D and
E)
6/27/2024
XC3000 CLB
6/27/2024
Contd..
• Data input for the flip-flops within a CLB is
supplied from the function F or G outputs of
the combinatorial logic, or the block input, DI.
• Both flip-flops in each CLB share the
asynchronous RD which, when enabled , is
dominant over clocked inputs.
• All flip-flops are reset by the active-Low chip
input, RESET, or during the configuration
process.
6/27/2024
Programmable Interconnect
• Programmable-interconnection resources in the Field
Programmable Gate Array provide routing paths to
connect inputs and outputs of the IOBs and CLBs
into logic networks.
• Interconnections between blocks are composed of a
two-layer grid of metal segments.
• Specially designed pass transistors, each controlled
by a configuration bit, form programmable
interconnect points (PIPs) and switching matrices
used to implement the necessary connections between
selected metal segments and block pins.
6/27/2024
Contd..
• The XC3000 interconnect structure has five
general interconnect lines both vertically and
horizontally .
• In addition each CLB has direct connections to
adjacent CLBs both vertically and horizontally.
• Three types of metal resources are provided to
accommodate various network interconnect
requirements.
•General Purpose Interconnect
•Direct Connection
•Long lines (multiplexed busses and wide AND
gates)
6/27/2024
XC3000 Interconnect
6/27/2024
XILINX XC4000 FPGA Device
• The XC4000 was designed to
performance improve and gate density for
designs.
large
• Several dedicated features were added to the
general purpose logic features of XC3000 ,
resulting an interesting combination of special
-purpose and general purpose functions.
• The XC4000 family was designed using
placement and routing tools to evaluate
architectural decisions.
6/27/2024
The basic building blocks in the XC4000 family
• Look-up tables forimplementation of logic
functions.
• A designer can use a fumction generator
implement any Boolen to function of a given
number of inputs by pre-loading the memory with
the bit pattern corresponding to the truth table of
the function.
• All functions of a function generator have the
timing ,the time to look-up results in the memory.
• Therefore ,the inputs to the function generator are
fully interchangeable by simple rearrangement of
the bits in the look-up table.
6/27/2024
Contd..
• A Programmable Interconnect Point(PIP) is a
pass transistor controlled by a memory cell.
The PIP is the basic unit of configurable
interconnect mechanism.
• The wire segments on each side of the
transistor are connected onthe
value in the memory cell.
depending
• The pass transistor introduces resistance into
the interconnected paths and hence delay
occurs.
6/27/2024
Advanced Features of the XC4000 FPGAs
• CLBs can be used as on-chip RAM
• Fast carry chain for highspeed implementation
of arithmetic
• Boundary scan compatibility (JTAG)
• Wide decode logic,More global clocks
• Faster placement and routing algorithms
• Scaled routing resources.
6/27/2024
Configurable Logic Block (CLB)
• The XC4000 CLB is similar to the
XC3000CLB. It contains three lookup tables
and two flip-flops.(F,G &H)
• The two primary look-up tables F & G
implement any function of four variables.
• These two results can be brought out of the
block independently or they can be combined
with another input in the H –look up table to
make any function of five inputs or some
function of up to nine inputs.
6/27/2024
Contd..
• The XC3000 can implement arithmetic with
sum in one look-up table and carry in another
look-up table.
• The XC4000 CLB can implement arithmetic in
this way also,but as the speed of the arithmetic
operation is dominated by the speed of the
carry chain ,the XC4000 CLB includes
dedicated high speed carry logic.
6/27/2024
Block
Diagram-CLB
6/27/2024
XC4000 I/O BLOCK
• The signals to be output from the chip can be
registered before output and enabled by a
separate control signal.
• Outputs can be optionally pulled up or down
and the output driver can be configured with
either fast or or slow slew rate.
• Inputs from the pad can be brought into the
interior of the chip directly ,registered or both
to facilitate multiplexed bus interfaces
6/27/2024
Contd..
• The XC4000IOB includes boundary scan logic
compatible with the ANSI EEE1149.1 (JTAG)
boundary scan standard.
• The boundary scan can check internal logic or
external logic.
• Scan operation can take place before and after
the FPGA is programmed and do not interfere
with the operation of the part.
6/27/2024
Interconnect Structure
• The XC4000 interconnect is arranged in horizontal
and vertical channels.
• Each channel contains some number of short wire
segments that span a single CLB (the number of
segments in each channel depends on the specific part
number), longer segments that span two CLBs, and
very long segments that span the entire length or
width of the chip.
• Programmable switches are available to connect the
inputs and outputs of the CLBs to the wire segments,
or to connect one wire segment to another..
6/27/2024
Contd..
The figure shows only the
below
segments wirechannel, and does
in a horizontal
not show the vertical routing channels, the
CLB inputs and outputs, or the routing
switches
6/27/2024
Contd..
• The salient feature about the Xilinx
interconnect is that signals must pass through
switches to reach one CLB from another, and
the total number of switches traversed depends
on the particular set of wire segments used.
• Thus, speed-performance of an implemented
circuit depends in part on how the wire
segments are allocated to individual signals by
CAD tools.
6/27/2024
Actel FPGAs
• In contrast to XILINX FPGAs the
devices manufactured by Actelare based on
anti fuse
technology.
• Actel offers three main families .They are :
6/27/2024
Contd..
• The logic blocks in the Actel devices are
relatively small in comparison to the LUT
based ones. , and are based on
multiplexers.
• It comprises an AND and OR gate that are
connected to a multiplexer based circuit block.
• The multiplexer circuit is arranged such that,
in combination with the two logic gates, a very
wide range of functions can be realized in a
single logic block.
6/27/2024
Contd..
• Actel‟s interconnect is organized in horizontal
routing channels.
• The channels consist of wire segments of various
lengths with anti-fuses to connect logic blocks to wire
segments or one wire to another.
• Also, Actel chips have vertical wires that overlay the
logic blocks, for signal paths that span multiple rows.
• In terms of speed-performance, it is evident that
Actel chips are not fully predictable, because the
number of anti-fuses traversed by a signal depends on
how the wire segments are allocated during circuit
implementation by CAD tools.
6/27/2024
Quicklogic pASIC FPGAs
• The Quicklogic is the main competitor for Actel in
anti-fuse -based FPGAs .
• It produces two families of devices, called pASIC
and pASIC-2. The pASIC-2 is an enhanced version
of pASIC.
• The pASIC, consists of a regular two-dimensional
array of blocks called pASIC Logic Blocks (pLBs).
• The logic capacities of first generation of Quick
Logic FPGAs is between 48 and 380pLBs,or 500 to
4000 equivalent MPGAs gates.s
6/27/2024
Contd..
As shown in figure below pASIC has similarities to
other FPGAs i.e the overall structure is array-based
like Xilinx FPGAs, and logic blocks use multiplexers
similar to Actel FPGAs, and the interconnect consists
of only long- lines like in Altera FLEX 8000.
6/27/2024
Contd..
• pASIC‟s multiplexer-based logic block is shown in below
figure. It is more complex than Actel‟s Logic Module,
with more inputs and wide (6-input) AND-gates on the
multiplexer select lines. Every logic block also contains a
flip- flops.
6/27/2024
Altera FLEX 8000 and FLEX 10000 FPGAs
• The first FPGA chips from Aletra were simple
arrays of logic cells ,which are relatively simple
logic elements (LEs),each element comprising of
a three input look-up table (LUT ) to generate
logic functions ,a single configurable flip-flop
and multiplexers for routing the signals and
selecting clocks.
• The logic cells were connected by switch boxes
instead of fixed interconnect. The general
architecture of Altera‟s FPGAs is shown in the
next slide.
6/27/2024
Architecture of ALTERA FPGA
6/27/2024
• There are two high performance FPGA series
called FLEX series.
• Altera‟s FLEX 8000 series consists of a three-
level hierarchy similar to CPLDs.
• However, the lowest level of the hierarchy
consists of a set of lookup tables, rather than
an SPLD like block, and so the FLEX 8000 is
categorized here as an FPGA.
6/27/2024
Contd..
• The architecture of FLEX 8000 is shown in
next slide.
• The basic logic block, called a Logic Element
(LE) contains a four-input LUT, a flip-flop,
and special-purpose carry circuitry for
arithmetic circuits (similar to Xilinx XC 4000).
• The LE also includes cascade circuitry that
allows for efficient implementation of wide
AND functions
6/27/2024
Architecture of Altera FLEX 8000 FPGA
6/27/2024
contd..
• A major difference between FLEX 8000 and
Xilinx chips is that Fast Track consists of only
long lines. This makes the FLEX 8000 easy for
CAD tools to automatically configure.
• All Fast-Track wires horizontal wires are
6/27/2024
Structure of the Concurrent Logic Block
6/27/2024
Crosspoint Solutions FPGAs
• The crosspoint FPGAs are different from other
FPGAs because it is configurable at the
transistor level as aoposed to logic block level
in other FPGAs.
• Basically the architecture consists of rows of
transistor pairs ,where the rows are separated
by horizontal wiring segments .
• Veritical wiring segments are also available
,for connection among the rows
6/27/2024
Contd..
• Each transistor row comprises two lines of
series connected transistors ,with one line
being NMOS and the other PMOS .
• The wiring resources allow individual
transistor pairs to be interconnected to
implement CMOS logic gates.
• The programming technology used for the
programmable switches is similar to the Via-
Link anti-fuse ,which is based on amorphous
silicon.
6/27/2024
Contd..
• The structure of the transistor pair rows is
shown in the next slide.
• The diagram shows the implementation
of a NOR gate and a NAND gate using
the transistor lines.
• The transistor gates ,drains , sources can
be programmable interconnected to other
transistors and also to power and ground.
6/27/2024
Structure of the Transistor Pair
The series connections across the lines is broken where
necessary by permanently holding a transistor in
its
OFF state. A wide of logic gates can be
range implemented bytransistor lines and the
the
interconnection patterns.
6/27/2024
contd..
• The FPGAs currently offered by Crosspoint
Solutions has a total logic capacity of 4200
gates.
• The chip has 256 rows of transistor pairs and
an additional 64-rows of multiplexer like
structures are provided.
• With its rows based architecture ,anti-fuse
programming technology and multiplexers ,the
Crosspoint FPGAs are most similar to those of
Actel FPGAs.
6/27/2024
ALGOTRONIX CAL-1024
• This design has a two-dimensional mesh array
structure which resembles the gate array “sea
of gates” architecture .
• Like the Xilinx architecture, Algotronics used
Static RAM programming technology to
specify the function performed by each logic
cell and to control the switching of
connections between cells.
• The CAL1024 design contains 1024 identical
logic cells arranged in a 32 X 32 matrix.
6/27/2024
contd..
• The design is considered to be a mesh-connected
architecture since each cell is directly connected
to its nearest north, south, east, and west
neighbors.
• In addition to these direct connects, two global
interconnect signals are routed to each cell to
distribute clock and other “low skew
requirement” control signals.
• Figure in next slide shows the basic array
architecture, indicating both nearest neighbor and
global connections to the logic cells.
6/27/2024
Basic Array
Architecture
6/27/2024
contd..
• The basic building block of the Algotronix design
is a configurable cell containing multiplexers and
a function unit.
• As indicated in the figure , the function unit is
preceded by multiplexers which select the source
for the X1 and X2 inputs.
• The function unit is capable of generating any
logic function of the two inputs, or of operating as
a D-type latch.
• There are four additional multiplexers which
select the function output or one of the external
inputs for routing to each of the four outputs
(north, south, east, and west).
6/27/2024