0% found this document useful (0 votes)
27 views203 pages

UNIT5 - PLDS, CPLDs & FPGA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views203 pages

UNIT5 - PLDS, CPLDs & FPGA

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 203

UNIT 5

PLDs ,CPLD & FPGA


ARCHITECTURES

6/27/2024
UNIT-5
Contents
• Programmable Logic Devices
• Storage Devices
• PLA & PAL
• CPLDs
• FPGAs

6/27/2024
Introduction
• The popular digital ICs like TTL or CMOS have
fixed functionality and the user has no option to
change or modify their functionality .i.e they
work according to the design given by the
manufacturer.
• To get the user expected functionality from these
ICs designers started thinking of a methodology
by which the functionality of an IC can be
modified or changed.
• This introduced the idea of using Fuses in ICs
very soon gained momentum.
6/27/2024
Contd..
• This is the motivation for the invention of
programmable devices and was realized in early
70s with the design of PLD by Ron Cline from
Signetics .
• The method of changing or modifying the
functionality of an IC using the Fuses was
appreciated and gained momentum soon.
• This method of blowing a Fuse between two
contacts or keeping the Fuse intact was done by
using a software and hence these devices were
called Programmable Logic Devices(PLDs).

6/27/2024
Contd..
• Programmable Read-Only Memories(PROM)
were the first programmable logic devices to
achieve widespread use in digital systems.
• PROM allowed the chip vendor, to store code
in the device using a simple and relatively
inexpensive desktop programmer.
• This new device was called a programmable

read only memory (PROM).

6/27/2024
Contd..
• The process of storing the code in the PROM
is called programming, or “burning” the
PROM. PROMs, like ROMs, retain their
contents even after power has been turned
off.
• The PROMs were initially intended for storing
code and constant data but design engineers
found them useful for implementing logic also.
• The engineers could program state machine logic
into a PROM, creating what is called “micro-
coded” state machines.
6/27/2024
A simple PROM CELL

6/27/2024
Contd..
• A PROM can be constructed with an array of
fuses and transistors as shown in the previous
slide. The fuses are like household fuses that
consist of a wire that breaks connection when
a large amount of current goes through it.
• To program a one-bit cell as a logic 0 or 1, the

fuse for that cell is selectively burned out or


left connected respectively.

6/27/2024
Contd..
• Eventually, erasable PROMs were developed
which allowed users to program,erase, and
reprogram the devices using an inexpensive,
desktop programmer.
• Typically, PROMs now refer to devices that
cannot be erased after being programmed.
• Erasable PROMS include erasable
programmable read only memories (EPROMs)
that are programmed by applying high-voltage
electrical signals and erased by flooding the
devices with UV light
6/27/2024
Contd
… only
• Electrically erasable programmable read
memories (EEPROMs) are programmed and
erased by applying high voltages to the
device.
• Flash EPROMs are programmed and erased
electrically and have sections that can be
erased electricallyin a short time and
independently of other sections within the
device.

6/27/2024
Contd..
• PROMs are excellent for implementing any kind
of combinational logic with a limited number of
inputs and outputs. Each output can be any
combinatorial function of the inputs, no matter
how complex.
• The problem with PROMs is that they tend to be
extremely slow.
• Even today , access times are of theorder of
40 nano-seconds or more
• Hence they are not useful forhigh speed
applications .
6/27/2024
6/27/2024
Memory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory


(CAM)

Read/Write Memory Read Only Memory


Shift Registers Queues
(RAM) (ROM)
(Volatile) (Nonvolatile)

Serial In Parallel In First In Last In


Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)

Mask ROM Programmable Erasable Electrically Flash ROM


ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)
13
Read Only Memory (ROM) CLASSIFICATION

Mask Programmed ROMs -Data is written during chip fabrication using a photo mask

Fused ROMs -Data is written by blowing the fuse electrically, hence cannot be modified later

Programmable Read Only Memories (PROMs) :Data is written after chip fabrication

Erasable PROMs -Complete block is erased using UV light which is penetrated through glass
window

Electrically Erasable PROMs -8 bit data is erased at a time, hence slower

Flash - Programmed using high electrical voltage. Erases data in blocks hence faster

14
Memory
Architecture
m×n
Stores large number of bits
memory
m x n: m words of n bits each …

words
k = Log2(m) address input signals

m
k
or m = 2 words
e.g., 4,096 x 8 memory:
n bits per
32,768 bits word
memory external
12 address input signals view
r/w
8 input/output data signals 2k × n read and write memory

enable
Memory access
A0
r/w: selects read or write …
A
enable: read or write only when asserted k-1


multiport: multiple accesses to different locations simultaneously
Q Q0
n-1

15
Semiconductor Memory Types
(Cont.)
RAM: the stored data is volatile
DRAM
A capacitor to store data, and a transistor to access the capacitor
Need refresh operation
Low cost , and high density ⇒ it is used for main memory
SRAM
Consists of a latch
Don’t need the refresh operation
High speed and low power consumption ⇒it is mainly used for cache
memory and memory in hand-held devices

16
ROM: “Read-Only” Memory
External
Nonvolatile view
Can be read from but not written to, by a processor in an 2k × n ROM
microcomputer system enable

Traditionally written to, “programmed”, before inserting


A0
to microcomputer system
Uses


Store software program for general-purpose processor A
k-1

Store constant data (parameters) needed by system
Implement combinational circuits (e.g., decoders)
Q Q0
n-1

17
Example: 8 x 4
ROM
Internal
view

Horizontal lines = words 8 × 4 ROM

Vertical lines = data word 0


enable 3×8
word 1
Lines connected only at circles decoder
word 2
word
Decoder sets word 2’s line to 1 if address input is A0 A1
010 A line
2
Data lines Q3 and Q1 are set to 1 because there is a
“programmed” connection with word 2’s line data
programmabl line
Word 2 is not connected with data lines Q2 and Q0
Output is 1010 e
connection Q3 Q2 Q1 Q0

18
Memory –
ROM
ROM Arrays

There are two basic types of ROM arrays


1) NOR-based ROM
2) NAND-based ROM
NOR-based ROM: All Column Lines are pulled-up using a PMOS transistor (or resistor)
The Row Lines are connected to the gatesof NMOS transistors at
the intersection of Row and Column Lines
The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is stored
If the NMOS transistor is present, it will pull down the Column
Line when its gate is driven high by the Row Line.
If the NMOS transistor is absent, the Column Line will not be pulled down,so it will remain
pulled up by the PMOS’s.

19
Memory –
ROM

NOR-based ROM
In order to Read from the
array, the Row line is
asserted and the desired
Column line is observed

a NOR-based ROM is
similar to a Hex Keypad

20
Memory –
ROM
NAND-based ROM
NAND-based ROM is a different architecture
array it uses a depletion-load NMOS as the pull-up
transistor
the Column NMOS’s are connected in series with the column lines (i.e.
a NAND configuration)
If an NMOS exists in the Column line and the Row line is asserted, the
NMOS will pull the Column Line down and represent a stored ’0’
If an NMOS is absent on the Column line and the

Row line is asserted, the Column will


pulled
Line by the depletion NMOS remain
high a stored and represent
‘1’ the NMOS’s are in series, to
from a
since Row,
all all inotherorder
Rows must be Read
- this
of means in order to turned ON the
distinguish we write a ‘0’ to it Row we are asserting,

21
Memory –
ROM

NAND-based ROM- In this configuration, if an NMOS is


present, it will represent a “stored 1” since in order to address
its location, the Row line is driven to a ‘0’ and the NMOS not
turned on. This leaves the Column line pulled HIGH.

- if NMOS is absent, it will


an all represent a “stored 0”
since of the
pull the other
Column Row LineNMOS’s
LOW are
the oppositeturned
and giveswill
-this behavior asonin a NOR-based ROM
NOR NAND
NMOS present 0 1
NMOS absent 1 0
- it also gives a complementary addressing
scheme NOR NAN
D
Address Row Line by 1 0
driving:
22 All other Row Lines driven to: 0 1
Mask-programmed
ROM

Connections “programmed” at fabrication


set of masks
Lowest write ability
only once
Highest storage permanence
bits never change unless damaged
Typically used for final design of high-volume
systems
spread out NRE (non-recurrent engineering) cost for
a low unit cost

23
EPROM: Erasable programmable
ROM

Programmable component is a MOS


transistor
Transistor has “floating” gate surrounded by an insulator
(a) Negative charges form a channel between source and drain storing floating gate
0V

a logic 1 source drain

(b) Large positive voltage at gate causes negative charges to move out
(a
of channel and get trapped in floating gate storing a logic 0
)
(c) (Erase) Shining UV rays on surface of floating-gate causes negative +15V
charges to return to channel from floating gate restoring the logic 1 source drain
(b
(d) An EPROM package showing quartz window through which UV )
light can pass 5-30 min

Better write ability source drain

(c
can be erased and reprogrammed thousands of times
)
Reduced storage permanence
(d
program lasts about 10 years but is susceptible to radiation and )
electric noise,Typically used during design
24 development
Sample EPROM
components

25
Sample EPROM
programmers

26
EEPROM: Electrically erasable programmable
ROM
Programmed and erased electronically
typically by using higher than normal voltage
can program and erase individual words
Better write ability
can be in-system programmable with built-in circuit to provide higher than normal voltage
built-in memory controller commonly used to hide details from memory user
writes very slow due to erasing and programming
“busy” pin indicates to processor EEPROM still writing
can be erased and programmed tens of thousands of times
Similar storage permanence to EPROM (about 10 years)
Far more convenient than EPROMs, but more expensive

27
FLASH
PROM
Extension of EEPROM
Same floating gate principle
Same write ability and storage permanence
Fast erase
Large blocks of memory erased at once, rather than one word at a time
Blocks typically several thousand bytes large
Writes to single words may be slower
Entire block must be read, word updated, then entire block written back
Used with embedded microcomputer systems storing large data items in nonvolatile memory
e.g., digital cameras, MP3, cell phones

28
Read/Write Memory
SRAM
bit
write

write_b

read

read_b

SRAM memory cell

29
6T SRAM Cell
Cell size accounts for most of array size
Reduce cell size at expense of complexity
6T SRAM Cell bit bit_b
Used in most commercial chips word
Data stored in cross-coupled inverters
Read:
Precharge bit, bit_b
Raise wordline
Write:
Drive data onto bit, bit_b
Raise wordline

30
SRAM Read
bit bit_b
word
Precharge both bitlines high P1
P2
Then turn on wordline N2 N4
A A_b
One of the two bitlines will be pulled down by the cell
N1
Ex: A = 0, A_b = 1 N3
A_b bit_b
bit discharges, bit_b stays high
A must not flip 1.5

N1 >> N2
1.0
word bit

But A bumps up slightly


0.5

Read stability A
0.0
0 100 200 300 400 500 600
time (ps)

31
SRAM
Write

bi bit_
Drive one bitline high, the other low t b
wor
Then turn on wordline d P1 P2
Bitlines overpower cell with new value N N
2 4
Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 A A_
N1 b
Force A_b low, then A rises high
N3
Writability
A_b

Must overpower feedback inverter A


1.5

N2 >> P1 bit_b
1.0

0.5
word

0.0
0 100 200 300 400
500 600 700
time (ps)

32
DRAM

DRAM store their contents as charge on a capacitor rather than in a feedback loop.
The cell must be periodically read and refreshed so that its contents do not leak
away. Like SRAM accessed by asserting wordline to connect the capacitor to the
bitline.

33
DRAM READ

On read the bitline is precharged to Vdd/2.

When wordline rises the capacitor shares its charge with the bitline causing a voltage change that can be
sensed.
Some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when writing a ‘1’.
DRAM capacitor must be physically small as possible to achieve good density.
According to charge-sharing equation the voltage swing on bitline during readout

34
Serial Access
Memories

Serial access memories do not use an address


The types of SAM are
Shift Registers
Serial In Parallel Out (SIPO)
Parallel In Serial Out (PISO)
Queues (FIFO, LIFO)

35
Shift Register

– Shift registers store and delay data


– Simple design: cascade of registers
– Watch your hold times!

clk

Din Dout
8

36
Serial In Parallel Out

– 1-bit shift register reads in serial data


– After N steps, presents N-bit parallel output

clk

Sin

P0 P1 P2 P3

37
Parallel In Serial Out

– Load all N bits in parallel when shift = 0


– Then shift one bit out per cycle

P0 P1 P2 P3
shift/load
clk

Sout

38
FIFO, LIFO Queues

– First In First Out (FIFO)


– Initialize read and write pointers to first element
– Queue is EMPTY
– On write, increment write pointer
– If write almost catches read, Queue is FULL
– On read, increment read pointer
– Last In First Out (LIFO)
– Also called a stack
– Use a single stack pointer for read and write

39
Content Addressable
Memories

40
CAMs

– Extension of ordinary memory (e.g. SRAM)


– Read and write memory as usual
– Also match to see which words contain a key

adr data/key

read
CAM match
write

41
What is
CAM?

Content Addressable Memory is a special kind of memory! 00 1 0 1 X X

01 0 1 1 0 X
Read operation in traditional memory: 0110X
10 0 1 1 X X

Input is address location of the content that we are interested in it. 11 1 0 0 1 1

Output is the content of that address.


0 1

In CAM it is the reverse:


Traditional Memory

Input is associated with something stored in the memory.


00 1 0 1 X X

Output is location where the associated content is stored. 01 0 1 1 0 X


01
10 0 1 1 X X

11 1 0 0 1 1

01101

Content Addressable
Memory

42
Simplified CAM Block
Diagram
The input to the system is the search word.
The search word is broadcast on the search lines.
Match line indicates if there were a match btw. the search and stored word.
Encoder specifies the match location.
If multiple matches, a priority encoder selects the first match.
Hit signal specifies if there is no match.
The length of the search word is long ranging from 36 to 144 bits.
Table size ranges: a few hundred to 32K.
Address space : 7 to 15 bits.

43
Type of
CAMs

Binary CAM (BCAM) only stores 0s and 1s


Applications: MAC table consultation. Layer 2 security related VPN
segregation.
Ternary CAM (TCAM) stores 0s, 1s and don’t cares.
Application: when we need wilds cards such as, layer 3 and 4 classification
for QoS and CoS purposes. IP routing (longest prefix matching).
Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.
CAM entries are structured as multiples of 36 bits rather than 32 bits.

44
CAM
Advantages

They associate the input (comparand) with their memory contents in one clock
cycle.
They are configurable in multiple formats of width and depth of search data
that allows searches to be conducted in parallel.
CAM can be cascaded to increase the size of lookup tables that they can store.
We can add new entries into their table to learn what they don’t know before.
They are one of the appropriate solutions for higher speeds.

45
CAM
Disadvantages

They cost several hundred of dollars per CAM even in large quantities.
They occupy a relatively large footprint on a card.
They consume excessive power.
Generic system engineering problems:
Interface with network processor.
Simultaneous table update and looking up requests.

110
47
48
49
Classification of PLDs
• The classification of PLDs is given
below.

6/27/2024
Simple Programmable Logic Device [SPLD]

• As the name suggests SPLD has a simple


architecture. PROM is a best example for
SPLD.
• SPLD is capable of implementing hundreds of
gates and normally programmed by the user by
using inexpensive programmers.
• The main limitation of SPLDs is their low
logic capacities due to the restricted nature of
AND-OR planes.
6/27/2024
BASIC CIRCUIT OF
PLD

6/27/2024
Contd

6/27/2024
Contd..
• There are three main types of SPLD
architectures
• (i).Programmable logic array (PLA),
• ii) Programmable array logic (PAL) , and
(iii) Generic array of logic (GAL)

• In digital design, Programmable Logic Arrays (PLAs)


and Programmable Array Logic (PAL) are types of
programmable logic devices used to implement
combinational logic circuits.
6/27/2024
Configuration of SPLDs

6/27/2024
Contd…
• Two of the most popular SPLDs are the PALs
produced by Advanced Micro Devices (AMD)
known as the 16R8 and 22V10.
• Both of these devices are industry standards

and are widely second-sourced by various


companies.
• The name “16R8” means that the PAL has a
maximum of 16 inputs (there are 8 dedicated
inputs and 8 input/outputs), and a maximum
of 8 outputs.
6/27/2024
Contd…
• The “R” refers to the type of outputs
provided by the PAL and means that each
output is “registered” by a D flip-flop.
• Similarly, the “22V10” has a maximum of
22 inputs and 10 outputs. Here, the “V”
means each output is versatile and can be
configured in various ways, some
configurations registered and some not.

6/27/2024
Contd..
• Another widely used and second sourced
SPLD is the Altera Classic EP610.
• This device is similar in complexity to
PALs, but it offers more flexibility in the
way that outputs are produced and has
larger AND- and OR- planes.
• In the EP610, outputs can be registered
and the flip-flops are configurable as any
of D, T, JK, or SR.
6/27/2024
PLA- Programmable logic array
• The PLA consists of two programmable planes
AND and OR . The AND plane consists of
programmable interconnect along with AND
gates.
• The OR plane consists of programmable

interconnect along with OR gates.


• Each of the inputs can be connected to an
AND gate with any of the other inputs by
connecting the crossover point of the vertical
and horizontal interconnect lines in the AND
gate programmable interconnect.
6/27/2024
Contd..
• Initially, the crossover points are not
electrically connected, but configuring the
PLA will connect particular cross over points
together.
• The AND gate is seen with a single line to the
input. This view is by convention, but this also
means that any of the inputs (vertical lines)
can be connected. Hence, for four PLA inputs,
the AND gate also has four inputs. The single
output from each of the AND gates is applied
to an OR gate programmable inter connect.
6/27/2024
Schematic Representation- PLA

6/27/2024
PROGRAMMABLE ARRAY LOGIC (PAL)
• The first programmable device was the
programmable array logic (PAL) developed by
Monolithic Memories Inc(MMI).
• The Programmable Array Logic or PAL is
similar to PLA, but in a PAL device only AND
gates are programmable. The OR array is
fixed by the manufacturer.
• This makes PAL devices easier to program and
less expensive than PLA. On the other hand,
since the OR array is fixed, it is less flexible
than a PLA device.
6/27/2024
Schematic Representation- PAL

6/27/2024
Block diagram of
PAL

6/27/2024
PAL contd..
• The PAL device. has n input lines which are
fed to buffers/inverters.
• Buffers/inverters are connected to inputs of

AND gates through programmable links.


Outputs of AND gates are then fed to the OR
array with fixed connections

6/27/2024
Verilog code to design a PLA
module PLA (
input wire A, In this example:
input wire B, •The AND plane has four products P1, P2, P3, P4
input wire C, •The OR plane sums these products to form the
output wire F1, outputs,F1 & F2 represented as follows.
output wire F2);
wire P1, P2, P3, P4; F1 = (AB + AC)

// AND plane F2 = (BC + A1B1)


assign P1 = A & B;
assign P2 = A & C;
assign P3 = B & C;
assign P4 = ~A & ~B;

// OR plane
assign F1 = P1 | P2;
assign F2 = P3 | P4;

endmodule
6/27/2024
Verilog code to design a PAL
module PAL (
input wire A,
input wire B, In this example:
input wire C, •The AND plane has four products P1, P2, P3, P4
output wire F1, •The OR plane has fixed connections to produce
output wire F2); the outputs F1 & F2 represented as follows.
wire P1, P2, P3, P4;
F1 = (AB + AC1)
// AND plane
assign P1 = A & B;
assign P2 = A & ~C; F2 = (A1B + A1B1)
assign P3 = ~A & B;
assign P4 = ~A & ~B;

// OR plane (fixed connections)


assign F1 = P1 | P2; // F1 is OR of P1 and P2
assign F2 = P3 | P4; // F2 is OR of P3 and P4

endmodule

6/27/2024
GAL-Generic Array Logic
• PAL and PLA devices are one-time
programmable (OTP) based on PROM, so the
PAL or PLA configuration cannot be changed
after it has been configured.
• This limitation means that the configured
device would have to be discarded and a new
device configured. The GAL, although similar
to the PAL architecture, uses EEPROM and
can be reconfigured.

6/27/2024
Contd…
• The Generic Array Logic (GAL) device was
invented by Lattice Semiconductor.
• The GAL was an improvement on the PAL
because one device was able to take the place
of many PAL devices or could even have
functionality not covered by the original range.
Its primary benefit, however, is that it is
erasable and re-programmable making
prototyping and design changes easier
engineers. for
6/27/2024
Complex Programmable Logic Devices
(CPLDs)
• CPLDs were pioneered by Altera, first in their
family of chips called Classic EPLDs, and then
in three additional series, called MAX 5000, MAX
7000 and MAX 9000.
• The CPLD is the complex programmable Logic
Device which is more complex than the SPLD.
• This is build on SPLD architecture and creates a
much larger design. Consequently, the SPLD can be
used to integrate the functions of a number of
discrete digital ICs into a single device and the
CPLD can be used to integrate the functions of a
number of SPLDs into a single device.
6/27/2024
Contd..
• CPLD architecture is based on a small number of
logic blocks and a global programmable
interconnect.
• Instead of relying on a programming unit
configure chip , it is advantageous to be to able to
perform the programming while the chip is still
attached to its circuit board.
• This method of programming is known is called
In-System programming (ISP). It is not usually
provided for PLAs (or) PALs , but it is available for
the more sophisticated chips known as Complex
programmable logic device.

6/27/2024
Architecture of CPLD

6/27/2024
Contd…
• The CPLD consists of a number of logic
blocks or functional blocks, each of which
contains a macrocell and either a PLA or
PAL circuit arrangement.
• In the diagram eight logic blocks are
shown. The building block of the CPLD is
the macro-cell, which contains logic
implementing disjunctive normal form
expressions and more specialized logic
operations.
6/27/2024
Contd..
• The macro cell provides additional circuitry
to accommodate registered or nonregistered
outputs, along with signal polarity control.
• Polarity control provides an output that is a
true signal or a complement of the true
signal.
• The actual number of logic blocks within a
CPLD varies the more logic blocks
available, the larger the design that can be
configured.
6/27/2024
Contd..
• In the center of the design is a global
programmable interconnect.
• This interconnect allows connections to the
logic block macrocells and the I/O cell arrays
(the digital I/O cells of the CPLD connecting
to the pins of the CPLD package).
• The programmable interconnect is usually
based on either array-based interconnect or
multiplexer-based interconnect

6/27/2024
CPLD
Architecture

6/27/2024
Contd..
• Multiplexer-based interconnect uses digital
multiplexers connected to each of the
macrocell inputs within the logic blocks.
• Specific signals within the programmable

interconnect are connected to specific inputs


of the multiplexers.
• It would not be practical to connect all
internal signals within the programmable
interconnect to the inputs of all multiplexers
due to size and speed of operation
considerations.
6/27/2024
FIELD PROGRAMMABLE GATE
ARRAYS (FPGA)

• The concept of FPGA was emerged in 1985


with the XC2064TM FPGA family from
Xilinx .
• The “FPGA is an integrated circuit that
contains many (64 to over 10,000) identical
logic cells that can be viewed as standard
components.”
• The individual cells are interconnected by a
matrix of wires and programmable switches.
6/27/2024
Contd..
• Unlike CPLDs (Complex Programmable Logic
Devices) FPGAs contain neither AND nor OR
planes.
• The FPGA architecture consists of
configurable logic blocks, configurable I/O
blocks, and programmable interconnect.
• Also, there will be clock circuitry for driving
the clock signals to each logic block, and
additional logic resources such as ALUs,
memory, and decoders may be available.
6/27/2024
Contd..
• The two basic types of programmable
elements for an FPGA are Static RAM and
anti-fuses.
• Each logic block in an FPGA has a small
number of inputs and one output.
• A look up table (LUT) is the most commonly
used type of logic block used within FPGAs.
• There are two types of FPGAs.(i) SRAM
based FPGAs and (ii) Anti-fuse technology
based(OTP).
6/27/2024
FPGA-Architecture

6/27/2024
Contd..
Every FPGA consists of the following
elements.
• Configurable logic blocks(CLBs)
• Configurable input output blocks(IOBs)
• Two layer metal network of vertical and
horizontal lines for interconnecting the
CLBS. Which are called Programmable
Interconnects.
6/27/2024
XILINX Logic Cell Array(LCA)
• LCA is the novel architectural feature
introduced by XILINX in the year 1985
for their FPGA devices. It is almost like a
proprietary or trade mark property of
XILINX implemented for FPGA devices.
• The XILINX LCA architecture consists
of three major Components. They are
(i) Configurable Logic Blocks (CLBs)
(ii)Input/Output Blocks (lOBs)and
(iii)Programmable Interconnect.
6/27/2024
Contd…
• In addition, configuration memoryis used to
hold the configuration program bits which
control the configuration of CLBs, IOBs and
interconnect.
• This LCA architecture consists of an interior
matrix of logic blocks and a surrounding ring of
I/O interface blocks.
• Interconnect resources occupy the channels
between the rows and columns of logic blocks
and between the logic blocks and I/O blocks.
Like a microprocessor the LCA is a program
ay, De ce mb er 31, 2 0 13
Contd..
• The functions of the LCA‟s configurable
logic blocks and I/O blocks and their
interconnection are controlled by a
configurationprogram stored an on-chip
in memory.
• The configuration program is loaded
automatically from an external memory on
power-up or on command, or is programmed
by a microprocessor as part of system
initialization

6/27/2024
Contd..
As shown below diagram the configuration
memory consists of a distributed array of
static memory cells.
During configuration the cell is written
through the data line and is read through
the data line during read back operation

6/27/2024
LCA-Architecture
• The core of the LCA is a matrix of
identical
Configurable Blocks (CLBs).Each CLB
containsprogrammable combinational logic
and storage registers.
• The combinational logic section of of the

block is capable of implementing any Boolean


function of its input variables.
• The registers can be loaded from the
combinational logic or directly from a CLB
input the register outputs can be inputs to the
combinational logic via an internal feedback
path
Block Diagram

6/27/2024
CLB Block Diagram

6/27/2024
I/O Blocks
• The periphery of the Logic Cell Array is made
up of user programmable input/output blocks
(IOBs).
• Each block can be programmed independently
to be an input ,an output or bi-directional pin
with three state control. Inputs can be
programmed to recognize either TTL or
CMOS thresholds.
• Each IOB also includes flip-flops that can be

used to buffer inputs and outputs.


6/27/2024
Programmable Interconnect
• In FPGAs three types of metal resources are
provided to fulfill various network
interconnect requirements. They are
a) General Purpose Interconnect
b) Direct Connection
c) Long lines (multiplexed buses and wide AND
gates)

6/27/2024
a) General Purpose Interconnect
• It consists of a grid of five horizontal and
five vertical metal segments located between
the rows and columns of logic and IOBs.
• Each segment is the height or width of a

logic block.
• Switching matrices join the ends of these
segments and allow programmed
interconnections between the metal
segments of adjoining
gridrows and columns.
6/27/2024
Contd..

6/27/2024
Contd...
• The switches of an un-programmed device are
all non-conducting.
• The connections through the switch matrix
may be established by the automatic routing or
by selecting the desired pairs of matrix pins to
be connected or disconnected.
• The interconnect buffers are available to
propagate signals in either direction on a given
general interconnect segment.
• These bidirectional (bidi) buffers are found
adjacent to the switching matrices, above and
to the right.
6/27/2024
b) Direct Interconnect
• Direct interconnect provides the most
efficient implementation of networks between
adjacent CLBs or I/O Blocks. Signals routed
from block to block using the direct
interconnect exhibit minimum interconnect
propagation and use no general interconnect
resources.

6/27/2024
Contd..

6/27/2024
Contd

• Directinterconnect should be used to
maximize the speed of high-performance
portions of logic.
• Where logic blocks are adjacent to IOBs,
direct connect is provided alternately to the
IOB inputs (I) and outputs (O) on all four
edges of the die.
• The right edge provides additional direct
connects from CLB outputs to adjacent IOBs.

6/27/2024
c) Long lines
• The Long lines bypass the switch matrices
and are intended primarily for signals that
must travel a long distance, or must have
minimum skew among multiple destinations.
• Long lines, run vertically and horizontally the
height or width of the interconnect area.
• Each interconnection column has four
vertical Long lines, and each interconnection
row has two horizontal Long lines.
6/27/2024
Contd..

6/27/2024
Contd
• …
Two additional Long lines are located adjacent
to the outer sets of switching matrices.
• Long lines can be driven by a logic block or IOB
output on a column-by-column basis.
• This capability provides a common low skew
control or clock line within each column of
logic blocks.
• Isolation buffers are provided at each input to a

Long line and are enabled automatically by the


development system when a connection is
made.
6/27/2024
Programming
Technologies
• There are a number of programming technologies
that have been used for reconfigurable
architectures.
• Each of these technologies have different
characteristics and have significant effect on the
programmable architecture.
Some of the well-known technologies are
(i) SRAM Based Programming Technology
(ii) Flash Programming
Technology(EEPROM) (iii) Anti-fuse based
6/27/2024
i) SRAM-Based Programming Technology
• Static memory cells are the basic cells used for
SRAM-based FPGAs.
• Most commercial vendors like XILINX,

Lattice and Altera etc.use static memory


(SRAM) based programming technology in
their devices.
• These devices use static memory cells which
are divided throughout the FPGA to provide
configurability.

6/27/2024
Contd..
• There are two primary uses for the SRAM cells.
Most of them are used to set the select lines to
multiplexers that steer interconnect signals.
• The majority of the remaining SRAM cells are
used to store the data in the lookup-tables (LUTs)
that are typically used in SRAM-based FPGAs to
implement logic functions.
• Historically, SRAM cells were used to control the
tri-state buffers and simple pass transistors that
were also used for programmable interconnect.

6/27/2024
• SRAM-based programming technology has
become the dominant approach for FPGAs
because of its re-programmability and the use
of standard CMOS process technology and
therefore leading to increased integration,
higher speed and lower power
dynamic
consumption of new process with smaller
geometry.

6/27/2024
Contd..
• There are however a number of
associated drawbacks with SRAM-based
technology programming
• For
.whichexample an SRAM cell requires 6 transistors
makes this technology costly in terms of
area compared to other programming
technologies.
• Further SRAM cells are volatile in nature and
external devices are required to permanently store
the configuration data.
• These external devices add to the cost and area
overhead of SRAM-based FPGAs.
6/27/2024
ii) Flash Programming Technology
• An important alternative to theSRAM-based
programming technology is the use of
EEPROM based programming flashtechnology.
or This
technology inject charge onto a gate that
above the transistor.
“floats”
• This approach is used in flash or EEPROM memory
cells. These cells are non-volatile; they do not lose
information when the device is powered down.
• With modern IC fabrication processes, it has become
possible to use the floating gate cells directly as
switches.

6/27/2024
Contd..
• Flash memory cells, in particular, are now used
because of their improved area efficiency.
• The widespread use of flash memory cells for
non-volatile memory chips ensures that flash
manufacturing processes will benefit from
steady decreases in process geometries.
• Flash-based programming technology offers
several advantages. For example, this
programming technologyis nonvolatile
nature. in
6/27/2024
• Flash-based programming technology is also
more area efficient than SRAM-based
programming technology.
• Flash-based programming technology has its
own disadvantages also.
• Unlike SRAM-based programming
technology, flash based devices cannot be
reconfigured/reprogrammed an infinite
number of times.
• Also, flash-based technology uses non-
standard CMOS process.
6/27/2024
Contd..
• This flash-based programming technology
offers several unique advantages, most
importantly non-volatility.
• This feature eliminates the need for the

external resources required to store and load


configuration data when SRAM-based
programming technology is used.
• Additionally, a flash-based device can function
immediately upon power-up instead of having
to wait for the loading of configuration data.
6/27/2024
Contd..
• The flash approach is more area efficient than
SRAM-based technology which requires up to
six transistors to implement the programmable
storage.
• The programming circuitry, such as the high
and low voltage buffers needed to program the
cell, contributes an area overhead not present
in SRAM-based devices.

6/27/2024
Contd..
• In devices from Altera, Xilinx and Lattice,
on-chip flash memory is used to provide non-
volatile storage while SRAM cells are still
used to control the programmable elements in
the design.
• This addresses the problems associated with

the volatility of pure-SRAM approaches, such


as the cost of additional storage devices or the
possibility of configuration data interception,
while maintaining the infinite re-
configurability of SRAM-based devices
6/27/2024
iii) Anti-fuse Programming Technology
• An alternative to SRAM and floating
gate-based technologies is anti fuse
programming technology.
• This technology is based on structures
which exhibit very high-resistance under
normal circumstances but can be
programmably “blown” (in reality,
connected) to create a low resistance link.
6/27/2024
Contd..

• An anti-fuse is a two terminal device with an


unprogrammed state presenting a very high
resistance between its terminals.
• When a high voltage (from 11 to 20 volts,
depending on the type of anti-fuse) is applied
across its terminals the anti-fuse will “blow”
and create a low resistance link.
• This link is permanent.

6/27/2024
• Programming an anti-fuse requires extra
circuitry to deliver the high programming
voltage and a relatively high current of 5
mA or more.
• This is done in through fairly sizable pass
transistors to provide addressing to each
anti-fuse. Anti-fuse technology is used in
the FPGA‟s from Actel , Quick logic ,
and Cross point
6/27/2024
Contd..
• A major advantage of the anti-fuse is its
small size, little more than the cross-section
of two metal wires.
• But this advantage is limited by the large
size of the necessary programming
transistors, which handle large currents, and
the inclusion of isolation transistors that are
sometimes needed to protect low voltage
transistors from high programming voltages.

6/27/2024
Contd..
• A second major advantage of an anti-fuse is its
relatively low series resistance.
• The on-resistance of the ONO anti-fuse is 300

to500 ohms, while the amorphous silicon anti-


fuse is 50 to100 ohms.
• Additionally, the parasitic capacitance of an

un programmed amorphous anti-fuse is


significantly lower than for other programming
technologies

6/27/2024
Contd..
• The limitations of this technology are , this
technology does not make use of standard CMOS
process.
• Also, anti-fuse programming technology based
devices cannot be reprogrammed.
• The ideal technology should be re-
programmable, and that
non-volatile, standard uses a
• But
CMOSitprocess.
is clear that none of the above
technologies satisfy these conditions
6/27/2024
Comparison of Programming
Technologies

Inspites of all the advantages and


disadvantages, programming
SRAM-based the technology is most
the widely used programming main
technology.
reason is itsThe
use of standard CMOS process .Due to
this reason it is expected that this technology will
continue to dominate the other two programming
technologies
6/27/2024
Commercially available FPGAs

6/27/2024
FPGA Design Flow
• The earlier PLD and FPGA designs were
performed largely by hand But to-days
complex programmable logic devices requires
the use of an integrated Computer-Aided
Design (CAD) system.
• Both commercial CAD tool vendors and FPGA
companies offer appropriate tools.
• For example, traditional Electronic Design
Automation (EDA) vendors such as Cadence,
Mentor Graphics, Synopsys, and View Logic
etc. offer tools to support FPGA design. s
6/27/2024
contd..
• These tools are typically used for the front-end
design entry and simulation operations and
provide the necessary interfaces to vendor-
specific back-end tools for chip placement and
routing.
• Examples of vendor specific tools are the
Xilinx XACT system and the Altera
MAX+PLUS II software.
• The Altera‟s MAX+PLUS II software
supports the entire design flow on either PC or
workstation platforms.
6/27/2024
Steps of FPGA Design Flow
• The first step in the design process is the description
of the logic circuit, which can be done either by
schematic capture tool or with Boolean expressions.
• This is followed by a translation that converts the
original circuit description into a standard format
used by the suitable CAD tools (Ex: XILINX CAD
tools).
• The circuit is then passed through CAD programs that
partition it into appropriate logic blocks. Select a
specific location in the FPGA for each logic block
and form the required interconnections.( (Cadence,
View Logic, OrCAD, etc.)
6/27/2024
Initial Design Entry
• The detailed description of the logic circuit are
entered using a schematic capture program. In the
design entry phase, RTL or schematic entry is used to
create the logic to be implemented in the device.
• Pin assignments can also be made, including pin
placement information, and timing constraints that
might be necessary for building a functioning design.
• In the design entry step a schematic or Block Design
File (.bdf) is created that is the top-level design. The
library of parameterized modules (LPM) functions
are added and Verilog HDL code is used to add a
logic block
6/27/2024
Contd..
• The library may be either supplied by the
vendor of the schematic capture program or
any FPGA vendor(Like Xilinx or Altera etc).
• An alternate way to specify the logic circuit is
to use a Boolean expression or state machine
language.
• This is done without the graphical interface.
Some times it is possible to use a mixture of
both schematic and Boolean expressions

6/27/2024
Translation to XNF Format

After the logic circuit is successfully designed and


merged into one circuit ,it is translated into a special
format that is understood by the CAD tools.For
Xilinx this format is called Xilinx Net list Format
(XNF).This translation utility is supported by the
Xilinx or by the vendor of the logic entry tool.The
translation process may also involve automatic
optimizations of the circuit.
6/27/2024
Partition
• The XNF circuit is partitioned into logic cells
(this partition is also known as Technology
Mapping).
• This technology mapping converts the XNF
circuit which is a net list of basic logic gates
,into a net list of Xilinx logic cells.
• The logic cell used depends on which Xilinx
product the circuit is to be implemented in.
XACT tools also attempt to optimize the
circuit during this step.
6/27/2024
Place and Route

• Place &Route is performed by using either CAD


tools or manually by the user or mixture of the
two.
• The first step is placement ,in which each logic cell
generated during the partition step is assigned to a
specific location in the FPGA.
• Automatic placement can be done using the
simulated annealing algorithm.
• After the placement ,the required interconnections
among the logic cells must be realized by selecting
wire segments and routing switches within the
FPGA interconnection resources
6/27/2024
Contd..
• The XACT tools provide a critical path timing
analyzer which provides delay information on the
longest through shortest paths through the chip.
• In addition, the physical layout timing information
can also be back-annotated to the schematics to get
more accurate functional simulation results.
• The final step in the Xilinx design flow is the
creation of the BIT file which contains the binary
programming data needed to configure the SRAM
bits of the target chip.
• This file is then downloaded to configure the chip for
final functional and timing tests of the programmed
6/27/2024 Dr.Y.Narasimha Murthy Ph.D
Tuesd ay, De c em b
Compilation
• After creating the design it must be compiled.
Compilation converts the design into a
bitstream that can be downloaded into the
FPGA.
• The most important output of compilation is an
SRAM Object File (.sof), which is used to
program the device.
• The software also generates other report files
that provide information about the code as it
compiles

6/27/2024
Contd..
• In the design flow process the simulation is
very important to learn, and there are entire
applications devoted to simulating hardware
designs.
• There are two types of simulation, RTL and
timing. RTL (or functional) simulation allows
you to verify that your code is place-and-route)
simulation verifies that the design meets
timing and functions appropriately in the
device
6/27/2024
contd..
• After completion of the design ,its
performance is checked either by downloading
the configuration bits into FPGA or by using
an interface to a timing simulation program.
• If the performance is not satisfactory ,suitable
modifications are done at some point in the
design flow.
• Once the timing and functionality is verified
the implementation is complete.

6/27/2024
APPLICATIONS OF FPGAs
• FPGAs have gained rapid acceptance over the
past two decades.
• Users can apply them to a wide range of
applications like random logic, integrating
multiple SPLDs, device controllers,
communication encoding and filtering,
small- to medium-size systems with SRAM
blocks, and many more.
• Another interesting FPGA application is
prototyping designs to be implemented in
gate arrays by using one or more large
FPGAs.
6/27/2024
contd..
• Another application is the emulation of entire
large hardware systems via the use of many
interconnected FPGAs.
• FPGAs offer particularly powerful solutions
for meeting machine vision, industrial
networking, motor control, and video
surveillance needs.
• For example, the flexibility of FPGAs allow
designers to quickly adapt to changing image
sensor interfaces and image processing
requirements, evolve analysis capabilities to
keep pace with market requirements, and add
features and functions long after deployment.
6/27/2024
contd..
• FPGAs are also used as custom computing
machines.
• This involves using the programmable parts to
execute software, rather than compiling the
software for execution on a regular CPU.
• FPGAs provide a unique combination of highly
parallel custom computation, relatively low
manufacturing/engineering costs, and low
power requirements.

6/27/2024
Contd..
• FPGAs meet critical timing and performance
requirements with parallel processing and real-
time industrial application performance,
permitting greater system integration and lower
development cost.
• In areas such as Industrial Networking and
Imaging, where the protocols and standards are
shifting and changing, the programmability of
FPGAs versus fixed logic chips such as ASICs
and ASSPs allows for both faster time-to-
market and longer time-in-market.
6/27/2024
Conclusion
• The low cost ,fast manufacturing turnaround is
the secret behind the market success of
FPGAs.
• Though the large, slow programmable
switches prevent FPGAs from providing the
speed performance ,the improvements in
architecture and CAD tools will overcome
these disadvantages.
• Over time FPDs will become the dominant
technology for implementing digital circuits.

6/27/2024
6/27/2024
Technology Mapping for FPGA
• The high functionality of FPGA logic blocks
presents new challenges for logic synthesis.
So,the technology mapping provides a solution
for FPGAs that use lookup tables to implement
combinational logic.
• Technology mapping is a process of
transforming a technology independent
Boolean network into a technology dependent
network.
• For example a K input lookup table (LUT) is a digital
memory that can implement any Boolean function of
K variables
6/27/2024
Contd..
• Technology mapping is the logic synthesis task
that is directly concerned with selecting the
circuit elements used to implement the
optimized circuit.
• Previous approaches to technology mapping
have focused on using circuit elements from a
limited set of simple gates.
• However such approaches are inappropriate

for complex logic blocks where each logic


block can implement a large number of
functions
6/27/2024
Library-Based Technology Mapping
• In library based mapping, gates or components
are selected from a technology library to
implement a circuit.
• Hence it is also referred to as library binding.

So, this method generates a technology


mapping for a given Boolean network using a
characterized cell library with the objective of
cost optimization or delay optimization

6/27/2024
Contd..
• In this method the set of available circuit elements is
represented as a library of functions and the
construction of the optimized circuit is divided into
three sub problems
• (i). Decomposition, (ii). Matching and (iii) Covering.
• The original network is first decomposed into a
canonical representation that uses limited fan in NAND
nodes.
• This decomposition guarantees that there will be no
nodes in the network that are too large to be
implemented by any library element provided the
library includes NAND gates that reach the fan in limit.

6/27/2024
contd..
• After decomposition the network is
partitioned into a forest of trees The optimal
sub circuit covering each tree is constructed
and finally the circuit covering the entire
network is assembled from these sub circuits.
• To form the forest of trees, the decomposed

network is partitioned at fan out nodes into a


set of single output sub networks.

6/27/2024
Contd..
• The major obstacle to applying library-based
technology mapping to LUT circuits is the
large number of different functions that a K-
input LUT can implement.
• The function implemented by a K-input LUT
is determined by the values stored in its 2K
memory bits. Since each bit can
independently be either 0 or 1, there are 22K
different Boolean functions of K- variables.

6/27/2024
contd..
• The major obstacle to applying library-based
technology mapping to LUT circuits is the
large number of different functions that a K-
input LUT can implement.
• The function implemented by a K-input LUT
is determined by the values stored in its 2K
memory bits. Since each bit can
independently be either 0 or 1, there are 22K
different Boolean functions of K- variables

6/27/2024
Contd..
• For values of K greater than 3 the library required to
represent a K-input LUT becomes very large.
• The size of the library can be reduced by noting that
some patterns are equivalent after a. permutation of
inputs.
• The inversion of outputs or inputs, which is trivially
accomplished with a LUT, can also produce
equivalent „patterns.
• Another alternative is to use a partial library tuned to
take advantage of the network structure likely to be
produced by technology independent logic
optimization.
6/27/2024
LUT-based Technology Mapping
• The limitations of earlier technology mapping
approaches paved the way for the development
of technology mapping that deals specially
with LUT circuits.
• The first LUT based technology mappers
appeared in 90s. and later improved for
optimized delay performance of LUT circuits
by minimizing the number of levels of LUT in
the final circuit.

6/27/2024
Contd..
• In LUT based FPGAs (example XILINX
FPGAs) the building blocks are LUTs and
Flip-Flops.
• In an LUT based FPGA chip the basic
programmable logic block is a K-input Look
Up Table.(K-LUT) which can implement any
Boolean function of up to K- variables.
• The technology mapping in LUT based FPGA
designs is to cover a general Boolean Network
using K-LUTs to obtain functionally equivalent
K-LUT network.
6/27/2024
Contd..
• The main objectives in LUT mapping are
(i).Cost optimal mapping i.e Minimizing the
number of LUTs and Minimizing the number of
CLBs
(ii) Delay optimal mapping i.e Minimizing the
number of LUT levels and Minimizing the delays
(including routing delays)
(iii).Maximizing the routability of the mapping
schemes.
• The LUT based technology can be implemented
• using
(a).Thetwo Area
types Algorithm
of algorithmsand
.They are
(b).The delay
algorithm
6/27/2024
MULTIPLEXER BASED TECHNOLOGY
MAPPING
• This Multiplexer based technology mapping is
used in ACTEL FPGAs and in recent Xilinx
VIRTEX 6 FPGA devices .
• Because their logic block architectures are
MUX based.
• In Actel based FPGAs ,the size of the
Multiplexers is small and suitable to achieve
the objective of area optimization and
minimum delays.
6/27/2024
Contd..
• Circuits usually contain a large number of
multiplexers (MUXes).
• This is mainly true for circuits that are automatically
synthesized from high-level descriptions.
• MUXes exist in the data-paths of circuits, where they
are used to route operands to operators. Also, the
control logic is frequently specified as a CASE
statement in HDL descriptions.
• MUXs arise as a result of a direct translation of
CASE statements in HDLs into a logic-level
description

6/27/2024
Contd..
• The main behind this Mux
technology
objective mapping based
is ,describinga
combinational circuit in terms Boolean
of
equations and realize it using minimum
number of basic blocks of the target Mux
based architecture and minimizing the delay
on the critical path.
• In this algorithm an appropriate base function
,a library of cells and a set of pattern graphs
are selected .
6/27/2024
Contd
• The advantages of MUX based …
technology
mapping are it generates optimal mappings,
which are often much better than those
produced by conventional heuristic techniques.
• Moderately large circuits can be mapped
optimally in a small amount of time. Very large
circuits can be mapped near-optimally by
partitioning the circuits and mapping each
partition individually

6/27/2024
XILINX XC3000 FPGA Device
• Xilinx introduced the first FPGA family, called
the XC2000 series, in 1984 and next offered
three more series of FPGAs namely XC3000,
XC4000, and XC5000 etc.
• XC3000 series of FPGA devices were
introduced in 1985 by XILINX Inc.
• This was the most successful family of
FPGAs. The XC3000 archtecture
enhancements includes
to the XC2000
architecture to improve performance ,density
and usability.
6/27/2024
Contd..
• The XC3000 Family covers a range of nominal
device densities from 2,000 to 9,000 gates,
practically achievable densities from 1,000 to
6,000 gates with up to 144 user-definable I/Os.
• The XC3000 Configurable Logic block is
substantially larger than XC2000 and Each of
the lookup tables has four inputs and requires
16 bits of configuration memory.
• There are now four distinct families within the
XC3000 Series of FPGA devices

6/27/2024
XC3000 Family of Devices

The basic LCA Cell Array) of


XC3000
(Logic consistsof three components .They are
Programmable I/O Blocks , Configurable Logic
Block and Programmable Interconnect. In addition to
this a small amount of configurable memory is also
present
6/27/2024
Programmable I/O Block
• The I/O Block of the XC3000 is more complex
than the XC2000 , IOB. The important
addition in this is a flip-flop in the out-put path
• By registering the data in IOB ,the clock to-
out- time does ot include interconnect delays.
• Each user-configurable IOB provides an
interface between the external package pin of
the device and the internal user logic. Each
IOB includes both registered and direct input
paths
6/27/2024
Programmable I/O Block

6/27/2024
Contd..
• Each IOB includes input and output storage
elements and I/O options selected by
configuration memory cells.
• A choice of two clocks is available on each

die edge. The polarity of each clock line (not


each flip-flop or latch) is programmable.
• Each input circuit also provides input clamping
diodes to provide electrostatic protection, and
circuits to inhibit latch-up produced by input
currents.
6/27/2024
Configurable Logic Block(CLB)
• The XC3000 CLB is substantially larger than
the XC2000 CLB.
• Each of the look-up tables has four inputs

rather than three and hence requires sixteen


bits of configuration memory rather than eight.
• The lookup tables can be combined with a
multiplexer to produce any function of five
inputs and some functions of up to seven
inputs

6/27/2024
• The XC3000 CLB has two flip-flops ,to ensure
that all combinational logic can be followed by
a pipelining flip-flop.
• The register rich CLB allows the XC3000 to
implement state intensive applications and
heavily pipe lined designs efficiently.
• Each CLB has a combinatorial logic section,
two flip-flops, and an internal control section.
The CLB has five logic inputs (A, B, C, D and
E)
6/27/2024
XC3000 CLB

6/27/2024
Contd..
• Data input for the flip-flops within a CLB is
supplied from the function F or G outputs of
the combinatorial logic, or the block input, DI.
• Both flip-flops in each CLB share the
asynchronous RD which, when enabled , is
dominant over clocked inputs.
• All flip-flops are reset by the active-Low chip
input, RESET, or during the configuration
process.

6/27/2024
Programmable Interconnect
• Programmable-interconnection resources in the Field
Programmable Gate Array provide routing paths to
connect inputs and outputs of the IOBs and CLBs
into logic networks.
• Interconnections between blocks are composed of a
two-layer grid of metal segments.
• Specially designed pass transistors, each controlled
by a configuration bit, form programmable
interconnect points (PIPs) and switching matrices
used to implement the necessary connections between
selected metal segments and block pins.
6/27/2024
Contd..
• The XC3000 interconnect structure has five
general interconnect lines both vertically and
horizontally .
• In addition each CLB has direct connections to
adjacent CLBs both vertically and horizontally.
• Three types of metal resources are provided to
accommodate various network interconnect
requirements.
•General Purpose Interconnect
•Direct Connection
•Long lines (multiplexed busses and wide AND
gates)
6/27/2024
XC3000 Interconnect

6/27/2024
XILINX XC4000 FPGA Device
• The XC4000 was designed to
performance improve and gate density for
designs.
large
• Several dedicated features were added to the
general purpose logic features of XC3000 ,
resulting an interesting combination of special
-purpose and general purpose functions.
• The XC4000 family was designed using
placement and routing tools to evaluate
architectural decisions.
6/27/2024
The basic building blocks in the XC4000 family
• Look-up tables forimplementation of logic
functions.
• A designer can use a fumction generator
implement any Boolen to function of a given
number of inputs by pre-loading the memory with
the bit pattern corresponding to the truth table of
the function.
• All functions of a function generator have the
timing ,the time to look-up results in the memory.
• Therefore ,the inputs to the function generator are
fully interchangeable by simple rearrangement of
the bits in the look-up table.
6/27/2024
Contd..
• A Programmable Interconnect Point(PIP) is a
pass transistor controlled by a memory cell.
The PIP is the basic unit of configurable
interconnect mechanism.
• The wire segments on each side of the
transistor are connected onthe
value in the memory cell.
depending
• The pass transistor introduces resistance into
the interconnected paths and hence delay
occurs.
6/27/2024
Advanced Features of the XC4000 FPGAs
• CLBs can be used as on-chip RAM
• Fast carry chain for highspeed implementation
of arithmetic
• Boundary scan compatibility (JTAG)
• Wide decode logic,More global clocks
• Faster placement and routing algorithms
• Scaled routing resources.

6/27/2024
Configurable Logic Block (CLB)
• The XC4000 CLB is similar to the
XC3000CLB. It contains three lookup tables
and two flip-flops.(F,G &H)
• The two primary look-up tables F & G
implement any function of four variables.
• These two results can be brought out of the
block independently or they can be combined
with another input in the H –look up table to
make any function of five inputs or some
function of up to nine inputs.
6/27/2024
Contd..
• The XC3000 can implement arithmetic with
sum in one look-up table and carry in another
look-up table.
• The XC4000 CLB can implement arithmetic in
this way also,but as the speed of the arithmetic
operation is dominated by the speed of the
carry chain ,the XC4000 CLB includes
dedicated high speed carry logic.

6/27/2024
Block
Diagram-CLB

6/27/2024
XC4000 I/O BLOCK
• The signals to be output from the chip can be
registered before output and enabled by a
separate control signal.
• Outputs can be optionally pulled up or down
and the output driver can be configured with
either fast or or slow slew rate.
• Inputs from the pad can be brought into the
interior of the chip directly ,registered or both
to facilitate multiplexed bus interfaces

6/27/2024
Contd..
• The XC4000IOB includes boundary scan logic
compatible with the ANSI EEE1149.1 (JTAG)
boundary scan standard.
• The boundary scan can check internal logic or
external logic.
• Scan operation can take place before and after
the FPGA is programmed and do not interfere
with the operation of the part.

6/27/2024
Interconnect Structure
• The XC4000 interconnect is arranged in horizontal
and vertical channels.
• Each channel contains some number of short wire
segments that span a single CLB (the number of
segments in each channel depends on the specific part
number), longer segments that span two CLBs, and
very long segments that span the entire length or
width of the chip.
• Programmable switches are available to connect the
inputs and outputs of the CLBs to the wire segments,
or to connect one wire segment to another..
6/27/2024
Contd..
The figure shows only the
below
segments wirechannel, and does
in a horizontal
not show the vertical routing channels, the
CLB inputs and outputs, or the routing
switches

6/27/2024
Contd..
• The salient feature about the Xilinx
interconnect is that signals must pass through
switches to reach one CLB from another, and
the total number of switches traversed depends
on the particular set of wire segments used.
• Thus, speed-performance of an implemented
circuit depends in part on how the wire
segments are allocated to individual signals by
CAD tools.

6/27/2024
Actel FPGAs
• In contrast to XILINX FPGAs the
devices manufactured by Actelare based on
anti fuse
technology.
• Actel offers three main families .They are :

Act 1, Act 2, and Act 3.


• Actel devices are based on a structure similar
to traditional gate arrays; the logic blocks are
arranged in rows and there are horizontal
routing channels between adjacent rows.
6/27/2024
LOGIC BLOCK –ACTEL FPGA

6/27/2024
Contd..
• The logic blocks in the Actel devices are
relatively small in comparison to the LUT
based ones. , and are based on
multiplexers.
• It comprises an AND and OR gate that are
connected to a multiplexer based circuit block.
• The multiplexer circuit is arranged such that,
in combination with the two logic gates, a very
wide range of functions can be realized in a
single logic block.
6/27/2024
Contd..
• Actel‟s interconnect is organized in horizontal
routing channels.
• The channels consist of wire segments of various
lengths with anti-fuses to connect logic blocks to wire
segments or one wire to another.
• Also, Actel chips have vertical wires that overlay the
logic blocks, for signal paths that span multiple rows.
• In terms of speed-performance, it is evident that
Actel chips are not fully predictable, because the
number of anti-fuses traversed by a signal depends on
how the wire segments are allocated during circuit
implementation by CAD tools.
6/27/2024
Quicklogic pASIC FPGAs
• The Quicklogic is the main competitor for Actel in
anti-fuse -based FPGAs .
• It produces two families of devices, called pASIC
and pASIC-2. The pASIC-2 is an enhanced version
of pASIC.
• The pASIC, consists of a regular two-dimensional
array of blocks called pASIC Logic Blocks (pLBs).
• The logic capacities of first generation of Quick
Logic FPGAs is between 48 and 380pLBs,or 500 to
4000 equivalent MPGAs gates.s

6/27/2024
Contd..
As shown in figure below pASIC has similarities to
other FPGAs i.e the overall structure is array-based
like Xilinx FPGAs, and logic blocks use multiplexers
similar to Actel FPGAs, and the interconnect consists
of only long- lines like in Altera FLEX 8000.

6/27/2024
Contd..
• pASIC‟s multiplexer-based logic block is shown in below
figure. It is more complex than Actel‟s Logic Module,
with more inputs and wide (6-input) AND-gates on the
multiplexer select lines. Every logic block also contains a
flip- flops.

6/27/2024
Altera FLEX 8000 and FLEX 10000 FPGAs
• The first FPGA chips from Aletra were simple
arrays of logic cells ,which are relatively simple
logic elements (LEs),each element comprising of
a three input look-up table (LUT ) to generate
logic functions ,a single configurable flip-flop
and multiplexers for routing the signals and
selecting clocks.
• The logic cells were connected by switch boxes
instead of fixed interconnect. The general
architecture of Altera‟s FPGAs is shown in the
next slide.

6/27/2024
Architecture of ALTERA FPGA

6/27/2024
• There are two high performance FPGA series
called FLEX series.
• Altera‟s FLEX 8000 series consists of a three-
level hierarchy similar to CPLDs.
• However, the lowest level of the hierarchy
consists of a set of lookup tables, rather than
an SPLD like block, and so the FLEX 8000 is
categorized here as an FPGA.

6/27/2024
Contd..
• The architecture of FLEX 8000 is shown in
next slide.
• The basic logic block, called a Logic Element
(LE) contains a four-input LUT, a flip-flop,
and special-purpose carry circuitry for
arithmetic circuits (similar to Xilinx XC 4000).
• The LE also includes cascade circuitry that
allows for efficient implementation of wide
AND functions

6/27/2024
Architecture of Altera FLEX 8000 FPGA

6/27/2024
contd..
• A major difference between FLEX 8000 and
Xilinx chips is that Fast Track consists of only
long lines. This makes the FLEX 8000 easy for
CAD tools to automatically configure.
• All Fast-Track wires horizontal wires are

identical, and so interconnect delays in the


FLEX 8000 are more predictable than FPGAs
that employ many smaller length segments
because there are fewer programmable
switches in the longer paths.
6/27/2024
contd..
• Predictability is furthered aided by the fact that
connections between horizontal and vertical
lines pass through active buffers.
• The FLEX 8000 architecture has been
extended in the state-of-the-art FLEX 10000
family.
• FLEX 10000 offers all of the features of FLEX
8000, with the addition of variable-sized
blocks of SRAM, called Embedded Array
Blocks (EABs) which shows that each row in a
FLEX 10000 chip has an EAB on one end.
6/27/2024
Concurrent Logic FPGA Device
• The manufacturer Concurrent Logic offers the
CFA6006 FPGA device ,which is based on two
dimensional array of identical blocks ,where
each block is symmetrical on its four sides.
• The array holds 3136 of such blocks ,providing
a total logic capacity of about 5000 equivalent
gates.
• Connections are formed using multiplexers
that are configured by a static RAM
programming technology.
6/27/2024
Contd..
• The structure of the Concurrent Logic Block is
shown in the next slide.It comprises of user
configurable multiplexers, basic gates and a D
type flip-flop .
• The concurrent FPGA is especially suitable for
register-intensive and arithmetic applications
since the logic block can easily implement a
half-adder and a register bit.

6/27/2024
Structure of the Concurrent Logic Block

6/27/2024
Crosspoint Solutions FPGAs
• The crosspoint FPGAs are different from other
FPGAs because it is configurable at the
transistor level as aoposed to logic block level
in other FPGAs.
• Basically the architecture consists of rows of
transistor pairs ,where the rows are separated
by horizontal wiring segments .
• Veritical wiring segments are also available
,for connection among the rows
6/27/2024
Contd..
• Each transistor row comprises two lines of
series connected transistors ,with one line
being NMOS and the other PMOS .
• The wiring resources allow individual
transistor pairs to be interconnected to
implement CMOS logic gates.
• The programming technology used for the
programmable switches is similar to the Via-
Link anti-fuse ,which is based on amorphous
silicon.
6/27/2024
Contd..
• The structure of the transistor pair rows is
shown in the next slide.
• The diagram shows the implementation
of a NOR gate and a NAND gate using
the transistor lines.
• The transistor gates ,drains , sources can
be programmable interconnected to other
transistors and also to power and ground.

6/27/2024
Structure of the Transistor Pair
The series connections across the lines is broken where
necessary by permanently holding a transistor in
its
OFF state. A wide of logic gates can be
range implemented bytransistor lines and the
the
interconnection patterns.

6/27/2024
contd..
• The FPGAs currently offered by Crosspoint
Solutions has a total logic capacity of 4200
gates.
• The chip has 256 rows of transistor pairs and
an additional 64-rows of multiplexer like
structures are provided.
• With its rows based architecture ,anti-fuse
programming technology and multiplexers ,the
Crosspoint FPGAs are most similar to those of
Actel FPGAs.
6/27/2024
ALGOTRONIX CAL-1024
• This design has a two-dimensional mesh array
structure which resembles the gate array “sea
of gates” architecture .
• Like the Xilinx architecture, Algotronics used
Static RAM programming technology to
specify the function performed by each logic
cell and to control the switching of
connections between cells.
• The CAL1024 design contains 1024 identical
logic cells arranged in a 32 X 32 matrix.
6/27/2024
contd..
• The design is considered to be a mesh-connected
architecture since each cell is directly connected
to its nearest north, south, east, and west
neighbors.
• In addition to these direct connects, two global
interconnect signals are routed to each cell to
distribute clock and other “low skew
requirement” control signals.
• Figure in next slide shows the basic array
architecture, indicating both nearest neighbor and
global connections to the logic cells.

6/27/2024
Basic Array
Architecture

6/27/2024
contd..
• The basic building block of the Algotronix design
is a configurable cell containing multiplexers and
a function unit.
• As indicated in the figure , the function unit is
preceded by multiplexers which select the source
for the X1 and X2 inputs.
• The function unit is capable of generating any
logic function of the two inputs, or of operating as
a D-type latch.
• There are four additional multiplexers which
select the function output or one of the external
inputs for routing to each of the four outputs
(north, south, east, and west).
6/27/2024

You might also like