0% found this document useful (0 votes)

199 views9 pages

Floating Point ALU Design PDF

Uploaded by

Gorantala Anil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views9 pages

Floating Point ALU Design PDF

Uploaded by

Gorantala Anil Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

International Journal of Computer Applications (0975 – 8887)

Volume 94 – No.17, May 2014

ASIC Implementation of 32 and 64 bit Floating

Point ALU using Pipelining

Dave Omkar R. Aarthy M.

Student Assistant Professor
VIT University VIT University
Vellore, TamilNadu Vellore, TamilNadu

ABSTRACT For double precision IEEE 754 standard, the difference in the
The 32-bit and 64-bit Floating point Arithmetic Logic Unit is Fig 1 is Exponent is 11 bit wide and mantissa is 52 bit wide.
a main part in the design of computers. The Aim of this paper
is high performance through the pipelining concept compared The format for the single precision is written below.
to non-pipelining. This ALU includes all the arithmetic and
logical operations. The Pipelined modules are independent of (1)
each other. The novelty is to design pipelined modules like
left shift, right shift, increment, decrement and logical Where 0<e<255 and .
modules. The Arithmetic pipelined modules are also
modified. These modules use single and double precision For double precision, the difference is in the exponent. It is
IEEE 754 standard to carry out the required operation. All 1023 instead of 127 and the range of e is 0<e<2047.
modules in the ALU design are realized using Verilog HDL.
Test vectors are given to the inputs of the floating point ALU ALU is a digital module that performs all the arithmetic and
to testify its functionality. The simulation is carried out with logical operations. It is an important block in CPU.
ModelSim 6.5b simulator and RTL synthesis is done with Depending on the selection bits ALU executes the appropriate
RTL Compiler tool in Cadence. Physical design of this operation and gives the result. Along with ALU output there
architecture is done with SoC Encounter cadence tool in are also status bits which represent exception in the arithmetic
180nm technology. operations. They are result zero, overflow, and underflow,
divide by zero and normal operation. Pipelining is a special
General Terms technique to give the faster output and reduce the delay in the
Algorithm, Floating point number. design. It allows many operations to occur in parallel.
Pipelining reduces the critical path in the circuit hence
Keywords increases the speed.
, ALU, ASIC, IEEE 754, LSB, MSB, Verilog HDL.
Generally in Pipelining, each operation of the stage is
1. INTRODUCTION performed at each clock pulse and concurrently the output of
Floating Point numbers are used when there is necessity the previous stage is given to the next stage so there is no
numbers to be very large or to be very small [1]. Floating waste of clock pulse in the pipelining [3].Implementing
point representation has its advantages of its resolution and pipelined architecture of floating point ALU gives faster
accuracy compared to fixed point number representation. results. The proposed 32 and 64 bit proposed floating point
Numbers in the floating point are represented in the form of ALU carry out 16 different arithmetic and logical operations
bit string. This bit string is combination of sign bit, mantissa with pipelining [4]. The modified addition, multiplication and
and exponent power. This representation is called IEEE 754 division algorithms of the floating point numbers are designed
standard [2].The single precision of floating Point is shown in using Verilog HDL. The proposed Left shift, Right shift,
Fig 1[2]. Increment, Decrement and all logical modules are also
implemented for Single precision and double precision.
31 30 22
In Proposed Pipelined modules, there are maximum 6 stages
as shown in the Fig 2 .So, after 6 clock pulse, the first output
Sign Exponent Mantissa comes and at 7th clock pulse second output comes. It reduces
the number of clock pulses.

1 bit 8 bit 23 bit

Fig 1: Basic IEEE 754 standard format for single precision

27
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

multiplexer is used to select status bits. These status bits are

shown in TABLE 2.

Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Addition

Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Subtraction

1:16
Operand a Demux 16:1
Stage1 Stage2 Stage3 Stage4 Stage5 32 bit ALU
Multiplication Mux Out
Stage1 Stage2 Stage3 Stage4
Division
Srage1 Stage2 Stage3
Reciprocal
Srage1 Stage2
2:1 Left
MUX Shift
Fig 2: Stages in Pipelining s
2:1 Right
1:16 MUX Shift
Operand b Demux s
2. BACKGROUND 32 bit
2:1
MUX
Incre
ment
16:1 Status
Mux
The main target of the previous work was to implement 16 bit s
2:1 Decre
floating point ALU using pipelined modules in VHDL[1].It MUX
s
ment

can be viewed in Fig 3. The sub-objectives were to design

AND
pipelined addition and sub traction. The operations are limited
to only four arithmetic operations like addition, subtraction, OR
multiplication and division [4].
Clock 1:16 NAND

The addition, subtraction, multiplication and division were Demux

NOR
done by using arithmetic operator. The previous work has
been done for 16 and 32 bit floating point ALU [4]. The XOR
maximum number of stages up to pipelining was up to 4.
XNOR

ADD 2:1
MUX
NOT
a
DEMUX s
16 bit Selection
bits[4:0]
{
16 bit
MUX
OUT
SUB
Fig 4: Modified top level view of 32 bit Floating Point
b DEMUX
ALU
16 bit

MUL Table 1. Selection of ALU operation

MUX STATUS OUT

No. Selection bits[3:0] ALU Operation

DEMUX
CLK DIV 1 0000 Addition
2 0001 Subtraction
STATUS 3 0010 Multiplication
Fig 3: Top level view of the ALU design 4 0011 Division

3. DESIGN AND METHODOLOGY 5 0100 Reciprocal

The new architecture of 32 bit and 64 bit floating point ALU 6 0101 Left Shift
with pipelined modules has been implemented which contains
all the arithmetic as well as logical operations These modules 7 0110 Right Shift
have 4 or more than 4 pipelined stages.
8 0111 Increment
3.1 Modified Top Level architecture of 32- 9 1000 Decrement
bit ALU 10 1001 AND
As shown in Fig 4, the modified top level architecture of 32-
bit floating point ALU consists of 3 levels. In, first level there 11 1010 OR
are 3 demultiplexers .first demultiplexer is for selecting the
12 1011 NAND
first operand and second demultiplexer is for selecting the
second operand and last demultiplexer is to select the clock. In 13 1100 NOR
second level, there are 16 blocks consists of all the arithmetic
and logical operations depending on the selection bits as 14 1101 XOR
shown in the TABLE 1. These blocks have two outputs ALU
15 1110 XNOR
out and status. Third level consists of 2 multiplexer .First
multiplexer is used to select the ALU operation and second 16 1111 NOT

28
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

Table 2. Status bits and status For 64 bit add/sub, The procedure is same but the mantissa
and exponent bits are 52 bits and 11 bits wide. For 32 and 64
Status bits[2:0] Status bit subtraction, the change is in the operation of the mantissa
No. i.e. addition becomes subtraction and vice versa.
1 000 Result zero
2 001 Overflow 3.3.2 Modified Pipelined Addition/Subtraction
3 010 Underflow architecture
4 011 Normal Operation For this addition/subtraction algorithm, new 32 bit pipelined
5 100 Divide by Zero addition/subtraction module has been implemented. D-FF is
used for the pipelining. It is shown in Fig 6.

3.2 Modified Top level architecture of 64- s1 D-

FF
s1d sc1 D-
FF
sc1d

bit ALU 32 bit

s2 D- s2d sc2 D- sc1d sr srd sr1 sr1d sr2 sr2d

The difference between 32 and 64 bit floating point ALU is in Operand a

FF
Comparator FF D-
FF
D-
FF
D-
FF out[31:0]
e1 e1d
giving the size of the operands. For 64 bit floating point ALU, Unpack
D-
FF and e3 D-
FF
e3d
Add/sub e5 D-
e5d
Normalize en D- ed Exception ec D-
ecd
Operand b FF FF FF
Packer
the operands A and B are 64 bits wide, because This ALU 32 bit module e2 D-
FF
e2d Barrel e4 D-
FF
e4d module mf D-
mfd module
mn D- mnd Checker mc D- mcd status[2:0]
uses double precision IEEE 754 standard format. So, the top FF FF FF

m1d shifter
m1 mc1 mc1d
D- D- st std
view of 64 bit floating point is constructed as in the Fig 4. But FF FF D-
FF

the inputs and output is 64 bit instead of 32 bit. m2 D- m2d mc2 D- mc2d
FF FF

3.3 Modified 32-bit and 64-bit Pipelined Clock

floating point Addition/Subtraction module
The algorithm and architecture for the 32 and 64 bit pipelined Fig 6: 32- bit pipelined add/sub architecture
floating point addition/subtraction has been designed.
The working of the above architecture is explained below.
3.3.1 Modified Addition/Subtraction Algorithm
Start 3.3.2.1 Unpack module
This module will separate mantissa, exponent and sign bit
Take two floating point from floating point numbers. It will also add the implied bit to
numbers in IEEE 754
standard the mantissa.
Separate mantissa, exponent
and sign bits and add the 3.3.2.2 Comparator and Barrel Shifter
implied bit in the mantissa This module will compare the exponents of the operands and
shift the smaller exponent by the difference of their
Don’ t take difference Take the difference of exponents.
Compare
and shift the exponent If e1=e2 If e1>e2 exponents and left shift
exponents
,just pass the value of the smaller mantissa by
exponent and mantissa the difference 3.3.2.3 Add/sub module
This module will add or subtract depending upon their signs.
If e2>e1 This sign is determined by doing XOR of the two sign bits of
Take the difference of the operands. Here 24x24 Ripple carry adder for single
exponents and left shift
the smaller mantissa by precision and 53x53 Ripple carry adder are used for addition
the difference because of its simplicity. For sub traction, the Ripple borrow
sub tractor is used .It uses full sub tractor instead of full adder.
Depending upon the above
conditions take one mantissa,
exponent and sign bit 3.3.2.4 Normalize module
In this module, the normalizing of the final result is carried
XOR the Sign bits
of mantissas
out. If MSB of the addition result is 0, the mantissa is left
shifted until the MSB becomes 1 and Exponent should be
Subtract
the Add the decremented.
Yes No mantissas
mantissas Is sign bit is 1?

3.3.2.5 Exception Checker

In this module, exceptions are checked like overflow,
Left shift mantissa until underflow, result zero and normal operation after checking
Normalization is Is the MSB of MSB becomes 1 and
not needed
Yes
the result
No
decrement the exponent mantissa and exponent. If exponent is 255 then “overflow”
mantissa 1? by 1 exception will be raised. If exponent is 1, then “underflow”
exception will be raised. If both operands are zero, “Result
zero” exception will be raised. Otherwise “Normal Operation”
Compute the final mantissa & drop
implied bit, then combine resultant
will be raised.
mantissa ,exponent and sign bit to
form the IEEE 754 format
3.3.2.6 Packer
Terminate
Packer module will combine the resultant sign bit, exponent
and mantissa. It will drop the implied bit from the resultant
Fig 5: Flowchart for modified addition/subtraction mantissa.
algorithm

29
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

The modified 64-bit pipelined floating point add/sub module The working of the Fig. 8 is explained below.
is implemented by changing the mantissa and exponent bits in
the operands. 3.4.2.1 Sign Calculation module
This module calculates the output sign of the resultant
3.4 Modified 32-bit and 64-bit Pipelined mantissa by doing the XOR operation of the two sign bits of
the operands .If the resultant sign is 0 ,then the result is
floating point Multiplication module positive and vice versa.
The algorithm and architecture for the 32 and 64 bit pipelined
floating point multiplication has been implemented. 3.4.2.2 Exception adder with bias subtraction
It computes the result exponent by adding the exponents of
3.4.1 Modified Multiplication Algorithm two operands with bias subtraction of (01111111) b
It is shown in Fig 7.
3.4.2.3 Mantissa Multiplier module
Start This module will calculate the multiplication of the two
mantissas. Here the multiplication should be done for 24x24
Take two floating point in single precision and 53x53 in double precision .But to
numbers in IEEE 754 reduce the area 12x12 multiplication has been done in the
standard implementation of mantissa multiplier module .For the
Separate mantissa, exponent
multiplication carry save multiplier has been used because of
and sign bits and add the its less use number of half adder and full adder.
implied bit in the mantissa
The 64 bit pipelined floating point multiplication module is
Calculate the sign Add the exponents
Multiply Mantissas of implemented by changing the mantissa and exponent bits in
of the result by and subtract the
two operands the operands.
XOR operation bias

3.5 Modified 32-bit and 64-bit Pipelined

Left shift mantissa until floating point Division module
Normalization is Is the MSB of MSB becomes 1 and
Yes No
not needed the result decrement the exponent 3.5.1 Modified Division Algorithm
mantissa 1? by 1
It is shown in Fig 9.

Start
Check the exception
and raise the status Take two floating point
bits numbers in IEEE 754
standard

Compute the final mantissa &drop Separate mantissa, exponent

and sign bits and add the
implied bit,then combine resultant
implied bit to the mantissa
mantissa ,exponent and sign bit to
form the IEEE 754 format Calculate the sign Make first mantissa 2n Subtract the
of the result by bits by padding 0s to exponents and add
XOR operation LSB the bias
Terminate
Right shift m1 by 1
No shift is Compare
and decrement
needed If m1<m2 mantissa If m1>m2
exponent by 1
Fig 7: Flowchart for modified multiplication algorithm

For 64 bit multiplication, the procedure is same but the Divide m1 by m2 and
compute the result
mantissa and exponent bits are 52 bits and 11 bits wide mantissa
instead of 23 bit in mantissa and 8 bit in exponent in 32 bit
multiplication.
Left shift mantissa
Normalization Is the MSB of until MSB becomes 1
Yes
3.4.2 Modified Pipelined Multiplication is not needed the result No
and decrement the
mantissa 1? exponent by 1
architecture
sr
s1 D-
FF
s1d Sign Calculation sc D- scd e5 sn mfD- snd
sed
FF FF se D-

s2 D- s2d module FF

32 bit FF ec ecd
Operand a D- out[31:0]
e1d
FF
Check the exception
Unpack
e1 D-
FF
Exponent adder Normalize en Exception mc mcd and raise the status
Operand b with bias e3 D- e3d D- ed
D-
FF Packer status[2:0] bits
32 bit module e2 D-
FF
e2d FF
module FF
Checker st
m1
subtraction D-
FF
std
D- m1d
FF
Mantissa Compute the final mantissa & drop
m2 m2d m3 D- m3d mn D- mnd implied bit ,then combine resultant
D-
FF Multiplier FF FF
mantissa ,exponent and sign bit to
Clock module form the IEEE 754 format

Terminate
Fig 8: 32-bit pipelined multiplication architecture
Fig 9: Flowchart for modified division algorithm

30
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

3.5.2 Modified Pipelined Division architecture

s1 D- s1d Sign Calculation sc D- scd e5 sn mfD- snd

sed
FF FF FF se D-

32 bit
module FF

s2 s2d
Operand a D-
FF
ec D- ecd out[31:0]
FF
e1d e3
Unpack e1 D-
FF Exponent Normalize Exception mc D- mcd
Operand b e4
e D- e3d en D- ed FF Packer status[2:0]
32 bit module e2 D- e2d subtractor FF
module FF
Checker st
D- std
FF
Division FF
m1 m1d m3
D-
FF Aligner Divider m5 D-
FF
m5d mn D-
FF
mnd
m2 m4
m2d fn
D-
FF flag module fg D-
FF
fgd D-
FF
fnd

Clock

Fig 10: 32 bit pipelined division architecture

The working of the Fig.10 is explained below. 3.6.2 Sign passer

This module just passes the sign of the operand to the next
3.5.2.1 Divide Aligner module.
This module will align the mantissa to get the desired result.
As mentioned in the division algorithm, if first mantissa is 3.6.3 Reciprocal Aligner
greater than second mantissa, Division Aligner will shift the It will align the mantissa of the operand. In detail, it compares
first mantissa by 1 and decrement the exponent by 1.It also the mantissa of 1 to the mantissa of the given operand and
indicates the division flag for the exception like divide by aligns that mantissa according to the division algorithm which
zero, result zero and normal operation. is given in the Section 3.3.2.

3.5.2.2 Divide Module 3.6.4 Exponent Sub tractor

This module will divide the mantissa by the restoring method. It subtracts the exponent of the operand with the exponent of
It is explained below. 1 and adds the bias of 127 in single precision and 1023 in
-First shift left the dividend by 1. double precision.
-Subtract the divisor. If the carry is 1 do not restore. If carry
is 0 i.e. answer is negative then restore by adding back to the 3.6.5 Divide Module
divisor. Divider module will divide the mantissa of 1 to the mantissa
-Place the carry as the LSB of the intermediate answer. of the operand. These mantissas are taken from the previous
-Do this procedure up to n –iterations, where n is number of module Reciprocal aligner which aligns the mantissa
bits in the divisor. Here n is 24 bits for single precision and 53 according to the conditions mentioned in the division
bit for double precision. algorithm.
3.5.2.3 Exponent Sub tractor
It subtracts the exponents of two operands and adds the bias 3.7 The Proposed 32 and 64-bitFloating
of 127 in single precision and 1023 in double precision. Point Left shift and Right Shift architecture
3.6 The Proposed 32 and 64-bitFloating 32 bit s1 D-
FF
s1d sl D-
FF
sld sc D-
FF
sc
Operand a
Point Reciprocal Architecture e1 e1d el eld ec ecd
out[31:0]
For the reciprocal architecture, the algorithm is same as 2:1 o D- od Unpack
D-
FF Left
D-
FF Exception
D-
FF
FF
MUX Packer status[2:0]
division algorithm. The only difference is that the first module m1 Shifter Checker mc
Operand b D- m1d ml D- mld D- mcd
mantissa is always 1.The architecture for Reciprocal 32 bit
FF FF FF

architecture is shown in Fig 11.

s1 s1d sc scd e5 sn mfD- snd Selection

32 bit
D-
FF Sign Passer D-
FF FF se D-
FF
sed bit
Operand a
ec D- ecd out[31:0] Clock
2:1 o D-
od e1d e3
FF

MUX
FF Unpack e1 D-
Exponent e Normalize en Exception mc D- mcd
FF
e4
D- e3d D- ed FF Packer status[2:0] Fig 12: 32-bit proposed pipelined Left Shift architecture
module Recipro subtractor module Checker st
FF FF

Operand b D- std
FF
32 bit
m1 D- m1d -cal m3 mn As Shown in Fig 12, the unpack module, Exception checker
m5 D- m5d D- mnd
Aligner m4 Divider
FF
FF FF

fn
and Packer module are same as described in the section 3.3.2.
flag module fg D- fgd D- fnd
Selection
FF FF
The new block is left shifter. It is explained below.
bit
Clock
3.7.1 Left Shifter
Fig 11: 32- bit proposed pipelined Reciprocal architecture This module will shift the mantissa part of the floating point
i.e. the exponent will be incremented by 1.
3.6.1 2:1 Multiplexer
This multiplexer will select one operand out of two operands
which is to be reciprocal.

31
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

32 bit s1 D- s1d sc D- scd si D- sid sn D-

snd se D- sed
FF FF FF FF FF
Operand a out[31:0]

2:1 o od e1 D- e1d Compara ec D- ecd Mantissa ei D- eid en D- edn e D- ed

D-
FF Unpack FF FF FF Normalize
sid
FF
Exception FF
status[2:0]
MUX tor and Incrementor Packer
module mc module Checker
Operand b m1 D-
FF
m1d shifter D-
FF
mcd by 1 mi D-
FF
mid mn D-
FF
mnd me D-
FF
med
32 bit

Selection
bit

Clock

Fig 13: 32- bit proposed pipelined Increment architecture

For Right shift architecture, the difference is in the Right 3.9.3 AND Gate
shifter instead of Left shifter in the Fig. 12.In Right shifter, This gate is used to perform the Logical AND operation of the
the exponent will be decremented by 1 to get the operand mantissas of two operands.
right shifted.
The Logical modules like OR, NOR, NOT, XOR & XNOR
3.8 The Proposed 32 and 64-bitFloating are implemented by changing the gate in the Fig 14 instead of
Point Increment and Decrement AND gate. The new 64 bit pipelined floating point Logical
modules for the above operations are implemented by
architecture changing the operand bit size 64 instead of 32bit.
The working of Fig.13 is explained below.
4. RESULTS AND DISCUSSION
3.8.1 Comparator and shifter The Simulations has been done in ModelSim 6.5 by giving the
different test vectors to the 32 and 64 bit Floating point ALU
This module will compare the exponent of the operand to the
with pipelined modules. The Simulation results are shown by
011111111 in single precision and 01111111111 in double
merging the two operations of the ALU. The Synthesis results
precision, because to increment the number by 1, add the
in 180 nm of both ALU are shown in the TABLE 3.
mantissa to 1. So, the single precision IEEE 754 standard
format of 1 is 3F800000 and double precision is For 32-bit and 64-bit operations of ALU, the inputs and
3FF0000000000000.Comparator will compare this exponent outputs are in the form of IEEE 754 standard. For example,
and shifter will shift the mantissa of 1 by the difference of the the addition & subtraction are performed as under.
exponent of the operand and exponent of 1.
Operand 1= (21.43) d = (41ab70a4) h= (40356e147ae147ae) h.
3.8.2 Mantissa Increment by 1 Operand 2 = (7.23) d = (40e75c29) h= (401ceb851eb851ec) h.
This module will add or subtract the mantissa with the Output = (28.67) d= (41E55C29) h= (403cae147ae147ae) h.
mantissa of 1 depending upon the sign of the first operand. For Subtraction,
Operand 1= (15.25) d = (41740000) h= (402e800000000000) h.
For Decrement architecture, the difference is in the Mantissa Operand 2 = (-5.5) d = (c0b00000) h= (C016000000000000) h.
decrement by 1 instead of Mantissa increment by 1 in the Fig Output = (20.75) d= (41a60000) h= (4034c00000000000) h.
13.Decrement architecture, the mantissa will be subtracted
with 1 instead of addition.
4.1 Simulation Results for 32-bit and 64-bit
3.9 The proposed 32-bit and 64-bit Floating Point ALU
Pipelined Logical modules
sc1
s1 D-
FF
s1d D-
FF
sc1d XOR
32 bit
s2 D- s2d sc2 D- sc1d Gate sr srd sr1 sr1d sr2 sr2d
Operand a
FF
Comparator FF D-
FF
D-
FF
D-
FF out[31:0]
e1d
Unpack
e1 D-
FF and e3 D-
FF
e3d Exponent e5 D-
e5d
Normalize en D- ed Exception ec D-
ecd
Operand b FF FF FF
Packer
32 bit module
e2 D-
FF
e2d Barrel e4 D-
FF
e4d Passer mf D-
mfd module
mn D- mnd Checker mc D- mcd status[2:0]
FF FF FF

m1 m1d shifter mc1 mc1d st std

D-
FF
D-
FF AND D-
FF
m2 D-
FF
m2d mc2 D-
FF
mc2d Gate
Clock

Fig 14: 32-bit proposed pipelined Logical AND

module

3.9.1 Comparator and shifter

The resultant sign is calculated by the XOR operation of the Fig 15: output of 32-bit Floating Point Addition and
sign bits of the two operands. Subtraction

3.9.2 Exponent Passer

This passer will pass the value of the output exponent to the
next module

32
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

Fig 16: output of 32-bit Floating Point Multiplication and division

Fig 17: output of 32-bit Floating Point Reciprocal

Fig 18: output of 64-bit Floating Point Addition and Subtraction

Fig 19: output of 64-bit Floating Point Multiplication and Division

Fig 20: output of 64-bit Floating Point Reciprocal

33
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

4.2 Synthesis results

4.2.1 Report summary of area, power and delay
Table 3 Parameters for 32 bit Floating Point ALU with
and without Pipelining

Parameters 32-bit Floating 32-bit Floating

point ALU with point ALU
Pipelining without Pipelining

Cells 41092 37621

Cell 0.936860 0.820796

Area(mm2)

Power(nW) 213482898.96 15799143.81

Worst path 0.891 16.231

delay (ns) Fig. 22: RTL Schematic of 64-bit Floating Point ALU with
Pipelining

Table 4 Parameters for 64 bit Floating Point ALU with 4.3 Backend Results
and without Pipelining

Parameters 64-bit Floating 64-bit Floating

point ALU with point ALU
Pipelining without Pipelining

Cells 136564 103096

Cell 3.1175726 2.763565

Area(mm2)

Power(nW) 971665797.7 773982042.61

Worst path 1.018 34.323

delay (ns)
Fig 23: Chip Layout of 32-bit floating point ALU with
pipelining
From the Table 3 and Table 4, it can be summarized that the
delay is less for 32 and 64 bit Floating Point ALU with
Pipelining compared to without pipelining. The frequencies of
Operation of 32-bit Floating Point ALU and 64-bit Floating
Point ALU with Pipelining are 1.122GHz and 0.9823GHz
respectively.
4.2.2 RTL Schematic

Fig 24: Chip Layout of 64-bit floating point ALU with

pipelining

Fig. 21: RTL Schematic of 32-bit Floating Point ALU with

Pipelining

34
International Journal of Computer Applications (0975 – 8887)
Volume 94 – No.17, May 2014

5. ACKNOWLEDGEMENTS [9] Poornima M, Shivaraj Kumar Patil, Shivukumar ,

Shridhar K P , Sanjay H,” “Implementation of Multiplier
It is pleasure to thank MR. JAYKRISHANAN P .Asst.
using Vedic Algorithm” International Journal of
professor from the department of SENSE (School of
Innovative Technology and Exploring Engineering
Electronics engineering), VIT University for helping in the
(IJITEE), ISSN: 2278-3075, Volume-2, Issue-6, May
project work. His guidance, encouragement and suggestions
2013.
are helpful from the starting to the end of this project.
[10] Itagi Mahi P. and S. S. Kerur “ Design and Simulation of
6. CONCLUSION AND FUTURE WORK Floating Point Pipelined ALU Using HDL and IP Core
In this paper, The 32-bit and 64- bit floating point ALU using
Generator” ISSN 2277– 4106©2013 INPRESSCO.
Pipelining are implemented successfully and the comparison
of 32 and 64 bit floating point ALU using pipelining has done [11] Sukhmeet Kaur, Suman, Manpreet Singh Manna, Rajeev
with 32 and 64 bit floating point ALU without using Agarwal,” VHDL Implementation of Non Restoring
pipelining with respect to area, power and delay. The Division Algorithm Using High Speed
simulation with different test vectors is done in Modelsim Adder/Subtractor” International Journal of Advanced
6.5b. The Rounding logic for the floating point numbers after Research in Electrical, Electronics and Instrumentation
doing the required arithmetic and logical operations can be Engineering Vol. 2, Issue 7, July 2013.
implemented for 32 and 64 bit floating point ALU. The Power
got after the synthesizing can be lowered by different low [12] Shuchita Pare, Dr. Rita Jain,”32 Bit Floating Point
power techniques. For the complete analysis of the ASIC Arithmetic Logic Unit A LU Design and
design, one can do the post-layout simulation (Formal Simulation,”IJETECS, Vol-1, Issue 8, December 2012.
verification) that was left in this paper. [13] Deepti Shrivastava,Rajesh Nema,” Double Precision
floating point ALU Implementation using VHDL”
7. REFERENCES ,International Journal of Advanced Electronics
[1] Rajit Ram Singh, Asish Tiwari, Vinay Kumar Singh, &communication systems Approved by CSIR-NISCAIR
Geetam S Tomar,” VHDL environment for floating ISSN NO:2277-7318.
point Arithmetic Logic Unit -ALU design and
simulation”, 2011 International Conference on [14] V.Vinay Chamkur, Chetana. R,” Design and
Communication Systems and Network Technologies. Implementation of IEEE-754 Addition and Subtraction
for Floating Point Arithmetic Logic Unit “, in
[2] Kai Hwang Book,“Advanced Computer Architecture”. Proceedings of International Conference on Computer
[3] ANSI WEE STD 754-1985, “IEEE Standard for Binary Science, Information and Technology, Pune, ISBN-978-
Floating-Point Arithmetic”, IEEE, New York, 1985. 93-81693-83-4, 23rd June, 2012.

[4] Mamu Bin Ibne Reaz, MEEE, Md. Shabiul Islam, [15] Surendra Singh Rajpoot, Nidhi Maheshwari, D.S. Yadav,
MEEE, Mohd. S. Sulaiman, MEEE,” Pipeline Floating ”Design and Implementation of efficient 32-bit floating
Point ALU Design using VHDL” ICSE2002 Proc. Point multiplier using verilog”, International Journal of
2002 , Penang, Malaysia. Engineering and Computer Science,Vol-2 ,Issue 6,June
2013,Page no.2098-2101.
[5] Shao Jie, Ye Ning, Zhang Xiao-Yan,” An IEEE
compliant Floating-point Adder with the Deeply [16] A book on “Verilog HDL: A Guide to Digital Design and
Pipelining paradigm on FPGAs”, 2008 International Synthesis” by. Samir Palnitkar , second edition.
Conference on Computer Science and Software [17] A User Manual on “GUI Guide for Encounter® RTL
Engineering. Compiler” by cadence®, Product Version 6.1, June,
[6] Prashant Gurjar, Rashmi Sola Pooja Kansliwal, 2006.
Mahendra Vucha, “VLSI Implementation of Adders for [18] A User Manual on “Using Encounter® RTL Compiler”
High Speed ALU. by cadence®, Product Version 9.1, September 14, 2009.
[7] A. Anand Kumar Book,” Fundamentals of Digital [19] Website:http://babbage.cs.qc.cuny.edu/IEEE-
Circuits”. 754.old/32bit.html.
[8] V.Narasimha rao, V.Swathi,” Normalization on floating [20] Website:http://www.academic.marist.edu/~jzbv/architect
point multiplication using Verilog HDL”, International ure/MultiplicationDivisionFP.htm.
Journal of VLSI and Embedded Systems-IJVES,
ISSN: 2249 – 6556.

IJCATM : www.ijcaonline.org 35

Tesla Dojo Technology
89% (9)
Tesla Dojo Technology
9 pages
Presentation Java Basics 1530977673 47076
No ratings yet
Presentation Java Basics 1530977673 47076
112 pages
Lecture 1 - Introduction: Arto Perttula TIE-50206 Logic Synthesis Tampere University of Technology 2017-2018
No ratings yet
Lecture 1 - Introduction: Arto Perttula TIE-50206 Logic Synthesis Tampere University of Technology 2017-2018
57 pages
Autocad Subassembly Composer Expressions
67% (3)
Autocad Subassembly Composer Expressions
17 pages
Implementation of Binary To Floating Point Converter Using HDL
No ratings yet
Implementation of Binary To Floating Point Converter Using HDL
41 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
Pic® Micro Principles V11
From Everand
Pic® Micro Principles V11
Clive W. Humphris
No ratings yet
Ece5017 Digital-Design-with-fpga Eth 1.0 40 Ece5017
No ratings yet
Ece5017 Digital-Design-with-fpga Eth 1.0 40 Ece5017
3 pages
FSM Design and Optimisation
No ratings yet
FSM Design and Optimisation
45 pages
Microsemi RTG4 FPGA Product Brief PB0051 V10
No ratings yet
Microsemi RTG4 FPGA Product Brief PB0051 V10
15 pages
Implementation of Smart Attendance On FPGA
No ratings yet
Implementation of Smart Attendance On FPGA
5 pages
Ug0727 User Guide Polarfire Fpga 10G Ethernet Solutions
No ratings yet
Ug0727 User Guide Polarfire Fpga 10G Ethernet Solutions
24 pages
Cyclone4 Handbook
No ratings yet
Cyclone4 Handbook
490 pages
STM32 Configuration and Initialization C Code Generation
100% (2)
STM32 Configuration and Initialization C Code Generation
7 pages
Entry Level Software Engineer Resume
No ratings yet
Entry Level Software Engineer Resume
1 page
Floating Point Arithmetic
No ratings yet
Floating Point Arithmetic
10 pages
Architecture of Fpga Altera Cyclone: BY:-Karnika Sharma Mtech (2 Year)
100% (1)
Architecture of Fpga Altera Cyclone: BY:-Karnika Sharma Mtech (2 Year)
29 pages
Lab2 MotorControl 08
100% (2)
Lab2 MotorControl 08
18 pages
Design and Modeling of I2C Bus Controller
No ratings yet
Design and Modeling of I2C Bus Controller
49 pages
Schematic+ +Hbridge+Ir2184
No ratings yet
Schematic+ +Hbridge+Ir2184
1 page
Embedded Systems
100% (1)
Embedded Systems
27 pages
Interview Questions On Microprocessor
No ratings yet
Interview Questions On Microprocessor
8 pages
Asic Prototyping Aldec
No ratings yet
Asic Prototyping Aldec
10 pages
All About FPGAs
No ratings yet
All About FPGAs
11 pages
Embedded System Design - Bubble Sort Algorithm, Embedded System Implementation
100% (1)
Embedded System Design - Bubble Sort Algorithm, Embedded System Implementation
29 pages
FPGA Senior Engineer Digital in Denver CO Resume Michael Surgeon
No ratings yet
FPGA Senior Engineer Digital in Denver CO Resume Michael Surgeon
2 pages
Embedded Systems Design - 2: Dr. N. Mathivanan
No ratings yet
Embedded Systems Design - 2: Dr. N. Mathivanan
10 pages
FPGA With Touch Screen
No ratings yet
FPGA With Touch Screen
23 pages
Design & Verification of AMBA APB Protocol
No ratings yet
Design & Verification of AMBA APB Protocol
4 pages
FSM Slides
0% (1)
FSM Slides
37 pages
Design Implementation of Nios II Processorfor Low Powered Embedded Systems
No ratings yet
Design Implementation of Nios II Processorfor Low Powered Embedded Systems
8 pages
Designing With The Nios II Processor and SOPC Builder Exercise Manual
No ratings yet
Designing With The Nios II Processor and SOPC Builder Exercise Manual
55 pages
Design Flow Vlsi
No ratings yet
Design Flow Vlsi
42 pages
Dijsktra Thesis
No ratings yet
Dijsktra Thesis
65 pages
FSM Implementations: TIE-50206 Logic Synthesis Arto Perttula Tampere University of Technology Fall 2017
100% (2)
FSM Implementations: TIE-50206 Logic Synthesis Arto Perttula Tampere University of Technology Fall 2017
25 pages
Senior Embedded Software Engineer in Ottawa Canada Resume Michael Nunan
No ratings yet
Senior Embedded Software Engineer in Ottawa Canada Resume Michael Nunan
2 pages
Open File 2
No ratings yet
Open File 2
68 pages
FIFODepth Calculation
No ratings yet
FIFODepth Calculation
2 pages
Lecture 13,14
No ratings yet
Lecture 13,14
44 pages
VHDL Tutorial
No ratings yet
VHDL Tutorial
68 pages
2009 - Open Book Exam BITS Pilani
No ratings yet
2009 - Open Book Exam BITS Pilani
2 pages
Experiment No 6: Implementation of Instruction Fetch Unit: Team Details: Terminal No: SL No Name Id No 1 2 3
No ratings yet
Experiment No 6: Implementation of Instruction Fetch Unit: Team Details: Terminal No: SL No Name Id No 1 2 3
6 pages
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
No ratings yet
FPGA Selection: LTC2387-18 S.No Pin - Name Pin - No. - ADC Mode Purpose
6 pages
6.hardware Software Codesign Ijrect
No ratings yet
6.hardware Software Codesign Ijrect
6 pages
Verilog Imp...
No ratings yet
Verilog Imp...
105 pages
Advanced VLSI Architecture Design For Emerging Digital Systems
No ratings yet
Advanced VLSI Architecture Design For Emerging Digital Systems
78 pages
FPGA Training: by Ushasri Merugu 21 Dec 2012
No ratings yet
FPGA Training: by Ushasri Merugu 21 Dec 2012
5 pages
Designing Finite State Machines (FSM) Using Verilog
No ratings yet
Designing Finite State Machines (FSM) Using Verilog
8 pages
System Verilog Lecture
No ratings yet
System Verilog Lecture
82 pages
Basic FPGA Architectures: Altera Xilinx
No ratings yet
Basic FPGA Architectures: Altera Xilinx
8 pages
Prac. 1 Keil Simulator
No ratings yet
Prac. 1 Keil Simulator
19 pages
Lec20 RTL Design
No ratings yet
Lec20 RTL Design
40 pages
Altera JTAG-to-Avalon-MM Tutorial: D. W. Hawkins (Dwh@ovro - Caltech.edu) March 14, 2012
No ratings yet
Altera JTAG-to-Avalon-MM Tutorial: D. W. Hawkins (Dwh@ovro - Caltech.edu) March 14, 2012
45 pages
Lab Report Fpga
No ratings yet
Lab Report Fpga
34 pages
Verilog HDL Introduction: Textbook
No ratings yet
Verilog HDL Introduction: Textbook
41 pages
Pulpissimo: Datasheet: The Pulp Team
No ratings yet
Pulpissimo: Datasheet: The Pulp Team
101 pages
Mediotek Health Systems PVT Ltd. Chennai
No ratings yet
Mediotek Health Systems PVT Ltd. Chennai
2 pages
Introduction To Quartus II 9.1 Web Edition
No ratings yet
Introduction To Quartus II 9.1 Web Edition
7 pages
VLSI Lab Manual
No ratings yet
VLSI Lab Manual
117 pages
Buchblock
No ratings yet
Buchblock
103 pages
Application-Specific Integrated Circuit ASIC A Complete Guide
From Everand
Application-Specific Integrated Circuit ASIC A Complete Guide
Gerardus Blokdyk
No ratings yet
Floating Point Multiplier With The Use of Alu
No ratings yet
Floating Point Multiplier With The Use of Alu
4 pages
32 Bit Floating Point ALU
0% (1)
32 Bit Floating Point ALU
7 pages
Circular Data Correlation PDF
No ratings yet
Circular Data Correlation PDF
24 pages
8 SQL Data Types in Sap Hana
No ratings yet
8 SQL Data Types in Sap Hana
8 pages
Unit-1_Slides_COA_updated
No ratings yet
Unit-1_Slides_COA_updated
66 pages
Cp-Ii (Java) Notes by Deepak Gaikar
No ratings yet
Cp-Ii (Java) Notes by Deepak Gaikar
59 pages
05-TYPES-OF-PIPELINING-
No ratings yet
05-TYPES-OF-PIPELINING-
56 pages
Reference For XnumbersXla v6p0p5p2
No ratings yet
Reference For XnumbersXla v6p0p5p2
18 pages
Band Math
No ratings yet
Band Math
10 pages
Goldscmidt Algo
No ratings yet
Goldscmidt Algo
4 pages
Basic Elements of C++ PDF
No ratings yet
Basic Elements of C++ PDF
12 pages
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
No ratings yet
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
4 pages
Migrating From DB2 To PostgreSQL - What You Should Know - Severalnines
No ratings yet
Migrating From DB2 To PostgreSQL - What You Should Know - Severalnines
13 pages
Computer Architecture CS F342 Ca-Lect7
No ratings yet
Computer Architecture CS F342 Ca-Lect7
11 pages
Representing Geography: Geographic Information Systems and Science, 2nd Edition
No ratings yet
Representing Geography: Geographic Information Systems and Science, 2nd Edition
22 pages
Apiref
No ratings yet
Apiref
704 pages
Unit - Ii Arithmetic For Computers
No ratings yet
Unit - Ii Arithmetic For Computers
28 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
MATLAB Programming Fundamentals-MathWorks (2023)
No ratings yet
MATLAB Programming Fundamentals-MathWorks (2023)
1,602 pages
MD Nastran 2006 DMAP Programmer's Guide
100% (1)
MD Nastran 2006 DMAP Programmer's Guide
1,848 pages
MC6839 Floating-Point ROM Manual PDF
No ratings yet
MC6839 Floating-Point ROM Manual PDF
94 pages
COA - Unit2 Floating Point Arithmetic 3
No ratings yet
COA - Unit2 Floating Point Arithmetic 3
19 pages
LabVIEW Workbook v1.2
No ratings yet
LabVIEW Workbook v1.2
39 pages
4.4_1 New Floating Point.pptx
No ratings yet
4.4_1 New Floating Point.pptx
22 pages
Floating Point
No ratings yet
Floating Point
33 pages
MIPS Green Sheet
No ratings yet
MIPS Green Sheet
2 pages
Java
No ratings yet
Java
309 pages
Data Types in Java Notes
No ratings yet
Data Types in Java Notes
4 pages
Chapter IV Computer Arithmetic
No ratings yet
Chapter IV Computer Arithmetic
133 pages

Floating Point ALU Design PDF

Uploaded by

Floating Point ALU Design PDF

Uploaded by

International Journal of Computer Applications (0975 – 8887)

Volume 94 – No.17, May 2014

ASIC Implementation of 32 and 64 bit Floating

Dave Omkar R. Aarthy M.

1 bit 8 bit 23 bit

Fig 1: Basic IEEE 754 standard format for single precision

multiplexer is used to select status bits. These status bits are

Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Addition

Stage1 Stage2 Stage3 Stage4 Stage5 Stage6 Subtraction

can be viewed in Fig 3. The sub-objectives were to design

The addition, subtraction, multiplication and division were Demux

MUL Table 1. Selection of ALU operation

No. Selection bits[3:0] ALU Operation

3. DESIGN AND METHODOLOGY 5 0100 Reciprocal

3.2 Modified Top level architecture of 64- s1 D-

bit ALU 32 bit

The difference between 32 and 64 bit floating point ALU is in Operand a

3.3 Modified 32-bit and 64-bit Pipelined Clock

3.3.2.5 Exception Checker

3.5 Modified 32-bit and 64-bit Pipelined

Compute the final mantissa &drop Separate mantissa, exponent

3.5.2 Modified Pipelined Division architecture

s1 D- s1d Sign Calculation sc D- scd e5 sn mfD- snd

Fig 10: 32 bit pipelined division architecture

The working of the Fig.10 is explained below. 3.6.2 Sign passer

3.5.2.2 Divide Module 3.6.4 Exponent Sub tractor

architecture is shown in Fig 11.

s1 s1d sc scd e5 sn mfD- snd Selection

32 bit s1 D- s1d sc D- scd si D- sid sn D-

2:1 o od e1 D- e1d Compara ec D- ecd Mantissa ei D- eid en D- edn e D- ed

Fig 13: 32- bit proposed pipelined Increment architecture

m1 m1d shifter mc1 mc1d st std

Fig 14: 32-bit proposed pipelined Logical AND

3.9.1 Comparator and shifter

3.9.2 Exponent Passer

Fig 16: output of 32-bit Floating Point Multiplication and division

Fig 17: output of 32-bit Floating Point Reciprocal

Fig 18: output of 64-bit Floating Point Addition and Subtraction

Fig 19: output of 64-bit Floating Point Multiplication and Division

Fig 20: output of 64-bit Floating Point Reciprocal

4.2 Synthesis results

Parameters 32-bit Floating 32-bit Floating

Cells 41092 37621

Cell 0.936860 0.820796

Power(nW) 213482898.96 15799143.81

Worst path 0.891 16.231

Parameters 64-bit Floating 64-bit Floating

Cells 136564 103096

Cell 3.1175726 2.763565

Power(nW) 971665797.7 773982042.61

Worst path 1.018 34.323

Fig 24: Chip Layout of 64-bit floating point ALU with

Fig. 21: RTL Schematic of 32-bit Floating Point ALU with

5. ACKNOWLEDGEMENTS [9] Poornima M, Shivaraj Kumar Patil, Shivukumar ,

You might also like