Area-Delay Efficient and Low-Power Carry Skip adder for high performance computing systems
Area-Delay Efficient and Low-Power Carry Skip adder for high performance computing systems
Abstract—The high performance computing architectures are are notably higher than the RCA, they significantly reduce
the primary need for the modern devices having compute- the delay. On the other hand, adder architectures with CSK
intensive applications such as signal and image processing. This technique provide higher speed than RCA with moderate logic
paper presents a novel low complexity carry-skip adder design
that provides high-speed and consumes low-power making it complexity [10], [11], [13].
suitable for the development of high-performance signal process- Among these adders, CSK provides high performance with
ing cores. The proposed adder is derived from the reformulated small area overhead. The conventional CSK adder presented
Boolean expressions that avoid the redundant computations with in [13] consists of a segment which is designed using small
reduced critical path delay. The proposed architectures are coded sized RCA and CSK logic. The complexity of this segment
in VHDL and synthesized with Synopsys Design Compiler using
65nm CMOS library. Synthesis results demonstrate that the is high due to the use of dedicated CSK logic. Further, a
proposed 32-bit carry skip adder reduces the delay, area and CSK logic with reduced delay by the use of compound logic
power by 10.2%, 13.6% and 8% respectively than the best known cells i.e. AND-OR-Invert (AOI) and OR-AND-Invert (OAI) is
carry skip adder. Finally, the proposed 16- and 32-bit adders presented in [14]. This CSK design provides high performance
reduce the area-delay product by 14.4% and 22.5% respectively with nearly same implementation complexity over the adder
over the existing adder.
Index Terms—Carry-Skip adder, Low-power, High-speed,
presented in [13]. However, it is possible to develop the
Computing architecture, Logic complexity. carry skip adder architecture with better design metrics by
doing an analysis of different Boolean operations and critical-
path present in the state-of-the-art CSK adder. The major
I. I NTRODUCTION
contributions are as follows:
Low-power and area efficient processing cores are required • The paper presents an analysis of the logic operations
by the modern portable devices due to exhibiting several performed in the CSK adder segment and path-delay.
compute intensive image/video processing applications. These • Further, a modified logic formulation and corresponding
signal processing cores (e.g. filter) require several adders and architecture for the CSK adder segment is proposed.
multipliers/dividers for its implementation [1]. Since adder is • Furthermore, a novel high performance CSK adder ar-
the primary component which is most frequently used as stan- chitecture with large bit-width using proposed adder
dalone arithmetic unit and also used to design other arithmetic segments is presented.
units such as subtractor, multiplier and divider [2]. Among the • Finally, the paper demonstrates 22.5% reduced area-delay
various adder architectures, ripple carry adder (RCA) is one product (ADP) by the proposed 32-bit adder design over
of the most area and power efficient design due to its simplest the existing adder.
structure. However, the performance of the RCA is limited The rest of the paper organized as follows: Section II
by the long carry propagation delay [2]. To reduce this large presents an analysis of the state-of-the-art carry skip adder
carry propagation delay, different methods namely look-ahead- whereas; Section III presents a low-complexity CSK adder
carry (CLA), carry select (CSL), parallel prefix and carry-skip segments. Further, a new high speed carry skip adder is
(CSK) are presented in [3]–[11]. However, approximate adders presented in Section IV whereas; the synthesis results and
are also presented to achieve high performance and low energy their analysis is presented in Section V. Finally Section VI
consumption for error tolerant applications [12]. concludes the paper.
The adder developed on the basis of CLA [3] provides high
speed but the single level implementation for operands with II. A NALYSIS OF STATE - OF - THE - ART CARRY SKIP ADDER
large-width makes it area and power inefficient. Therefore, This section presents an analysis of the data-flow and logic
CLA adders for the large bit-width are designed by cascad- operations involved in the adder segment of the state-of-the-
ing the 4-bit CLAs which provide trade-off between delay art carry-skip adder presented in [14] to identify the redundant
and logic complexity. The parallel prefix carry computation logics and critical path. The m-bit adder-segment as illustrated
method provides the many carry computing architectures with in Fig. 1 contains a RCA module, an incrementation block
their different area and speed characteristics [4], [5]. However, (IB), logical AND gate and AOI carry skip logic. The RCA
the power and area consumptions in the CSL adders [6]–[9] module can be implemented by cascading full adders (FAs).
301
to the CG unit. The CG unit then computes the carry-bits cell (8 Transistors) is significantly less than the total area of
( f cm−2:0 ) which are given to the SG unit to produce the final the AND (6 Transistors) and XOR (12 Transistors) cells [15].
sum bits (sm−1:0 ). The output carry bit generation (OCBG) Therefore, it is expected that the modified logic formulation
unit includes: a final carry bit generation (FCBG) module to based adder segment will occupy less silicon area compared
compute the output carry (c0out ) with cin = 0 according to Eq. to the existing adder segment. Due to the reduced number of
2(d), a logical AND gate to generate the carry-skip control transistors, the proposed adder segment will exhibits reduced
signal (hsAND ) according to 2(e) and an AOI carry skip logic switching capacitances which will lead to the reduction in the
to generate the output carry in the complemented form (cout ) power consumption and delay. Along with this, the control
according to 2(f). signal generated from the half-sum bits in place of the sum-
bits generated by RCA as the case in [14] which will offer
xm-1:0 ym-1:0 significantly reduced delay.
RCA: ripple carry adder, CSG: control signal generation, CSK: carry skip
Fig. 2. Proposed adder segment based on the AOI carry skip logic. operation, FCBG: Final carry bit generation, IB: incrementation block.
Similarly, the architecture of the segment based on the OR- The proposed adder segments are utilized to design high
AND-Invert (OAI) carry skip logic is developed and shown performance low complexity CSK adder which is presented
in Fig. 3. In the OCBG unit, the output carry c0out and carry in the next section.
control signal hsAND generated in the complemented forms
while the RCA section is identical to the architecture shown IV. P ROPOSED C ARRY-S KIP A DDER
in Fig. 2. The architecture of the carry skip adder for the large bit-
width can be developed by cascading smaller size proposed
xm-1:0 ym-1:0 adder segments and conventional RCA. The generalized pro-
posed architecture contains proposed 4-bit AOI and OAI
OCBG Unit RCA
adder segments at the even and odd positions respectively
HS-HC Unit
except for the least significant ( j = 1) position where con-
ventional RCA is used. For example, a 16-bit carry skip
hsm-1:0 hcm-1:0 adder for addition of two 16-bit operands (a and b) is shown
m-bit in Fig. 4. It is constructed by cascading the proposed 4-
FCBG (Cin=0) CG Unit bit adder segments (with alternate AOI and OAI structures)
and a 4-bit conventional RCA. In the proposed architecture,
cout fcm-2:0
cout 16-bit input operands (a, b) are segmented into four 4-bit
cin operands (a4 j−1:4( j−1) and b4 j−1:4( j−1) ) which are applied to
OAI SG Unit
the respective adder segments and corresponding partial sum
sm-1:0
a15:12 b15:12 a11:8 b11:8 a7:4 b7:4 a3:0 b3:0
Fig. 3. Proposed adder segment based on the OAI carry skip logic.
cout 3 2
Segment cout Segment cout Segment
1
cout cin1 = ‘0’
(AOI) (OAI) RCA
The comparative study of the logic operations involved in cin4 cin3 (AOI)
cin2
the existing and modified logic formulations are listed in Table
s15:12 s11:8 s7:4 s3:0
I. From the Table I, it is observed that only operations in
incrementation block (IB) and OCBG block are different while
other operations are same. The area of AND-OR (AO) CMOS Fig. 4. Proposed 16-bit carry skip adder.
302
(s4 j−1:4( j−1) ) are concatenated to achieve desired sum, where 450 Proposed CSKA[14]
1 ≤ j ≤ m4 . The additional NOT gate is used to achieve non- 400
complemented carry output from the most significant adder 350
303