0% found this document useful (0 votes)
2 views

Area-Delay Efficient and Low-Power Carry Skip adder for high performance computing systems

The paper presents a novel low-complexity carry-skip adder (CSK) design aimed at enhancing performance in high-performance computing systems while minimizing power consumption and area. The proposed 32-bit CSK adder demonstrates a reduction in delay, area, and power by 10.2%, 13.6%, and 8% respectively compared to existing designs, and achieves a 22.5% reduction in area-delay product. The architecture utilizes modified logic formulations to optimize performance and reduce redundancy in computations.

Uploaded by

jatin belani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Area-Delay Efficient and Low-Power Carry Skip adder for high performance computing systems

The paper presents a novel low-complexity carry-skip adder (CSK) design aimed at enhancing performance in high-performance computing systems while minimizing power consumption and area. The proposed 32-bit CSK adder demonstrates a reduction in delay, area, and power by 10.2%, 13.6%, and 8% respectively compared to existing designs, and achieves a 22.5% reduction in area-delay product. The architecture utilizes modified logic formulations to optimize performance and reduce redundancy in computations.

Uploaded by

jatin belani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS)

Area-Delay Efficient and Low-Power Carry Skip


Adder for High Performance Computing Systems
Sujit Kumar Patel∗ , Bharat Garg† , Anurag Mahajan ‡ and Shireesh Kumar Rai§
∗†§ ECE Department, Thapar Institute of Engineering and Technology Patiala, Punjab, India
‡ E&TC Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
Email: {∗ sujit.patel,† bharat.garg, § skumar.rai}@thapar.edu ‡ [email protected]

Abstract—The high performance computing architectures are are notably higher than the RCA, they significantly reduce
the primary need for the modern devices having compute- the delay. On the other hand, adder architectures with CSK
intensive applications such as signal and image processing. This technique provide higher speed than RCA with moderate logic
paper presents a novel low complexity carry-skip adder design
that provides high-speed and consumes low-power making it complexity [10], [11], [13].
suitable for the development of high-performance signal process- Among these adders, CSK provides high performance with
ing cores. The proposed adder is derived from the reformulated small area overhead. The conventional CSK adder presented
Boolean expressions that avoid the redundant computations with in [13] consists of a segment which is designed using small
reduced critical path delay. The proposed architectures are coded sized RCA and CSK logic. The complexity of this segment
in VHDL and synthesized with Synopsys Design Compiler using
65nm CMOS library. Synthesis results demonstrate that the is high due to the use of dedicated CSK logic. Further, a
proposed 32-bit carry skip adder reduces the delay, area and CSK logic with reduced delay by the use of compound logic
power by 10.2%, 13.6% and 8% respectively than the best known cells i.e. AND-OR-Invert (AOI) and OR-AND-Invert (OAI) is
carry skip adder. Finally, the proposed 16- and 32-bit adders presented in [14]. This CSK design provides high performance
reduce the area-delay product by 14.4% and 22.5% respectively with nearly same implementation complexity over the adder
over the existing adder.
Index Terms—Carry-Skip adder, Low-power, High-speed,
presented in [13]. However, it is possible to develop the
Computing architecture, Logic complexity. carry skip adder architecture with better design metrics by
doing an analysis of different Boolean operations and critical-
path present in the state-of-the-art CSK adder. The major
I. I NTRODUCTION
contributions are as follows:
Low-power and area efficient processing cores are required • The paper presents an analysis of the logic operations
by the modern portable devices due to exhibiting several performed in the CSK adder segment and path-delay.
compute intensive image/video processing applications. These • Further, a modified logic formulation and corresponding
signal processing cores (e.g. filter) require several adders and architecture for the CSK adder segment is proposed.
multipliers/dividers for its implementation [1]. Since adder is • Furthermore, a novel high performance CSK adder ar-
the primary component which is most frequently used as stan- chitecture with large bit-width using proposed adder
dalone arithmetic unit and also used to design other arithmetic segments is presented.
units such as subtractor, multiplier and divider [2]. Among the • Finally, the paper demonstrates 22.5% reduced area-delay
various adder architectures, ripple carry adder (RCA) is one product (ADP) by the proposed 32-bit adder design over
of the most area and power efficient design due to its simplest the existing adder.
structure. However, the performance of the RCA is limited The rest of the paper organized as follows: Section II
by the long carry propagation delay [2]. To reduce this large presents an analysis of the state-of-the-art carry skip adder
carry propagation delay, different methods namely look-ahead- whereas; Section III presents a low-complexity CSK adder
carry (CLA), carry select (CSL), parallel prefix and carry-skip segments. Further, a new high speed carry skip adder is
(CSK) are presented in [3]–[11]. However, approximate adders presented in Section IV whereas; the synthesis results and
are also presented to achieve high performance and low energy their analysis is presented in Section V. Finally Section VI
consumption for error tolerant applications [12]. concludes the paper.
The adder developed on the basis of CLA [3] provides high
speed but the single level implementation for operands with II. A NALYSIS OF STATE - OF - THE - ART CARRY SKIP ADDER
large-width makes it area and power inefficient. Therefore, This section presents an analysis of the data-flow and logic
CLA adders for the large bit-width are designed by cascad- operations involved in the adder segment of the state-of-the-
ing the 4-bit CLAs which provide trade-off between delay art carry-skip adder presented in [14] to identify the redundant
and logic complexity. The parallel prefix carry computation logics and critical path. The m-bit adder-segment as illustrated
method provides the many carry computing architectures with in Fig. 1 contains a RCA module, an incrementation block
their different area and speed characteristics [4], [5]. However, (IB), logical AND gate and AOI carry skip logic. The RCA
the power and area consumptions in the CSL adders [6]–[9] module can be implemented by cascading full adders (FAs).

978-1-7281-4655-3/19/$31.00 ©2019 IEEE 300


DOI 10.1109/iSES47678.2019.00074
The logic operations performed in these units are given by the 2) Further, the final sum bits exhibit large delay due to
following equations. being processed by first RCA and then incrementation
block. This delay can be reduced if we compute the sum
xm-1:0 ym-1:0 bits by RCA with input cin and separately computing the
final carry out (c0out ) with cin = 0.
0
cout 3) Finally, from the given expressions it can be observed
RCA Module ‘0’
that the carry-skip control signal is generated with the
0
fsm-1:0 help sum-bits of RCA which has more data-path delay.
0
fsm-1:0 Therefore, the delay of the control signal is more and
can be reduced to achieve high-speed operation.
cout
cin On the basis of above analysis, the modified logic ex-
AOI
pressions and corresponding low-complexity high speed CSK
Incrementation Block
adder segment is presented in the next Section.
fsm-1:0
III. P ROPOSED L OW-C OMPLEXITY CSK A DDER S EGMENT
Fig. 1. Carry-skip adder segment with AOI carry-skip logic [14]. This section first presents the modified logic expressions
for the CSK adder segment and then presents corresponding
architecture. Finally, the high level complexity analysis of the
proposed adder segment over the existing adder segment is
Operations performed in the ith FA of the RCA: presented.
hsi = xi ⊕ yi ; hci = xi · yi (1a)
A. Modified Logic Expression for Proposed Adder Segment
c0i = hci + hsi · c0i−1 ; c0−1 = 0 (1b)
Based on the above analysis, we split the RCA module
c0out = c0m−1 (1c) into small units to reduce the delay and implementation
complexity by reusing the intermediate signals. The modified
f s0i = hsi ⊕ c0i−1 (1d) logic expressions for the proposed adder segment are given by
Logical AND of sum-bits: the following equations:
m−1
s0AND = ∏ f s0i Operations in RCA with input carry cin :
i=0
hsi = xi ⊕ yi ; hci = xi · yi (2a)
Carry Generation:
f ci = hci + hsi · f ci−1 ; f c−1 = cin (2b)
cout = c0out + cin · s0AND si = hsi ⊕ f ci−1 (2c)
cout = c0out + cin · s0AND (1e) Output carry generation with cin = 0:
Incrementation Block: c0i = hci + hsi · c0i−1 ; c0−1 = 0
ci = f s0 · ci−1 ; c−1 = cin (1f) c0out = c0m−1 (2d)
f si = f s0i ⊕ ci−1 . (1g) Carry-skip control signal:
m−1
where 0 ≤ i ≤ m−1. The ith FA first computes the intermediate
sum and carry bits (half-sum hsi and half-carry hci ) and then
hsAND = ∏ hsi (2e)
i=0
computes the output carry (c0i ) and sum ( f s0i ). Further, carry-
skip control signal (s0AND ) is obtained by logical AND of the Carry Generation:
sum-bits. The incrementation block (IB) computes the output cout = c0out + cin · hsAND
sum-bit ( f si ) by using f s0i of the RCA and input carry (cin ).
From the above logic operations, we have made the following cout = c0out + cin · hsAND (2f)
observations:
1) The sum bits are generated twice with different carry
inputs by RCA (with cin = 0) and IB (with cin ). There- where 0 ≤ i ≤ m − 1. Fig. 2 shows the proposed carry skip
fore, this architecture exhibits redundancy which can adder segment derived from the modified logic Eq. 2(a) - 2(f).
be removed to reduce implementation complexity. By In this architecture, the RCA is constructed from the half-
suitable modification in the logic expressions, the re- sum and half-carry (HS-HC), carry generation (CG) and sum
dundant operations can be easily avoided to reduce the generation (SG) units. The HS-HC unit computes the half-
implementation area. sum (hsm−1:0 ) and half-carry (hcm−1:0 ) bits and send them

301
to the CG unit. The CG unit then computes the carry-bits cell (8 Transistors) is significantly less than the total area of
( f cm−2:0 ) which are given to the SG unit to produce the final the AND (6 Transistors) and XOR (12 Transistors) cells [15].
sum bits (sm−1:0 ). The output carry bit generation (OCBG) Therefore, it is expected that the modified logic formulation
unit includes: a final carry bit generation (FCBG) module to based adder segment will occupy less silicon area compared
compute the output carry (c0out ) with cin = 0 according to Eq. to the existing adder segment. Due to the reduced number of
2(d), a logical AND gate to generate the carry-skip control transistors, the proposed adder segment will exhibits reduced
signal (hsAND ) according to 2(e) and an AOI carry skip logic switching capacitances which will lead to the reduction in the
to generate the output carry in the complemented form (cout ) power consumption and delay. Along with this, the control
according to 2(f). signal generated from the half-sum bits in place of the sum-
bits generated by RCA as the case in [14] which will offer
xm-1:0 ym-1:0 significantly reduced delay.

OCBG Unit RCA TABLE I


HS-HC Unit
C OMPARISON OF THE LOGIC OPERATIONS INVOLVED IN THE PROPOSED
AND EXISTING ADDER SEGMENT

hsm-1:0 hcm-1:0 Operations in Existing [14] Proposed Comments


m-bit
RCA   Same operations
FCBG (Cin=0) CG Unit
0
CSG   Same operation
cout fcm-2:0
cout CSK   Same AOI/OAI operation
cin IB   Includes (m − 1) AND
SG Unit
AOI
and m XOR operations

sm-1:0 FCBG   (m − 1) AND-OR


operations

RCA: ripple carry adder, CSG: control signal generation, CSK: carry skip
Fig. 2. Proposed adder segment based on the AOI carry skip logic. operation, FCBG: Final carry bit generation, IB: incrementation block.

Similarly, the architecture of the segment based on the OR- The proposed adder segments are utilized to design high
AND-Invert (OAI) carry skip logic is developed and shown performance low complexity CSK adder which is presented
in Fig. 3. In the OCBG unit, the output carry c0out and carry in the next section.
control signal hsAND generated in the complemented forms
while the RCA section is identical to the architecture shown IV. P ROPOSED C ARRY-S KIP A DDER
in Fig. 2. The architecture of the carry skip adder for the large bit-
width can be developed by cascading smaller size proposed
xm-1:0 ym-1:0 adder segments and conventional RCA. The generalized pro-
posed architecture contains proposed 4-bit AOI and OAI
OCBG Unit RCA
adder segments at the even and odd positions respectively
HS-HC Unit
except for the least significant ( j = 1) position where con-
ventional RCA is used. For example, a 16-bit carry skip
hsm-1:0 hcm-1:0 adder for addition of two 16-bit operands (a and b) is shown
m-bit in Fig. 4. It is constructed by cascading the proposed 4-
FCBG (Cin=0) CG Unit bit adder segments (with alternate AOI and OAI structures)
and a 4-bit conventional RCA. In the proposed architecture,
cout fcm-2:0
cout 16-bit input operands (a, b) are segmented into four 4-bit
cin operands (a4 j−1:4( j−1) and b4 j−1:4( j−1) ) which are applied to
OAI SG Unit
the respective adder segments and corresponding partial sum
sm-1:0
a15:12 b15:12 a11:8 b11:8 a7:4 b7:4 a3:0 b3:0
Fig. 3. Proposed adder segment based on the OAI carry skip logic.
cout 3 2
Segment cout Segment cout Segment
1
cout cin1 = ‘0’
(AOI) (OAI) RCA
The comparative study of the logic operations involved in cin4 cin3 (AOI)
cin2
the existing and modified logic formulations are listed in Table
s15:12 s11:8 s7:4 s3:0
I. From the Table I, it is observed that only operations in
incrementation block (IB) and OCBG block are different while
other operations are same. The area of AND-OR (AO) CMOS Fig. 4. Proposed 16-bit carry skip adder.

302
(s4 j−1:4( j−1) ) are concatenated to achieve desired sum, where 450 Proposed CSKA[14]
1 ≤ j ≤ m4 . The additional NOT gate is used to achieve non- 400
complemented carry output from the most significant adder 350

ADP (μm2 ×ns)


segment. This architecture provides more speed and occupy 300
less area compared to the adder of [14]. 250
The next section presents the comparison of the synthesis 200
results for analysis of the performance improvement in terms 150
of area, delay and power consumption. 100
50
V. ASIC S YNTHESIS R ESULTS 0
This section presents the synthesis results of the proposed 16 32
and existing carry skip adders for comparative analysis. These Bit-width (n)
adders are first coded in VHDL for 16− and 32− bit and
then synthesized in Synopsys Design Compiler (SDC) using Fig. 5. Comparison of Area-Delay product (ADP) for adders of different
bit-width.
65nm CMOS library. The area, delay and power (at 200MHz)
metrics obtained from SDC are summarized in Table II for
comparative analysis. From the synthesis results, it can be architecture can be effectively utilized in the high performance
observed that the proposed 32-bit CSK adder involve 10.2% computing systems.
less delay, 13.6% less area and 8% less power over the existing
adder [14]. R EFERENCES
[1] A. Jaiswal, B. Garg, V. Kaushal, and G. Sharma, “SPAA-aware 2D Gaus-
TABLE II sian smoothing filter design using efficient approximation techniques,”
S YNTHESIS RESULTS OF THE PROPOSED AND EXITING CSK ADDERS in VLSI Design (VLSID), 2015 28th International Conference on. IEEE,
USING 65 NM CMOS PDK 2015, pp. 333–338.
[2] B. Parhami, Computer arithmetic. Oxford university press, 1999,
Adder Bit-width (n) Delay Area Power vol. 20.
[3] K. Hwang, “Computer arithmetic principles, architecture, and design,”
(ns) (μm2 ) (μW) 1979.
CSKA [14] 16 0.29 627.84 16.73 [4] A. Beaumont-Smith and C.-C. Lim, “Parallel prefix adder design,” in
Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on.
32 0.39 985.32 35.18 IEEE, 2001, pp. 218–225.
16 0.27 577.44 15.53 [5] R. E. Ladner and M. J. Fischer, “Parallel prefix computation,” Journal
Proposed of the ACM (JACM), vol. 27, no. 4, pp. 831–838, 1980.
32 0.35 851.04 32.39
[6] Y. He, C.-H. Chang, and J. Gu, “An area efficient 64-bit square
Power is estimated at the normalized clock frequency 200 MHz. root carry-select adder for low power applications,” in 2005 IEEE
International Symposium on Circuits and Systems. IEEE, 2005, pp.
4082–4085.
The area-delay product (ADP1 ) is computed for the pro- [7] B. Ramkumar and H. M. Kittur, “Low-power and area-efficient carry
posed and existing adder designs and illustrated in Fig. 5 for select adder,” IEEE transactions on very large scale integration (VLSI)
systems, vol. 20, no. 2, pp. 371–375, 2012.
comparative analysis. Fig. 5 shows that the proposed adder [8] B. K. Mohanty and S. K. Patel, “Area–delay–power efficient carry-select
involve significantly less ADP over the the best known existing adder,” IEEE transactions on circuits and systems II: express briefs,
adder architecture [14]. It is clear from the synthesis results vol. 61, no. 6, pp. 418–422, 2014.
[9] M. Bahadori, M. Kamal, A. Afzali-Kusha, and M. Pedram, “An energy
that the proposed adder out performs over the best-known and area efficient yet high-speed square-root carry select adder struc-
adder architecture. ture,” Computers & Electrical Engineering, vol. 58, pp. 101–112, 2017.
[10] M. Lehman and N. Burla, “Skip techniques for high-speed carry-
VI. C ONCLUSION propagation in binary arithmetic units,” IRE Transactions on Electronic
Computers, no. 4, pp. 691–698, 1961.
In this paper, we presented a new carry skip adder segment [11] K. Chirca, M. Schulte, J. Glossner, H. Wang, B. Mamidi, P. Balzola,
by reformulating the Boolean expressions to eliminate the and S. Vassiliadis, “A static low-power, high-performance 32-bit carry
redundancy existing in the state-of-the-art adder architecture. skip adder,” in Euromicro Symposium on Digital System Design, 2004.
DSD 2004. IEEE, 2004, pp. 615–619.
Further, a high-performance adder segment is presented on the [12] B. Garg, Y. Bisht, “A Novel High Performance Reverse Carry Propagate
basis of reformulated Boolean expressions. Finally, large bit- Adder for Energy Efficient Multimedia Applications,” in IEEE Interna-
width adder architecture is proposed by cascading proposed tional Symposium on Smart Electronic Systems (IEEE-iSES). IEEE,
2019, pp. 1–4.
adder segment and RCA segment. The effectiveness of the [13] M. Alioto and G. Palumbo, “A simple strategy for optimized design of
proposed adder over the existing is evaluated by implementing one-level carry-skip adders,” IEEE Transactions on Circuits and Systems
them in VHDL and synthesizing with the help of Synopsis I: Fundamental Theory and Applications, vol. 50, no. 1, pp. 141–148,
2003.
Design Compiler using 65nm CMOS library. Synthesis result [14] M. Bahadori, M. Kamal, A. Afzali-Kusha, M. Pedram et al., “High-
shows that the proposed 32-bit carry skip adder involve 10.2% speed and energy-efficient carry skip adder operating under a wide range
less delay, 13.6% less area and 8% less power over the of supply voltage levels.” IEEE Trans. VLSI Syst., vol. 24, no. 2, pp.
421–433, 2016.
best available carry skip adder. Therefore, the proposed adder [15] J. M. Rabaey, A. Chandrakasan, and B. Nikolić, Digital integrated
circuits: a design perspective, 2003.
1 ADP = Area × Delay

303

You might also like