0% found this document useful (0 votes)
111 views

A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation

This document summarizes a research paper that proposes a novel approximate adder design using improved carry prediction and constant truncation with error reduction. The proposed adder achieves significantly better accuracy compared to existing approximate adders, with up to a 98.9% reduction in error distance. It also offers excellent hardware efficiency, with up to a 95.7% reduction in the power-accuracy and area-delay products compared to other adders. When considering both hardware cost and accuracy, the proposed adder performs best among those analyzed in the research.

Uploaded by

krishna s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation

This document summarizes a research paper that proposes a novel approximate adder design using improved carry prediction and constant truncation with error reduction. The proposed adder achieves significantly better accuracy compared to existing approximate adders, with up to a 98.9% reduction in error distance. It also offers excellent hardware efficiency, with up to a 95.7% reduction in the power-accuracy and area-delay products compared to other adders. When considering both hardware cost and accuracy, the proposed adder performs best among those analyzed in the research.

Uploaded by

krishna s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Received July 30, 2021, accepted August 21, 2021, date of publication August 27, 2021, date of current

version September 7, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3108443

A Novel Approximate Adder Design Using


Error Reduced Carry Prediction and
Constant Truncation
JUNGWON LEE , (Graduate Student Member, IEEE),
HYOJU SEO , (Graduate Student Member, IEEE),
HYELIN SEOK , (Student Member, IEEE), AND YONGTAE KIM , (Member, IEEE)
School of Computer Science and Engineering, Kyungpook National University, Buk-gu, Daegu 41566, South Korea
Corresponding author: Yongtae Kim ([email protected])
This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT)
under Grant NRF-2020R1A4A1019628, and in part by the Ministry of Education under Grant NRF-2019R1I1A3A01061266.

ABSTRACT This paper proposes a novel approximate adder that exploits an error-reduced carry prediction
and constant truncation with error reduction schemes. The proposed adder design techniques significantly
improve overall computation accuracy while providing excellent hardware efficiency. Particularly, the pro-
posed carry prediction technique can reduce a prediction error rate by up to 75% compared to existing
approximate adders considered in this paper. Furthermore, the error reduction technique also enhances the
overall computation accuracy by decreasing the error distance (ED). Our experimental results show that
the proposed adder improves the normalized mean ED (NMED) and mean relative ED (MRED) by up
to 91.4% and 98.9%, respectively, compared to the other approximate adders. Importantly, an excellent
design tradeoff allows the proposed adder to be the most competitive of the adders under consideration.
Specifically, the proposed adder achieves up to 95.7%, 91.1%, and 93.2% reductions of the power-NMED,
energy-NMED, and area-delay product (ADP)-NMED products, respectively, compared to the other adders.
Our adder enhances the power-, energy-, and ADP-MRED products by up to 99.4% compared to the others.
In particular, the figure of merit (FoM) considering both hardware and accuracy of the proposed adder is up
to 93.05% smaller than that of the other approximate adders considered herein. Furthermore, we confirm
that the approximation errors caused by the proposed adder have very little impact on output quality when
adopted in practical applications, such as digital image processing and machine learning.

INDEX TERMS Approximate adder, approximate computing, carry prediction, constant truncation, error
reduction.

I. INTRODUCTION a certain level of errors can be acceptable. The limitation of


With the prevalence of battery-operated mobile and portable human perception offers an opportunity for a new comput-
devices, power and energy consumption become the key ing paradigm, approximate computing, trading computation
constraint in system design because applications on these accuracy for power and energy [10]–[12]. Because adders are
devices process a vast amount of computationally intensive fundamental arithmetic components in computing systems,
information, such as multimedia (i.e., image, video, and the design of efficient approximate adders is a practical way
audio) processing, deep learning, data mining, and recog- to enable approximate computing. Therefore, it has gained
nition, under a limited power and energy budget [1]–[6]. remarkable attention from researchers and a significant num-
Many applications do not always require perfect computa- ber of approximate adder designs have been presented in the
tion accuracy [7]–[9]. For example, multimedia processing technical literature [13]–[35]. We will review some existing
that involves human senses is error-tolerant. In other words, approximate adders in Section II.
humans usually do not perceive the output quality degrada- Approximate adders can be classified as block-based and
tion caused by computation errors on these applications, and full adder (FA)-based designs. Block-based approximate
adders split an entire adder into smaller multiple sub-adders
The associate editor coordinating the review of this manuscript and that perform partial additions concurrently [22]–[29]. The
approving it for publication was Cihun-Siyong Gong . main idea of this approach is to cut a long carry propagation

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 119939
J. Lee et al.: Novel Approximate Adder Design

chain to achieve faster additions. However, it requires bits (LSBs) of a multibit adder [13], [19], and many of their
more area and power than FA-based approximate adders. variants were presented so far [14]–[18], [20], [21].
FA-based adders use approximate 1-bit FAs to add some The LOA consists of two parts: an accurate part and
lower-order input bits approximately by replacing accurate an inaccurate part [13]. The former part uses a traditional
FAs with approximate ones in the corresponding bit positions precise adder, such as the ripple carry adder (RCA) and
[13]–[21]. This improves the area and power performance at carry-lookahead adder (CLA), to calculate the most sig-
the expense of the computation accuracy degradation. nificant bits (MSBs) with no computation error. Whereas,
In this paper, we propose a new approximate adder design the latter part only uses an OR operation to approximately
based on new approximate FA cells, enhanced carry pre- obtain LSB summations. Furthermore, the output of an AND
diction, and a constant truncation with error reduction. The operation for the MSB input pair of the inaccurate part is
proposed carry prediction scheme significantly reduces the utilized as a carry input to the accurate part to improve overall
prediction error rate by up to 75% compared to existing computation accuracy. Design variants based on the LOA
approximate adders considered here. Also, the truncation have been proposed to further optimize the LOA, such as
with error reduction logic enhances the overall computation LOA without the AND-based carry prediction (LOAWA),
accuracy while reducing energy and power consumption. optimized lower-part constant OR adder (OLOCA), hardware
When implemented in a 32-nm CMOS technology, the pro- optimized and error reduced approximate adder (HOERAA),
posed adder is 1.49×, 1.90×, and 3.12× better area-, power- hardware optimized adder having a near-normal error distri-
, and energy-efficient, respectively, than a traditional adder. bution (HOAANED), and hybrid error reduction lower-part
Furthermore, compared to existing approximate adders, our OR adder (HERLOA) [14]–[18]. The LOAWA is identical to
adder improves overall computation accuracy by up to 98.9%. the LOA, except for the AND-based carry prediction [14].
When jointly analyzing the adders in terms of hardware and In other words, the carry input to the accurate part is fixed
accuracy, the proposed design is the most competitive among to a constant ‘‘0,’’ which degrades accuracy but improves
the adders considered. the computation speed. The OLOCA is also similar to the
In summary, this paper makes the following key contribu- LOA in that the OR operation is utilized for the inaccurate
tions in designing approximate adders: part approximation, but it outputs a constant ‘‘1’’ to a few
• We present a novel efficient approximate adder design LSBs regardless of the corresponding bit inputs [15]. This
that effectively trades off between hardware cost and also degrades accuracy a bit while reducing hardware cost.
computation accuracy through systematic analysis, and In addition to the OLOCA, the HOERAA uses the OR oper-
prove that our design outperforms the others by exten- ation for two MSB input pairs of the inaccurate part and sets
sively comparing it with 12 approximate adders. the remaining LSB outputs to a constant ‘‘1’’ regardless of the
• We propose 1) a new carry prediction scheme that inputs [16]. For the MSB output of the inaccurate part, it uses
reduces the prediction error rate by up to 75% compared a 2-to-1 multiplexer to select ‘‘0’’ or an OR operation output
to the others, 2) approximate FA cells that improves of the corresponding input pairs. The multiplexer output is
accuracy, and 3) a constant truncation with an error then used in an OR operation with the AND gate output
reduction scheme that reduces hardware cost while of the second MSB input pair of the inaccurate part. Also,
offering good accuracy performance. it includes an AND-based carry prediction for the accurate
The remainder of this paper is organized as follows. part, which also serves as the selection input of the multi-
Section II provides a brief review of existing approximate plexer. The HOAANED is derived from the HOERAA by
adders. In Section III, we present the proposed adder, which including one additional OR gate at the MSB of the inaccurate
consists of our proposed approximate FAs, novel carry pre- part [17]. This OR gate contributes to the improvement of
diction, and constant truncation with error reduction. Illus- an error metric, and thus, the HOAANED produces outputs
trated examples of the adder operation and mathematical with almost normal error distributions. To enhance over-
analysis of the carry prediction error rate and overall error rate all computation accuracy, the HERLOA combines the basic
are also provided. Then, Section IV explains the experimental LOA structure with the hybrid error reduction scheme [18].
results and systematic analysis of the proposed adder as well Figure 1 shows the architecture of the inaccurate part of the
as extensive comparison with the 12 existing approximate HERLOA. Note that the accurate part is the same as the LOA
adders. Also, a joint analysis of the adders in hardware and (i.e., precise adder). When the second MSB input pair of the
accuracy aspects is presented. In Section V, the application inaccurate part is both ‘‘1,’’ error reduction logic decreases
of the approximate adders to digital image processing and the error distance (ED) by investigating the MSB input pair.
machine learning are presented. Finally, Section VI concludes The grayed gates in Figure 1 are the hybrid error reduction
the work. logic, while the others are the LOA logic. The error rate is
reduced by replacing an OR gate at the MSB in the LOA with
II. RELATED WORKS an XOR gate in the HERLOA.
The lower-part OR adder (LOA) and error tolerant adder The ETAI, like the LOA, divides an adder into two
I (ETAI) are two representative approximate adders imple- parts [19]. The inaccurate part of the ETAI utilizes its
mented using an approximate FA for the least significant own modified XOR operation instead of the traditional

119940 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

own error recover unit (BCSAERU ) can improve the accuracy


of the original BCSA without increasing the delay when an
error occurs in a certain condition.

III. PROPOSED APPROXIMATE ADDER


This section presents our proposed FA cell-based approxi-
mate adder, which exploits a novel carry prediction scheme
and a constant truncation technique to reduce the ED and
improve overall computation accuracy. We call our adder the
error reduced carry prediction approximate adder (ERCPAA).
FIGURE 1. Architecture of inaccurate part of HERLOA [18].
We denote two n-bit input operands and one n-bit output of
the adder as An−1:0 , Bn−1:0 , and Sn−1:0 , respectively. Also,
Ai , Bi , and Si represent the (i)th LSBs of An−1:0 , Bn−1:0 , and
Sn−1:0 , respectively.

A. OVERALL ADDER ARCHITECTURE


Figure 3 shows the overall hardware architecture of the
proposed approximate adder with n-bit inputs. An n-bit
adder is divided into two parts: a k-bit accurate part and an
(n − k)-bit inaccurate part, where k < n. The accurate
part simply consists of a k-bit precise adder that produces
an accurate output (i.e., Sn−1:n−k ) from k MSB inputs (i.e.,
An−1:n−k and Bn−k:n−k ) and a carry input (i.e., Cin ). The
FIGURE 2. Block diagram of SETA [21].
inaccurate part uses some of the remaining LSB inputs to
generate an approximate output and a carry input to the
OR operation. Note that the ETAI also uses a precise adder precise adder. Note that the sizes of the accurate part and
for MSBs additions. Furthermore, the carry prediction for the inaccurate part do not have to be equal, and the precise adder
accurate part is a key difference from the LOA. The lack can be implemented in any type of traditional adders, such
of prediction reduces accuracy while improving the speed. as RCA and CLA. The inaccurate part is further divided into
The carry predicting ETA (CPETA) was presented to improve three parts: an array of the proposed approximate FA cells,
computation accuracy by including a carry prediction scheme a carry prediction logic, and a constant truncation with error
to the accurate part [20]. The CPETA adopts the AND-based reduction logic. The proposed FA cell (see blue-highlighted
carry prediction, which is the same as the LOA. Figure 2 box) simplifies the conventional single-bit FA cell to produce
shows a block diagram of the simplified ETA (SETA), which an approximate summation and an approximate carry, and is
optimizes the ETAI’s modified XOR operation to reduce placed in some higher-order bit positions of the inaccurate
hardware costs without a significant accuracy loss [21]. The part. The carry prediction logic, which is highlighted in green,
modified XOR operation of the ETAI checks the input pairs generates the carry input to the precise adder. While most
from the MSB to LSB direction of the inaccurate part to check FA-based approximate adders employ an AND operation
that both bits of the corresponding input pair are ‘‘1’’ whereas with the MSB inputs of the inaccurate part to produce the
the SETA only checks a specific input pair to test if both bits carry input, our prediction logic leverages the two MSB
are ‘‘1.’’ This reduces hardware costs compared to the ETAI inputs to improve carry prediction accuracy at the cost of
without significant accuracy degradation. two additional logic gates. The constant truncation with error
Different from the LOA, ETAI, and their variants that split reduction logic highlighted in red sets l LSB outputs (i.e.,
an adder into two parts, other approximate adder structures Sl−1:0 ) to either a constant ‘‘0’’ or ‘‘1’’ to reduce hardware
have been presented in the literature as well. The recon- costs depending on input conditions. In other words, the l
figurable approximate CLA (RAP-CLA) comprises several LSB inputs are not used to generate approximate summations.
small sized blocks (i.e., windows) that are overlapped each It also assigns the other output bits except for the MSB of
other to predict the carry of each bit position [22]. In other the inaccurate part to a constant ‘‘0’’ to reduce the ED under
words, each carry is speculated by a sub-block to reduce certain input conditions. We will describe the condition to
the critical path delay by cutting the long carry propaga- determine to fix the output bits to ‘‘0’’ or ‘‘1’’ with illustrative
tion path. The block-based carry speculative approximate examples in Section III-D.
adder (BCSA) includes a number of non-overlapped blocks,
each of which consists of a sub-adder, a carry predict unit, B. PROPOSED APPROXIMATE FULL ADDER
a select unit, and a multiplexer [23]. Each block’s carry is An FA is the key building block for carry propagate adders
predicted by either the carry predict unit or the sub-adder and (e.g., RCA). The traditional 1-bit FA adds two inputs, Ai and
selected by the selection unit. Additionally, the BCSA with its Bi , as well as a carry from the previous bit position Ci−1 and

VOLUME 9, 2021 119941


J. Lee et al.: Novel Approximate Adder Design

FIGURE 3. Overall hardware architecture of the proposed approximate adder, termed error reduced carry prediction approximate adder (ERCPAA).

produces a sum Si and a carry output Ci using TABLE 1. Truth table for traditional FA and proposed approximate FAs.

Si = Ai ⊕ Bi ⊕ Ci−1 (1)
Ci = Ai Bi + Ai Ci−1 + Bi Ci−1 (2)

Although the FA requires two XOR gates to generate a


sum, we replace the XOR gates with OR counterparts to
do the same approximately to reduce hardware overhead in
our approximate FA. In addition, the FA generates a carry
output Ci from not only the two inputs, Ai and Bi , but also
the carry from the previous bit position Ci−1 . In other words,
the carry of the previous bit position can be propagated to the
next bit position through the current FA, resulting in a long
critical path delay and degraded hardware performance in the
carry propagate adders. To reduce the critical path delay and
hardware overhead, we remove the dependency of the carry sum Si when both operands are ‘‘1’’ and the carry of the
from the previous bit position to generate the carry output in previous bit position is ‘‘1’’.
our FA. Thus, the Boolean equations of our approximate FA
are given by C. PROPOSED CARRY PREDICTION TECHNIQUE
The accurate part can take a carry input generated from the
Si,ERCPAA = Ai + Bi + Ci−1 (3) inaccurate part to improve overall computation accuracy at
Ci,ERCPAA = Ai Bi (4) the expense of a few logic gates [13], [15]–[18], [20]. The
AND-based carry prediction scheme, which has an error of
Consequently, the approximate part using the proposed approximately 25%, is widely adopted since it is easily imple-
FA cell does not form the carry propagation chain from the mented by performing an AND operation with the inaccurate
lower to the higher-order bit positions and thus the delay of part’s MSB inputs (i.e., An−k−1 AND Bn−k−1 ) to produce the
the approximate part is consistent, although the size of the carry input to the precise adder. In our proposed prediction,
approximate part is larger (i.e., k decreases under a given only two additional gates (i.e., an AND gate and an OR
n). Note that the MSB position of the inaccurate part has a gate in the green highlighted box in Figure 3) are utilized to
different configuration of the FA, which uses an XOR gate produce the carry input with twice the prediction accuracy
instead of the OR gate to generate the sum of the two input of the conventional AND-based one. Also, the inputs of the
operands Ai and Bi . This improves the overall computation MSB and its previous bit position of the inaccurate part (i.e.,
accuracy since the XOR-based FA gives a more accurate sum, An−k−1:n−k−2 and Bn−k−1:n−k−2 ) are exploited to predict the
and it also allows the carry prediction logic to produce a carry input. Let Pi denotes the propagate signal of the (i)th
more accurate carry input to the precise adder than the OR- bit position, and the carry from the previous bit position Ci−1
based FA. Table 1 depicts the truth table of the traditional and is propagated to the carry output Ci if the propagate signal is
proposed FAs. The proposed FAs introduce errors if either of ‘‘1,’’ defined as
the operands is ‘‘1’’ and the carry of the previous bit posi-
tion is ‘‘1.’’ The OR-based FA causes an additional error at Pi = Ai ⊕ Bi (5)

119942 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

Ci = Ci−1 if Pi = 1 (6)

Since our carry prediction scheme leverages the inputs of


two bit positions, a carry can be generated from either the
(n−k −1)th or (n−k −2)th bit position. If a carry is produced
in the (n − k − 1)th bit position, the carry input Cin is simply
Cn−k−1 . On the other hand, if a carry is generated in the (n −
k − 2)th bit position, the carry Cn−k−2 should be propagated
through (n − k − 1)th bit position to pass it to the accurate
part. Therefore, the carry input Cin is derived by

Cin = Cn−k−1 + Pn−k−1 Cn−k−2 (7)

where Ci is defined in (4). According to Equation (7), one


XOR, three AND, and one OR gates are required to generate
the carry input Cin . Cn−k−1 and Cn−k−2 can be obtained
from the proposed FAs in the corresponding bit positions and
Pn−k−1 can also be calculated using the XOR gate of the
FA in the MSB position of the inaccurate part. It is worth
noting that one of the reasons to replace the OR with an
XOR in the FA at the MSB is to generate a Pn−k−1 signal. FIGURE 4. Operations of the proposed adder; (a) constant ‘‘1’’ truncation
Therefore, we only need two additional gates (see green and (b) constant ‘‘0’’ truncation with error reduction.
box in Figure 3) to implement the proposed carry prediction
logic.
1 1 1 1 1 1 1 1
Since our carry prediction is achieve using the inputs of = + · + · · + · · · + n−k−3 ·
the two MSB positions, it is correct when a carry is generated 4  2 4 2 2 4 2 4
1 1
from any of these two bit positions. However, the carry pre- = 1 − n−k−2 (9)
diction would be incorrect when a carry is produced from any 2 2
lower-order bit position beyond the (n − k − 2)th bit position where Cn−k−3 , Pn−k−3 Cn−k−4 , Pn−k−3 Pn−k−4 Cn−k−5 , · · · ,
and this carry is propagated through the (n − k − 2)th and n−k−3
Q
Pi C0 are mutually exclusive. Therefore, the error rate
(n − k − 1)th bit positions. Assuming that the two operands A i=1
and B are bitwise independent, then the propagated signal and of the carry prediction of the proposed adder ERCP is given
carry are also bitwise independent. We denote an event that a by
carry is generated from (n − k − 3)th or any of its lower-order
bit positions by Eca : ERCP (n, k) = P(Pn−k−1 Pn−k−2 Eca )
= P(Pn−k−1 )P(Pn−k−2 )P(Eca )
Eca = Cn−k−3 1

1

+Pn−k−3 Cn−k−4 = 3 1 − n−k−2 (10)
2 2
+Pn−k−3 Pn−k−4 Cn−k−5 Note that Pn−k−1 , Pn−k−2 , and Eca are independent.
n−k−3
Y
+··· + Pi C0 (8) D. CONSTANT TRUNCATION WITH ERROR REDUCTION
i=1
The proposed adder outputs a constant to a few LSBs to
where Ci and Pi are defined in (4) and (5), respectively, and reduce hardware overhead by sacrificing overall accuracy
the probability of this event is given by slightly since the lower-order outputs have relatively less
impact on the accuracy than higher-order outputs [15]–[17],
P(Eca ) = P(Cn−k−3 ) [33]–[35]. Figure 4 exhibits an example of constant trunca-
+P(Pn−k−3 Cn−k−4 ) tion operations with error reduction using the adder design
+P(Pn−k−3 Pn−k−4 Cn−k−5 ) parameters n = 16, k = 8, and l = 4. As shown
n−k−3 in Figure 4(a), our adder sets the l LSB outputs to ‘‘1’’
regardless of the inputs of the corresponding bit positions.
Y
+ · · · + P( Pi C0 )
i=1
When a carry is generated from (n − k − 2)th bit position and
= P(Cn−k−3 ) then propagated through (n − k − 1)th bit position, our adder
performs error reduction. In short, the reduction is performed
+P(Pn−k−3 )P(Cn−k−4 )
when Pn−k−1 Cn−k−2 = 1. Under this given input condition,
+P(Pn−k−3 )P(Pn−k−4 )P(Cn−k−5 ) the correct output of (n − k − 1)th bit is ‘‘0,’’ however, our
+ · · · + P(Pn−k−3 ) · · · P(P1 )P(C0 ) FA produces ‘‘1’’ as the output at this bit position as shown

VOLUME 9, 2021 119943


J. Lee et al.: Novel Approximate Adder Design

in Figure 4(b). This means that the approximate summation IV. EXPERIMENTAL RESULTS AND COMPARISON
will be larger than the correct one in this case. Instead of In this section, we evaluate the performance of the proposed
forcing the l LSB outputs to ‘‘1,’’ hence, the proposed adder adder in terms of both hardware costs and computation accu-
sets all outputs of the inaccurate part except for its MSB racy through systematic analysis. Also, an extensive compar-
position to ‘‘0,’’ (i.e., Sn−k−2:0 = 0) making the approxi- ison with other existing approximate adders is presented to
mation output closer to the correct addition. Under the given demonstrate the potential benefits of the proposed adder.
input shown in Figure 4(b), the ED, defined by |Sapproximate −
Scorrect | where Sapproximate and Scorrect are approximate and
A. EXPERIMENT SETUP AND EVALUATION
correct summations, respectively, decreases from 211 to 84.
We designed our adder in Verilog HDL and synthesized it
This reduction technique allows up to a 2n−k−1 − 1 decrease
with the Synopsys 32-nm generic library (SAED32) using
in the ED.
Synopsys Design Compiler to examine the hardware char-
acteristics of the proposed approximate adder in terms of
E. ERROR RATE ANALYSIS
area, delay, power, and energy [36]–[38]. We implemented
The proposed adder generates an output error when two input a 16-bit adder using an 8-bit RCA-based precise adder (i.e.,
operands Ai and Bi of any bit position from (n − k − 2)th to n = 16 and k = 8). Prior studies suggested that a size
(l)th LSBs are both ‘‘1.’’ In other words, if the inputs of at least of 7 to 9 bits for the inaccurate part would be appropriate
one OR-based FA are both ‘‘1,’’ an error occurs. According to obtain a good tradeoff between output quality and power
to Table 1, the OR-based FA produces an incorrect output at and energy saving for practical applications, such as video
sum Si when Ai = 1 and Bi = 1, whereas the XOR-based and image processing, and a 16-bit adder was widely adopted
counterpart does not. This input condition generates a carry in these applications [7], [9], [23], [32], [39]. Therefore,
generation for the next bit position, which results in an output we chose the adder design parameters of n = 16 and k =
error at the sum of the next bit position. Furthermore, an error 8. In addition to the hardware cost, we also analyzed the
occurs when both the inputs of any bit position at the constant error characteristics of the proposed adder by developing a
truncation part (i.e., (l − 1)th to (0)th bit position) are either software-based simulator. To exhaustively test a 16-bit adder,
‘‘0’’ or ‘‘1’’ because the part fixes the output to ‘‘1.’’ In other 232 distinct input pairs can be considered but it is extremely
words, the constant truncation part output is always correct intensive to compute. Therefore, we use 10 million (i.e., 107 )
when either of the two inputs is ‘‘1.’’ To simplify the error rate input pairs, each of which was uniformly distributed random
analysis, we first calculate a probability of the input condition input, to the proposed adder to obtain the error characteristics
to make the output of the adder correct, and then the error measured by various error metrics, such as the overall error
rate can be achieved by obtaining its complement. Since the rate, carry prediction error rate, mean error distance (MED),
proposed adder produces correct outputs when Ai 6 = 1 and normalized mean error distance (NMED), and mean relative
Bi 6 = 1 where l ≤ i ≤ n − k − 2 and Ai 6 = Bi where 0 ≤ error distance (MRED).
i ≤ l − 1, we can define an event Eco that the adder generates
always correct outputs as follows:
B. TRADEOFF ANALYSIS OF THE PROPOSED ADDER
n−k−2 l−1
Y  Y  In our proposed design, the area, power, and energy perfor-
Eco = Ai Bi · Ai Bi + Ai Bi (11) mance degrade as the design parameter l decreases because a
i=l i=0 smaller l requires more logic gates to implement the adder.
and the probability of this event is given by However, as l decreases, the overall computation accuracy
performance improves. The power-NMED product was intro-
n−k−2 l−1
Y Y duced to assess approximate adders considering the power
P(Eco ) = P( Ai Bi )P( (Ai Bi + Ai Bi )) and accuracy performance together [40]. Since this metric
i=l i=0 does not consider the area aspect, we can consider a new
= P(An−k−2 Bn−k−2 ) · · · P(Al Bl ) joint metric, the area-NMED product, to analyze the area
× P(Al−1 Bl−1 + Al−1 Bl−1 ) · · · P(A0 B0 + A0 B0 ) and accuracy performance collectively. Similarly, the power-
 n−k−l−1  l
3 1 MRED and area-MRED products can be employed to jointly
= (12) analyze the costs and accuracy.
4 2
To seek the best tradeoff between the hardware cost
Note that we assumed that the two input operands A and B and accuracy of our adder, we adjusted design parameter l
are bitwise independent. The error rate of the proposed adder and obtained the power-NMED/MRED products and area-
is the probability of the complement of the event. Therefore, NMED/MRED products. It is noteworthy that the delay is
the error rate ERERCPAA is given by consistent, although l varies, and thus we exclude the delay
for the tradeoff analysis. Figure 5 shows the tradeoff of the
ERERCPAA (n, k, l) = 1 − P(Eco ) hardware costs and accuracy for the proposed adder with
 n−k−l−1  l
3 1 various values of l. We varied the design parameter l from
= 1− (13)
4 2 1 to 6 because our adder requires at least two FAs at the

119944 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

FIGURE 5. Tradeoff analysis between hardware costs and accuracy for the
proposed 16-bit adder with various values of l , ranging from 1 to 6.
FIGURE 6. Comparison of carry prediction error rates of approximate
adders.
TABLE 2. Accuracy performance of the proposed adder under various
values of the design parameters.

of the proposed adder at various values of the parameters.


Here, we made the approximate part of the adder to have three
non-constant bits according to the previous tradeoff analysis
and thus parameter l was set to n − k − 3. As the parameter
k increases, in other words, the size of the accurate part
increases, at a given n, the accuracy performance gets better
in terms of error rate, MED, and MRED as expected. The
error rate drastically gets worse as k decreases and quickly
reaches almost 100%. The MED and MRED values increase
more than 15× and 11×, respectively, when the parameter k
(n − k − 1)th and (n − k − 2)th bit positions to produce decreases by 4 at the given k = 32.
the carry input (i.e., l = 6) and at least one constant trun-
cation bit (i.e., l = 1). As expected, the power and area D. PERFORMANCE COMPARISON WITH EXISTING
become better and the NMED does worse as l increases. ADDERS
Specifically, the power dissipations at l = 1 and l = 6 are We also designed nine existing approximate adders that have
40.6µW and 35.3µW , respectively, and the area occupations similar architectures (LOA, LOAWA, OLOCA, HOERAA,
are 150.4µm2 and 126.7µm2 , respectively. On the contrary, HOAANED, HERLOA, ETAI, CPETA, and SETA) and an
the NMED degrades from 0.864 × 10−3 at l = 1 to 0.974 × accurate adder (RCA) to compare them with our adder. To be
10−3 at l = 6. To effectively see the tradeoffs, the product fair, the adders were synthesized with the same 32-nm library
values are normalized using the corresponding value of the using Synopsys Design Compiler, and 16-bit adders with an
adder with l = 1. Note that the lower the product value 8-bit RCA-based precise adder were implemented. For the
is, the better the tradeoff between the hardware costs and OLOCA and SETA, design parameters of l and i were set
accuracy. According to the power-NMED and area-NMED to 6 and 7, respectively [15], [21]. Also, the error metrics
products in Figure 5, the proposed adder has the best trade- were extracted under 10 million uniformly generated ran-
off performance at l = 5, which means a 5-bit constant dom input pairs. Furthermore, three more approximate adders
truncation. In fact, from the power-MRED and area-MRED that employ different architectures (RAP-CLA, BCSA, and
products’ perspective, the best tradeoff of the adder is found BCSAERU ) were designed using the identical design method-
at l = 4. While the power-/area-MRED products at l = 4 and ology and included in the comparison for completeness. The
l = 5 are almost the same, the value difference between the 16-bit adders with an RCA-based sub-adder and its block size
NMED counterparts at l = 4 and l = 5 are relatively larger of 4 were used for these adders [22], [23].
than that of the MRED counterparts. Therefore, we use our First, to demonstrate the superiority of our carry prediction
16-bit adder design with a parameter of l = 5 for comparison technique, we compare the carry prediction error rate of the
with other adders (i.e., n = 16, k = 8, and l = 5). approximate adders, as shown in Figure 6. Note that the
RAP-CLA, BCSA, and BCSAERU were excluded for this
C. ACCURACY OF THE PROPOSED ADDER WITH comparison because they do not have the carry to the precise
DIFFERENT PARAMETERS adder due to a different architecture. The absence of the
To examine the accuracy performance of the proposed carry prediction in the LOAWA, ETAI, and SETA results
adder under different adder sizes and design parameters, in a nearly 50% carry prediction error. Note that the carry
we adjusted parameters k and l of the proposed 32-bit adder input to the precise adder is set to ‘‘0’’ in these adders.
(i.e., n = 32). Table 2 lists the error rate, MED, and MRED The LOA, OLOCA, HOERAA, HOAANED, HERLOA, and

VOLUME 9, 2021 119945


J. Lee et al.: Novel Approximate Adder Design

TABLE 3. Performance summary of various adders.

CPETA include the AND-based carry speculation to the pre-


cise adder, which reduces the error rate to approximately
25%. The proposed prediction scheme is the most accurate
among all adders and has an error rate of 12.305%, which
is identical to the one calculated using Equation (10). Fur-
thermore, our adder achieves error rate reductions of 75.3%
and 50.4% on average compared with the adders without
carry prediction and with the AND-based carry prediction,
respectively.
Table 3 summarizes the hardware costs and accuracy per-
formance of various adders. The RCA is the slowest adder
because of the long carry propagation chain from the LSB
to MSB. The longest delay results in the largest energy FIGURE 7. Comparison of mean relative error distances (MREDs) of
(i.e., power-delay product; PDP) consumption although the approximate adders.

BCSAERU dissipates the largest power. The RAP-CLA is


the fastest thanks to the relatively shorter carry chain gen- among the approximate adders due to the proposed carry
erated by the blocks but occupies the largest area because prediction scheme and causes a relatively larger energy con-
a significant number of blocks is required to predict the sumption, and it still has 3.12× higher energy efficiency than
carry for each bit position. Although the RAP-CLA consumes the RCA. The LOA, LOAWA, ETAI, and SETA have the
the second largest power, its energy consumption is relatively same error rate of 90.0%, but the LOA has at least 61%
small and similar to that of the LOA, ETAI, their variants, better MED and NMED performance than others. The LOA
and the proposed adder ERCPAA. The BCSA and BCSAERU variants that force a few LSB outputs to ‘‘1’’ (i.e., OLOCA,
have almost the same delay but BCSAERU has slightly larger HOERAA, and HOAANED) degrade the error rate compared
area, power, and energy consumption than the BCSA while it to the LOA, which is up to 99% while maintaining a simi-
shows slightly better accuracy performance. The adders that lar MED and NMED performance. The RAP-CLA, BCSA,
fix some LSB outputs to ‘‘1’’ and have AND-based carry and BCSAERU show very good error rate performance less
prediction (i.e., OLOCA, HOERAA, and HOAANED) show than 22% but relatively poor MED and NMED performance
similar hardware characteristics. Furthermore, the HOERAA than the others. These adders can cause computation errors
and HOAANED are almost the same in both hardware and on the higher-order bit positions, whereas errors of other
accuracy performance. The lack of a carry prediction to the FA-based approximate adders concentrate on the lower-order
accurate part allows the corresponding adders (i.e., LOAWA, bit positions (i.e., approximate part). Although the BCSAERU
ETAI, and SETA) to be the fastest among the FA-based has the lowest error rate among the approximate adders,
adders, however, it results in poor MED and NMED per- the proposed adder has the best MED and NMED per-
formance compared to other FA-based approximate adders. formance. Specifically, the proposed adder shows 4.09×
In terms of area and power, the proposed adder ERCPAA and 4.1× greater MED and NMED than the BCSAERU ,
is comparable to the HERLOA. It has the longest delay respectively.

119946 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

FIGURE 8. Normalized power-normalized mean error distance (NMED), energy-NMED, and area-delay product (ADP)-NMED products of
approximate adders.

Figure 7 shows the MREDs of the approximate adders. To compare these products effectively among the adders, they
To effectively make the comparison, the MRED values were were normalized by the corresponding LOA values, and the
normalized using the corresponding values of the LOA. The values were inserted outside the bars. Undoubtedly, the pro-
MREDs of the LOAWA, OLOCA, ETAI and SETA are posed adder outperforms the other approximate adders in all
slightly greater than that of the LOA. In particular, the MRED these joint metrics. The RAP-CLA, BCSA, and BCSAERU
values of the RAP-CLA, BCSA, and BCSAERU far exceed show very poor tradeoff performance and the three product
those of the others, and their values were inserted outside the values far exceed the other adders because they are a bit
bars. Specifically, the RAP-CLA exhibits the worst MRED faster but consume a larger area, power, and energy than the
performance, which is 52.02× greater than the LOA. The other adders as shown in Table 3. Also, they exhibit relatively
HOERAA and HOAANED have almost identical MRED val- worse accuracy that deteriorates the tradeoff performance.
ues, which are 27.5% less than the LOA on average. In terms Among the FA-based approximate adders, the ETAI has the
of MRED performance, the proposed adder is comparable to worst tradeoff performance and the approximate adders that
the HERLOA and CPETA. Specifically, the proposed adder exclude the carry prediction (i.e., LOAWA and SETA) have
reduces the MRED by 41.2% and 98.9% compared to the similar values with the ETAI. Although the lack of carry pre-
LOA and RAP-CLA, respectively. diction allows these adders to be relatively efficient in terms
of area, delay, and power, the poor accuracy degrades the
E. JOINT ANALYSIS BETWEEN HARDWARE AND overall tradeoff performance so that their three product values
ACCURACY OF APPROXIMATE ADDERS are at least 50% higher than those of the LOA. The OLOCA
The error rate is an important metric to assess the accuracy and CPETA have similar power-NMED and energy-NMED
of approximate adders. Unfortunately, its usefulness to eval- products, which are slightly better than the LOA. In addition,
uate the adder might be limited because it only considers the HOERAA and HOAANED are comparable in all three
the presence of an error but not the implication (e.g., dis- products because of almost identical hardware architecture.
tance/magnitude) of the error on the additions [40]. Hence, The HERLOA is nearly the same power-NMED and energy-
we adopted ED based metrics, such as NMED and MRED, NMED products as the HOERAA and HOAANED. However,
to better represent the accuracy of the adders rather than the the larger area occupation stems from the hybrid error reduc-
error rate in the joint analysis. The power-NMED product is tion scheme results in a higher ADP-NMED product. In sum-
widely used to evaluate approximate adders in terms of power mary, our adder has the best tradeoff performance among
and accuracy jointly [40]. Similarly, the energy-NMED prod- the compared approximate adders. Specifically, the power-
uct was considered to analyze the energy aspect [18]. Unfor- NMED, energy-NMED, and ADP-NMED products of the
tunately, neither of these two products do not includes the proposed adder are 95.7%, 91.1%, and 93.2% lower than
area or delay of approximate adders. The area-delay prod- those of the RAP-CLA, respectively.
uct (ADP) is a widely employed metric to evaluate hardware Similar to the joint metrics using NMED, we can take
resources in terms of area and delay [15]. Therefore, we can into account the metrics using MRED as well. Figure 9
consider a new joint metric, the ADP-NMED product, to ana- shows the power-MRED, energy-MRED, and ADP-MRED
lyze the tradeoff between area, delay, and accuracy. products for the approximate adders. The values that were
Figure 8 exhibits the power-NMED, energy-NMED, added outside the bars were normalized by the corresponding
and ADP-NMED products for 13 approximate adders. LOA values, and the three products using MRED exhibit

VOLUME 9, 2021 119947


J. Lee et al.: Novel Approximate Adder Design

FIGURE 9. Normalized power-normalized mean relative error distance (MRED), energy-MRED, and area-delay product (ADP)-MRED
products of approximate adders.

similar trends to those using NMED. The RAP-CLA, BCSA,


and BCSAERU have significantly larger product values than
the others. Specifically, the power-MRED, energy-MRED,
and ADP-MRED products of these adders are at least 30×,
23×, and 25× greater than those of the LOA, respectively.
Among the FA-based adders, the LOAWA shows the worst
performance in the products using MRED, and the values
of the ETAI and SETA are close to those of the LOAWA.
The LOA and OLOCA have a similar tradeoff performance
that the gap between the product values is less than 7%. The
proposed adder demonstrates excellent tradeoff performance
and is comparable to the HOERAA, HOAANED, HERLOA,
and CPETA in all three products. Particularly, our adder
achieves reductions in the power-MRED, energy-MRED, and
ADP-MRED products, respectively, of 99.4%, 98.9%, and FIGURE 10. Normalized figure of merit (FoM) of approximate adders;
99.1% of the RAP-CLA. energy-delay-area-normalized mean error distance (NMED) product.
Finally, to evaluate the approximate adders in terms of the
various hardware costs together with the accuracy perfor- is similar but shows better FoM performance than the
mance (i.e., NMED), we define the following product as a OLOCA, HOERAA, and HOAANED. Obviously, the excel-
figure of merit (FoM) for the approximate adders. lent design tradeoff between hardware costs and accuracy
(i.e., NMED) allows our design to be the most competitive
FoM = Energy × Delay × Area × NMED (14)
adder among the approximate adders considered here. Partic-
In (14), the better energy efficiency, higher speed, and ularly, the FoM of our adder is 93.05% smaller than that of the
smaller area with good accuracy performance in the error RAP-CLA.
distance for the approximate adders result in a smaller value
for this FoM. Figure 10 exhibits the FoM of the approx- V. APPLICATIONS OF APPROXIMATE ADDERS
imate adders, and the values were normalized by the cor- Approximate adders can be utilized in many error-tolerant
responding values of the LOA. Note that the lower the applications. To examine the effectiveness of the proposed
FoM value is, the better the approximate adder performance. adder in practical applications, we adopted our adder and
The LOA, HERLOA, and CPETA have similar FoM val- existing approximate adders in a couple of applications and
ues, and so do the LOAWA, ETAI, and SETA. Unfortu- compared their performance.
nately, the FoMs of the RAP-CLA, BCSA, and BCSAERU are
much greater than those of the other FA-based approximate A. DIGITAL IMAGE PROCESSING
adders and the numbers outside the bars indicate their FoM First, the approximate adders were applied to digital image
values. Specifically, the normalized FoM of these adders processing. Particularly, we considered Gaussian smooth-
reaches greater than 10 because the poor NMED performance ing filtering, which is achieved by a 2-D convolution of
severely deteriorates the FoM, even though they are rela- an image and Gaussian kernel and used the peak signal-to-
tively faster than the others. The proposed adder ERCPAA noise ratio (PSNR) to measure the output image quality. The

119948 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

FIGURE 11. Original input image and output images with PSNRs of Gaussian smoothing filtering using an accurate adder, the existing
approximate adders, and the proposed adder.

TABLE 4. PSNRs of various images by Gaussian smoothing filtering using the approximate adders.

following 5 × 5 Gaussian kernel G is used for filtering [39]. For the Gaussian smoothing operation, the addition was
performed using an accurate adder as well as the proposed
and existing approximate adders, whereas multiplication
 
1 3 6 3 1
3 15 25 15 3 and division were performed accurately. Additionally, since
1  
Gaussian smoothing filtering is useful to reduce image noise,
G= 8 6 25 41 25 6 (15)
2 3 15 25 15

3 we added zero-mean, Gaussian white noise with a variance
1 3 6 3 1 of 0.01 to the original lena image, which is a grayscale image

VOLUME 9, 2021 119949


J. Lee et al.: Novel Approximate Adder Design

FIGURE 12. Original data and clustered data with WCSS by k-means clustering using various adders.

119950 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

with a size of 512 × 512, and then performed filtering [41]. points to the centroids of the corresponding clusters, which
We employed an accurate adder (RCA), the proposed adder, is defined by the within cluster sum of squares (WCSSs).
and 12 existing approximate adders in the filtering. The Therefore, it iteratively calculates the distances where the
PSNR values were calculated against the images obtained subtraction operation is mainly used in this algorithm.
by applying Gaussian filtering to the original input image We applied the approximate adders to the operation [28].
using the accurate adder. First, the approximate adders with Note that the subtraction can be done by 2’s comple-
design parameters of n = 16 and k = 8 were applied to the ment addition. We obtained an unlabeled dataset comprising
filtering and we found out that all approximate adders, except 1000 data points from [44] and set the number of clusters k
for the RAP-CLA, BCSA, and BCSAERU whose block sizes to 5.
were set to 4, produce visually very similar output images, Figure 12 demonstrates the visualized 2-D original dataset
although our adder generates the best image quality with the and clustered dataset using the accurate adder, the existing
highest PSNR. Therefore, to make the output images more approximate adders, and the proposed adder. The WCSS
visually distinguishable, we reduced the size of the accurate values were extracted to evaluate the quality of the clustering
part to 3 and the block size of the approximate adders by results using the difference adders [28]. The value closer to
half. the one clustered by the accurate adder indicates a better
Figure 11 shows the original noisy image and output clustering result. The LOAWA, ETAI, and SETA show a
images of Gaussian smoothing filtering using various adders. similar clustering result, and so do the LOA and OLOCA.
The BCSA shows the worst PSNR value of 8.20dB among the The LOA/OLOCA produce much better clustering quality
images. The PSNR value of 8.25dB is identical to the output than the LOAWA/ETAI/SETA because the latter does not
images processed by the LOAWA, ETAI, and SETA. Simi- include any carry prediction logic to the precise adder and
larly, the LOA and OLOCA generate the same output image this degrades computation accuracy. The proposed approxi-
quality. The PSNRs of images with the HOERAA, CPETA, mate adder exhibits the best clustering result closest to the
and BCSAERU range from 9.83dB to 10.93dB. In other words, one using the accurate adder. The HOERAA, HOAANED,
the image quality processed by these adders is between those HERLOA, and CPETA yield slightly worse results than
processed by the LOA/OLOCA and LOAWA/ETAI/SETA. the proposed adder. Unfortunately, the RAP-CLA, BCSA,
The HOAANED, HERLOA, and RAP-CLA yield slightly and BCSAERU show poor clustering performance and do
better output images than the LOA/OLOCA. The proposed not allow the dataset to be partitioned properly. Specifi-
adder produces the best image quality distinctly seen in cally, the WCSS values of these adders are up to 384% and
human vision with a PSNR value of 20.84dB, which means 378% larger than those of the accurate and proposed adders,
that the filtered image is the closest to the one generated respectively. In summary, the proposed adder has the best
by the accurate adder. This confirms that the approximation performance in terms of WCSS in k-means clustering as
errors of the proposed adder have a negligible impact on the well.
processing quality and thus, it is suitable for digital image
processing applications. To further examine the approximate VI. CONCLUSION
adders in the application, we performed the Gaussian smooth In this paper, we presented a new approximate adder that
filtering for eight more well-known benchmark images (cam- combines error-reduced carry prediction and constant trun-
eraman, peppers, baboon, F-16, couple, fishing boat, clock, cation with error reduction schemes. The proposed carry
and airplane) obtained from [42]. Note that the same white prediction scheme achieves an error rate reduction of up
noise was added to these images. The PSNRs of the fil- to 75% compared to the existing approximate adder, and
tered output images generated by the approximate adders the proposed error reduction technique improves the over-
are listed in Table 4. All images exhibit a similar PSNR all computation accuracy by decreasing the error distance.
trend with the lena image. Evidently, our ERCPAA achieves We systematically analyzed our design and sought the best
the best PSNR value for all benchmark images among tradeoff between hardware costs and accuracy by adjust-
the approximate adders in the Gaussian smoothing filtering ing the adder design parameter. When implemented in the
application. 32-nm CMOS technology, the proposed design has 1.90×
and 3.12× greater power- and energy efficiency, respectively,
B. MACHINE LEARNING than the RCA, with NMED and MRED improvements of up
In addition to the filtering application, we also took machine to 91.4% and 98.9%, respectively, compared to the existing
learning into consideration to explore the efficacy of the approximate adders. Importantly, our design achieves 95.7%,
proposed adder. Specifically, we examined the performance 91.1%, and 93.2% reductions in the power-NMED, energy-
of the approximate adders in k-means clustering, which is NMED, and ADP-NMED products, respectively, compared
an unsupervised machine learning algorithm and extensively to the RAP-CLA due to an excellent design tradeoff. Our
utilized in data mining [43]. Basically, the algorithm groups adder also reduces the power-, energy-, and ADP-MRED
a set of unlabeled data points into k different clusters that products by up to 99.4% compared to the others. Particu-
each data point belongs to only one cluster. When clus- larly, in terms of the FoM considering hardware resources
tering, it minimizes the sum of distances between the data (i.e., energy, delay, and area) and the accuracy performance

VOLUME 9, 2021 119951


J. Lee et al.: Novel Approximate Adder Design

(i.e., NMED), the proposed adder is up to 93.05% better [16] P. Balasubramanian and D. L. Maskell, ‘‘Hardware optimized and
than the RAP-CLA. The proposed adder has been adopted error reduced approximate adder,’’ Electronics, vol. 8, no. 11, p. 1212,
Oct. 2019.
in a digital image processing application and proves that the [17] P. Balasubramanian, R. Nayar, D. L. Maskell, and N. E. Mastorakis,
proposed adder rarely affects the output image quality that ‘‘An approximate adder with a near-normal error distribution: Design, error
is the closest to the one with the accurate adder. Addition- analysis and practical application,’’ IEEE Access, vol. 9, pp. 4518–4530,
2021.
ally, we have demonstrated the performance of our adder [18] H. Seo, Y. S. Yang, and Y. Kim, ‘‘Design and analysis of an approximate
in a machine learning application and the result has shown adder with hybrid error reduction,’’ Electronics, vol. 9, no. 3, p. 471,
that the proposed adder outperforms the other approximate Mar. 2020.
[19] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, ‘‘Design of
adders. Therefore, the proposed adder is well applicable low-power high-speed truncation-error-tolerant adder and its application
to energy-efficient and error-tolerant applications, such as in digital signal processing,’’ IEEE Trans. Very Large Scale Integr. (VLSI)
machine learning, neuromorphic computing, and digital sig- Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010.
nal processing. [20] Y. Kim, ‘‘An accuracy enhanced error tolerant adder with carry prediction
for approximate computing,’’ IEIE Trans. Smart Process. Comput., vol. 8,
no. 4, pp. 324–330, Aug. 2019.
ACKNOWLEDGMENT [21] J. Lee, H. Seo, Y. Kim, and Y. Kim, ‘‘Approximate adder design with sim-
(Jungwon Lee and Hyoju Seo contributed equally to this plified lower-part approximation,’’ IEICE Electron. Exp., vol. 17, no. 15,
pp. 1–3, Aug. 2020.
work.) [22] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram,
‘‘RAP-CLA: A reconfigurable approximate carry look-ahead adder,’’
REFERENCES IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 65, no. 8, pp. 1089–1093,
Aug. 2018.
[1] L. Jiao and J. Zhao, ‘‘A survey on the new generation of deep learning in [23] F. Ebrahimi-Azandaryani, O. Akbari, M. Kamal, A. Afzali-Kusha, and
image processing,’’ IEEE Access, vol. 7, pp. 172231–172263, 2019. M. Pedram, ‘‘Block-based carry speculative approximate adder for energy-
[2] C. Lammie, A. Olsen, T. Carrick, and M. R. Azghadi, ‘‘Low-power and efficient applications,’’ IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67,
high-speed deep FPGA inference engines for weed classification at the no. 1, pp. 137–141, Jan. 2020.
edge,’’ IEEE Access, vol. 7, pp. 51171–51184, 2019.
[24] Y. Kim, Y. Zhang, and P. Li, ‘‘An energy efficient approximate adder
[3] Y. Kim, Y. Zhang, and P. Li, ‘‘A reconfigurable digital neuromorphic with carry skip for error resilient neuromorphic VLSI systems,’’ in
processor with memristive synaptic crossbar for cognitive computing,’’ Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2013,
ACM J. Emerg. Technol. Comput. Syst., vol. 11, no. 4, pp. 38:1–38:25, pp. 130–137.
Apr. 2015.
[25] Y. Kim, Y. Zhang, and P. Li, ‘‘Energy efficient approximate arithmetic for
[4] B. Liu, Z. Wang, W. Zhu, Y. Sun, Z. Shen, L. Huang, Y. Li, Y. Gong,
error resilient neuromorphic computing,’’ IEEE Trans. Very Large Scale
and W. Ge, ‘‘An ultra-low power always-on keyword spotting accelerator
Integr. (VLSI) Syst., vol. 23, no. 11, pp. 2733–2737, Nov. 2015.
using quantized convolutional neural network and voltage-domain analog
[26] A. B. Kahng and S. Kang, ‘‘Accuracy-configurable adder for approximate
switching network-based approximate computing,’’ IEEE Access, vol. 7,
arithmetic designs,’’ in Proc. 49th Annu. Design Autom. Conf. (DAC),
pp. 186456–186469, 2019.
2012, pp. 820–825.
[5] I. Khan, S. Choi, and Y.-W. Kwon, ‘‘Earthquake detection in a static
and dynamic environment using supervised machine learning and a novel [27] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, ‘‘A low latency generic
feature extraction method,’’ Sensors, vol. 20, no. 3, p. 800, Feb. 2020. accuracy configurable adder,’’ in Proc. 52nd Annu. Design Autom. Conf.,
Jun. 2015, pp. 81:1–81:6.
[6] Q. Wang, P. Li, and Y. Kim, ‘‘A parallel digital VLSI architecture for
integrated support vector machine training and classification,’’ IEEE Trans. [28] J. Hu, Z. Li, M. Yang, Z. Huang, and W. Qian, ‘‘A high-accuracy
Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 8, pp. 1471–1484, approximate adder with correct sign calculation,’’ Integration, vol. 65,
Aug. 2015. pp. 370–388, Mar. 2019.
[7] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, ‘‘Low-power digital [29] V. Camus, M. Cacciotti, J. Schlachter, and C. Enz, ‘‘Design of approx-
signal processing using approximate adders,’’ IEEE Trans. Comput.-Aided imate circuits by fabrication of false timing paths: The carry cut-
Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137, Jan. 2013. back adder,’’ IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 4,
[8] Y. S. Yang and Y. Kim, ‘‘Approximate digital leaky Integrate-and-fire pp. 746–757, Dec. 2018.
neurons for energy efficient spiking neural networks,’’ IEIE Trans. Smart [30] M. Pashaeifar, M. Kamal, A. Afzali-Kusha, and M. Pedram, ‘‘Approx-
Process. Comput., vol. 9, no. 3, pp. 252–259, Jun. 2020. imate reverse carry propagate adder for energy-efficient DSP applica-
[9] A. Raha, H. Jayakumar, and V. Raghunathan, ‘‘Input-based dynamic recon- tions,’’ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 11,
figuration of approximate arithmetic units for video encoding,’’ IEEE pp. 2530–2541, Nov. 2018.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 3, pp. 846–857, [31] N.-C. Huang, S.-Y. Chen, and K.-C. Wu, ‘‘Sensor-based approximate
Mar. 2016. adder design for accelerating error-tolerant and deep-learning applica-
[10] T. Moreau, A. Sampson, and L. Ceze, ‘‘Approximate computing: Making tions,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2019,
mobile systems more efficient,’’ IEEE Pervasive Comput., vol. 14, no. 2, pp. 692–697.
pp. 9–13, Apr. 2015. [32] L. Soares, M. da Rosa, C. Machado, E. da Costa, and S. Bampi,
[11] S. Mittal, ‘‘A survey of techniques for approximate computing,’’ ACM ‘‘Design methodology to explore hybrid approximate adders for
Comput. Surv., vol. 48, no. 4, pp. 1–33, May 2016. energy-efficient image and video processing accelerators,’’ IEEE
[12] Q. Xu, M. Todd, and S. K. Nam, ‘‘Approximate computing: A survey,’’ Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 6, pp. 2137–2150,
IEEE Design Test, vol. 33, no. 1, pp. 8–22, Feb. 2016. Jun. 2019.
[13] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, ‘‘Bio-inspired [33] H. Seo and Y. Kim, ‘‘A new approximate adder with duplicate-constant
imprecise computational blocks for efficient VLSI implementation of soft- scheme for energy efficient applications,’’ in Proc. IEEE Int. Conf. Con-
computing applications,’’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, sum. Electron. Asia (ICCE-Asia), Nov. 2020, pp. 1–2.
no. 4, pp. 850–862, Apr. 2010. [34] F. Frustaci, S. Perri, P. Corsonello, and M. Alioto, ‘‘Energy-quality scalable
[14] P. Albicocco, G. C. Cardarilli, A. Nannarelli, M. Petricca, and M. Re, adders based on nonzeroing bit truncation,’’ IEEE Trans. Very Large Scale
‘‘Imprecise arithmetic for low power image processing,’’ in Proc. Conf. Integr. (VLSI) Syst., vol. 27, no. 4, pp. 964–968, Apr. 2019.
Rec. 46th Asilomar Conf. Signals, Syst. Comput. (ASILOMAR), Nov. 2012, [35] H. Seo, Y. S. Yang, and Y. Kim, ‘‘An energy-efficient imprecise adder
pp. 983–987. with a lower-part constant approximation,’’ in Proc. Int. SoC Design Conf.
[15] A. Dalloo, A. Najafi, and A. Garcia-Ortiz, ‘‘Systematic design of (ISOCC), Oct. 2020, pp. 143–144.
an approximate adder: The optimized lower part constant-OR adder,’’ [36] H. Bhatnagar, Advanced ASIC Chip Synthesis: Using Synopsys Design
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8, Compiler Physical Compiler and Prime Time. Norwell, MA, USA:
pp. 1595–1599, Aug. 2018. Kluwer, 2002.

119952 VOLUME 9, 2021


J. Lee et al.: Novel Approximate Adder Design

[37] R. Goldman, K. Bartleson, T. Wood, K. Kranen, V. Melikyan, and HYOJU SEO (Graduate Student Member, IEEE)
E. Babayan, ‘‘32/28 nm educational design kit: Capabilities, deployment received the B.S. degree from the School of
and future,’’ in Proc. IEEE Asia Pacific Conf. Postgraduate Res. Micro- Computer Science and Engineering, Kyungpook
electron. Electron. (PrimeAsia), Dec. 2013, pp. 284–288. National University, Daegu, Republic of Korea,
[38] (Jan. 2012). Synopsys Digital Standard Cell Library SAED_EDK32/28_ in 2020, where she is currently pursuing the M.S.
CORE Databook Revision 1.0.0. Accessed: Jul. 27, 2021. [Online]. Avail- degree. Her research interests include approximate
able: https://www.synopsys.com/community/university-program/ computing, neuromorphic computing, deep learn-
teaching-resources.html
ing accelerator, and image processing.
[39] B. Garg and S. K. Patel, ‘‘Reconfigurable carry look-ahead adder trading
accuracy for energy efficiency,’’ J. Signal Process. Syst., vol. 93, no. 1,
pp. 99–111, Jan. 2021.
[40] J. Liang, J. Han, and F. Lombardi, ‘‘New metrics for the reliability of
approximate and probabilistic adders,’’ IEEE Trans. Comput., vol. 62,
no. 9, pp. 1760–1771, Sep. 2013.
[41] M. Masadeh, O. Hasan, and S. Tahar, ‘‘Input-conscious approximate HYELIN SEOK (Student Member, IEEE) is cur-
multiply-accumulate (MAC) unit for energy-efficiency,’’ IEEE Access, rently pursuing the integrated B.S. and M.S.
vol. 7, pp. 147129–147142, 2019. degrees with the School of Computer Science
[42] The USC-SIPI Image Database. Accessed: Jul. 27, 2021. [Online]. Avail- and Engineering, Kyungpook National University,
able: http://sipi.usc.edu/database/database.php Daegu, Republic of Korea. Her research interests
[43] K. P. Sinaga and M.-S. Yang, ‘‘Unsupervised K-means clustering algo- include approximate arithmetic and new comput-
rithm,’’ IEEE Access, vol. 8, pp. 80716–80727, 2020.
ing systems.
[44] Clustering Benchmark. Accessed: Jul. 27, 2021. [Online]. Available:
http://github.com/deric/clustering-benchmark

YONGTAE KIM (Member, IEEE) received the


B.S. and M.S. degrees in electrical engineering
from Korea University, Seoul, Republic of Korea,
in 2007 and 2009, respectively, and the Ph.D.
degree from the Department of Electrical and
Computer Engineering, Texas A&M University,
JUNGWON LEE (Graduate Student Member, College Station, TX, USA, in 2013. From 2013 to
IEEE) is currently pursuing the integrated B.S. and 2018, he was a Software Engineer with Intel
M.S. degrees with the School of Computer Science Corporation, Santa Clara, CA, USA. Since 2018,
and Engineering, Kyungpook National University, he has been with the School of Computer Science
Daegu, Republic of Korea. Her research interests and Engineering, Kyungpook National University, Daegu, South Korea,
include deep learning and approximate arithmetic. where he is currently an Assistant Professor. His research interests include
energy efficient integrated circuits and systems, particularly, neuromorphic
computing and approximate computing, and new memory devices and
architectures.

VOLUME 9, 2021 119953

You might also like