A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
A Novel Approximate Adder Design Using Error Reduced Carry Prediction and Constant Truncation
ABSTRACT This paper proposes a novel approximate adder that exploits an error-reduced carry prediction
and constant truncation with error reduction schemes. The proposed adder design techniques significantly
improve overall computation accuracy while providing excellent hardware efficiency. Particularly, the pro-
posed carry prediction technique can reduce a prediction error rate by up to 75% compared to existing
approximate adders considered in this paper. Furthermore, the error reduction technique also enhances the
overall computation accuracy by decreasing the error distance (ED). Our experimental results show that
the proposed adder improves the normalized mean ED (NMED) and mean relative ED (MRED) by up
to 91.4% and 98.9%, respectively, compared to the other approximate adders. Importantly, an excellent
design tradeoff allows the proposed adder to be the most competitive of the adders under consideration.
Specifically, the proposed adder achieves up to 95.7%, 91.1%, and 93.2% reductions of the power-NMED,
energy-NMED, and area-delay product (ADP)-NMED products, respectively, compared to the other adders.
Our adder enhances the power-, energy-, and ADP-MRED products by up to 99.4% compared to the others.
In particular, the figure of merit (FoM) considering both hardware and accuracy of the proposed adder is up
to 93.05% smaller than that of the other approximate adders considered herein. Furthermore, we confirm
that the approximation errors caused by the proposed adder have very little impact on output quality when
adopted in practical applications, such as digital image processing and machine learning.
INDEX TERMS Approximate adder, approximate computing, carry prediction, constant truncation, error
reduction.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 119939
J. Lee et al.: Novel Approximate Adder Design
chain to achieve faster additions. However, it requires bits (LSBs) of a multibit adder [13], [19], and many of their
more area and power than FA-based approximate adders. variants were presented so far [14]–[18], [20], [21].
FA-based adders use approximate 1-bit FAs to add some The LOA consists of two parts: an accurate part and
lower-order input bits approximately by replacing accurate an inaccurate part [13]. The former part uses a traditional
FAs with approximate ones in the corresponding bit positions precise adder, such as the ripple carry adder (RCA) and
[13]–[21]. This improves the area and power performance at carry-lookahead adder (CLA), to calculate the most sig-
the expense of the computation accuracy degradation. nificant bits (MSBs) with no computation error. Whereas,
In this paper, we propose a new approximate adder design the latter part only uses an OR operation to approximately
based on new approximate FA cells, enhanced carry pre- obtain LSB summations. Furthermore, the output of an AND
diction, and a constant truncation with error reduction. The operation for the MSB input pair of the inaccurate part is
proposed carry prediction scheme significantly reduces the utilized as a carry input to the accurate part to improve overall
prediction error rate by up to 75% compared to existing computation accuracy. Design variants based on the LOA
approximate adders considered here. Also, the truncation have been proposed to further optimize the LOA, such as
with error reduction logic enhances the overall computation LOA without the AND-based carry prediction (LOAWA),
accuracy while reducing energy and power consumption. optimized lower-part constant OR adder (OLOCA), hardware
When implemented in a 32-nm CMOS technology, the pro- optimized and error reduced approximate adder (HOERAA),
posed adder is 1.49×, 1.90×, and 3.12× better area-, power- hardware optimized adder having a near-normal error distri-
, and energy-efficient, respectively, than a traditional adder. bution (HOAANED), and hybrid error reduction lower-part
Furthermore, compared to existing approximate adders, our OR adder (HERLOA) [14]–[18]. The LOAWA is identical to
adder improves overall computation accuracy by up to 98.9%. the LOA, except for the AND-based carry prediction [14].
When jointly analyzing the adders in terms of hardware and In other words, the carry input to the accurate part is fixed
accuracy, the proposed design is the most competitive among to a constant ‘‘0,’’ which degrades accuracy but improves
the adders considered. the computation speed. The OLOCA is also similar to the
In summary, this paper makes the following key contribu- LOA in that the OR operation is utilized for the inaccurate
tions in designing approximate adders: part approximation, but it outputs a constant ‘‘1’’ to a few
• We present a novel efficient approximate adder design LSBs regardless of the corresponding bit inputs [15]. This
that effectively trades off between hardware cost and also degrades accuracy a bit while reducing hardware cost.
computation accuracy through systematic analysis, and In addition to the OLOCA, the HOERAA uses the OR oper-
prove that our design outperforms the others by exten- ation for two MSB input pairs of the inaccurate part and sets
sively comparing it with 12 approximate adders. the remaining LSB outputs to a constant ‘‘1’’ regardless of the
• We propose 1) a new carry prediction scheme that inputs [16]. For the MSB output of the inaccurate part, it uses
reduces the prediction error rate by up to 75% compared a 2-to-1 multiplexer to select ‘‘0’’ or an OR operation output
to the others, 2) approximate FA cells that improves of the corresponding input pairs. The multiplexer output is
accuracy, and 3) a constant truncation with an error then used in an OR operation with the AND gate output
reduction scheme that reduces hardware cost while of the second MSB input pair of the inaccurate part. Also,
offering good accuracy performance. it includes an AND-based carry prediction for the accurate
The remainder of this paper is organized as follows. part, which also serves as the selection input of the multi-
Section II provides a brief review of existing approximate plexer. The HOAANED is derived from the HOERAA by
adders. In Section III, we present the proposed adder, which including one additional OR gate at the MSB of the inaccurate
consists of our proposed approximate FAs, novel carry pre- part [17]. This OR gate contributes to the improvement of
diction, and constant truncation with error reduction. Illus- an error metric, and thus, the HOAANED produces outputs
trated examples of the adder operation and mathematical with almost normal error distributions. To enhance over-
analysis of the carry prediction error rate and overall error rate all computation accuracy, the HERLOA combines the basic
are also provided. Then, Section IV explains the experimental LOA structure with the hybrid error reduction scheme [18].
results and systematic analysis of the proposed adder as well Figure 1 shows the architecture of the inaccurate part of the
as extensive comparison with the 12 existing approximate HERLOA. Note that the accurate part is the same as the LOA
adders. Also, a joint analysis of the adders in hardware and (i.e., precise adder). When the second MSB input pair of the
accuracy aspects is presented. In Section V, the application inaccurate part is both ‘‘1,’’ error reduction logic decreases
of the approximate adders to digital image processing and the error distance (ED) by investigating the MSB input pair.
machine learning are presented. Finally, Section VI concludes The grayed gates in Figure 1 are the hybrid error reduction
the work. logic, while the others are the LOA logic. The error rate is
reduced by replacing an OR gate at the MSB in the LOA with
II. RELATED WORKS an XOR gate in the HERLOA.
The lower-part OR adder (LOA) and error tolerant adder The ETAI, like the LOA, divides an adder into two
I (ETAI) are two representative approximate adders imple- parts [19]. The inaccurate part of the ETAI utilizes its
mented using an approximate FA for the least significant own modified XOR operation instead of the traditional
FIGURE 3. Overall hardware architecture of the proposed approximate adder, termed error reduced carry prediction approximate adder (ERCPAA).
produces a sum Si and a carry output Ci using TABLE 1. Truth table for traditional FA and proposed approximate FAs.
Si = Ai ⊕ Bi ⊕ Ci−1 (1)
Ci = Ai Bi + Ai Ci−1 + Bi Ci−1 (2)
Ci = Ci−1 if Pi = 1 (6)
in Figure 4(b). This means that the approximate summation IV. EXPERIMENTAL RESULTS AND COMPARISON
will be larger than the correct one in this case. Instead of In this section, we evaluate the performance of the proposed
forcing the l LSB outputs to ‘‘1,’’ hence, the proposed adder adder in terms of both hardware costs and computation accu-
sets all outputs of the inaccurate part except for its MSB racy through systematic analysis. Also, an extensive compar-
position to ‘‘0,’’ (i.e., Sn−k−2:0 = 0) making the approxi- ison with other existing approximate adders is presented to
mation output closer to the correct addition. Under the given demonstrate the potential benefits of the proposed adder.
input shown in Figure 4(b), the ED, defined by |Sapproximate −
Scorrect | where Sapproximate and Scorrect are approximate and
A. EXPERIMENT SETUP AND EVALUATION
correct summations, respectively, decreases from 211 to 84.
We designed our adder in Verilog HDL and synthesized it
This reduction technique allows up to a 2n−k−1 − 1 decrease
with the Synopsys 32-nm generic library (SAED32) using
in the ED.
Synopsys Design Compiler to examine the hardware char-
acteristics of the proposed approximate adder in terms of
E. ERROR RATE ANALYSIS
area, delay, power, and energy [36]–[38]. We implemented
The proposed adder generates an output error when two input a 16-bit adder using an 8-bit RCA-based precise adder (i.e.,
operands Ai and Bi of any bit position from (n − k − 2)th to n = 16 and k = 8). Prior studies suggested that a size
(l)th LSBs are both ‘‘1.’’ In other words, if the inputs of at least of 7 to 9 bits for the inaccurate part would be appropriate
one OR-based FA are both ‘‘1,’’ an error occurs. According to obtain a good tradeoff between output quality and power
to Table 1, the OR-based FA produces an incorrect output at and energy saving for practical applications, such as video
sum Si when Ai = 1 and Bi = 1, whereas the XOR-based and image processing, and a 16-bit adder was widely adopted
counterpart does not. This input condition generates a carry in these applications [7], [9], [23], [32], [39]. Therefore,
generation for the next bit position, which results in an output we chose the adder design parameters of n = 16 and k =
error at the sum of the next bit position. Furthermore, an error 8. In addition to the hardware cost, we also analyzed the
occurs when both the inputs of any bit position at the constant error characteristics of the proposed adder by developing a
truncation part (i.e., (l − 1)th to (0)th bit position) are either software-based simulator. To exhaustively test a 16-bit adder,
‘‘0’’ or ‘‘1’’ because the part fixes the output to ‘‘1.’’ In other 232 distinct input pairs can be considered but it is extremely
words, the constant truncation part output is always correct intensive to compute. Therefore, we use 10 million (i.e., 107 )
when either of the two inputs is ‘‘1.’’ To simplify the error rate input pairs, each of which was uniformly distributed random
analysis, we first calculate a probability of the input condition input, to the proposed adder to obtain the error characteristics
to make the output of the adder correct, and then the error measured by various error metrics, such as the overall error
rate can be achieved by obtaining its complement. Since the rate, carry prediction error rate, mean error distance (MED),
proposed adder produces correct outputs when Ai 6 = 1 and normalized mean error distance (NMED), and mean relative
Bi 6 = 1 where l ≤ i ≤ n − k − 2 and Ai 6 = Bi where 0 ≤ error distance (MRED).
i ≤ l − 1, we can define an event Eco that the adder generates
always correct outputs as follows:
B. TRADEOFF ANALYSIS OF THE PROPOSED ADDER
n−k−2 l−1
Y Y In our proposed design, the area, power, and energy perfor-
Eco = Ai Bi · Ai Bi + Ai Bi (11) mance degrade as the design parameter l decreases because a
i=l i=0 smaller l requires more logic gates to implement the adder.
and the probability of this event is given by However, as l decreases, the overall computation accuracy
performance improves. The power-NMED product was intro-
n−k−2 l−1
Y Y duced to assess approximate adders considering the power
P(Eco ) = P( Ai Bi )P( (Ai Bi + Ai Bi )) and accuracy performance together [40]. Since this metric
i=l i=0 does not consider the area aspect, we can consider a new
= P(An−k−2 Bn−k−2 ) · · · P(Al Bl ) joint metric, the area-NMED product, to analyze the area
× P(Al−1 Bl−1 + Al−1 Bl−1 ) · · · P(A0 B0 + A0 B0 ) and accuracy performance collectively. Similarly, the power-
n−k−l−1 l
3 1 MRED and area-MRED products can be employed to jointly
= (12) analyze the costs and accuracy.
4 2
To seek the best tradeoff between the hardware cost
Note that we assumed that the two input operands A and B and accuracy of our adder, we adjusted design parameter l
are bitwise independent. The error rate of the proposed adder and obtained the power-NMED/MRED products and area-
is the probability of the complement of the event. Therefore, NMED/MRED products. It is noteworthy that the delay is
the error rate ERERCPAA is given by consistent, although l varies, and thus we exclude the delay
for the tradeoff analysis. Figure 5 shows the tradeoff of the
ERERCPAA (n, k, l) = 1 − P(Eco ) hardware costs and accuracy for the proposed adder with
n−k−l−1 l
3 1 various values of l. We varied the design parameter l from
= 1− (13)
4 2 1 to 6 because our adder requires at least two FAs at the
FIGURE 5. Tradeoff analysis between hardware costs and accuracy for the
proposed 16-bit adder with various values of l , ranging from 1 to 6.
FIGURE 6. Comparison of carry prediction error rates of approximate
adders.
TABLE 2. Accuracy performance of the proposed adder under various
values of the design parameters.
FIGURE 8. Normalized power-normalized mean error distance (NMED), energy-NMED, and area-delay product (ADP)-NMED products of
approximate adders.
Figure 7 shows the MREDs of the approximate adders. To compare these products effectively among the adders, they
To effectively make the comparison, the MRED values were were normalized by the corresponding LOA values, and the
normalized using the corresponding values of the LOA. The values were inserted outside the bars. Undoubtedly, the pro-
MREDs of the LOAWA, OLOCA, ETAI and SETA are posed adder outperforms the other approximate adders in all
slightly greater than that of the LOA. In particular, the MRED these joint metrics. The RAP-CLA, BCSA, and BCSAERU
values of the RAP-CLA, BCSA, and BCSAERU far exceed show very poor tradeoff performance and the three product
those of the others, and their values were inserted outside the values far exceed the other adders because they are a bit
bars. Specifically, the RAP-CLA exhibits the worst MRED faster but consume a larger area, power, and energy than the
performance, which is 52.02× greater than the LOA. The other adders as shown in Table 3. Also, they exhibit relatively
HOERAA and HOAANED have almost identical MRED val- worse accuracy that deteriorates the tradeoff performance.
ues, which are 27.5% less than the LOA on average. In terms Among the FA-based approximate adders, the ETAI has the
of MRED performance, the proposed adder is comparable to worst tradeoff performance and the approximate adders that
the HERLOA and CPETA. Specifically, the proposed adder exclude the carry prediction (i.e., LOAWA and SETA) have
reduces the MRED by 41.2% and 98.9% compared to the similar values with the ETAI. Although the lack of carry pre-
LOA and RAP-CLA, respectively. diction allows these adders to be relatively efficient in terms
of area, delay, and power, the poor accuracy degrades the
E. JOINT ANALYSIS BETWEEN HARDWARE AND overall tradeoff performance so that their three product values
ACCURACY OF APPROXIMATE ADDERS are at least 50% higher than those of the LOA. The OLOCA
The error rate is an important metric to assess the accuracy and CPETA have similar power-NMED and energy-NMED
of approximate adders. Unfortunately, its usefulness to eval- products, which are slightly better than the LOA. In addition,
uate the adder might be limited because it only considers the HOERAA and HOAANED are comparable in all three
the presence of an error but not the implication (e.g., dis- products because of almost identical hardware architecture.
tance/magnitude) of the error on the additions [40]. Hence, The HERLOA is nearly the same power-NMED and energy-
we adopted ED based metrics, such as NMED and MRED, NMED products as the HOERAA and HOAANED. However,
to better represent the accuracy of the adders rather than the the larger area occupation stems from the hybrid error reduc-
error rate in the joint analysis. The power-NMED product is tion scheme results in a higher ADP-NMED product. In sum-
widely used to evaluate approximate adders in terms of power mary, our adder has the best tradeoff performance among
and accuracy jointly [40]. Similarly, the energy-NMED prod- the compared approximate adders. Specifically, the power-
uct was considered to analyze the energy aspect [18]. Unfor- NMED, energy-NMED, and ADP-NMED products of the
tunately, neither of these two products do not includes the proposed adder are 95.7%, 91.1%, and 93.2% lower than
area or delay of approximate adders. The area-delay prod- those of the RAP-CLA, respectively.
uct (ADP) is a widely employed metric to evaluate hardware Similar to the joint metrics using NMED, we can take
resources in terms of area and delay [15]. Therefore, we can into account the metrics using MRED as well. Figure 9
consider a new joint metric, the ADP-NMED product, to ana- shows the power-MRED, energy-MRED, and ADP-MRED
lyze the tradeoff between area, delay, and accuracy. products for the approximate adders. The values that were
Figure 8 exhibits the power-NMED, energy-NMED, added outside the bars were normalized by the corresponding
and ADP-NMED products for 13 approximate adders. LOA values, and the three products using MRED exhibit
FIGURE 9. Normalized power-normalized mean relative error distance (MRED), energy-MRED, and area-delay product (ADP)-MRED
products of approximate adders.
FIGURE 11. Original input image and output images with PSNRs of Gaussian smoothing filtering using an accurate adder, the existing
approximate adders, and the proposed adder.
TABLE 4. PSNRs of various images by Gaussian smoothing filtering using the approximate adders.
following 5 × 5 Gaussian kernel G is used for filtering [39]. For the Gaussian smoothing operation, the addition was
performed using an accurate adder as well as the proposed
and existing approximate adders, whereas multiplication
1 3 6 3 1
3 15 25 15 3 and division were performed accurately. Additionally, since
1
Gaussian smoothing filtering is useful to reduce image noise,
G= 8 6 25 41 25 6 (15)
2 3 15 25 15
3 we added zero-mean, Gaussian white noise with a variance
1 3 6 3 1 of 0.01 to the original lena image, which is a grayscale image
FIGURE 12. Original data and clustered data with WCSS by k-means clustering using various adders.
with a size of 512 × 512, and then performed filtering [41]. points to the centroids of the corresponding clusters, which
We employed an accurate adder (RCA), the proposed adder, is defined by the within cluster sum of squares (WCSSs).
and 12 existing approximate adders in the filtering. The Therefore, it iteratively calculates the distances where the
PSNR values were calculated against the images obtained subtraction operation is mainly used in this algorithm.
by applying Gaussian filtering to the original input image We applied the approximate adders to the operation [28].
using the accurate adder. First, the approximate adders with Note that the subtraction can be done by 2’s comple-
design parameters of n = 16 and k = 8 were applied to the ment addition. We obtained an unlabeled dataset comprising
filtering and we found out that all approximate adders, except 1000 data points from [44] and set the number of clusters k
for the RAP-CLA, BCSA, and BCSAERU whose block sizes to 5.
were set to 4, produce visually very similar output images, Figure 12 demonstrates the visualized 2-D original dataset
although our adder generates the best image quality with the and clustered dataset using the accurate adder, the existing
highest PSNR. Therefore, to make the output images more approximate adders, and the proposed adder. The WCSS
visually distinguishable, we reduced the size of the accurate values were extracted to evaluate the quality of the clustering
part to 3 and the block size of the approximate adders by results using the difference adders [28]. The value closer to
half. the one clustered by the accurate adder indicates a better
Figure 11 shows the original noisy image and output clustering result. The LOAWA, ETAI, and SETA show a
images of Gaussian smoothing filtering using various adders. similar clustering result, and so do the LOA and OLOCA.
The BCSA shows the worst PSNR value of 8.20dB among the The LOA/OLOCA produce much better clustering quality
images. The PSNR value of 8.25dB is identical to the output than the LOAWA/ETAI/SETA because the latter does not
images processed by the LOAWA, ETAI, and SETA. Simi- include any carry prediction logic to the precise adder and
larly, the LOA and OLOCA generate the same output image this degrades computation accuracy. The proposed approxi-
quality. The PSNRs of images with the HOERAA, CPETA, mate adder exhibits the best clustering result closest to the
and BCSAERU range from 9.83dB to 10.93dB. In other words, one using the accurate adder. The HOERAA, HOAANED,
the image quality processed by these adders is between those HERLOA, and CPETA yield slightly worse results than
processed by the LOA/OLOCA and LOAWA/ETAI/SETA. the proposed adder. Unfortunately, the RAP-CLA, BCSA,
The HOAANED, HERLOA, and RAP-CLA yield slightly and BCSAERU show poor clustering performance and do
better output images than the LOA/OLOCA. The proposed not allow the dataset to be partitioned properly. Specifi-
adder produces the best image quality distinctly seen in cally, the WCSS values of these adders are up to 384% and
human vision with a PSNR value of 20.84dB, which means 378% larger than those of the accurate and proposed adders,
that the filtered image is the closest to the one generated respectively. In summary, the proposed adder has the best
by the accurate adder. This confirms that the approximation performance in terms of WCSS in k-means clustering as
errors of the proposed adder have a negligible impact on the well.
processing quality and thus, it is suitable for digital image
processing applications. To further examine the approximate VI. CONCLUSION
adders in the application, we performed the Gaussian smooth In this paper, we presented a new approximate adder that
filtering for eight more well-known benchmark images (cam- combines error-reduced carry prediction and constant trun-
eraman, peppers, baboon, F-16, couple, fishing boat, clock, cation with error reduction schemes. The proposed carry
and airplane) obtained from [42]. Note that the same white prediction scheme achieves an error rate reduction of up
noise was added to these images. The PSNRs of the fil- to 75% compared to the existing approximate adder, and
tered output images generated by the approximate adders the proposed error reduction technique improves the over-
are listed in Table 4. All images exhibit a similar PSNR all computation accuracy by decreasing the error distance.
trend with the lena image. Evidently, our ERCPAA achieves We systematically analyzed our design and sought the best
the best PSNR value for all benchmark images among tradeoff between hardware costs and accuracy by adjust-
the approximate adders in the Gaussian smoothing filtering ing the adder design parameter. When implemented in the
application. 32-nm CMOS technology, the proposed design has 1.90×
and 3.12× greater power- and energy efficiency, respectively,
B. MACHINE LEARNING than the RCA, with NMED and MRED improvements of up
In addition to the filtering application, we also took machine to 91.4% and 98.9%, respectively, compared to the existing
learning into consideration to explore the efficacy of the approximate adders. Importantly, our design achieves 95.7%,
proposed adder. Specifically, we examined the performance 91.1%, and 93.2% reductions in the power-NMED, energy-
of the approximate adders in k-means clustering, which is NMED, and ADP-NMED products, respectively, compared
an unsupervised machine learning algorithm and extensively to the RAP-CLA due to an excellent design tradeoff. Our
utilized in data mining [43]. Basically, the algorithm groups adder also reduces the power-, energy-, and ADP-MRED
a set of unlabeled data points into k different clusters that products by up to 99.4% compared to the others. Particu-
each data point belongs to only one cluster. When clus- larly, in terms of the FoM considering hardware resources
tering, it minimizes the sum of distances between the data (i.e., energy, delay, and area) and the accuracy performance
(i.e., NMED), the proposed adder is up to 93.05% better [16] P. Balasubramanian and D. L. Maskell, ‘‘Hardware optimized and
than the RAP-CLA. The proposed adder has been adopted error reduced approximate adder,’’ Electronics, vol. 8, no. 11, p. 1212,
Oct. 2019.
in a digital image processing application and proves that the [17] P. Balasubramanian, R. Nayar, D. L. Maskell, and N. E. Mastorakis,
proposed adder rarely affects the output image quality that ‘‘An approximate adder with a near-normal error distribution: Design, error
is the closest to the one with the accurate adder. Addition- analysis and practical application,’’ IEEE Access, vol. 9, pp. 4518–4530,
2021.
ally, we have demonstrated the performance of our adder [18] H. Seo, Y. S. Yang, and Y. Kim, ‘‘Design and analysis of an approximate
in a machine learning application and the result has shown adder with hybrid error reduction,’’ Electronics, vol. 9, no. 3, p. 471,
that the proposed adder outperforms the other approximate Mar. 2020.
[19] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, ‘‘Design of
adders. Therefore, the proposed adder is well applicable low-power high-speed truncation-error-tolerant adder and its application
to energy-efficient and error-tolerant applications, such as in digital signal processing,’’ IEEE Trans. Very Large Scale Integr. (VLSI)
machine learning, neuromorphic computing, and digital sig- Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010.
nal processing. [20] Y. Kim, ‘‘An accuracy enhanced error tolerant adder with carry prediction
for approximate computing,’’ IEIE Trans. Smart Process. Comput., vol. 8,
no. 4, pp. 324–330, Aug. 2019.
ACKNOWLEDGMENT [21] J. Lee, H. Seo, Y. Kim, and Y. Kim, ‘‘Approximate adder design with sim-
(Jungwon Lee and Hyoju Seo contributed equally to this plified lower-part approximation,’’ IEICE Electron. Exp., vol. 17, no. 15,
pp. 1–3, Aug. 2020.
work.) [22] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram,
‘‘RAP-CLA: A reconfigurable approximate carry look-ahead adder,’’
REFERENCES IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 65, no. 8, pp. 1089–1093,
Aug. 2018.
[1] L. Jiao and J. Zhao, ‘‘A survey on the new generation of deep learning in [23] F. Ebrahimi-Azandaryani, O. Akbari, M. Kamal, A. Afzali-Kusha, and
image processing,’’ IEEE Access, vol. 7, pp. 172231–172263, 2019. M. Pedram, ‘‘Block-based carry speculative approximate adder for energy-
[2] C. Lammie, A. Olsen, T. Carrick, and M. R. Azghadi, ‘‘Low-power and efficient applications,’’ IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 67,
high-speed deep FPGA inference engines for weed classification at the no. 1, pp. 137–141, Jan. 2020.
edge,’’ IEEE Access, vol. 7, pp. 51171–51184, 2019.
[24] Y. Kim, Y. Zhang, and P. Li, ‘‘An energy efficient approximate adder
[3] Y. Kim, Y. Zhang, and P. Li, ‘‘A reconfigurable digital neuromorphic with carry skip for error resilient neuromorphic VLSI systems,’’ in
processor with memristive synaptic crossbar for cognitive computing,’’ Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Nov. 2013,
ACM J. Emerg. Technol. Comput. Syst., vol. 11, no. 4, pp. 38:1–38:25, pp. 130–137.
Apr. 2015.
[25] Y. Kim, Y. Zhang, and P. Li, ‘‘Energy efficient approximate arithmetic for
[4] B. Liu, Z. Wang, W. Zhu, Y. Sun, Z. Shen, L. Huang, Y. Li, Y. Gong,
error resilient neuromorphic computing,’’ IEEE Trans. Very Large Scale
and W. Ge, ‘‘An ultra-low power always-on keyword spotting accelerator
Integr. (VLSI) Syst., vol. 23, no. 11, pp. 2733–2737, Nov. 2015.
using quantized convolutional neural network and voltage-domain analog
[26] A. B. Kahng and S. Kang, ‘‘Accuracy-configurable adder for approximate
switching network-based approximate computing,’’ IEEE Access, vol. 7,
arithmetic designs,’’ in Proc. 49th Annu. Design Autom. Conf. (DAC),
pp. 186456–186469, 2019.
2012, pp. 820–825.
[5] I. Khan, S. Choi, and Y.-W. Kwon, ‘‘Earthquake detection in a static
and dynamic environment using supervised machine learning and a novel [27] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, ‘‘A low latency generic
feature extraction method,’’ Sensors, vol. 20, no. 3, p. 800, Feb. 2020. accuracy configurable adder,’’ in Proc. 52nd Annu. Design Autom. Conf.,
Jun. 2015, pp. 81:1–81:6.
[6] Q. Wang, P. Li, and Y. Kim, ‘‘A parallel digital VLSI architecture for
integrated support vector machine training and classification,’’ IEEE Trans. [28] J. Hu, Z. Li, M. Yang, Z. Huang, and W. Qian, ‘‘A high-accuracy
Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 8, pp. 1471–1484, approximate adder with correct sign calculation,’’ Integration, vol. 65,
Aug. 2015. pp. 370–388, Mar. 2019.
[7] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, ‘‘Low-power digital [29] V. Camus, M. Cacciotti, J. Schlachter, and C. Enz, ‘‘Design of approx-
signal processing using approximate adders,’’ IEEE Trans. Comput.-Aided imate circuits by fabrication of false timing paths: The carry cut-
Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137, Jan. 2013. back adder,’’ IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 4,
[8] Y. S. Yang and Y. Kim, ‘‘Approximate digital leaky Integrate-and-fire pp. 746–757, Dec. 2018.
neurons for energy efficient spiking neural networks,’’ IEIE Trans. Smart [30] M. Pashaeifar, M. Kamal, A. Afzali-Kusha, and M. Pedram, ‘‘Approx-
Process. Comput., vol. 9, no. 3, pp. 252–259, Jun. 2020. imate reverse carry propagate adder for energy-efficient DSP applica-
[9] A. Raha, H. Jayakumar, and V. Raghunathan, ‘‘Input-based dynamic recon- tions,’’ IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 11,
figuration of approximate arithmetic units for video encoding,’’ IEEE pp. 2530–2541, Nov. 2018.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 3, pp. 846–857, [31] N.-C. Huang, S.-Y. Chen, and K.-C. Wu, ‘‘Sensor-based approximate
Mar. 2016. adder design for accelerating error-tolerant and deep-learning applica-
[10] T. Moreau, A. Sampson, and L. Ceze, ‘‘Approximate computing: Making tions,’’ in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), Mar. 2019,
mobile systems more efficient,’’ IEEE Pervasive Comput., vol. 14, no. 2, pp. 692–697.
pp. 9–13, Apr. 2015. [32] L. Soares, M. da Rosa, C. Machado, E. da Costa, and S. Bampi,
[11] S. Mittal, ‘‘A survey of techniques for approximate computing,’’ ACM ‘‘Design methodology to explore hybrid approximate adders for
Comput. Surv., vol. 48, no. 4, pp. 1–33, May 2016. energy-efficient image and video processing accelerators,’’ IEEE
[12] Q. Xu, M. Todd, and S. K. Nam, ‘‘Approximate computing: A survey,’’ Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 6, pp. 2137–2150,
IEEE Design Test, vol. 33, no. 1, pp. 8–22, Feb. 2016. Jun. 2019.
[13] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, ‘‘Bio-inspired [33] H. Seo and Y. Kim, ‘‘A new approximate adder with duplicate-constant
imprecise computational blocks for efficient VLSI implementation of soft- scheme for energy efficient applications,’’ in Proc. IEEE Int. Conf. Con-
computing applications,’’ IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, sum. Electron. Asia (ICCE-Asia), Nov. 2020, pp. 1–2.
no. 4, pp. 850–862, Apr. 2010. [34] F. Frustaci, S. Perri, P. Corsonello, and M. Alioto, ‘‘Energy-quality scalable
[14] P. Albicocco, G. C. Cardarilli, A. Nannarelli, M. Petricca, and M. Re, adders based on nonzeroing bit truncation,’’ IEEE Trans. Very Large Scale
‘‘Imprecise arithmetic for low power image processing,’’ in Proc. Conf. Integr. (VLSI) Syst., vol. 27, no. 4, pp. 964–968, Apr. 2019.
Rec. 46th Asilomar Conf. Signals, Syst. Comput. (ASILOMAR), Nov. 2012, [35] H. Seo, Y. S. Yang, and Y. Kim, ‘‘An energy-efficient imprecise adder
pp. 983–987. with a lower-part constant approximation,’’ in Proc. Int. SoC Design Conf.
[15] A. Dalloo, A. Najafi, and A. Garcia-Ortiz, ‘‘Systematic design of (ISOCC), Oct. 2020, pp. 143–144.
an approximate adder: The optimized lower part constant-OR adder,’’ [36] H. Bhatnagar, Advanced ASIC Chip Synthesis: Using Synopsys Design
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8, Compiler Physical Compiler and Prime Time. Norwell, MA, USA:
pp. 1595–1599, Aug. 2018. Kluwer, 2002.
[37] R. Goldman, K. Bartleson, T. Wood, K. Kranen, V. Melikyan, and HYOJU SEO (Graduate Student Member, IEEE)
E. Babayan, ‘‘32/28 nm educational design kit: Capabilities, deployment received the B.S. degree from the School of
and future,’’ in Proc. IEEE Asia Pacific Conf. Postgraduate Res. Micro- Computer Science and Engineering, Kyungpook
electron. Electron. (PrimeAsia), Dec. 2013, pp. 284–288. National University, Daegu, Republic of Korea,
[38] (Jan. 2012). Synopsys Digital Standard Cell Library SAED_EDK32/28_ in 2020, where she is currently pursuing the M.S.
CORE Databook Revision 1.0.0. Accessed: Jul. 27, 2021. [Online]. Avail- degree. Her research interests include approximate
able: https://www.synopsys.com/community/university-program/ computing, neuromorphic computing, deep learn-
teaching-resources.html
ing accelerator, and image processing.
[39] B. Garg and S. K. Patel, ‘‘Reconfigurable carry look-ahead adder trading
accuracy for energy efficiency,’’ J. Signal Process. Syst., vol. 93, no. 1,
pp. 99–111, Jan. 2021.
[40] J. Liang, J. Han, and F. Lombardi, ‘‘New metrics for the reliability of
approximate and probabilistic adders,’’ IEEE Trans. Comput., vol. 62,
no. 9, pp. 1760–1771, Sep. 2013.
[41] M. Masadeh, O. Hasan, and S. Tahar, ‘‘Input-conscious approximate HYELIN SEOK (Student Member, IEEE) is cur-
multiply-accumulate (MAC) unit for energy-efficiency,’’ IEEE Access, rently pursuing the integrated B.S. and M.S.
vol. 7, pp. 147129–147142, 2019. degrees with the School of Computer Science
[42] The USC-SIPI Image Database. Accessed: Jul. 27, 2021. [Online]. Avail- and Engineering, Kyungpook National University,
able: http://sipi.usc.edu/database/database.php Daegu, Republic of Korea. Her research interests
[43] K. P. Sinaga and M.-S. Yang, ‘‘Unsupervised K-means clustering algo- include approximate arithmetic and new comput-
rithm,’’ IEEE Access, vol. 8, pp. 80716–80727, 2020.
ing systems.
[44] Clustering Benchmark. Accessed: Jul. 27, 2021. [Online]. Available:
http://github.com/deric/clustering-benchmark