AxRMs_Approximate_Recursive_Multipliers_Using_High-Performance_Building_Blocks
AxRMs_Approximate_Recursive_Multipliers_Using_High-Performance_Building_Blocks
ABSTRACT Recursive multipliers (RMs) have been classified as a class of low-power multipliers because
they provide a wide-range of power-quality configuration options. 22 multipliers are the constitutional
building blocks of this recursive topology; however, most of the state-of-the-art approximate recursive
designs are based on a 44 building blocks. Therefore, the design space exploration of AxRMs using 22
multipliers is still an open-research problem. To add the configurability and flexibility in the design of
AxRMs such 2-bit multipliers are required that exhibit high-performance and low-area. In this article, two
approximate 22 multipliers are proposed that exhibit double-sided error distribution feature. Compared to
the existing best-approximated 22 multiplier, the proposed design achieves a reduction of 52 percent in
area and exhibits an improvement of 25 percent in terms of delay while having a bounded error behavior.
Then, three 88 multipliers of variable accuracy are designed using different configurations of approximate
22 multiplier. AxRM1 is the most-accurate design; an improvement of 50 percent in terms of mean relative
error distance (MRED) is achieved compared to the existing best MRED-optimized design. AxRM3 has simi-
lar MRED compared to the previous best 22-based AxRM (called MACISH); however, AxRM3 exhibits 13
percent better PDP than MACISH due to the use of low-power and high-performance 22 multipliers in
building larger multipliers. The proposed approximate multipliers are applied to cutting-edge error-tolerant
application, i.e., convolutional neural networks. AxRM2 provides the best quality-power trade-off, 32.64 per-
cent power savings are achieved with 1.10 percent better classification accuracy.
INDEX TERMS Low-power, accuracy-energy trade-off, error compensation, building blocks, pareto-front
2168-6750 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE
VOLUME 10, NO.
Authorized 2, APRIL-JUNE
licensed 2022to: Tallinn University
use limited permission. of
See ht_tps://www.ieee.org/publications/rights/index.html
Technology. Downloaded on December 03,2024 for more information.
at 10:15:32 1229
UTC from IEEE Xplore. Restrictions apply.
Waris et al.: AxRMs: Approximate Recursive Multipliers Using High-Performance Building Blocks
TABLE 1. Truth table and error distance of the proposed approximate 22 multipliers with error compensation feature.
Inputs Exact Truth Table Mul2a Truth Table Mul2b Truth Table Output (Dec.) ED1 ED2
a1 a0 b1 b0 c3 c2 c1 c0 c3 c2 c1 c0 c3 c2 c1 c0 Mul2a Mul2b
0 0 0 0 0 0 0 0 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0 0 0 0
0 0 0 1 0 0 0 0 0@ 0@ 0@ 1 0@ 0@ 0@ 0@ 1 0 +1 0
0 0 1 0 0 0 0 0 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0 0 0 0
0 0 1 1 0 0 0 0 0@ 0@ 0@ 1 0@ 0@ 0@ 0@ 1 0 +1 0
0 1 0 0 0 0 0 0 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0@ 0 0 0 0
0 1 0 1 0 0 0 1 0@ 0@ 0@ 1@ 0@ 0@ 0@ 1@ 1 1 0 0
0 1 1 0 0 0 1 0 0@ 0@ 1@ 0@ 0@ 0@ 0 0@ 2 0 0 2
0 1 1 1 0 0 1 1 0@ 0@ 1@ 1@ 0@ 0@ 0 1@ 3 1 0 2
1 0 0 0 0 0 0 0 0@ 0@ 0@ 0@ 0@ 0@ 1 0@ 0 2 0 +2
1 0 0 1 0 0 1 0 0@ 0@ 1@ 0@ 0@ 1@ 0@ 1 3 2 +1 0
1 0 1 0 0 1 0 0 0@ 1@ 0@ 0@ 0@ 1@ 1 0@ 4 6 0 +2
1 0 1 1 0 1 1 0 0@ 1@ 1@ 1 0@ 1@ 1@ 0@ 7 6 +1 0
1 1 0 0 0 0 0 0 0@ 0@ 0@ 0@ 0@ 0@ 1 0@ 0 2 0 +2
1 1 0 1 0 0 1 1 0@ 0@ 1@ 1@ 0@ 0@ 1@ 1@ 3 3 0 0
1 1 1 0 0 1 1 0 0@ 1@ 1@ 0@ 0@ 1@ 1@ 0@ 6 6 0 0
1 1 1 1 1 0 0 1 0 1 1 1@ 0 1 1 1@ 7 7 2 2
improvement up to 61 percent compared to the exact 4-bit of the proposed and existing state-of-the-art low-power
multiplier. approximate multipliers. The evaluation of the proposed
The very first approximate 22 multiplier where reduction designs for convolutional neural networks is presented in
in the critical path delay and area has been achieved is referred Section V. Section VI concludes the paper.
as UDM [7]. 2-bit multiplier proposed by Rehman et al. [8]
have a reduced maximum error magnitude compared to [7] II. THE PROPOSED 22 MULTIPLIER
with no improvement in the delay and area. [9] has proposed a The double-sided error distribution property has shown promis-
2-bit multiplier that is a mirror of [7], i.e., it produces an error ing results to compensate the error [11]. This is because posi-
case (e = +2) which is an additive inverse of [7] (e = 2). How- tive and negative errors complement each other resulting in the
ever, this has been achieved at the increased hardware cost internal self healing of the error. In this section approximate
compared to [7]. Another 2-bit multiplier with a large negative 22 multipliers (Mul2a /Mul2b ) are investigated with error
error (e = 4) has been proposed in [10]. It exhibits large delay compensation/cancellation feature. Table 1 shows the truth
compared to [7] as XOR based design is proposed. The existing table of the proposed 22 multipliers (called Mul2a and
2-bit multipliers [8]–[10] exhibit no improvement in the critical Mul2b ). All the approximated outputs in Mul2b have the same
path delay compared to state-of-the-art approximate multiplier error distance (ED) of two whereas Mul2a has only one output
[7]. Moreover, with reference to [7] they have a large area and with ED = 2. In Mul2a five outputs are approximated while
power consumption. To explore the power-quality trade-off in Mul2b contains six inaccurate outputs. Among the five approxi-
the design of AxRMs there is a need for a wide-range of mated outputs of Mul2a , four generate positive error while one
power-quality configuration options for approximate arithmetic shows negative error. Similarly, in Mul2b , three positive and
modules. The main contributions of our work are summarized three negative errors are introduced. Figure 1 shows the gate-
as follows: level logic of proposed 22 multipliers. Mul2a consists of
1) The two 22 multipliers (called Mul2a and Mul2b ) of three AND and one OR gate whereas Mul2b is further simpli-
variable accuracy/power are proposed. Mul2b exhibits an fied with two AND gates. As the approximated outputs (in both
improvement of 52 percent in terms of area compared to
the existing state-of-the-art approximate 2-bit multiplier.
2) The large-size multipliers (AxRMs) are then designed
using proposed 2-bit multipliers. A comprehensive
error analysis is presented by evaluating AxRMs
against different input distributions.
3) The proposed AxRM1 multiplier outperforms prior
error-energy Pareto front due to internal error compen-
sation feature, achieving 61 percent better error charac-
teristics for comparable energy dissipation.
The rest of the paper is organized as follows. The proposed
approximate 22 multipliers are presented in Section II.
Section III describes the design of 88 multipliers using 2- FIGURE 1. Logic diagrams of the proposed approximate 22 mul-
bit multipliers. Section IV presents the comparative analysis tipliers (a) Mul2a and (b) Mul2b .
1230 VOLUME
Authorized licensed use limited to: Tallinn University of Technology. Downloaded on December 03,2024 at 10:15:32 UTC from IEEE 10, NO.
Xplore. 2, APRIL-JUNE
Restrictions 2022
apply.
Waris et al.: AxRMs: Approximate Recursive Multipliers Using High-Performance Building Blocks
FIGURE 2. Utilization of 2-bit multipliers against inputs A = (1110)2 and B = (1011)2 (a) Exact output, (b) UDM [7] output, (c) Mul2a output
and (d) Mul2b output.
the proposed designs) produce both positive and negative equal numerical weights; therefore, the perfect error cancella-
errors so the errors can complement each other in the partial tion is performed and final result is same as that of exact result
product reduction process. This is further elaborated with the (Figure 2(a)). Table 2 shows the hardware savings and the out-
numerical example. Consider a 44 multiplier designed using put accuracy (quantified as, mean error distance) of the pro-
22 multipliers, four 2-bit multipliers are utilized (Figure 2 posed designs compared to the exact and existing state-of-the-
(a)). For example, let A = (1110)2 and B = (1011)2 . This implies art approximate 2-bit multiplier designs. Compared to the exact
AH = (11)2 = 3, AL = (10)2 = 2, BH = (10)2 = 2, BL = (11)2 = 3. design, an improvement of up to 65 and 72 percent in the area
An impact on the result is observed by using approximate mul- and power are achieved, respectively. Mul2b is the fastest and
tipliers. The output can be expressed as the most power-efficient design with near-to-zero mean error
profile. It has already been shown through a numerical example
C4x4 ¼ AL BL þ 4ðAL BH Þ that for certain inputs this multiplier can produce zero-mean
(1) error. Therefore, use of this approximate block for the design of
þ 4ðAH BL Þ þ 16ðAH BH Þ;
AxRMs can achieve significant hardware savings on account
where, shift factors are represented by the constants 4 and 16. of bounded error. Moreover, Mul2a has the similar mean error
The multiplication carried out using existing state-of-the-art distance as that of state-of-the-art UDM [7]; however, due to
approximate 22 multiplier [7] is shown in Figure 2(b). In [7], its reduced power consumption it exhibits 59 percent better
output against the input (3*3) is approximated to 7; therefore, PDP than [7].
negative errors are always introduced. The same inputs have
been passed to the approximate 44 multiplier designed using III. THE PROPOSED 88 APPROXIMATE RECURSIVE
Mul2a . First two partial products are approximated as shown in MULTIPLIERS USING 22 MULTIPLIER
Figure 2(c). Approximation against first partial product intro- An n-bit recursive multiplier can be designed using (n/2)2
duced an error of +1 while an error of 2 is introduced for the elementary multipliers. In this work, the 8-bit multiplier is
second partial product. These positive and negative errors com- under investigation which is designed using sixteen 2-bit
pensate each other in the partial product reduction step. How-
ever, as not all the errors are canceled out; therefore, the
resultant output has an error compared to the exact result. Con-
trarily, using Mul2b to design a 4-bit multiplier shows a zero-
mean error behavior against the specified inputs (Figure 2(d)).
The second and third partial products introduced an error of 2
and +2, respectively. As both these partial products have the
TABLE 3. 8-bit multipliers using Approx. 2-bit multipliers. TABLE 4. Error analysis of approximate multipliers.
1232 VOLUME
Authorized licensed use limited to: Tallinn University of Technology. Downloaded on December 03,2024 at 10:15:32 UTC from IEEE 10, NO.
Xplore. 2, APRIL-JUNE
Restrictions 2022
apply.
Waris et al.: AxRMs: Approximate Recursive Multipliers Using High-Performance Building Blocks
1234 VOLUME
Authorized licensed use limited to: Tallinn University of Technology. Downloaded on December 03,2024 at 10:15:32 UTC from IEEE 10, NO.
Xplore. 2, APRIL-JUNE
Restrictions 2022
apply.
Waris et al.: AxRMs: Approximate Recursive Multipliers Using High-Performance Building Blocks
Mul2a and Mul2b multipliers cancel errors in the partial [7] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power
with an underdesigned multiplier architecture,” in Proc. Int. Conf. VLSI
product reduction step. The proposed approximate multi- Des., 2011, pp. 346–351.
pliers have been evaluated using AlexNet CNN that classifies [8] S. Rehman et al., “Architectural-space exploration of approximate multi-
the ImageNet ILSVRC2012 dataset. AxRM1 exhibits an pliers,” in Proc. Int. Conf. Comput.-Aided Des., 2016, pp. 1–8.
[9] G. A. Gillani, M. A. Hanif, M. Krone, S. H. Gerez, M. Shafique, and
improvement of 1.4 percent in the classification accuracy A. B. J. Kokkeler, “SquASH: Approximate square-accumulate with self-
with 20.93 percent power savings. To encourage and help healing,” IEEE Access, vol. 6, pp. 49112–49128, 2018.
further research in this direction the synthesizable Verilog [10] G. A. Gillani, M. A. Hanif, B. Verstoep, S. H. Gerez, M. Shafique, and
A. B. J. Kokkeler, “MACISH: Designing approximate MAC accelerators
files are provided as open-source libraries at https://source- with internal-self-healing,” IEEE Access, vol. 7, pp. 77142–77160, 2019.
forge.net/projects/approxrecursivemul/. [11] H. Waris, C. Wang, W. Liu, and F. Lombardi, “AxBMs: Approximate Radix-
8 booth multipliers for high-performance FPGA-based accelerators,” IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 5, pp 1566–1570, May 2021.
ACKNOWLEDGMENTS [12] A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. D. Meo, “Com-
parison and extension of approximate 4–2 compressors for low-power
This work was supported by grants from the National
approximate multipliers,” IEEE Trans. Circuits Syst. I, Reg. Papers,
Natural Science Foundation of China (62022041 and vol. 67, no. 9, pp. 3021–3034, Sep. 2020.
61871216), and the Six Talent Peaks Project in Jiangsu [13] R. Pilipovic and P. Bulic, “On the design of logarithmic multiplier using
radix-4 booth encoding,” IEEE Access, vol. 8, pp. 64578–64590, 2020.
Province (2018-XYDXX-009).
[14] S. Vahdat, M. Kamal, A. Afzali-Kusha, and M. Pedram, “TOSAM: An
energy-efficient truncation-and rounding-based scalable approximate mul-
REFERENCES tiplier,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 5,
pp. 1161–1173, May 2019.
[1] W. Liu, F. Lombardi, and M. Shulte, “A retrospective and prospective view of [15] V. Mrazek, R. Hrbacek, Z. Vasicek, and L. Sekanina, “EvoApprox8b:
approximate computing,” Proc. IEEE, vol. 108, no. 3, pp. 394–399, Mar. Library of approximate adders and multipliers for circuit design and bench-
2020. marking of approximation methods,” in Proc. Des. Automat. Test Eur.
[2] C. Chen et al., “Optimally approximated and unbiased floating-point mul- Conf. Exhib., 2017, pp. 258–261.
tiplier with runtime configurability,” in Proc. Int. Conf. Comput. Aided [16] S. Hashemi et al., “DRUM: A dynamic range unbiased multiplier for
Des., 2020, pp. 1–9. approximate applications,” in Proc. Int. Conf. Comput.-Aided Des., 2015,
[3] V. Leon, K. Asimakopoulos, S. Xydis, D. Soudris, and K. Pekmestzi, pp. 418–425.
“Cooperative arithmetic-aware approximation techniques for energy-effi- [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
cient multipliers,” in Proc. 56th ACM/IEEE Des. Automat. Conf., 2019, with deep convolutional neural networks,” in Proc. Int. Conf. Neural Inf.
pp. 1–6. Process. Syst., 2012, pp. 1097–1105.
[4] M. S. Ansari, H. Jiang, B. F. Cockburn, and J. Han, “Low-power approxi- [18] Z.-G. Tasoulas, G. Zervakis, I. Anagnostopoulos, H. Amrouch, and J. Hen-
mate multipliers using encoded partial products and approximate compres- kel, “Weight-oriented approximation for energy-efficient neural network
sors,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 3, inference accelerators,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67,
pp. 404–416, Sep. 2018. no. 12, pp. 4670–4683, Dec. 2020.
[5] Y. Guo, H. Sun, and S. Kimura, “Design of power and area efficient lower- [19] M. S. Ansari, V. Mrazek, B. F. Cockburn, L. Sekanina, Z. Vasicek, and
part-OR approximate multiplier,” in Proc. IEEE Region 10 Conf., 2018, J. Han, “Improving the accuracy and hardware efficiency of neural net-
pp. 2110–2115. works using approximate multipliers,” IEEE Trans. Very Large Scale
[6] H. Waris, C. Wang, W. Liu, J. Han, and F. Lombardi, “Hybrid partial Integr. (VLSI) Syst., vol. 28, no. 2, pp. 317–328, Feb. 2020.
product-based high-performance approximate recursive multipliers,” IEEE
Trans. Emerg. Topics Comput., early access, Aug. 04, 2020, doi: 10.1109/
TETC.2020.3013977.