0% found this document useful (0 votes)
19 views

Area-Delay-Power Efficient VLSI Architecture of FIR Filter For Processing Seismic Signal

Uploaded by

vikasstanli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Area-Delay-Power Efficient VLSI Architecture of FIR Filter For Processing Seismic Signal

Uploaded by

vikasstanli
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO.

11, NOVEMBER 2021 3451

Area-Delay-Power Efficient VLSI Architecture of


FIR Filter for Processing Seismic Signal
Sudipta Bose , Arijit De, Member, IEEE, and Indrajit Chakrabarti, Member, IEEE

Abstract—Low-complexity, high-speed and re-configurability real-time applications. Therefore, research on the area-power-
are the primary requirements of the finite impulse response (FIR) delay optimized FIR filter architecture by CSE has gained
filters employed for the processing of acquired seismic signal importance in the recent research works [5]–[10]. In [5],
in real-time seismic-alert-system. The common sub-expression a hybrid CSE technique has been employed in order to imple-
elimination (CSE) technique is employed widely to reduce the ment an efficient multiplier-less FIR filter. A reconfigurable
hardware complexity by minimizing the logic operators (LOs)
and logic depths (LDs) in digital FIR filter. In this brief, a novel
FIR filter architecture has been proposed in [7] by employ-
matrix grouped CSE (MCSE) algorithm has been proposed which ing a CSD based VHCSE algorithm, which outperforms the
outperforms the existing CSE algorithms in terms of LOs and counterpart [6] in terms of reduction in hardware cost. In the
LDs minimization. Moreover, a new half-unit biased (HUB) based case of the SAS, it is essential to implement a low-power,
rounding technique is incorporated in the proposed design to high-speed architecture of the low pass FIR filter so that
reduce the truncation error while maintaining low-complexity the processing time of the SAS can be reduced to provide
and the cut-set retiming technique is employed to reduce the a warning more advanced in time while consuming low power
critical-path-delay (CPD). Two hardware efficient FIR filter in the real-time sensor-based system. The motivation of this
architectures (I and II) involving the proposed canonical signed brief lies in the design of area-delay-power efficient FIR fil-
digit (CSD) based MCSE algorithm, HUB rounding and cut-set ter architecture for pre-processing seismic signal with high
retiming approach have been presented. Further, the architecture
precision (reduced truncation error) in real-time. The primary
II represents a hardware efficient realization of a reconfigurable
FIR filter. The hardware implementation of the architectures contributions of the work presented in this brief are as follows.
is performed on both FPGA and ASIC platforms. The hard- i) A novel matrix grouped CSE (MCSE) algorithm has
ware implementation of the proposed architecture I yields more been proposed which reduces the computational burden
than 25%, 49%, 38%, 36% and, 31% reduction in LOs, CPD, and achieves the best possible reduction in LOs and
effective latency, area-delay-product (ADP) and, power-delay- LDs while comparing with the existing CSE algorithms
product (PDP), respectively, over the state-of-the-art CSE based in literature [5]–[10].
architectures. Whereas the reconfigurable architecture II exhibits ii) A low-power and high-speed FIR filter (architecture
nearly 48%, 12% and, 13% reduction in CPD, ADP and, PDP I) has been implemented for seismic signal pre-
over the counterpart. processing. The proposed architecture uses CSD coded
Index Terms—Finite impulse response (FIR), half-unit filter coefficients as input data and applies the MCSE
biased (HUB), canonical signed digit (CSD), common sub- algorithm with HUB [11]–[13] fixed point rounding
expression elimination (CSE), seismic signal. format, to achieve a reduction in the hardware cost.
Moreover, high speed (low CPD) is achieved by
retiming the adder tree. The comparison on hardware
I. I NTRODUCTION complexity of the FIR filter architecture for seismic
LIMINATING noise from the seismic signal is one of application [1]–[3] reveals that the proposed architec-
E the crucial challenges in seismic signal processing [1]–[3]
for seismic-alert-system (SAS) [1]. Digital finite impulse
ture I is superior to the existing [5], [8]–[10], [14] FIR
architectures.
response (FIR) filter is widely used for various seismic sig- iii) For real-time applications, the key requirement is the
nal processing applications due to its linear-phase property dynamically reconfigurable filter coefficients. Hence,
and stability [1]. The drawback of the conventional FIR fil- the MCSE based reconfigurable architecture II has
ter architectures [4] is that it contains a large number of been proposed which outperforms the state-of-the-
multiplication operations, which requires a large computation art [6]–[7], [15] in terms of hardware complexity
time and leads to excessive area and power consumption for and CPD.
The brief has been organized as follows. In Section II, the
Manuscript received February 1, 2021; revised March 14, 2021 and April existing algorithms have been discussed. The novel MCSE
19, 2021; accepted May 4, 2021. Date of publication May 17, 2021; date algorithm has been presented in Section III. The proposed
of current version October 28, 2021. This work was supported by the VLSI architectures of FIR filter are discussed in Section IV.
Visvesvaraya Ph.D. Scheme for Electronics and IT, MeitY, Government The implementation results are given in Section V. Finally, the
of India under Grant MEITY-PHD-2509. This brief was recommended by
Associate Editor Y. Pu. (Corresponding author: Sudipta Bose.) brief is concluded in Section VI.
The authors are with the Department of Electronics and Electrical
Communication Engineering, Indian Institute of Technology Kharagpur,
Kharagpur 721302, India (e-mail: [email protected]; II. P RELIMINARIES
[email protected]; [email protected]).
Color versions of one or more figures in this article are available at
The CSE algorithm can be applied in an architecture where
https://doi.org/10.1109/TCSII.2021.3081257. a set of constant multipliers are used to multiply a common
Digital Object Identifier 10.1109/TCSII.2021.3081257 variable, i.e., the transposed direct form FIR filter [4]. The
1549-7747 
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
3452 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 11, NOVEMBER 2021

Fig. 2. 4th order FIR Filter Coefficients encoded in CSD under HUB.

reducing the word-length of the entire architecture by one bit.


As shown in Fig. 1(c), after each addition operation the output
of 23-bit word-length is transmitted or stored. Whereas, under
the conventional RN rounding format [16], after each addi-
Fig. 1. Architecture of (a)Transposed Direct form FIR filter (b) one tap
CSD encoding under conventional rounding [16] (c) One tap CSD encoding
tion operation 24-bit word-length is transmitted to maintain
under HUB [12], (d) ERNs under conventional and HUB, and (e) Average the same rounding accuracy. Moreover, there is no require-
frequencies of occurrences of CSs in conventional and HUB based 16-bit ment for the addition operation after bit-wise inversion for 2’s
coefficients of filters having different tap lengths. complement in HUB format as no carry is propagated to the
explicit bits. As a result, overall hardware cost can be signifi-
cantly reduced for the CSE algorithm realization under HUB
architecture of a transposed direct form FIR filter is shown result.
in Fig. 1(a), where the common input variable x(n) has been
multiplied with all the four constant coefficients (h0 − h3 ). In
case of CSE algorithm, constant coefficient multiplication for III. P ROPOSED MCSE A LGORITHM
each tap can be performed by employing basic operations, i.e., We have analyzed the frequency of occurrences [17] of the
addition/subtraction and hardwired shift as shown in Fig. 1(b). CSs [101], [101̄],[101], [1001̄] and their negative versions for
1) Encoding Constant Filter Coefficients in CSD: The hard- the 16-bit CSD based filter coefficients of FIR filters having
ware cost of the FIR architecture can be reduced by encoding different tap lengths (20-500). The analysis has been done
constant coefficients in canonical signed digit (CSD) rep- for both 16-bit conventional CSD-coefficients and HUB based
resentation. A signed number can be represented by CSD CSD-coefficients where ILSB of value 1 is appended. For all
using bit values 1, 0, and −1. For example, we consider the FIR designs under analysis, passband ωp and stopband ωs
the 17-bit binary representation of the FIR filter coefficient are normalized and uniformly distributed over the range of
h0 = 0.498 = 0.0111111110000001. As nine nonzero [0, π]. Basically, the passband ωp and stopband ωs have been
bits are there, it requires eight adders to execute constant varied from 0.01π to 0.99π. We have chosen different values
multiplication. However, the 17-bit CSD representation of of the parameter |ωs − ωp | = 0.1π, 0.05π, 0.02π and 0.01π
h0 = 0.10000000 − 10000001, has only three non-zero bits for each different type of filter namely low pass (LP), high
and requires only two adder/subtractor operations to execute pass (HP), band pass (BP), and band stop (BS) to carry out the
multiplication (h0 × x(n)) as shown in Fig. 1(b) and Fig. 1(c). analysis. The coefficients are generated using the MATLAB
2) CSD Based Vertical CSE (VCSE) Algorithm: VCSE algo- FDA tool by selecting the filter specifications. The results of
rithm identifies the multiple presence of the vertical common the aforementioned analysis as depicted in. Fig. 1(e) reveal that
subexpressions (VCSs), i.e., [1 1] and [1 1̄] (1̄ represents − 1), the two primarily effective CSs [101] and [101̄] combined take
and their negative (2’s complement) version ([1̄ 1̄] and [1̄ 1]) nearly 70% for both HUB based CSD-coefficients and conven-
that exists in CSD based filter coefficients. The VCSs [1 1] and tional rounding based CSD-coefficients out of all CSs in terms
[1 1̄] can be defined as x1 = x + x[−1] and x2 = x − x[−1] of average frequencies of occurrences. The CSs [101] and
respectively, where x denotes the input signal and x [−M] [101̄] combined will reduces computational burden as com-
signifies x delayed by M clock cycles. pared to the other CSs in HCSE algorithm. Hence, we use,
3) CSD Based Horizontal CSE (HCSE) Algorithm: HCSE [101] and [101̄] in our illustration. When x denotes the input
algorithm identifies the presence of multiple horizontal data and 2−R the hardwired right shift operation by R bits, the
Common subexpressions (HCSs), i.e., [101], [101̄], [1001], HCSs [101] and [101̄] can be defined as x3 = x + 2−2 x and
[1001̄], and their 2’s complement version that exists in CSD x4 = x − 2−2 x respectively.
based filter coefficients to eliminate the redundant common bit 1) CSD Based Matrix Grouped CSE (MCSE) Algorithm:
patterns with the shifted version. We have proposed novel CSs namely matrix grouped
4) Fixed Point Arithmetic Operation Using HUB: The half- CSs (MCSs)
  by three types of 2 × 3 matrix, i.e.,
 represented
unit-biased (HUB) rounding has been presented in [11]–[13] N 0 N N 0 N N 0 N
to reduce quantization error as well as optimize the hardware , and as type-1, type-2 and
N 0 N 0 0 N N 0 0
cost of digital filters. The HUB [12] representation of any type-3 respectively. Where N represents a non-zero (1 or 1̄) bit.
number is obtained by shifting half of the weight of the Least Where the type-1 MCSs include
Significant Bit (LSB) of the exactly represented
n−1 number (ERN)
               
in conventional format, i.e., X  = j=−m X.2 + 2
j −m−1 as
1 0 1 1 0 1 1 0 1 1 0 1 1 0 1̄ 1 0 1̄ 1 0 1̄ 1 0 1̄
, , , , , , ,
shown in Fig. 1(d), where X denotes a fixed point real num- 1 0 1 1̄ 0 1̄ 1̄ 0 1 1 0 1̄ 1 0 1̄ 1̄ 0 1 1 0 1 1̄ 0 1̄
ber with n integer bits and m fractional bits and the last term
(2−m−1 ) corresponds to the implicit LSB (ILSB) of value denoted as x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 respectively and
one. The implicit LSB (ILSB) is not required to be stored or their negated 2’s complement  version.   Similarly, the type-2

transmitted. It is taken into account only when any arithmetic 1 0 1 1 0 1 1 0 1̄
and type-3 MCSs include , , ,
operation is performed on the HUB number, which helps in 0 0 1 0 0 1̄ 0 0 1

Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
BOSE et al.: AREA-DELAY-POWER EFFICIENT VLSI ARCHITECTURE OF FIR FILTER FOR PROCESSING SEISMIC SIGNAL 3453

   
1 0 1̄ 1 0 1
denoted as x13 , x14 , x15 , x16 and ,
1 0 0
 0 0 1̄     
1 0 1 1 0 1̄ 1 0 1̄
, , denoted as x17 , x18 , x19 ,
1̄ 0 0 1 0 0 1̄ 0 0
x20 their negated 2’s complement version respectively. The
MCSs can be defined as x5 = x3 + x3 [−1], x6 = x3 − x3 [−1],
x7 = x3 − x4 [−1], x8 = x3 + x4 [−1], x9 = x4 + x4 [−1],
x10 = x4 −x4 [−1], x11 = x4 +x3 [−1], x12 = x4 −x3 [−1], x13 =
x3 + 2−2 x[−1], x14 = x3 − 2−2 x[−1], x15 = x4 + 2−2 x[−1],
x16 = x4 − 2−2 x[−1], x17 = x3 + x[−1], x18 = x3 − x[−1],
x19 = x4 + x[−1], x20 = x4 − x[−1]. The type-1 MCSs can
be realized through addition/subtraction operation of present
or delayed HCSs (x3 = x + 2−2 x and x4 = x − 2−2 x). The
type-2 is realized through the addition/subtraction operation
of 2-bit right shifted delayed input signal (x) with HCSs (x3
or x4 ) and type-3 realization is same as type-2 where the right
shift operation is eliminated. Fig. 3. Proposed FIR Filter Architecture I involving Canonical Signed
Using the statistical analysis conducted in this work and the Digit (CSD) based MCSE, HUB rounding and retimed adder tree.
equation to determine the number of LOs (CLO ) given in [17],
we get CLO = 0.2134 × BNZ − 0.8926 × BCS + 3.7402 × Bu ,  
where BNZ is the total number of non-zero bits in the coeffi- 1̄ 0 1̄
= x3 + x3 [−1] are indicated by solid rectangles.
cient set, Bu is the number of unpaired bits that do not form 1 0 1
CSs and BCS is the number of CSs that can be formed from By applying the proposed MCSE algorithm, we obtain the
the non-zero bits. It is evident that CLO is mainly dependent following filter output
on Bu (CLO ∝ Bu ). The proposed MCS combines more num-
bers (≥3) of non-zero bits as compared to the existing CSE y = 2−6 x6 + 2−11 x1 + 2−14 x6 + 2−2 x[−1] + 2−1 x[−2] − 2−9 x[−2]
algorithms [5], [8], [9]–[10]. Moreover, the proposed MCS is + 2−16 x[−2] + 2−2 x[−3] + 2−6 x6 [−3] + 2−11 x1 [−3] + 2−14 x6 [−3] (1)
realized by combining the effective CSs (x3 = [101] and
x4 = [101̄]) with high frequency of occurrences. It is evi- We have realized (1) in hardware as shown in Fig. 3. To reduce
dent that the MCSs with more (≥3) non-zero bits and higher the hardware complexity, controlled delay registers have been
frequency of occurrences would result in smaller Bu , larger used to realize VCS and MCS as shown in Fig. 3. Using
BCS , in turn, reduces CLO to realize the coefficient multiplier no delay (0) the CSs x1 , x6 can be realized, and using two
as compared to the counterparts [5], [8], [9]–[10]. Further, we delays (2D) the reflexive version of the CSs, i.e., x1 , x6 can
have analyzed that the average frequency of occurrences of the be realized for the symmetric one-half of coefficients. The
primary CSs for HUB based CSD-coefficients is ≈30 % more total number of adders required for the realization of con-
than conventional CSD-coefficients as shown in Fig. 1(e), stant coefficients multiplier block as given in [8], is defined
which indicates reduction in CLO . The MCSE algorithm iden- as Ncma = Nc − N/2, where Nc = the total number of the
tifies all possible CSs with highest priority (type-1 MCSs> non-zero bits present in the symmetric half coefficient set, N =
type-2&3 MCSs> HCSs>VCSs) given to MCSs. filter tap length. Hence, it is evident from Fig. 2, Ncma = 11
for the existing [8] algorithm and, two more adders are used
for two 2’s complement conversion, as a result, overall LO
IV. VLSI A RCHITECTURES OF FIR F ILTER FOR requirement is 13. However, the LO (number of adders in
ACQUIRED S EISMIC S IGNAL P REPROCESSING coefficient multiplier block) count for the proposed MCSE and
HUB based approach is 7 as shown in Fig. 3, as a result nearly
We have proposed two architectures (I and II) of FIR filter 46 % reduction in LO is achieved over the counterpart [8]. As
for processing the seismic signal. shown in Fig. 3, registers have been inserted in the feed for-
ward path by applying the retiming technique on the adder tree
A. Proposed Architecture I: High-Speed and Low-Complexity structure to reduce the CPD. The throughput is fp = 2/TADD ,
where TADD is the computation delay of an adder.
The proposed architecture I of FIR for pre-processing seis-
mic signal employs the MCSE algorithm, retimed adder tree,
and half-unit biased (HUB) [11]–[13] rounding. B. Proposed Architecture II: Reconfigurable
1) Illustrative Example: The filter coefficients of a fourth- In this brief, the proposed architecture II of the FIR filter
order low pass digital FIR filter with a cut-off frequency 15 Hz is reconfigurable. The MCSE algorithm (3-bit VCS and 3-bit
have been generated using the MATLAB FDA tool. The values HCS has been applied to the CSD based 16-bit coefficients
under the HUB format. The 16-bit CSD based coefficients
of the coefficients n are h0 /h4 = 0.02010370826, h1 /h3 =
h[15:0] presented in HUB format contains sign and magni-
0.23086668180 and h2 = 0.49805921985. The corresponding tude parts s[15:0] and m[15:0], which have been stored in
16-bit CSD representation of the FIR filter coefficients in HUB LUTs. The least significant 15-bits of the coefficient are par-
format is depicted in Fig. 2. It can be observed that the HCSs titioned into groups of 3-bit, i.e., P1 = (s[14:12], m[14:12]),
x3 = [101] = x + 2−2 x, x3 = [1̄ 0 1̄] = −(x + 2−2 x) = −x3 P2 = (s[11:9], m[11:9]), P3 = (s[8:6], m[8:6]), P4 = (s[5:3],
and VCS x1 = [1 1] = x + x[−1] areindicated using dashed m[5:3]) and P5 = (s[2:0], m[2:0]). The architecture of the
1 0 1 proposed constant multiplier block of the FIR filter is shown in
rectangle. The MCSs x6 = = x3 + x3 [−1], x6 =
1̄ 0 1̄ Fig. 4(a), which consists of four units: (i) Partial product (PP)

Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
3454 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 11, NOVEMBER 2021

Fig. 4. Proposed Reconfigurable FIR Filter Architecture II (a) Architecture of the proposed reconfigurable constant multiplier block (b) Internal architecture
of partial product generator unit and (c) Internal architecture of control signal generator unit.

TABLE I
generation unit to generate partial products by employing the VLSI I MPLEMENTATION C OMPARISON FOR P ROPOSED A RCHITECTURE -I
shift and add technique. The internal architecture of the PP
generation (PPG) unit is shown in Fig. 4(b). The proposed
MCSE utilizes ten possible 3-bit CSD-based CSs, i.e., “001”,
“010”, “100”, “101”, “101̄” and their negative versions, which
produces partial PPs (x1 − x10 ). To generate the PPs only
two adders are required for “101” and “101̄” respectively.
Other PPs can be generated by hardwired shifting. Due to
the involvement of the HUB format, the corresponding nega-
tive partial products (x2 , x4 , x6 , x8 , and x10 ) are generated by
employing only bit- wise inversion (1’s complement) operation
instead of 2’s complement. (ii) The control signal genera-
tor (CSG) unit identifies the similarity among the groups
P1-P5 by comparing the sign and magnitude parts of each
group with other groups.
As we append ILSB of value one and create the 16-bit CSD
coefficient under HUB format. While comparing the magni-
tude of the last three bits P5 with upper 3-bit magnitudes
P1-P4 instead of 3-bit comparator we require only a 1-bit
comparator as shown in Fig. 4(c). (iii) Multiplexers (MUX2-
MUX6) layer unit is used to select the PP corresponding to
the 3-bit VCS as shown in Fig. 4(a). (iv) A controlled adder
layer is used to execute the controlled additions of multiplexed
(MUX7-MUX16) PPs according to the MCSE. The retimed
adder tree structure reduces CPD.

V. I MPLEMENTATION R ESULTS AND C OMPARISON


The behavioral model of the filters under comparison have with a frequency of 50 MHz. The proposed non-reconfigurable
been coded in Verilog HDL for the same filter coefficients, architecture I (AR-IHUB ) has been compared with the existing
quantization, and number of pipelined stages for each bench- non-reconfigurable architectures [5], [8]–[10], [14] as shown
mark to make fair comparison. We have optimized the design in Table I. Whereas, reconfigurable architecture II has been
of all benchmark filters by executing the synthesis using the compared with existing reconfigurable FIR [6]–[7], [15] as
Xilinx VIVADO design suite 2016.1 with the resource shar- shown in Table II. For the HUB rounding based proposed
ing and retiming options enabled. We have used the Xilinx architectures, the word-length of the input is of 19-bit (10-bit
ZYNQ-XC7Z020-1CLG84C FPGA device as the target hard- integer and 9-bit fraction) and the coefficients are of 16-bit
ware platform and the results have been obtained from the post (1-bit integer and 15-bit fraction). The HUB format helps
and route report. ASIC implementation has been done using in reducing the word-length of the entire architecture by
Synopsys Design Compiler and UMC 65nm CMOS standard one bit while maintaining same rounding accuracy as the
cell technology library. To perform the power analysis we conventional [16]. To validate the superiority of the proposed
have used input data set with switching probability of 0.5 and architecture I in seismic signal processing, three benchmark
the toggle rate is 0.25. The power dissipation has been esti- FIR filters (A, B and, C) related to the seismic application as
mated by using Synopsys PrimeTime (containing PrimePower) specified in [1], [2], and [3] respectively are considered. A is

Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
BOSE et al.: AREA-DELAY-POWER EFFICIENT VLSI ARCHITECTURE OF FIR FILTER FOR PROCESSING SEISMIC SIGNAL 3455

TABLE II
VLSI I MPLEMENTATION C OMPARISON FOR P ROPOSED A RCHITECTURE -II 47% and 34% over [6], [15] and [7], respectively. Moreover,
Table III reveals that the proposed AR-I yield more than
38% reduction in effective latency over the existing [5]–[15]
architectures.

VI. C ONCLUSION
By observing the implementation results, it can be con-
cluded that the involvement of HUB format, CSD-based
MCSE, and retiming method fulfill the objective of a sub-
stantial reduction in the hardware complexity and CPD as
compared with the state-of-the-art [5]–[15] and support the
basic requirement of a real-time application, e.g., SAS.

R EFERENCES
[1] S. Kumar, R. Vig, and P. Kapur, “Development of earthquake event
TABLE III detection technique based on STA/LTA algorithm for seismic alert
C OMPARISON OF L ATENCY OF 20-TAP FIR F ILTER D ESIGN system,” J. Geol. Soc. India, vol. 92, pp. 679–686, Dec. 2018.
[2] S. Chikhalikar, O. Khandekar, and C. Bhattacharya, “Design of real-time
acquisition and filtering for MEMS-based accelerometer data in micro-
controller,” in Proc. IEEE Electron Devices Kolkata Conf., Kolkata,
India, Nov. 2018, pp. 15–18.
[3] D. Xu and J. Chiu, “Design of a high-order FIR digital filtering and
variable gain ranging seismic data acquisition system,” in Proc. IEEE
Southeastcon, Charlotte, NC, USA, Apr. 1993, p. 6.
[4] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation. New York, NY, USA: Wiley, 1999.
a 5-tap low-pass filter with a cut off frequency 15 Hz [1]. [5] I. Sharma, A. Kumar, L. Balyan, and G. K. Singh, “A new hybrid CSE
B is the low-pass filter of order 21 from [2]. C is a low- technique for multiplier-less FIR filter,” in Proc. IEEE Int. Conf. Digit.
pass filter with filter length 140 from [3]. To implement the Signal Process. (DSP), London, U.K., 2017, pp. 1–5.
[6] I. Hatai, I. Chakrabarti, and S. Banerjee, “An efficient constant multiplier
ERN [16] based FIR (AR-ICON ) same coefficient, quantiza- architecture based on vertical-horizontal binary common sub-expression
tion, and same number of pipeline stages as that of the elimination algorithm for reconfigurable FIR filter synthesis,” IEEE
proposed AR-IHUB has been used. It can be observed from Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 4, pp. 1071–1080,
Table I, for the three benchmark filters, AR-IHUB outperforms Apr. 2015.
the MCSE based architecture (AR-ICON ) under the conven- [7] I. Hatai, I. Chakrabarti, and S. Banerjee, “A computationally effi-
cient reconfigurable constant multiplication architecture based on CSD
tional RN rounding format [16] in terms of reduction in ADP decoded vertical–horizontal common sub-expression elimination algo-
(40%), CPD(44%) and, PDP(25%) due to the involvement rithm,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 1,
of the hardware efficient HUB rounding. The involvement of pp. 130–140, Jan. 2018.
the proposed MSE algorithm in architecture (AR-IHUB ) helps [8] A. P. Vinod, E. Lai, D. L. Maskell, and P. K. Meher, “An improved
common subexpression elimination method for reducing logic operators
in achieving an average 35%, 30%, 25% and, 40% reduc- in fir filter implementations without increasing logic depth,” Integration,
tion in LO over hybrid-CSE [5], improved CSE method [8], vol. 43, no. 1, pp. 124–135, 2010.
TCSE [9], multiple adder graphs method [10] respectively. [9] S. Roy and A. Chandra, “A triangular common subexpression elimina-
Due to the significant reduction in LO, the area and power tion algorithm with reduced logic operators in FIR filter,” IEEE Trans.
consumption also reduces for AR-IHUB . The average reduc- Circuits Syst. II, Exp. Briefs, vol. 67, no. 12, pp. 3527–3531, Dec. 2020.
[10] J.-H. Han and I.-C. Park, “Fir filter synthesis considering multiple adder
tion in CPD, ADP, and PDP for the AR-IHUB are 61%, graphs for a coefficient,” IEEE Trans. Comput.-Aided Design Integr.
36% and 31% respectively over [5], [8], [9]–[10] and 49%, Circuits Syst., vol. 27, no. 5, pp. 958–962, May 2008.
56% and 60% respectively over the faithfully rounded Booth [11] J. Hormigo and J. Villalba, “Optimizing DSP circuits by a new family
encoding multipliers [14]. Whereas, the FPGA implementa- of arithmetic operators,” in Proc. Asilomar Conf. Signals Syst. Comput.,
Pacific Grove, CA, USA, Nov. 2014, pp. 871–875.
tion reveals a reduction in LUT (≈22%) and slice (≈35%) [12] J. Hormigo and J. Villalba, “New formats for computing with real-
requirement for AR-IHUB over the counterpart [14]. It can numbers under round-to-nearest,” IEEE Trans. Comput., vol. 65, no. 7,
be observed from Table II that the average reduction in pp. 2158–2168, Jul. 2016.
CPD, ADP, and PDP of the proposed reconfigurable AR-II [13] S. Bose, A. De, and I. Chakrabarti, “Framework for automated earth-
(20-tap 19 × 16) are 39%, 36% and, 2% respectively over quake event detection based on denoising by adaptive filter,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 3070–3083, Sep. 2020.
configurable Booth (CBM) [15], and 48%, 12%, and 13% [14] S.-F. Hsiao, J.-H. Z. Jian, and M.-C. Chen, “Low-cost FIR filter designs
respectively over CSD-VHCSE [7]. The CBM [15] requires based on faithfully rounded truncated multiple constant multiplica-
more area due to the involvement of the relatively com- tion/accumulation,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,
plex additional circuits to achieve configuration. The HUB no. 5, pp. 287–291, May 2013.
format helps in reducing the hardware complexity of PPG [15] S.-R. Kuang and J.-P. Wang, “Design of power-efficient configurable
booth multiplier,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,
and CSG unit for the AR-II as compared to the counter- no. 3, pp. 568–580, Mar. 2010.
part [6]–[7], as a result area requirement reduces. Moreover, [16] P. Kornerup, J.-M. Muller, and A. Panhaleux, “Performing arithmetic
the AR-I and AR-II employ retimed binary-tree- structured operations on round-to-nearest representations,” IEEE Trans. Comput.,
adder which ensures lowest LD and least number of transition vol. 60, no. 2, pp. 282–291, Feb. 2011.
[17] R. Mahesh and A. P. Vinod, “A new common subexpression elimina-
in turn reduces power dissipation and CPD as compared to tion algorithm for realizing low-complexity higher order digital filters,”
the counterparts [5]–[15]. FPGA implementation of the 20-tap IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 2,
AR-II reveals a reduction in slice delay product (SDP) by 38%, pp. 217–229, Feb. 2008.

Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.

You might also like