Area-Delay-Power Efficient VLSI Architecture of FIR Filter For Processing Seismic Signal
Area-Delay-Power Efficient VLSI Architecture of FIR Filter For Processing Seismic Signal
Abstract—Low-complexity, high-speed and re-configurability real-time applications. Therefore, research on the area-power-
are the primary requirements of the finite impulse response (FIR) delay optimized FIR filter architecture by CSE has gained
filters employed for the processing of acquired seismic signal importance in the recent research works [5]–[10]. In [5],
in real-time seismic-alert-system. The common sub-expression a hybrid CSE technique has been employed in order to imple-
elimination (CSE) technique is employed widely to reduce the ment an efficient multiplier-less FIR filter. A reconfigurable
hardware complexity by minimizing the logic operators (LOs)
and logic depths (LDs) in digital FIR filter. In this brief, a novel
FIR filter architecture has been proposed in [7] by employ-
matrix grouped CSE (MCSE) algorithm has been proposed which ing a CSD based VHCSE algorithm, which outperforms the
outperforms the existing CSE algorithms in terms of LOs and counterpart [6] in terms of reduction in hardware cost. In the
LDs minimization. Moreover, a new half-unit biased (HUB) based case of the SAS, it is essential to implement a low-power,
rounding technique is incorporated in the proposed design to high-speed architecture of the low pass FIR filter so that
reduce the truncation error while maintaining low-complexity the processing time of the SAS can be reduced to provide
and the cut-set retiming technique is employed to reduce the a warning more advanced in time while consuming low power
critical-path-delay (CPD). Two hardware efficient FIR filter in the real-time sensor-based system. The motivation of this
architectures (I and II) involving the proposed canonical signed brief lies in the design of area-delay-power efficient FIR fil-
digit (CSD) based MCSE algorithm, HUB rounding and cut-set ter architecture for pre-processing seismic signal with high
retiming approach have been presented. Further, the architecture
precision (reduced truncation error) in real-time. The primary
II represents a hardware efficient realization of a reconfigurable
FIR filter. The hardware implementation of the architectures contributions of the work presented in this brief are as follows.
is performed on both FPGA and ASIC platforms. The hard- i) A novel matrix grouped CSE (MCSE) algorithm has
ware implementation of the proposed architecture I yields more been proposed which reduces the computational burden
than 25%, 49%, 38%, 36% and, 31% reduction in LOs, CPD, and achieves the best possible reduction in LOs and
effective latency, area-delay-product (ADP) and, power-delay- LDs while comparing with the existing CSE algorithms
product (PDP), respectively, over the state-of-the-art CSE based in literature [5]–[10].
architectures. Whereas the reconfigurable architecture II exhibits ii) A low-power and high-speed FIR filter (architecture
nearly 48%, 12% and, 13% reduction in CPD, ADP and, PDP I) has been implemented for seismic signal pre-
over the counterpart. processing. The proposed architecture uses CSD coded
Index Terms—Finite impulse response (FIR), half-unit filter coefficients as input data and applies the MCSE
biased (HUB), canonical signed digit (CSD), common sub- algorithm with HUB [11]–[13] fixed point rounding
expression elimination (CSE), seismic signal. format, to achieve a reduction in the hardware cost.
Moreover, high speed (low CPD) is achieved by
retiming the adder tree. The comparison on hardware
I. I NTRODUCTION complexity of the FIR filter architecture for seismic
LIMINATING noise from the seismic signal is one of application [1]–[3] reveals that the proposed architec-
E the crucial challenges in seismic signal processing [1]–[3]
for seismic-alert-system (SAS) [1]. Digital finite impulse
ture I is superior to the existing [5], [8]–[10], [14] FIR
architectures.
response (FIR) filter is widely used for various seismic sig- iii) For real-time applications, the key requirement is the
nal processing applications due to its linear-phase property dynamically reconfigurable filter coefficients. Hence,
and stability [1]. The drawback of the conventional FIR fil- the MCSE based reconfigurable architecture II has
ter architectures [4] is that it contains a large number of been proposed which outperforms the state-of-the-
multiplication operations, which requires a large computation art [6]–[7], [15] in terms of hardware complexity
time and leads to excessive area and power consumption for and CPD.
The brief has been organized as follows. In Section II, the
Manuscript received February 1, 2021; revised March 14, 2021 and April existing algorithms have been discussed. The novel MCSE
19, 2021; accepted May 4, 2021. Date of publication May 17, 2021; date algorithm has been presented in Section III. The proposed
of current version October 28, 2021. This work was supported by the VLSI architectures of FIR filter are discussed in Section IV.
Visvesvaraya Ph.D. Scheme for Electronics and IT, MeitY, Government The implementation results are given in Section V. Finally, the
of India under Grant MEITY-PHD-2509. This brief was recommended by
Associate Editor Y. Pu. (Corresponding author: Sudipta Bose.) brief is concluded in Section VI.
The authors are with the Department of Electronics and Electrical
Communication Engineering, Indian Institute of Technology Kharagpur,
Kharagpur 721302, India (e-mail: [email protected]; II. P RELIMINARIES
[email protected]; [email protected]).
Color versions of one or more figures in this article are available at
The CSE algorithm can be applied in an architecture where
https://doi.org/10.1109/TCSII.2021.3081257. a set of constant multipliers are used to multiply a common
Digital Object Identifier 10.1109/TCSII.2021.3081257 variable, i.e., the transposed direct form FIR filter [4]. The
1549-7747
c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
3452 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 11, NOVEMBER 2021
Fig. 2. 4th order FIR Filter Coefficients encoded in CSD under HUB.
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
BOSE et al.: AREA-DELAY-POWER EFFICIENT VLSI ARCHITECTURE OF FIR FILTER FOR PROCESSING SEISMIC SIGNAL 3453
1 0 1̄ 1 0 1
denoted as x13 , x14 , x15 , x16 and ,
1 0 0
0 0 1̄
1 0 1 1 0 1̄ 1 0 1̄
, , denoted as x17 , x18 , x19 ,
1̄ 0 0 1 0 0 1̄ 0 0
x20 their negated 2’s complement version respectively. The
MCSs can be defined as x5 = x3 + x3 [−1], x6 = x3 − x3 [−1],
x7 = x3 − x4 [−1], x8 = x3 + x4 [−1], x9 = x4 + x4 [−1],
x10 = x4 −x4 [−1], x11 = x4 +x3 [−1], x12 = x4 −x3 [−1], x13 =
x3 + 2−2 x[−1], x14 = x3 − 2−2 x[−1], x15 = x4 + 2−2 x[−1],
x16 = x4 − 2−2 x[−1], x17 = x3 + x[−1], x18 = x3 − x[−1],
x19 = x4 + x[−1], x20 = x4 − x[−1]. The type-1 MCSs can
be realized through addition/subtraction operation of present
or delayed HCSs (x3 = x + 2−2 x and x4 = x − 2−2 x). The
type-2 is realized through the addition/subtraction operation
of 2-bit right shifted delayed input signal (x) with HCSs (x3
or x4 ) and type-3 realization is same as type-2 where the right
shift operation is eliminated. Fig. 3. Proposed FIR Filter Architecture I involving Canonical Signed
Using the statistical analysis conducted in this work and the Digit (CSD) based MCSE, HUB rounding and retimed adder tree.
equation to determine the number of LOs (CLO ) given in [17],
we get CLO = 0.2134 × BNZ − 0.8926 × BCS + 3.7402 × Bu ,
where BNZ is the total number of non-zero bits in the coeffi- 1̄ 0 1̄
= x3 + x3 [−1] are indicated by solid rectangles.
cient set, Bu is the number of unpaired bits that do not form 1 0 1
CSs and BCS is the number of CSs that can be formed from By applying the proposed MCSE algorithm, we obtain the
the non-zero bits. It is evident that CLO is mainly dependent following filter output
on Bu (CLO ∝ Bu ). The proposed MCS combines more num-
bers (≥3) of non-zero bits as compared to the existing CSE y = 2−6 x6 + 2−11 x1 + 2−14 x6 + 2−2 x[−1] + 2−1 x[−2] − 2−9 x[−2]
algorithms [5], [8], [9]–[10]. Moreover, the proposed MCS is + 2−16 x[−2] + 2−2 x[−3] + 2−6 x6 [−3] + 2−11 x1 [−3] + 2−14 x6 [−3] (1)
realized by combining the effective CSs (x3 = [101] and
x4 = [101̄]) with high frequency of occurrences. It is evi- We have realized (1) in hardware as shown in Fig. 3. To reduce
dent that the MCSs with more (≥3) non-zero bits and higher the hardware complexity, controlled delay registers have been
frequency of occurrences would result in smaller Bu , larger used to realize VCS and MCS as shown in Fig. 3. Using
BCS , in turn, reduces CLO to realize the coefficient multiplier no delay (0) the CSs x1 , x6 can be realized, and using two
as compared to the counterparts [5], [8], [9]–[10]. Further, we delays (2D) the reflexive version of the CSs, i.e., x1 , x6 can
have analyzed that the average frequency of occurrences of the be realized for the symmetric one-half of coefficients. The
primary CSs for HUB based CSD-coefficients is ≈30 % more total number of adders required for the realization of con-
than conventional CSD-coefficients as shown in Fig. 1(e), stant coefficients multiplier block as given in [8], is defined
which indicates reduction in CLO . The MCSE algorithm iden- as Ncma = Nc − N/2, where Nc = the total number of the
tifies all possible CSs with highest priority (type-1 MCSs> non-zero bits present in the symmetric half coefficient set, N =
type-2&3 MCSs> HCSs>VCSs) given to MCSs. filter tap length. Hence, it is evident from Fig. 2, Ncma = 11
for the existing [8] algorithm and, two more adders are used
for two 2’s complement conversion, as a result, overall LO
IV. VLSI A RCHITECTURES OF FIR F ILTER FOR requirement is 13. However, the LO (number of adders in
ACQUIRED S EISMIC S IGNAL P REPROCESSING coefficient multiplier block) count for the proposed MCSE and
HUB based approach is 7 as shown in Fig. 3, as a result nearly
We have proposed two architectures (I and II) of FIR filter 46 % reduction in LO is achieved over the counterpart [8]. As
for processing the seismic signal. shown in Fig. 3, registers have been inserted in the feed for-
ward path by applying the retiming technique on the adder tree
A. Proposed Architecture I: High-Speed and Low-Complexity structure to reduce the CPD. The throughput is fp = 2/TADD ,
where TADD is the computation delay of an adder.
The proposed architecture I of FIR for pre-processing seis-
mic signal employs the MCSE algorithm, retimed adder tree,
and half-unit biased (HUB) [11]–[13] rounding. B. Proposed Architecture II: Reconfigurable
1) Illustrative Example: The filter coefficients of a fourth- In this brief, the proposed architecture II of the FIR filter
order low pass digital FIR filter with a cut-off frequency 15 Hz is reconfigurable. The MCSE algorithm (3-bit VCS and 3-bit
have been generated using the MATLAB FDA tool. The values HCS has been applied to the CSD based 16-bit coefficients
under the HUB format. The 16-bit CSD based coefficients
of the coefficients n are h0 /h4 = 0.02010370826, h1 /h3 =
h[15:0] presented in HUB format contains sign and magni-
0.23086668180 and h2 = 0.49805921985. The corresponding tude parts s[15:0] and m[15:0], which have been stored in
16-bit CSD representation of the FIR filter coefficients in HUB LUTs. The least significant 15-bits of the coefficient are par-
format is depicted in Fig. 2. It can be observed that the HCSs titioned into groups of 3-bit, i.e., P1 = (s[14:12], m[14:12]),
x3 = [101] = x + 2−2 x, x3 = [1̄ 0 1̄] = −(x + 2−2 x) = −x3 P2 = (s[11:9], m[11:9]), P3 = (s[8:6], m[8:6]), P4 = (s[5:3],
and VCS x1 = [1 1] = x + x[−1] areindicated using dashed m[5:3]) and P5 = (s[2:0], m[2:0]). The architecture of the
1 0 1 proposed constant multiplier block of the FIR filter is shown in
rectangle. The MCSs x6 = = x3 + x3 [−1], x6 =
1̄ 0 1̄ Fig. 4(a), which consists of four units: (i) Partial product (PP)
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
3454 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 68, NO. 11, NOVEMBER 2021
Fig. 4. Proposed Reconfigurable FIR Filter Architecture II (a) Architecture of the proposed reconfigurable constant multiplier block (b) Internal architecture
of partial product generator unit and (c) Internal architecture of control signal generator unit.
TABLE I
generation unit to generate partial products by employing the VLSI I MPLEMENTATION C OMPARISON FOR P ROPOSED A RCHITECTURE -I
shift and add technique. The internal architecture of the PP
generation (PPG) unit is shown in Fig. 4(b). The proposed
MCSE utilizes ten possible 3-bit CSD-based CSs, i.e., “001”,
“010”, “100”, “101”, “101̄” and their negative versions, which
produces partial PPs (x1 − x10 ). To generate the PPs only
two adders are required for “101” and “101̄” respectively.
Other PPs can be generated by hardwired shifting. Due to
the involvement of the HUB format, the corresponding nega-
tive partial products (x2 , x4 , x6 , x8 , and x10 ) are generated by
employing only bit- wise inversion (1’s complement) operation
instead of 2’s complement. (ii) The control signal genera-
tor (CSG) unit identifies the similarity among the groups
P1-P5 by comparing the sign and magnitude parts of each
group with other groups.
As we append ILSB of value one and create the 16-bit CSD
coefficient under HUB format. While comparing the magni-
tude of the last three bits P5 with upper 3-bit magnitudes
P1-P4 instead of 3-bit comparator we require only a 1-bit
comparator as shown in Fig. 4(c). (iii) Multiplexers (MUX2-
MUX6) layer unit is used to select the PP corresponding to
the 3-bit VCS as shown in Fig. 4(a). (iv) A controlled adder
layer is used to execute the controlled additions of multiplexed
(MUX7-MUX16) PPs according to the MCSE. The retimed
adder tree structure reduces CPD.
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.
BOSE et al.: AREA-DELAY-POWER EFFICIENT VLSI ARCHITECTURE OF FIR FILTER FOR PROCESSING SEISMIC SIGNAL 3455
TABLE II
VLSI I MPLEMENTATION C OMPARISON FOR P ROPOSED A RCHITECTURE -II 47% and 34% over [6], [15] and [7], respectively. Moreover,
Table III reveals that the proposed AR-I yield more than
38% reduction in effective latency over the existing [5]–[15]
architectures.
VI. C ONCLUSION
By observing the implementation results, it can be con-
cluded that the involvement of HUB format, CSD-based
MCSE, and retiming method fulfill the objective of a sub-
stantial reduction in the hardware complexity and CPD as
compared with the state-of-the-art [5]–[15] and support the
basic requirement of a real-time application, e.g., SAS.
R EFERENCES
[1] S. Kumar, R. Vig, and P. Kapur, “Development of earthquake event
TABLE III detection technique based on STA/LTA algorithm for seismic alert
C OMPARISON OF L ATENCY OF 20-TAP FIR F ILTER D ESIGN system,” J. Geol. Soc. India, vol. 92, pp. 679–686, Dec. 2018.
[2] S. Chikhalikar, O. Khandekar, and C. Bhattacharya, “Design of real-time
acquisition and filtering for MEMS-based accelerometer data in micro-
controller,” in Proc. IEEE Electron Devices Kolkata Conf., Kolkata,
India, Nov. 2018, pp. 15–18.
[3] D. Xu and J. Chiu, “Design of a high-order FIR digital filtering and
variable gain ranging seismic data acquisition system,” in Proc. IEEE
Southeastcon, Charlotte, NC, USA, Apr. 1993, p. 6.
[4] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation. New York, NY, USA: Wiley, 1999.
a 5-tap low-pass filter with a cut off frequency 15 Hz [1]. [5] I. Sharma, A. Kumar, L. Balyan, and G. K. Singh, “A new hybrid CSE
B is the low-pass filter of order 21 from [2]. C is a low- technique for multiplier-less FIR filter,” in Proc. IEEE Int. Conf. Digit.
pass filter with filter length 140 from [3]. To implement the Signal Process. (DSP), London, U.K., 2017, pp. 1–5.
[6] I. Hatai, I. Chakrabarti, and S. Banerjee, “An efficient constant multiplier
ERN [16] based FIR (AR-ICON ) same coefficient, quantiza- architecture based on vertical-horizontal binary common sub-expression
tion, and same number of pipeline stages as that of the elimination algorithm for reconfigurable FIR filter synthesis,” IEEE
proposed AR-IHUB has been used. It can be observed from Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 4, pp. 1071–1080,
Table I, for the three benchmark filters, AR-IHUB outperforms Apr. 2015.
the MCSE based architecture (AR-ICON ) under the conven- [7] I. Hatai, I. Chakrabarti, and S. Banerjee, “A computationally effi-
cient reconfigurable constant multiplication architecture based on CSD
tional RN rounding format [16] in terms of reduction in ADP decoded vertical–horizontal common sub-expression elimination algo-
(40%), CPD(44%) and, PDP(25%) due to the involvement rithm,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 1,
of the hardware efficient HUB rounding. The involvement of pp. 130–140, Jan. 2018.
the proposed MSE algorithm in architecture (AR-IHUB ) helps [8] A. P. Vinod, E. Lai, D. L. Maskell, and P. K. Meher, “An improved
common subexpression elimination method for reducing logic operators
in achieving an average 35%, 30%, 25% and, 40% reduc- in fir filter implementations without increasing logic depth,” Integration,
tion in LO over hybrid-CSE [5], improved CSE method [8], vol. 43, no. 1, pp. 124–135, 2010.
TCSE [9], multiple adder graphs method [10] respectively. [9] S. Roy and A. Chandra, “A triangular common subexpression elimina-
Due to the significant reduction in LO, the area and power tion algorithm with reduced logic operators in FIR filter,” IEEE Trans.
consumption also reduces for AR-IHUB . The average reduc- Circuits Syst. II, Exp. Briefs, vol. 67, no. 12, pp. 3527–3531, Dec. 2020.
[10] J.-H. Han and I.-C. Park, “Fir filter synthesis considering multiple adder
tion in CPD, ADP, and PDP for the AR-IHUB are 61%, graphs for a coefficient,” IEEE Trans. Comput.-Aided Design Integr.
36% and 31% respectively over [5], [8], [9]–[10] and 49%, Circuits Syst., vol. 27, no. 5, pp. 958–962, May 2008.
56% and 60% respectively over the faithfully rounded Booth [11] J. Hormigo and J. Villalba, “Optimizing DSP circuits by a new family
encoding multipliers [14]. Whereas, the FPGA implementa- of arithmetic operators,” in Proc. Asilomar Conf. Signals Syst. Comput.,
Pacific Grove, CA, USA, Nov. 2014, pp. 871–875.
tion reveals a reduction in LUT (≈22%) and slice (≈35%) [12] J. Hormigo and J. Villalba, “New formats for computing with real-
requirement for AR-IHUB over the counterpart [14]. It can numbers under round-to-nearest,” IEEE Trans. Comput., vol. 65, no. 7,
be observed from Table II that the average reduction in pp. 2158–2168, Jul. 2016.
CPD, ADP, and PDP of the proposed reconfigurable AR-II [13] S. Bose, A. De, and I. Chakrabarti, “Framework for automated earth-
(20-tap 19 × 16) are 39%, 36% and, 2% respectively over quake event detection based on denoising by adaptive filter,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 67, no. 9, pp. 3070–3083, Sep. 2020.
configurable Booth (CBM) [15], and 48%, 12%, and 13% [14] S.-F. Hsiao, J.-H. Z. Jian, and M.-C. Chen, “Low-cost FIR filter designs
respectively over CSD-VHCSE [7]. The CBM [15] requires based on faithfully rounded truncated multiple constant multiplica-
more area due to the involvement of the relatively com- tion/accumulation,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 60,
plex additional circuits to achieve configuration. The HUB no. 5, pp. 287–291, May 2013.
format helps in reducing the hardware complexity of PPG [15] S.-R. Kuang and J.-P. Wang, “Design of power-efficient configurable
booth multiplier,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,
and CSG unit for the AR-II as compared to the counter- no. 3, pp. 568–580, Mar. 2010.
part [6]–[7], as a result area requirement reduces. Moreover, [16] P. Kornerup, J.-M. Muller, and A. Panhaleux, “Performing arithmetic
the AR-I and AR-II employ retimed binary-tree- structured operations on round-to-nearest representations,” IEEE Trans. Comput.,
adder which ensures lowest LD and least number of transition vol. 60, no. 2, pp. 282–291, Feb. 2011.
[17] R. Mahesh and A. P. Vinod, “A new common subexpression elimina-
in turn reduces power dissipation and CPD as compared to tion algorithm for realizing low-complexity higher order digital filters,”
the counterparts [5]–[15]. FPGA implementation of the 20-tap IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 2,
AR-II reveals a reduction in slice delay product (SDP) by 38%, pp. 217–229, Feb. 2008.
Authorized licensed use limited to: VIT University. Downloaded on November 17,2024 at 06:41:27 UTC from IEEE Xplore. Restrictions apply.