0% found this document useful (0 votes)
11 views

Root Power2019

This paper proposes a new low complexity architecture for computing Nth roots and powers. It uses a binary logarithm-binary inverse logarithm relation approach, rather than the typical CORDIC-based natural logarithm-exponential relation approach. The proposed architecture was modeled in VHDL, synthesized using a 45nm process, and shown to reduce area, power and latency compared to state-of-the-art CORDIC-based approaches. It also avoids limitations of CORDIC related to input range and complexity scaling with the negative index boundary.

Uploaded by

Ankur Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Root Power2019

This paper proposes a new low complexity architecture for computing Nth roots and powers. It uses a binary logarithm-binary inverse logarithm relation approach, rather than the typical CORDIC-based natural logarithm-exponential relation approach. The proposed architecture was modeled in VHDL, synthesized using a 45nm process, and shown to reduce area, power and latency compared to state-of-the-art CORDIC-based approaches. It also avoids limitations of CORDIC related to input range and complexity scaling with the negative index boundary.

Uploaded by

Ankur Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS 1

Lowcomplexity Generic VLSI Architecture Design


Methodology for N th Root and
N th Power Computations
Suresh Mopuri and Amit Acharyya , Member, IEEE

Abstract— In this paper, we propose a low complexity archi- digit-recurrence algorithm is presented in [7] whose hardware
tecture design methodology for fixed point root and power com- complexity like NR approach, also depends on N. In [8],
putations. The state of the art approaches perform the root and a top-level approach has been presented based on the binary
power computations based on the natural logarithm-exponential 1 log2 (R)
relation using Hyperbolic COordinate Rotation DIgital Computer logarithm-binary inverse logarithm relation i.e, R N = 2 N .
(CORDIC). In this paper, any root and power computations But this approach [8] did not present the implementation
have been performed using binary logarithm-binary inverse details of the binary logarithm, division and binary inverse
logarithm relation. The designs are modeled using VHDL for logarithm. Another approach was presented in [9] based on the
fixed point numbers and synthesized under the T SM C40-nm 1
CMOS technology @ 1 GHz frequency. The synthesis results natural logarithm-exponential relation i.e, R N = ex p( ln(R) N )
shows that the proposed Nth root computation saves 19.38% where the natural logarithm, division and exponential com-
on chip area and 15.86% power consumption when compared putations are performed using CORDIC. On the other hand,
with the state of the art architecture for root computation the powers are computed using multipliers [10]–[13], in which
without compromising the computational accuracy. Similarly,
the proposed Nth power computation saves 38% on chip area,
the square and cube operations were computed using reduced
35.67% power consumption when compared with the state of the partial product arrays and ancient Indian Vedic mathematics.
art power computation with out loss in accuracy. The proposed However, these approaches [10]–[13] are not generic for the
root and power computation designs save 8 clock cycle latency N t h power computation. Such a generic approach for N t h
when compared with the state of the art implementations. power computation is proposed in [14] based on the natural
Index Terms— CORDIC, logarithm, exponential, VLSI archi- logarithm-exponential relation i.e, R N = ex p(ln(R) × N)
tecture, root computation, power computation, hyperbolic where the natural logarithm and exponential computations are
CORDIC. performed using CORDIC.
It is well known that the CORDIC performs several tasks
I. I NTRODUCTION such as trigonometric, hyperbolic and logarithmic functions,
real and complex multiplications, division and square-root
R OOT and power computations have been used in dif-
ferent areas such as atmospheric models, digital image
synthesis, 3-D graphics and many VLSI signal processing
using shift add operations [15]–[21]. However, the conver-
gence and precision of the CORDIC depend on its negative
applications [1]–[3]. However, the design and implementation index (m) and positive (n) boundaries respectively [15], [21]
of low complexity as well as highly accurate VLSI architecture (elaborated in section II, please see equation (11), (12)
of such N t h root and N t h power computation is a challenging and (13a)). The CORDIC convergence boundary (m) poses
task for real time resource constrained platform. the following limitations on the state of the art N t h root and
There are various approaches available for root compu- N t h power computations [9], [14].
tation. The well known method is Newton-Raphson (NR) • The CORDIC negative index boundary (m) limits the
method requiring an initial guess which may result different input range of R and N. As m value increases the input
precision in the outputs [4]–[6]. The hardware complexity of ranges of R and N will increase.
NR method increases with increasing value of N. A General • As m value increases, number of CORDIC iterative stages
will increase in turn the hardware complexity, area, power
Manuscript received March 2, 2019; revised July 13, 2019; accepted consumption and latency will increase.
August 29, 2019. This work was supported in part by the Science and
Engineering Research Board (SERB), Government of India, for the project Addressing the fore mentioned limitations, in this paper,
entitled “Intelligent IoT enabled Autonomous Structural Health Monitor- • We propose a low complexity architecture design method-
ing System for Ships, Aeroplanes, Trains and Automobiles” through the
Impacting Research Innovation and Technology (IMPRINT) Program under
ology for the N t h root and N t h power computation
Grant IMP/2018/000375. This article was recommended by Associate based on the binary logarithm-exponential relation using
Editor A. Cilardo. (Corresponding author: Amit Acharyya.) CORDIC.
The authors are with the Department of Electrical Engineering, Indian
• We propose Binary Hyperbolic CORDIC algorithm to
Institute of Technology Hyderabad (IIT Hyderabad), Hyderabad 502285, India
(e-mail: [email protected]; amit_ [email protected]). perform the binary logarithm and inverse binary logarithm
Digital Object Identifier 10.1109/TCSI.2019.2939720 computations.
1549-8328 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

The highlights of this paper are given below. TABLE I


• The proposed approach unlike the state of the art C ONVERGENCE R ANGE OF CORDIC
approach will not depend on the CORDIC negative index
boundary (m) for logarithm and exponential computations
and thereby reduces the hardware complexity, power
consumption and latency.
• The proposed architecture and the state of the art archi-
tecture are coded in VHDL for fixed point numbers The CORDIC performs the rotation iteratively through an
and the ASIC implementation has been done at TSMC angle αi instead of performing the rotation directly through
45nm CMOS technology @ V D D = 1.08V and clock the angle z, where αi = tanh −1 (2−i ). So that z can be
frequency @ 1G H z with the help of Synopsis Design decomposed as follows
Compiler (DC) and IC compiler. The synthesis results 
n
show that the proposed N t h root design saves 19.38% on z= σi αi ; σi = ±1 (4)
chip area, 15.86% power consumption when compared i=1
with the state of the art architecture [9] without com- where σi is decomposition factor. The iteration formula
promising the computational accuracy. Similarly, the pro- for conventional Hyperbolic CORDIC can be expressed as
posed N t h power computation design saves 38% on chip follows
area, 35.67% power consumption when compared with
the state of the art power computation [14]. x i+1 = x i + σi (2−i yi ) (5a)
−i
• The proposed approach is one order superior in accuracy yi+1 = yi + σi (2 x i ) (5b)
for the N t h root computation and two order superior for z i+1 = z i − σi (tanh −1 (2−i )) (5c)
N t h power computation.
• The proposed approach will fix the shortcomings of [8] where i is an integer starts with 1. The σi can be determined
including the implementation of binary logarithm and by mode of operation [15]. Based on the mode of operation,
inverse binary logarithm computations. the CORDIC has been divided into two classes. One is rotation
The rest of this paper is organized as follows: Section II mode CORDIC and other is vectoring mode CORDIC. The σi
provides the necessary theoretical background. Section III for Hyperbolic Rotation (HR) mode CORDIC and Hyperbolic
introduces the proposed methodology. Section IV details the Vectoring (HV) mode CORDIC is given by
experimental results and section V concludes the discussion.
σi = sign(z i ) (6a)
II. T HEORETICAL BACKGROUND σi = −sign(yi ) (6b)
The state of the art approaches perform the N t h root and The scale factor for the Hyperbolic CORDIC K H is expressed
Nth power computations based on the the natural logarithm- as:
exponential relation [9], [14] n 
n n
1
1 ln(R) KH = cosh(σi αi ) = cosh(αi ) = ∘
R N = ex p( ) (1a) 1 1 i=1 1 − 2−2i
N
(7)
RN = ex p(ln(R) × N) (1b)
From the CORDIC convergence theorem in [21], to guarantee
The computations shown in (1) have been divided into three
the convergence, the iterations i = 4, 13, 40, · · · , k, (3k + 1)
steps. The first step is the computation of the natural logarithm
must be repeated. The maximum z value of conventional
i.e, ln(.) in both approaches [9], [14]. The next step is division
Hyperbolic CORDIC is expressed as
operation for root computation and multiplication operation for
power computation. The final step is exponential computation 
n 
n

i.e, ex p(.). The natural logarithm and exponential computa- |z|max = αi = tanh −1 (2−i ) (8)
tions are performed in [9], [14] using Hyperbolic CORDIC. i=1 i=1

The division operation is performed using linear CORDIC [9]. The iterative formula for Linear Vectoring (LV) mode
The basic working principle of Hyperbolic CORDIC can be CORDIC [15] is expressed as follows
expressed as:
      x i+1 = x i (9a)
xf x0 cosh(z) si nh(z) −i
= RH ∗ ; RH = (2) yi+1 = yi + σi (2 x i ) (9b)
yf y0 si nh(z) cosh(z)
z i+1 = z i − σi (2−i ) (9c)
where [x 0 , y0 ] and [x f , y f ] are the initial and final position
vectors, R H is hyperbolic rotation matrix and z is the angle where i is an integer starts with 0, and σi = −sign(yi ). Table I
of rotation [15]. By factoring out the cosh(z) term, the above summarizes the convergence range of the HR, HV and LV
equation can be rewritten as follows CORDIC as n → ∞. The convergence limits of CORDIC
     shown in the Table I are not enough for the implementation
xf 1 tanh(z) x 0 logarithm, exponential and division computations in practical
= cosh(z) ∗ (3)
yf tanh(z) 1 y0 applications [9], [15], [21]. Hence, the convergence range for
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 3

TABLE II TABLE III


I MPROVEMENT IN C ONVERGENCE R ANGE OF CORDIC C ONVERGENCE R ANGE OF R AND N

and z 0 = 0. The ln(R) can be computed as:


ln(R) = 2 × V ecHz (m, n, R + 1, R − 1, 0)
R−1
the Hyperbolic CORDIC was improved in [21] by considering = 2 × tanh −1 ( ) (14)
R+1
the negative index numbers as i = −m, −m+1, · · · , 1, · · · , n. The ln(R) value can be obtained by shifting the HV CORDIC
However, the iteration formula can not be same as (5). output shown in (14) by 1 bit to the left.
As mentioned in [21], when i ≤ 0, the term 2−i in (5) will be Step2: The division operation can be performed using LV
−i+1
replaced with the term (1 − 2−2 ). The iterative formula of CORDIC by considering inputs to the LV CORDIC as x 0 = N,
Hyperbolic CORDIC (5) for i ≤ 0 can be rewritten as follows y0 = ln(R) and z 0 = 0
ln(R)
x i+1 = x i + σi (1 − 2−2
−i+1
)yi (10a) = V ecL z (m, n, N, ln(R), 0) (15)
N
−2−i+1
yi+1 = yi + σi (1 − 2 )x i (10b) Step3: The exponential computation can be performed
z i+1 = z i − σi tanh −1
(1 − 2 −2−i+1
) (10c) using the HR CORDIC by considering inputs as x 0 = K H ,
y0 = 0 and z 0 = ln(R) N . The outputs are x n = Rot H x
where σi is same as shown in (6). The iterative formula of (m, n, K H , 0, N ) = cosh( ln(R)
ln(R)
N ), yn = Rot H y (m, n,
LV CORDIC for i ≤ 0 is same as shown in (9) [21]. If n is
K H , 0, ln(R)
N ) = si nh( ln(R)
N ). The exponential is computed by
large, the convergence of LV CORDIC depends on m which
adding HR CORDIC outputs as follows
is expressed as follows
ln(R) ln(R)
zn ← z0 +
y0
= V ecL z (m, n, x 0 , y0 , z 0 ) ≤ 2m+1 (11) ex p( ) = Rot Hx (m, n, K H , 0, )
x0 N N
ln(R)
where V ecL z (.) represents the z component output of LV +Rot H y (m, n, K H , 0, ) (16)
N
CORDIC. If n is large, the x, y, z of the Hyperbolic CORDIC
In R N computation, the input to the HR CORDIC is z 0 =
tends to the results shown in (12) and (13a) for the rotation
and vectoring modes respectively. ln(R) × N. The above equation can be rewritten as

1 ex p(ln(R) × N) = Rot Hx (m, n, K H , 0, ln(R) × N)


xn ← (cosh(z 0 )x 0 + si nh(z 0 )y0 )
KH + Rot H y (m, n, K H , 0, ln(R) × N)
= Rot Hx (m, n, x 0 , y0 , z 0 ) (12a) (17)
1 1
yn ← (si nh(z 0 )x 0 + cosh(z 0 )y0 ) From the above three step, the R computation will depend
N
KH on the m. The convergence of R and N is summarized in the
= Rot H y (m, n, x 0 , y0 , z 0 ) (12b) Table III for different values of m.
z n ← 0 = Rot Hz (m, n, x 0 , y0 , z 0 ) (12c) From Table III, (14), (15), (16) and (17), it can be inferred
 1
that the steps involved in R N and R N computations will have
1
xn ← x 02 − y02 = V ecHx (m, n, x 0 , y0 , z 0 ) (13a) the following limitations.
KH
• The convergence of R and N values depends on the
yn ← 0 = V ecH y (m, n, x 0 , y0 , z 0 ) (13a)
y negative index boundary (m) of CORDIC.
z n ← z 0 + tanh −1 ( ) = V ecHz (m, n, x 0 , y0 , z 0 ) (13a)
0
• The input range of R and N increases as m increases.
x0
• As m increases, the iterative stages in CORDIC also
where Rot Hc (.) and V ecHc (.) represent the rotation mode and increases resulting in increase in the hardware complexity,
vectoring mode of the Hyperbolic CORDIC respectively and latency and power consumption.
c represents the output of x or y or z component. For different Addressing all the limitations, in the next section we intro-
values of m, the improvement in the convergence range of duce a low complexity architecture design methodology for
CORDIC is summarized in the Table II. It can be noted from N t h root and N t h power computations.
Table II that the convergence range of CORDIC increases as
m increases. Now the steps involved in (1) can be computed III. P ROPOSED M ETHODOLOGY AND A RCHITECTURE
using HV, HR and LV CORDIC as follows [9], [14]: In this section, a low complexity architecture design
Step1: The logarithm can be computed using HV CORDIC. methodology for N t h root and N t h power computation is
Consider inputs to the HV CORDIC as x 0 = R+1, y0 = R−1 proposed.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

A. Proposed Methodology The above equation can be rewritten as


1
The R and
N RN computations can be performed using 22∗z − 1
binary logarithm- binary inverse logarithm relation. t= (25)
22∗z + 1
From the above equation z = tanh −1
log2 (R)
R N = 2( N ) b (t) can be derived as
1
(18a)
follows
R N = 2(log2 (R)×N) (18b)
1 (1 + t)
tanh −1
b (t) = log2 ; |t| < 1 (26)
The binary logarithm log2 (.) and binary exponential 2(.) can 2 (1 − t)
not be performed by Hyperbolic CORDIC as it is due to The rotation angle z can be decomposed in terms of predefined
rotation matrix R H . Hence, we will investigate the proper- angle αi as
ties of R H and introduce new R H to compute the binary
logarithm log2 (.) and binary exponential 2(.) . From (2), 
n 
n
z= σi αi = σi tanh −1 −i
b (2 ); σi = ±1 (27)
it can be observed that the determinant of R H is cosh 2
i=1 i=1
(z) − si nh 2 (z) = 1. In R H , the cosh(z) is an even function
and the si nh(z) is an odd function. The natural exponential The decomposition factor (σi ) can be determined by the
can be computed using cosh(z) and si nh(z) as follows mode of operation. The rotation formula for i t h iteration
corresponding angle αi using (23).
e z = cosh(z) + si nh(z) (19a)     
x i+1 1 tanh b (σi αi ) x i
−z
= cosh(z) − si nh(z) = cosh b (σi αi )
e (19b) yi+1 tanh b (σi αi ) 1 yi
Now, we will define two new functions cosh b (z) and si nh b (z) (28)
which are even and odd functions for inverse binary logarithm From (21), the si nh b (.) is an odd function and cosh b (.) is
computation. The binary inverse logarithm computation can be an even function so that tanh b (.) is an odd function. Since
expressed as σi = ±1, (28) can be rewritten as
    
2z = cosh b (z) + si nh b (z) (20a) x i+1 1 σi tanh b (αi ) x i
= cosh b (αi )
2−z = cosh b (z) − si nh b (z) (20b) yi+1 σi tanh b (αi ) 1 yi
(29)
By solving (20), the cosh b (z) and si nh b (z) can be obtained
as follows where tanh b (σi αi ) = σi tanh b (αi ). Using (27) and (29),
the iterative formula of CORDIC is expressed as follows
2z + 2−z
cosh b (z) = (21a) x i+1 = x i + σi (2−i yi ) (30a)
2
2z − 2−z yi+1 = yi + σi (2−i x i ) (30b)
si nh b (z) = (21b)
2 z i+1 = z i − σi (tanh −1 −i
b (2 )) (30c)
From (21), it can be noted that the defined cosh b (z) is an
where i is an integer starts with 1, σi = sign(z i ) and
even function, si nh b (z) is an odd function and cosh 2b (z) −
σi = −sign(yi ) for rotation and vectoring mode respectively.
si nh 2b (z) = 1. Hence, the cosh(z) and si nh(z) can be replaced The scale factor cosh b (αi ) is independent of decomposition
with cosh b (z) and si nh b (z) in R H for binary logarithm
factor σi so that instead of scaling during each iteration,
and inverse binary logarithm computation and (2) can be the magnitude of final output could be scaled by final scale
rewritten as
     factor K b . The K b is computed using the following equation
xf cosh b (z) si nh b (z) x 0
=  (22) 
n
yf si nh b (z) cosh b (z) y0 Kb = cosh b (αi ) (31)
The Hyperbolic CORDIC in (22) is intended for computation i=1
of binary logarithm and inverse binary logarithm. Therefore, By substituting cosh b (z) =  1
and tanb(αi ) = 2−i
it is named as Binary Hyperbolic CORDIC. Taking cosh b (z) 1−t anh 2b (z)
sinh b (z) in (31), the above equation can be rewritten as
term out from (22), the cosh b (z)
is considered as tanh b (z)
then (22) can be rewritten as 
n
1 
n
1
     Kb =  = ∘ (32)
xf 1 tanh b (z) x 0 1 − tanh 2b (αi ) 1 − 2−2i
= cosh b (z) (23) i=1 i=1
yf tanh b (z) 1 y0
−z
From (7) and (32), it can be noted that the scale factors K H
From (21), tanh b (z) = 22z −2
z
+2−z
. As mentioned section II, and K b are same. The maximum z can be computed as
the CORDIC performs the rotation iteratively with predefined

n 
n
angle αi instead of rotation directly through z, where αi = |z|max = αi = tanh −1 −i
b (2 ) (33)
tanh −1 −i
b (2 ). Assume that tanh b (z) = t and t can be derived i=1 i=1
as follows
As n → ∞, the convergence of Binary Hyperbolic rota-
2z − 2−z tion mode (BR) CORDIC and Binary Hyperbolic vectoring
t= z (24)
2 + 2−z mode (BV) CORDIC have been tabulated in Table IV.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 5

TABLE IV CORDIC shown in (34) and (35a). The computations shown


C ONVERGENCE OF B INARY H YPERBOLIC CORDIC in (18) is now divided into the following three steps.
Step1: The first step is binary logarithm log2 (R) compu-
tation. Consider inputs to the Binary Hyperbolic Vectoring
mode (BV) CORDIC as x 0 = R + 1, y0 = R − 1 and z 0 = 0.
The output z n is expressed using (35a) and (26) as follows
TABLE V
C ONVERGENCE OF I MPROVED B INARY H YPERBOLIC CORDIC log2 (R) = 2 × V ecHzb (m, n, R + 1, R − 1, 0)
R−1
= 2 × tanh −1b ( ) (36)
R+1
From the Table IV, the convergence limit of the BV CORDIC
is tanh −1 y0
b ( x 0 ) ≤ 1.6132. From (36), R+1 ≤ tanh b (1.6132) =
R−1

0.8069 and R ∈ [ 9.36 , 9.36].


1

The range of R ∈ [ 9.361


, 9.36] is limited and may not ade-
quate for many practical applications where R ∈ / [ 9.36
1
, 9.36].
Therefore, the range of R can be increased by considering
From the Table IV, it can be noted that the scale factor of negative indices like the state of the art design [9] and as
the Binary Hyperbolic CORDIC is same as the conventional shown in Table V. But this technique increases the hardware
Hyperbolic CORDIC. The convergence range of the Binary complexity, latency and power consumption. Hence, we will
Hyperbolic CORDIC is improved when compared with the introduce a normalization procedure which enhances the
conventional Hyperbolic CORDIC shown in the Table II. The convergence limit without increasing the hardware complexity
convergence limit shown in the Table IV is not adequate for and latency.
implementation logarithm and exponential computations. The Consider the working range of R as r ∈ [1, 2]. If R ∈ / [1, 2],
convergence range of Binary Hyperbolic CORDIC can be the R can be scaled down to the working range by performing
improved by considering negative indices as i = −m, −m + a simple shifting operation. For example, if 2(k) < R ≤ 2k+1
1, · · · , 1, · · · , n and replacing the term 2−i with the term where k is an integer, the R is scaled down to working range
−i+1
(1 −2−2 ) as mentioned in section II. Now the convergence by right shifting R by k bits. The R value can be expressed as:
range and scale factor depend on the negative index boundary
(m). For different m values the convergence range and scale R = 2k ∗ r ; r ∈ [1, 2] (37)
factor of the proposed improved Binary Hyperbolic CORDIC Now the log2 (R) can be computed as follows
has been summarized in Table V. It can be noted from
Table V that the the convergence range of the Binary CORDIC log2 (R) = log2 (2k ∗ r ) = k + log2 (r ) (38)
increases as m increases. If n is considered large, the x, Now the log2 (r ) can be computed using the BV CORDIC by
y, z of the Binary Hyperbolic CORDIC tend to the results considering inputs x 0 = (r + 1), y0 = (r − 1), z 0 = 0 and
shown in (34) and (35a) for the rotation and vectoring modes m = × (don’t care condition).
respectively.
log2 (r ) = 2 × V ecHzb(×, n, r + 1, r − 1, 0) (39)
1
xn ← (cosh b (z 0 )x 0 + si nh b (z 0 )y0 ) where m = × denotes that the computation in (39) will not
Kb
depend on the negative index boundary (m). The computations
= Rot Hxb (m, n, x 0 , y0 , z 0 ) (34a) shown in (37) and (38) can be performed with shift and add
1 operations. From (39), it can be noted that the logarithm
yn ← (si nh b (z 0 )x 0 + cosh b (z 0 )y0 )
Kb computation is independent of m which reduces the hardware
= Rot H yb (m, n, x 0 , y0 , z 0 ) (34b) complexity and latency. As an example, consider R = 49 then
25 < R ≤ 26 and k = 5. After shifting R by k = 5 bits
zn ← 0 = Rot Hzb (m, n, x 0 , y0 , z 0 ) (34c)
 to right, r will become 1.53125. The log2 (r ) is computed
1
xn ← x 02 − y02 = V ecHxb (m, n, x 0 , y0 , z 0 ) (35a) using BV CORDIC as shown in (39) and log2 (r ) = 0.6147.
Kb Thereafter log2 (r ) should be brought to its original value by
yn ← 0 = V ecH yb(m, n, x 0 , y0 , z 0 ) (35b) adding k = 5 i.e, log2 (R) = 5.6147 as shown in (38).
y0 Step2: The next step in root computation is the division
z n ← z 0 + tanh −1b ( ) = V ecHzb (m, n, x 0 , y0 , z 0 ) (35c)
x0 operation using LV CORDIC. The inputs to the LV CORDIC
are x 0 = N, y0 = log2 (R) and z 0 = 0. From (11), the output
where Rot Hcb (.) and V ecHcb(.) represent the rotation mode converge to the following result.
and vectoring mode of Binary Hyperbolic CORDIC respec-
log2 (R)
tively. The Hyperbolic CORDIC can be extended for other = V ecL(m, n, N, log2 (R), 0) (40)
logarithm and inverse logarithm computations by storing N
appropriate predefined angles. The second step in the R N computation is multiplication
Our main goal now is to perform the computations shown i.e, log2 (R) × N which can be performed using any conven-
in (18), which can be performed using the Binary Hyperbolic tional multiplier.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

Step3: The final step is binary inverse logarithm computa-


log2 (R)
tion. The 2 N can be computed using Binary Rotation mode
CORDIC (BR CORDIC). Consider the inputs to BR CORDIC
as x 0 = K b , y0 = 0 and z 0 = logN2 (R) . So that outputs of BR
CORDIC can be expressed using (34)
log2 (R) log2 (R)
x n = Rot Hxb (m, n, K b , 0, ) = cosh b ( )
N N
(41a)
log2 (R) log2 (R)
yn = Rot H y (m, n, K b , 0,
b
) = si nh b ( )
N N
(41b)
The binary inverse logarithm can be computed by adding BR
CORDIC outputs which is expressed as follows
log2 (R) log2 (R) log2 (R) 1
2( N )
= cosh b ( )+si nh b ( )= R N (42)
N N
In R N computation, z 0 will be log2 (R) × N. From Table IV,
it is observed that the BR CORDIC has convergence limit
which limits the input range for exponential computation
i.e, z 0 = log2 (R) × N ≤ 1.6132 or z 0 = logN2 (R) ≤ 1.6132.
However, this limit z 0 ≤ 1.6132 is not adequate for practical
applications. Therefore, it can be enhanced by considering
negative indices for BR CORDIC as shown in the Table V
but this technique results the increase hardware complexity
and latency. Hence, we introduce a normalization procedure
to enhance the convergence limit.
Consider a real number P ≤ 1. If P > 1, it can be split
into two parts as shown in the following equation
P = PI + PF (43)
where PI and PF represents the integer and fractional parts
of P respectively. Now 2 P can be computed using following
equation Fig. 1. Computational flow for the proposed methodology a) N th Root
computation b) N th Power computation.
2 P = 2 PI +PF = 2 PI ∗ 2 PF (44)
Now PF is less than 1. The 2 PF can be compute using basic
BR CORDIC considering inputs as x 0 = K b , y0 = 0 and procedure for inverse binary logarithm was presented in [8].
z 0 = PF ≤ 1.6132. The 2 PF can be computed as But the implementation details of the normalization procedure
and 2 PF computation were not provided. In the proposed
2 PF = Rot Hxb (×, n, K b , 0, PF ) + Rot H yb (×, n, K b , 0, PF ) approach, we presented the implementation details of normal-
(45) ization procedure and Binary Hyperbolic CORDIC algorithm
to perform 2 PF computation. The hardware complexity of the
where m = × denotes the computation shown in (45) proposed Binary Hyperbolic CORDIC is same as conventional
independent of m. After 2 PF computation, it could be brought Hyperbolic CORDIC which is discussed in section IV-C. The
to the original value by shifting PI bits to the left. The proposed Binary Hyperbolic CORDIC algorithm facilitates us
steps shown in (43) and (43) can be performed using simple to apply the pre-log normalization and pre-exponential normal-
shifting operation which made the BR CORDIC independent ization procedures for the binary logarithm and binary expo-
of m resulting in the reduction of hardware complexity. As an nential computations which reduces the number of iterations.
example, consider logN2 (R) = P = 5.36 then PI = 5,
PF = 0.36 and 20.36 = 1.2834. Now 2 PF should be brought
to its original value by shifting PI = 5 bits to the left B. Illustration of the Proposed Methodology
log2 (R)
i.e, 2 P = 41.0696. Therefore, the 2( N ) or 2(log2 (R)×N) The computational flow of the proposed methodology is
1
can be computed by using above normalization procedure and depicted in Fig.1(a) and Fig.1(b) for the R N and R N compu-
the BR CORDIC. From (39) and (45), the logarithm and tations respectively. The functionality of the proposed method-
exponential computations are independent of the CORDIC ology has been illustrated with an example here. Consider two
1
converge limit (m) which reduces the hardware complexity and real numbers R = 67.55 and N = 4.78. In the R N and R N
latency in root power computations. A similar normalization computations the first step is binary logarithm computation.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 7

Fig. 2. Iteration structure for a) BR CORDIC, b) BV CORDIC and c) LV CORDIC.

Step1: The binary logarithm can be computed using pre-log shown in section III-A. From the proposed methodology,
normalization and BV CORDIC (using (37), (38) and (39),) as the binary logarithm and binary inverse logarithm can be
shown in Fig.1(a) and Fig.1(b). The pre log normalization can performed using Binary Hyperbolic CORDIC. The iterative
be performed using shifting operation. The input to the pre-log formula for the proposed Binary Hyperbolic CORDIC is
normalizer is R = 67.55 then 26 < R ≤ 27 and the outputs shown in (30). Using (30) and (6), the iteration structure
are r = 1.05546875 and k = 6. Now log2 (r ) can be computed of the Binary Hyperbolic CORDIC for rotation and vector-
using BV CORDIC. Consider the inputs to the BV CORDIC ing mode are shown in Fig.2(a) and Fig.2(b) respectively.
as x 0 = r + 1 = 2.05546875, y0 = r − 1 = 0.5546875 and Similarly using (9), the iteration structure of LV CORDIC
z 0 = 0. The output of BV CORDIC is z n = 12 log2 (r ) = is shown Fig.2(c). The iteration stages in the BV CORDIC,
0.0389. The log2 (r ) is obtained by shifting z n one bit to left BR CORDIC and LV CORDIC are cascaded with each other 1
then log2 (r ) = 0.0779. The log2 (R) can computed by adding to form a pipeline architecture. The critical path for the R N
k = 6 to the log2 (r ) then the log2 (R) = 6.0779. computation is a shift-add operation which is same as the
1
Step2: The second step in R N is division computation. The state of the art approach [9]. The critical path for the R N
division can be performed using LV CORDIC. The inputs computation is a multiplication operation which is same as
to the LV CORDIC are x 0 = N = 4.78, y0 = log2 (R) = the state of the art approach [14]. The proposed architectures
6.0779 and z 0 = 0. The output of LV CORDIC is z n = are implemented in pipeline fashion. Therefore, the output is
log2 (R)
N = 1.2715 which is treated as D. The second step in available for every clock cycle. The throughput of the proposed
1
R N is multiplication operation. The inputs to the multiplier approach for R N and R N computation is 100% which is same
are x 0 = N = 4.78, y0 = log2 (R) = 6.0779 then the output as the state of the art approaches [9], [14].
z n = 29.0523 is treated as M.
1
Step3: The final step of the R N and R N computations IV. E XPERIMENTAL R ESULTS AND D ISCUSSION
is binary inverse logarithm computation. The binary inverse A. Verification of Proposed Methodology
logarithm can be computed using pre-exponential normaliza-
In this subsection, the correctness of the proposed methodol-
tion and BR CORDIC (using (43), (44) and (45)) as shown
1 ogy has been verified by modeling in MATLAB and simulating
in Fig.1(a) and Fig.1(b). In R N computation, the input to the absolute errors. The Absolute Error (AE) is defined as
the pre- exponential normalization is P = D = 1.2715. The
outputs are PI = 1, PF = 0.2715. Now 2 PF can be computed T −M
AE = | | (46)
using basic BR CORDIC considering inputs as x 0 = K b , T
y0 = 0 and z 0 = PF = 0.2715. The outputs of BR CORDIC where T is the true value of the N t h root or N t h power and
are x n = 1.0178, yn = 0.1893. The 2 PF can be obtained M is measured value of the N t h root or N t h power using
by adding x n , yn then the 2 PF = 1.2071. The 2 P could the proposed method. The another important criteria is Mean
be obtained by shifting the 2 PF by PI = 1 bits to the left Absolute Error (MAE) which defined as follows
1 1
then 2 D = 2.4141 = R N = 67.55 4.78 . In R N computation,  Num
j =1 AE
the input to the pre- exponential normalization is P = M = M AE = (47)
29.0523. The outputs are PI = 29, PF = 0.0523. Now 2 PF Num
can be computed using basic BR CORDIC considering inputs where Num denotes the number of test cases. The steps
as x 0 = K b , y0 = 0 and z 0 = PF = 0.0523. The outputs of involved in the proposed approach and the state of the art
BR CORDIC are x n = 1.0007, yn = 0.0363. The 2 PF can be approaches [9], [14] are depend on the m and n values. Before
obtained by adding x n , yn then 2 PF = 1.0369. The 2 P could simulating the errors, the dependency of m and n values
1
be obtained by shifting the 2 PF by PI = 29 bits to the left. have been analyzed. In R N computation, the state of the
Then 2 M = 5.5669 × 108 = R N = 67.554.78. approach [9] performed the software implementation as well
as hardware implementation for R ∈ [10−6 , 106 ] and N ∈
C. Proposed Architecture [2, 1002] [9]. In order to compare our proposed architecture
1
We implemented the architecture for the R N and R N com- with the state of the art architecture [9] on a uniform platform,
putations in pipeline manner using the proposed methodology we also consider the same values of R and N. In the state of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE VI
D EPENDENCY OF m AND n FOR P ROPOSED M ETHODOLOGY

TABLE VII
V ERIFICATION OF THE P ROPOSED M ETHODOLOGY (MATLAB S IMULATION )

the art approach [9], from the Table III, the m is chosen as 2 complexity and latency. Hence, the proposed methodology for
for HV CORDIC then R ≤ 1.0562 ∗ 106 . If m = 2, from the R N computation does not have any limitation on its input
Table II, for the HR CORDIC, the z 0 ≤ 6.935112 then N = ranges of R and N. The dependency of m and n for R N
ln(106 )
6.935112 ≥ 2. The maximum N value is chosen as 1002. The
computation has been summarized in the Table VI. For the
convergence of LV CORDIC is ln(R) N ≤ 2
m+1 . The minimum m and n values shown in the Table VI, the proposed and the
value of N is 2 then ln(10
6 )
≤ 2m+1 and m should be 2. The n state of the art approaches [9], [14] are coded in MATLAB and
2 simulated the absolute errors using (46) and (47). The Num
value is considered as 20. In the proposed approach, the input
is chosen as 5 million, the R and N are generated randomly.
R is independent of m value. In the proposed approach from
(39), (40) and (45), the division operation alone depends on The results are summarized in Table VII. From the Table VII,
1
it is evident that the proposed approach for R N computation
m value. The convergence of LV CORDIC is log22(10 ) ≤ 2m+1
6

is one order superior and the proposed approach for R N


and m should be 3. The dependency of m and n for the pro-
computation is two order superior in terms of maximum AE
posed and state of the art approach [9] have been summarized
and M AE when compared with the respective state of the
in the Table VI. From the Table VI, it can be observed that the
art approaches [9], [14]. The MAE of the proposed Binary
proposed approach requires one extra iteration for the division
Hyperbolic CORDIC is same as the conventional Hyperbolic
computation because log22(10 ) > ln(10
6 6)
2 ≤ 2m+1 . However, CORDIC. However, the proposed approach achieved better
it can be noted from the Table VI that the proposed approach is MAE than the state of the art approach due to the input ranges
independent of m for logarithm and exponential computations. of the CORDIC. For example, in root computation, it can be
This reduces the number of iterations involved in the logarithm noted from the Table VII, the input range of R and N for the
and exponential computations. This also reduces the hardware state of the art approach and proposed are same. However,
complexity and latency which is discussed in IV-C in detail. for logarithm computation the proposed Binary Hyperbolic
Similarly, for R N computation, the dependency analysis of CORDIC has been operated for the input of [1, 2]. Where
m and n is carried out here. From (39) and (45), it is evident as in the state of the art approach the conventional Hyperbolic
that the proposed approach is independent of m where as the CORDIC has been operated for the input of [10−6 , 106 ].
state of the art approach [14] depends on m. Consider R as Similarly, for exponential computation, the proposed Binary
R ∈ [10−2 , 100] and ln(100) = 4.0652. From the Table III, Hyperbolic CORDIC has been operated for [0, 1]. But, in the
to compute ln(100) the m should be 1 for HV CORDIC. The sate of the art approach, the conventional Hyperbolic CORDIC
HR CORDIC has been used for exponential computation. The operates within range of [0, 6.935112].
input to the HR CORDIC is z 0 = ln(R)×N. From the Table II,
if m = 4 for HR CORDIC the z 0 ≤ 24.255 then ln(R) × N ≤ B. Word Length Analysis
24.255 and N ≤ 5.2669. Therefore, the range of N can be In this subsection, before implementing the proposed archi-
increased by increasing m which results additional hardware tectures on hardware, first we will analyze the input word
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 9

TABLE VIII
1
W ORD L ENGTHS R EQUIRED FOR THE S TATE OF THE A RT A RCHITECTURE AND P ROPOSED A RCHITECTURE FOR R N C OMPUTATION

lengths required for each step. First, let us analyze the word will be n e f f = 22. To achieve n bit precision in the output
1
lengths required by the state of the art approach for R N com- of CORDIC, the internal registers should have log2 (n) extra
putation. The state of the art approach [9] implemented their bits at the LSB position [15]. Therefore, the fractional word
design for R ∈ [10−6 , 106 ] and N ∈ [2, 1002]. The state of the length is n e f f + log2 (n e f f ) = 27. The first step is pre-log
art design [9] is chosen the fractional part of data as 27 bits. normalization as shown in Fig.1(a). The output of first step
The maximum value of R is 106 and log2 (106 ) ≈ 20. The inte- is r which is fed as input to the BV CORDIC. From (37),
ger part of R will be 20 bit. The input data for next module (LV the r ∈ [1, 2]. The maximum of R is 220 , therefore, to bring
CORDIC) is N and ln(R). The integer part of the input data the R is between r ∈ [1, 2], it is required to shift the R
for the LV CORDIC depends on maximum of ln(R) and N. by k = 19 bits to the right. The least significant 19 bits
The maximum value of ln(R) i.e, ln(220 ) = 13.863 ≈ 24 . Four may be ignored while performing normalization. However,
bits are necessary to represent the ln(.) value. The maximum we considered additional 18 bits in the fractional part of r to
value of N is 1002 ≈ 210 . Ten bits are required to represent improve the accuracy in logarithm computation. The fractional
the N value. Therefore, the integer part of input for LV part of r is to be set as 45. The integer part of r is considered
CORDIC is chosen as 10 bit. The input to the final step as 2 bit because r ∈ [1, 2]. After performing the logarithm
depends on maximum of si nh( ln(R) ln(R)
N ) and cosh( N ). The
computation, the least significant 18 bits of the fractional
si nh( ln(22 ) ) = si nh( ln(22 ) ) = 512.0005 hence the integer
20 20
part will be ignored. The consideration of additional 18 bit
part for input of HR CORDIC is 10 bit. But, the integer part in fractional part of r improves the accuracy of logarithm
of the input data for HR CORDIC as 11 bits to avoid the computation. The next step is compensation of logarithm by
truncation errors due to the iteration formula. An extra sign adding K to the log2 (r ). The maximum value of k is 19.
bit is added in front of every input data and the word length The integer part of the k is to be set as 5 bits. The input
requirements for each step are tabulated in Table VIII. data for LV CORDIC is N and log2 (R). The word length
Next, we will analyze the input data word lengths required of the input data for the LV CORDIC depends on maximum
1 of N and log2 (R). The maximum value is 1002, therefore,
for the proposed architecture for R N computation. We will
the integer part for N and log2 (R) will be chosen as 10 bits.
consider the same input range for R and N as the state of the
The maximum input to the pre exponential normalization step
art approach [9] to compare the proposed architecture with the
is P = log22(2 ) = 10. The integer part of input P to be set as
20
state of the art architecture [9] on a uniform platform and the
integer part and fractional part of R chosen as 20 and 27 4 bit. The input to the final CORDIC depends on maximum
respectively. Here, we followed the word length selection of si nh b (PF ) and cosh b (PF ). The PF is less than equal to 1
methodology presented in the state of the art approach [9]. For so that max{cosh b (PF )} = max{si nh b (PF )} ≈ 1.25. The
hyperbolic CORDIC, the convergence range and precision will integer part required for input of BR CORDIC is 2 bit. The
depend on its positive index(m) and negative index(n) bound- final output is shifted by PI bits to the original value. An extra
aries respectively. The fractional word length will depend on sign bit is added in front of every input data and settings are
the n value. In the hyperbolic CORDIC, the iterations 4, summarized in the Table VIII.
13, . . . ., 3k + 1 need to be repeated. The negative index From the Table VIII, it can be noted that an extra step
boundary (n) considered in MATLAB simulation is 20. For is required in proposed approach in logarithm computation
n = 20, the iterations 4 and 13 need to be repeated in i.e, pre log normalization which involves only shift operations.
hyperbolic CORDIC and the number of effective iterations The input word length to HV CORDIC and BV CORDIC
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE IX
W ORD L ENGTHS R EQUIRED FOR THE S TATE OF THE A RT A RCHITECTURE AND P ROPOSED A RCHITECTURE FOR R N C OMPUTATION

is same. The proposed approach will have more accuracy in C. Hardware Complexity and Timing Analysis
the logarithm computation due to the range of r ∈ [1, 2] In this subsection, we analyze the performance of the
and the fractional part of r consists of 45 bits instead of proposed architecture and compare with the state of the art
27 bits. Similarly, an extra step is required in exponential architecture [9], [14] in terms of the hardware complexity,
computation i.e, pre exponential normalization which involves latency and throughput. Throughout the analysis we keep a
only shift operations. The input word length required by the generalized view on CORDIC stages m, n and word-length
BR CORDIC is lesser compared to the HR CORDIC due as b. A Ripple Carry Adder (RCA) and Conventional Array
to pre exponential normalization which reduces the hardware Multiplier (CAM) are considered here to provide compar-
complexity. ison on a uniform platform. A b-bit RCA requires b full
Now, we will analyze the input data word length required adders (FA). A b X b CAM requires b(b − 2) FA plus b half
for R N computation for m and n shown in the Table VI. adders (HA) and b 2 AND gates. In addition, one FA cell
We consider the input range of R as R ∈ [10−2 , 100] requires 24 transistors, one HA cell consist of 12 transistors
as mentioned in the Table VII. The fractional part R is and a two input AND gates consists of 6 transistors. Based on
chosen as 27 bits to achieve an average precision of 10−7 . the approach presented in [9] and [18], Transistor count for the
The integer part of R depends on maximum R value. The proposed architecture is expressed in terms of Transistor Count
integer part will be log2 (100) ≈ 7 bit. In the state of the (T C) of RCA and CAM. We can calculate T C RC A = 24b
art approach [14], the exponential computation depends on and T CC AM = 6b(5b − 6). In the Hyperbolic CORDIC,
m as shown in Table VI. From Table VI, m = 4 which each iteration requires six add operations for i > 0 and for
limits the N value (N ≤ 5.2669). Hence, we consider i ≤ 0, each iteration requires eight add operations. In the LV
N as N ∈ [1, 5]. The multiplier word length depends on CORDIC, each iteration requires two add operations for all
maximum value ln(R) = ln(100) = 4.6051 and N = 5. values of i . In conventional Hyperbolic CORDIC, for i > 0
The maximum multiplier output is ln(100) ∗ 5 = 23.0258. the critical path the critical path is one shift and one add
Therefore, the integer part of multiplier is chosen as 5 bit. operation but for i ≤ 0 the critical path is one shift and
The word length for HR CORDIC depends on si nh(ln(100)× two add operations [9]. The state of the art approach [9]
5) = cosh(ln(100) × 5) = 1.71 × 1010 . The integer part used a folding-delay technique to maintain critical path as
required by HR CORDIC is 34 bits. The word lengths one shift and one add operation. The consequence of the
required by the state of art approach [14] are tabulated folding-delay technique is the iteration i ≤ 0 requires two
in Table IX for R N computation. In proposed approach, clock cycles and the iteration i > 0 requires one clock
the logarithm and exponential computations are independent cycle [9]. For LV CORDIC each iteration requires one clock
of CORDIC convergence limit (m). The similar word length cycle.
1
analysis of R N computation is performed for R N compu- In the state of the art approaches [9], [14], the natural
tation and the input word lengths required in each step are logarithm is computed using HV CORDIC along with two
tabulated in the Table IX. From the Table IX, it can be additional add operations as shown in (14). The total T C
noted that, the word length required by the BR CORDIC is involved in the natural logarithm computation for the state
32 bit lesser compared to the HR CORDIC. This reduces of the art design [9] is expressed as follows
the hardware complexity in exponential computation. The
hardware complexity analysis is performed in the following T Cnat ural_log = (8 × (m + 1) + 6 × (n) + 2) × T C RC A
subsection. (48)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 11

TABLE X
T RANSISTOR C OUNT AND C LOCK C YCLE A NALYSIS

The number of clock cycles required for the natural logarithm shown in (44) and (45). The T C involved in the binary inverse
computation is expressed as logarithm computation is given by
C L K nat ural_log = 2 ∗ m + n + 3 (49) T Cbinar y_inv_log = (6 × (n) + 1) × T C RC A (56)
In the proposed approach, the binary logarithm is computed The number of clock cycles required for the binary inverse
using basic BV CORDIC along with two additional add logarithm computation using (44) and (45) is given by
operations as shown in (37), (38) and (39). The compensation
with k is performed by add operation. The T C involved in the C L K binar y_inv_log = n + 2 (57)
log2 (r )computation is T Clog2 (r) = (6 × (n) + 2) × T C RC A . The total T C and clock cycles required for each step have
The total T C a require for the binary logarithm computation been summarized in Table X for the values of m and n
can be expressed as follows shown in the Table VI and word lengths shown in the
T Cbinar y_log = T Clog2 (r) + T C RC A (50) Table VIII and Table IX. From the Table VI, n is considered
as 20. In conventional and Binary Hyperbolic CORDIC as per
From (37), (38) and (39), the number of clock cycles required CORDIC convergence theorem i = 4, 13 are to be repeated
for the binary logarithm computation is expressed as which results additional complexity and latency. The repeated
C L K binar y_log = n + 4 (51) iterations are also accounted in T C and C L K computation,
summarized in Table X for root and power computations. The
The division operation has been performed using LV CORDIC TS (Transistor Saving) is defined as follows
in the proposed design and the state of the art design as shown T C proposed
in (15) and (40). The T C involved in the division computation TS = 1− (58)
T C St at eof t heart
is expressed as
In the proposed approach, if the pre-log normalization and
T Cdiv = (2 × (m + n + 1) × T C RC A ) (52)
pre-exponential normalization procedures are not performed
The number of clock cycles required for the division compu- the number of iterations and word lengths required for the
tation is expressed as proposed Binary Hyperbolic CORDIC is same as conventional
Hyperbolic CORDIC. Therefore, the hardware complexity
C L K div = m + n + 1 (53) of the proposed approach is same as [9] and [14] when
The second step in R N computation is multiplication operation normalization is not performed. As can be seen from Table X
which is performed using CAM and number of clock cycles that the proposed approach saves 20.55% and 42.01% T S for
1
required by CAM is 1. The final step in the state of the R N and R N computations when compared with the state of the
art design is natural exponential computation. The natural art approaches [9], [14] respectively. The proposed approach
exponential is computed using HR CORDIC and one add also saves 8 clock cycle latency compared with the the state
operation as shown in (16). The T C involved in the natural of the art approaches [9], [14].
exponential computation is expressed in the following equation
T Cnat ural_ex p = (8 × (m + 1) + 6 × (n) + 1) × T C RC A D. Implementation Results
(54) The proposed architectures and the state of the art archi-
tectures are coded in VHDL for per word lengths shown in
The number of clock cycles required for the natural exponen- the Table VIII and Table IX. The ASIC implementation was
tial computation using (16) can be expressed as done for the proposed architecture at TSMC 45nm CMOS
technology @ V D D = 1.08V and clock frequency @
C L K nat ural_ex p = 2 ∗ m + n + 3 (55)
1G H z with the help of Synopsis Design Compiler (DC) and
The binary inverse logarithm in the proposed design is com- IC compiler. The synthesis results of ASIC implementation
puted using basic BR CORDIC and one add operation as are shown in Table XI. The state of the art approach [9]
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE XI
H ARDWARE I MPLEMENTATION FOR THE P ROPOSED AND S TATE OF THE A RT A RCHITECTURES

performed its ASIC implementation @ 1GHz. To compare


the proposed approach with the state of the art approach on a
uniform platform, the synthesis frequency is selected as 1GHz.
From the Table XI, it can be noted that the maximum clock
frequency of the proposed design is same as the state of the
art approach. The critical path of the proposed design is same
as the state of the art deign as mentioned in section III-C.
Due to this reason, the maximum operating frequency of the
proposed design is same as the state of the art design.
1
From the Table XI, the proposed design for R N computation
saves 19.38% on chip area and 15.86% power consumption
when compared with the state of the art architecture [9]. Sim-
ilarly, from the Table XI, the proposed design for R N compu-
tation saves 38% on chip area and 35.7% power consumption
when compared with the state of the art architecture [14].
It can be noted that, the proposed methodology is independent
of the technology node. However to provide insight into the Fig. 3. Bit position Error of BV CORDIC and BR CORDIC.
FPGA implementation of the proposed methodology, FPGA
prototyping is performed on Xilinx Virtex-6 (XC6v1x240t).
From the Table XI, it can be observed that the proposed
root computation. The Binary Hyperbolic CORDIC performs
design saves 20.25% and 39.32% LUT consumption for root
n e f f = 22 iterations so that 22 bits are accurate. The input
and power computations respectively. In order to evaluate
range for BV CORDIC and BR CORDIC are [ 9.36 1
, 9.36] and
the energy efficiency, sampling rate per watt criterion [9]
[0, 1.6132] respectively. However, in the proposed approach
is adopted here by assuming the sampling rate equals to
we are limiting the input ranges to r ∈ [1, 2] for BV CORDIC
1000 × f MSPS (Million Samples Per Second). The sampling
and PF ∈ [0, 1] for BR CORDIC which gives better precision.
rate per watt can be expressed as 1000× f
p(w) M S P S/W where f The bit position error of BV CORDIC and BR CORDIC are
is frequency in GHz and p is power consumption in watts.
simulated for the input range of r ∈ [1, 2] and PF ∈ [0, 1]
From the Table XI, it can be noted that at 1GHz frequency
based on approach presented in [9] and [17]. The bit position
the proposed approach can process 4.436 million additional
error of Binary Hyperbolic CORDIC is shown in Fig.3.
root computations per second per watt (joule) when compared
Let us consider R = 999999.22721512243151664733 =
with the state-of-the-art method [9]. Similarly, the proposed
11110100001000111111.00111010001 01010110001010011.
approach can process 11.6 million additional power com-
After the range reduction, we have r having 48 bits and
putations per joule when compared with the state-of-the-art
r = 1.1110100001000111111001110100010101011000101
method [14].
001 and k = 19 = 10011. From Fig.3, it is evident that 23 bits
approximately accurate for the BV CORDIC. The output of
E. Accuracy BV CORDIC is log2 (r ) =0.111011100111101100110X X X X
In order to better understand the accuracy of the pro- where X X X X will be noise. After adding k to the log2 (r )
posed approach, here we will illustrate an example for then log2 (R) = 10011.111011100111101100110X X X X.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

MOPURI AND ACHARYYA: LOWCOMPLEXITY GENERIC VLSI ARCHITECTURE DESIGN METHODOLOGY 13

TABLE XII in [22] requires 7 BRAMs where as the proposed approach


COMPARISON W ITH F LOATING -P OINT E XPONENTIATION U NITS requires 0 BRAMs. For higher precision, a generic polynomial
approximation approach has been used which increase the
resource consumption. A generalized Hyperbolic CORDIC
algorithm is presented in [23] to compute arbitrary logarithm
and exponential computations. The CORDIC approach has
significant advantage over approximation approach in terms
of area and power consumption for logarithm and exponen-
tial computations when high precision is required [23]. The
proposed approach computes logarithm and exponential com-
putations using CORDIC. Therefore, the proposed approach
will have significant advantage over the approach in [22] when
high precision is required.
Consider N = 2 and the result of the division is
P = 01001.111101110011110110011X X X X X X. Now PI =
9(01001) and PF = 0.111101110011110110011. The 2 PF V. C ONCLUSION
can be computed using BR CORDIC. It can be noted
In this paper, we proposed a low complexity N t h root and
from Fig.3 that 23 bits approximately accurate for the BR
N t h power computation architecture design methodology for
CORDIC. The 2 PF =1.11110011111111111111000X X X X
real time applications. The state of the art approaches [9], [14]
and it can be brought to original value by shifting PI = 9
performs the N t h root and N t h power computation based
to the left. The R (1/2) i.e, 2 P =1111100111.11111111111001
on natural logarithm-exponential relation using Hyperbolic
X X X X but actual R (1/2) is 1111100111.1111111111100
CORDIC. In the state of the art approaches [9], [14],
1101010110101101. In this example, the absolute error is
the CORDIC negative index boundary (m) poses limitation on
2−14 = 6.1035 × 10−5 and considering 39 bit output 26 bit
R and N values and also increases the hardware complexity
are approximately accurate. 1
and latency. The proposed approach performs the R N and R N
The HDL simulation is carried out for proposed architec-
computations based on the binary logarithm- binary logarithm
tures using (46) and (47) for 5 million test cases. The MAE
relation shown in [8]. The proposed approach is independent of
and maximum AE are tabulated in the Table XI. The MAE
the CORDIC negative index boundary (m) which reduces the
for proposed root computation is 7.6958 × 10−6 ≈ 2−17 .
hardware complexity and latency. Subsequently, low complex-
Considering 39 bit output, 29 bits are approximately accu-
ity architectures have been designed for N t h root computation
rate for the proposed root computation. From the Table XI,
and N t h power computation using VHDL and synthesized
the MAE for the state of the art [9] root computation is
under the T S MC40 − nm CMOS technology @ 1 G H z
7.3852 × 10−5 ≈ 2−14 and considering 39-bit output, 26 bits
frequency. The synthesis results shows that the proposed
are approximately accurate. Similarly, from the Table XI,
N t h root architecture saves 19.38% on chip area and 15.86%
the MAE for the proposed power computation is 8.2492 ×
power consumption when compared with the state of the art
10−6 ≈ 2−17 . Therefore, considering 62-bit output, 52 bits are
N t h root architecture [9]. Similarly, the proposed N t h power
approximately accurate in the proposed power computation.
architecture saves 38% on chip area, 35.67% power consump-
From the Table XI, the MAE for the state of the power
tion when compared with the state of the art N t h power
computation [14] is 9.5642 × 10−4 ≈ 2−10 . Therefore 45 bits 1
are accurate in 62-bit output. architecture [14]. The proposed approach for R N computation
is one order superior and the proposed approach for R N
computation is two order superior in terms of maximum
F. Discussion on Floating-Point Exponentiation Units absolute error and mean absolute error when compared with
The power computation method was presented in [22] the respective state of the art approaches [9], [14]. The critical
for floating-point numbers based on logarithm- exponential path for the proposed approach is same as the state of the art
relation in (1b). A fair comparison is not possible between approach. The throughput of the proposed approach is 100%.
the proposed approach and [22] on a uniform platform due
to the following reasons. The approach in [22] performs
ACKNOWLEDGMENT
the power computation for floating point numbers where as
proposed approach performs the computation for fixed point All the Computer Aided Design tools are supported under
numbers. The another reason is implementation technology the Special Manpower Development Program (SMDP) of the
node used are different in proposed approach and [22]. How- Ministry of Electronics and Information Technology (MEITY),
ever, to provide more insight to the readers in this subsection, Government of India.
we tabulated the comparison results between the proposed
approach and [22] in Table XII. In [22], the logarithm and R EFERENCES
exponential computations are performed using a second order
polynomial approximation. The polynomial approximation [1] A. S. Glassner, Principles of Digital Image Synthesis. San Mateo, CA,
USA: Morgan Kaufmann, 1995.
methods will be implemented using Look up tables for low- [2] D. Harris, “A powering unit for an Open GL lighting engine,” in Proc.
precision. From the Table XII, it can be noted that the approach 35th Asilomar Conf. Signals, Syst., Comput., Nov. 2001, pp. 1641–1645.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

14 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

[3] S. P. Mohanty, N. Ranganathan, and R. K. Namballa, “VLSI implemen- [19] S. Aggarwal, P. K. Meher, and K. Khare, “Concept, design,
tation of visible watermarking for secure digital still camera design,” and implementation of reconfigurable CORDIC,” IEEE Trans. Very
in Proc. 17th Int. Conf. VLSI Design, Mumbai, India, Jun. 2004, Large Scale Integr. (VLSI) Syst., vol. 24, no. 4, pp. 1588–1592,
pp. 1063–1068. Apr. 2016.
[4] W. Liu and A. Nannarelli, “Power efficient division and square root [20] S. Mopuri, S. Bhardwaj, and A. Acharyya, “Coordinate rotation-based
unit,” IEEE Trans. Comput., vol. 61, no. 8, pp. 1059–1070, Apr. 2012. design methodology for square root and division computation,” IEEE
[5] A. Seth and W.-S. Gan, “Fixed-point square roots using L-b truncation,” Trans. Circuits Syst., II, Exp. Briefs, vol. 66, no. 7, pp. 1227–1231,
IEEE Signal Process. Mag., vol. 28, no. 6, pp. 149–153, Nov. 2011. Jul. 2019.
[6] H. Kabuo et al., “Accurate rounding scheme for the Newton-Raphson [21] X. Hu, R. G. Harber, and S. C. Bass, “Expanding the range of
method using redundant binary representation,” IEEE Trans. Comput., convergence of the CORDIC algorithm,” IEEE Trans. Comput., vol. 40,
vol. 43, no. 1, pp. 43–51, Jan. 1994. no. 1, pp. 13–21, Jan. 1991.
[7] P. Montuschi, J. D. Bruguera, L. Ciminiera, and J.-A. Piñeiro, [22] F. de Dinechin, P. Echeverría, M. López-Vallejo, and B. Pasca, “Floating-
“A digit-by-digit algorithm for mth root extraction,” IEEE Trans. Com- point exponentiation units for reconfigurable computing,” ACM Trans.
put., vol. 56, no. 12, pp. 1969–1706, Dec. 2007. Reconfigurable Technol. Syst., vol. 6, no. 1, May 2013, Art. no. 4.
[8] A. Vázquez and J. D. Bruguera, “Composite iterative algorithm and [23] Y. Luo, Y. Wang, Y. Ha, Z. Wang, S. Chen, and H. Pan, “Generalized
architecture for q-th root calculation,” in Proc. IEEE Symp. Comput. hyperbolic CORDIC and its logarithmic and exponential computation
Arith. (ARITH), Jul. 2011, pp. 52–61. with arbitrary fixed base,” IEEE Trans. Very Large Scale Integr. (VLSI)
[9] Y. Luo, Y. Wang, H. Sun, Y. Zha, Z. Wang, and H. Pan, “CORDIC-based Syst., vol. 27, no. 9, pp. 2159–2169, Sep. 2019.
architecture for computing nth root and its implementation,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 65, no. 12, pp. 4183–4195, Dec. 2018.
[10] A. A. Liddicoat and M. J. Flynn, “Parallel square and cube compu-
tations,” in Proc. 34th Asilomar Conf. Signals, Syst. Comput., vol. 2,
Oct./Nov. 2000, pp. 1325–1329.
[11] J. E. Stine and J. M. Blank, “Partial product reduction for parallel Suresh Mopuri received the B.Tech. degree (Hons.)
cubing,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI), in electronics and communication engineering from
Porto Alegre, Brazil, Mar. 2007, pp. 337–342. the Sri Venkateswara University College of Engi-
[12] S. Bui, J. E. Stine, and M. Sadeghian, “Experiments with high speed neering, Tirupati, India, in 2012. He is currently
parallel cubing units,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, pursuing the Ph.D. degree with the Department of
Tampa, FL, USA, Jul. 2014, pp. 48–53. Electrical Engineering, Indian Institute of Technol-
[13] H. Thapliyal, S. Kotiyal, and M. B. Srinivas, “Design and analysis of ogy Hyderabad (IIT Hyderabad), as an External
a novel parallel square and cube architecture based on ancient Indian Student. He joined as a Research Scholar with the
Vedic mathematics,” in Proc. 48th Midwest Symp. Circuits Syst., Vol. 2, Indian Institute of Technology, Hyderabad. He is
Aug. 2005, pp. 1462–1465. also a Scientist with the Tracking Systems Group,
[14] J.-A. Pineiro, M. D. Ercegovac, and J. D. Bruguera, “High-radix iterative Indian Space Research Organization (ISRO). His
algorithm for powering computation,” in Proc. 16th IEEE Symp. Comput. research interests include signal processing algorithms, VLSI architectures,
Arithmetic, Santiago de Compostela, Spain, Jun. 2003, pp. 204–211. low power design techniques, radar signal processing, and weather signal
doi: 10.1109/ARITH.2003.1207680. processing.
[15] P. Meher, J. Valls, T.-B. Juang, K. Sridharan, and K. Maharatna, “50
years of CORDIC: Algorithms, architectures, and applications,” IEEE
Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 1893–1907,
Sep. 2009.
[16] S. Aggarwal, P. K. Meher, and K. Khare, “Scale-free hyperbolic Amit Acharyya (M’11) received the Ph.D. degree
CORDIC processor and its application to waveform generation,” IEEE from the School of Electronics and Computer Sci-
Trans. Circuits Syst. I, Reg. Papers, vol. 60, no. 2, pp. 314–326, ence, University of Southampton, U.K., in 2011.
Feb. 2013. He is currently an Associate Professor with the
[17] A. Acharyya, K. Maharatna, B. M. Al-Hashimi, and J. Reeve, Indian Institute of Technology Hyderabad (IIT
“Coordinate rotation based low complexity N-D fast ICA algorithm Hyderabad), India. His research interests include
and architecture,” IEEE Trans. Signal Process., vol. 59, no. 8, VLSI systems design for real-time resource-
pp. 3997–4011, Aug. 2011. constrained applications, machine learning and sig-
[18] S. Mopuri and A. Acharyya, “Low-complexity methodology for complex nal processing hardware architecture design, edge
square-root computation,” IEEE Trans. Very Large Scale Integr. (VLSI) computing, health-care technology, hardware secu-
Syst., vol. 25, no. 11, pp. 3255–3259, Nov. 2017. rity, and design for testability and reliability.

You might also like