A Novel Area-Efficient VLSI Architecture For Recursion Computation in LTE
A Novel Area-Efficient VLSI Architecture For Recursion Computation in LTE
6, JUNE 2015
Abstract—Long-term evolution (LTE) is aimed to achieve the iterative decoding nature, high latency, and significant silicon
peak data rates in excess of 300 Mb/s for the next-generation wire- area consumption. The decoding procedure is performed using
less communication systems. Turbo codes, the specified channel- the algorithm presented in [3]. Since the implementation of
coding scheme in LTE, suffer from a low-decoding throughput the actual maximum a posteriori (MAP) algorithm incurs very
due to its iterative decoding algorithm. One efficient approach
to achieve a promising throughput is to use multiple maximum
high computational complexity, typically, two modified forms
a posteriori (MAP) cores in parallel, resulting in a large area of the MAP algorithm, i.e., the max-log-MAP and log-MAP
overhead. The two computationally challenging units in an MAP algorithms [4], [5], are commonly realized instead.
core are α and β recursion units. Although several methods have In these two alternative methods, the MAP core consists of
been proposed to shorten the critical path of these recursion units, log-likelihood ratio (LLR) units, as well as the core units to
their area-efficient architecture with minimum silicon area is still compute α, β, and γ, i.e., the forward, backward, and branch
missing. In this brief, a novel relation existing between the α and metrics, respectively. In fact, the α and β units, due to their
β metrics is introduced, leading to a novel add–compare–select
(ACS) architecture. The proposed technique can be applied to both recursive computation nature, are the most challenging units to
the precise approximation of log-MAP and max-log-MAP ACS implement, occupying almost 40% of the whole MAP core area
architectures. The proposed ACS design, which is implemented [6]. The γ unit, on the other hand, is a trivial part of the turbo
in a 0.13-μm CMOS technology and customized for the LTE decoder, consisting of few addition computations. Therefore,
standard, results in, at most, 18.1% less area compared with the an area-efficient architecture for α and β metrics computation
reported designs to date while maintaining the same throughput is highly desirable, which has always been a challenge in
level.
literature.
Index Terms—Add–compare–select (ACS) unit, long-term evo- In order to address this challenge, in this brief, a new relation
lution (LTE), parallel architecture, radix-4, recursion unit, turbo between the α and β metrics is introduced; based on this new
decoder, very-large-scale integration (VLSI). relation, a novel add–compare–select (ACS) unit for forward
and backward computation is proposed. The proposed scheme
I. I NTRODUCTION results in, at most, an 18.1% reduction in the silicon area
compared with the designs reported to date.
M ANY advanced wireless communication standards
adopted turbo codes as the channel coding scheme due
to its near Shannon error-correcting performance [1]. The de- II. T URBO D ECODER A LGORITHM
coding procedure is performed in two different half iterations,
where the reliability of received bits is computed in the form The MAP algorithm, which provides the a posteriori proba-
of extrinsic values using interleavers and soft-input–soft-output bility for each bit, is used in iterative decoding of turbo codes.
(SISO) decoders in an iterative way. On even half iterations, the The MAP algorithm provides the probability of the decoded bit
decoding process is performed on the noninterleaved data and uk being either +1 or −1 for the received symbol sequence y
parity, whereas on odd half iterations, the interleaved data are by calculation of the LLR values as
decoded. The extrinsic values, representing the reliability of the
p(uk = +1|y)
information bits, are sent to another half iteration by passing L(uk |y) = log (1)
p(uk = −1|y)
through the interleaver/deinterleaver unit until the acceptable
error level is achieved. where p(uk = +1|y) and p(uk = −1|y) denote the probabili-
Recently, long-term evolution (LTE) advanced has been ties of bit uk being +1 and −1, respectively.
dominated as the next-generation wireless communication stan- The turbo decoder specified in LTE consists of two recur-
dard, which is aimed at higher peak data rates close to sive convolutional encoders, i.e., an interleaver and a feed-
3 Gb/s [2]. The turbo decoder, which is specified in LTE, through path, as shown in Fig. 1(a). The feed-through passes
reveals to be a limiting block toward this goal due to its one block of K information bits, which are called systematic
bits xsk , where k = 0, 1, . . . , K − 1. The parity generated by
Manuscript received October 8, 2014; revised December 3, 2014; accepted
January 13, 2015. Date of publication February 24, 2015; date of current the convolutional encoder is denoted by xp1 k . By permuting
version May 29, 2015. This brief was recommended by Associate Editor the systematic bits via the interleaver, the second sequence of
G. Masera. parity is generated by passing through the second convolutional
The authors are with the Department of Electrical Engineering, Sharif
University of Technology, Tehran 14588 89694, Iran (e-mail: ardakani.arash@
encoder, which is denoted by xp2 k . On the receiver side, the
gmail.com; [email protected]). reliability of bits is computed iteratively by exchanging the
Digital Object Identifier 10.1109/TCSII.2015.2407232 extrinsic LLRs between two SISO decoders based on (1), as
1549-7747 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ARDAKANI AND SHABANY: ARCHITECTURE FOR RECURSION COMPUTATION IN LTE TURBO DECODERS 569
Fig. 1. (a) Turbo encoder. (b) Turbo decoder. (c) Radix-2 trellis diagram. (d) Partial radix-4 trellis diagram.
depicted in Fig. 1(b). Another representation of a convolutional where α, β, and γ are defined as
encoder is by using a trellis diagram, as shown in Fig. 1(c),
depicting two steps of the LTE turbo encoder. αk (s) = log (α̃k (s)) (9)
Applying few mathematical manipulations on (1) leads to
βk (s) = log β̃k (s) (10)
uk =+1 α̃k−1 (s )β̃k (s)γ̃k (s , s) γk (s , s) = log (γ̃k (s , s)) . (11)
LLR(uk ) = log (2)
uk =−1 α̃k−1 (s )β̃k (s)γ̃k (s , s)
The preceding logarithmic formulation of the MAP algo-
rithm is used to make the implementation of this algorithm
where α̃k (s), β̃k (s), and γ̃k (s , s) denote the forward, back-
ward, and branch metrics, respectively. The s and s indexes feasible. The γ values, according to (8), can be readily realized
are also associated with trellis steps k and k − 1, respectively. through few additions, not critical in hardware. In fact, the
The MAP algorithm traverses in both forward and backward computation of α, β, and LLR values makes up the major
directions to get state metrics α̃k (s) and β̃k (s), respectively. computation part of the algorithm, occupying the major fraction
The transmission value in the kth stage from the state s to the of the silicon area. In order to implement the logarithmic
state s is denoted by γ̃k (s , s). The calculations of the α̃k (s), computations efficiently in hardware, two common approaches
β̃k (s), and γ̃k (s , s) metrics are performed as are normally used, namely, the max-log-MAP and precise
approximation of log-MAP algorithms, which are described in
the following.
α̃k (s) = γ̃k (s , s)α̃k−1 (s ) (3)
Consider the following equation, which is used to implement
s
the logarithm:
β̃k−1 (s ) = γ̃k (s , s)β̃k (s) (4)
s
max ∗(z, t) = log(ez + et )
1 1 1
γ̃k (s , s) = exp Le (uk )uk + Lc Xks uk + Lc Xkp ck (5) = max(z, t) + log 1 + e−|z−t| (12)
2 2 2
where Xks and Xkp are the received soft inputs corresponding to where the max function denotes the maximum value. In the pre-
transmitted bits xsk and xpk , respectively. The value of Le (uk ) cise approximation of log-MAP method, the first term in (12),
denotes the extrinsic value of uk , and Lc is the channel re- i.e., max(z, t), can be easily implemented by a comparator,
liability measure. uk and ck are the transmitted values of the whereas the second term, i.e., log(1 + e−|z−t| ), is implemented
systematic and parity bits, respectively, which can be either +1 using a lookup table (LUT) (see [7]). On the other hand, the
or −1. max-log-MAP method relies on the approximation of log(ez +
Due to the high computational complexity of the MAP et ) by the maximum of z and t, i.e., max ∗(z, t) ≈ max(z, t)
algorithm, which is as a result of the exponential and multipli- (see [6]). The hardware realization of the max-log-MAP results
cation calculations, typically, an equivalent logarithmic form is in a lower critical path and computational complexity compared
employed, where a multiplication is converted to an addition. with the precise approximation of log-MAP implementation,
In this case, the corresponding equations in (2)–(5) can be whereas its performance loss is an inevitable side effect.
reformulated as By using either the max-log-MAP or the precise approx-
imation of log-MAP algorithm, the recursive computation is
performed as
αk (s) = log exp (γk (s , s) + αk−1 (s )) (6)
s
αk (s) = max ∗ γk (s , s) + αk−1 (s ) (13)
βk−1 (s ) = log exp (γk (s , s) + βk (s)) (7) s
s
1 1 1 βk−1 (s ) = max ∗ γk (s , s) + βk (s) . (14)
γk (s , s) = Le (uk )uk + Lc Xks uk + Lc Xkp ck (8)
2 2 2 s
570 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 62, NO. 6, JUNE 2015
TABLE I
S ETS OF E QUAL γ S FOR R ADIX -4 C OMPUTATION
V. P ROPOSED S CHEME
According to (16) and (17), the following expressions are
obtained:
R EFERENCES
[1] C. Berrou and A. Glavieux, “Near optimum error correcting coding
and decoding: Turbo-codes,” IEEE Trans. Commun., vol. 44, no. 10,
pp. 1261–1271, Oct. 1996.
[2] S. Belfanti, C. Roth, M. Gautschi, C. Benkeser, and Q. Huang, “A 1 Gbps
LTE-advanced turbo-decoder ASIC in 65 nm CMOS,” in Proc. VLSIC
Symp., Jun. 2013, pp. C284–C285.
[3] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear
codes for minimizing symbol error rate (Corresp.),” IEEE Trans. Inf.
Theory, vol. IT-20, no. 2, pp. 284–287, Mar. 1974.
[4] V. Franz and J. Anderson, “Concatenated decoding with a reduced-
search BCJR algorithm,” IEEE J. Sel. Areas Commun., vol. 16, no. 2,
pp. 186–195, Feb. 1998.
[5] J. Woodard and L. Hanzo, “Comparative study of turbo decoding tech-
niques: An overview,” IEEE Trans. Veh. Technol., vol. 49, no. 6,
max-log-MAP algorithm, the scaled max-log-MAP is normally pp. 2208–2233, Nov. 2000.
[6] C. Studer, “Iterative MIMO decoding: Algorithms and VLSI implemen-
used, which reduces the performance gap between the max-log- tation aspects,” Ph.D. dissertation, Dept. Inform. Technol. Elect. Eng.,
MAP and log-MAP algorithms to 0.1 dB, whereas near-optimal ETH Zurich, Zurich, Switzerland, Jun. 2009.
performance to log-MAP algorithm provides some advantages [7] L. Li, R. Maunder, B. Al-Hashimi, and L. Hanzo, “A low-complexity
over scaled max-log-MAP detailed in [13]. To this point, many turbo decoder architecture for energy-efficient wireless sensor networks,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 1,
architectures based on precise approximation of log-MAP al- pp. 14–22, Jan. 2013.
gorithm are presented in the literature. The proposed MSR [8] G. Fettweis and H. Meyr, “Parallel Viterbi algorithm implementation:
method also can be applied to these architectures [13]–[18], Breaking the ACS-bottleneck,” IEEE Trans. Commun., vol. 37, no. 8,
pp. 785–790, Aug. 1989.
which proves that their MSR version results in up to 18.1% [9] C. Studer, C. Benkeser, S. Belfanti, and Q. Huang, “Design and imple-
lower area without any performance degradation. It is worth mentation of a parallel turbo-decoder ASIC for 3GPP-LTE,” IEEE J.
mentioning that each architecture shown in Table II results in Solid-State Circuits, vol. 46, no. 1, pp. 8–17, Jan. 2011.
different bit error rate performances (see [13]). [10] Z. Wang, “High-speed recursion architectures for MAP-based turbo de-
coders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 4,
Moreover, in order to increase the throughput, a higher radix pp. 470–474, Apr. 2007.
architecture can be used. In [19], a method for computing a [11] C. Studer, S. Fateh, C. Benkeser, and Q. Huang, “Implementation trade-
higher radix computation is proposed as a two-stage technique, offs of soft-input soft-output MAP decoders for convolutional codes,”
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 11, pp. 2774–2783,
where it is proposed to use two serial radix-2PT /2 stage ACSs Nov. 2012.
instead of one radix-2PT recursive computation, presenting [12] M. Bickerstaff, L. Davis, C. Thomas, D. Garrett, and C. Nicol, “A 24 Mb/s
55% reduction for PT = 4 in the final silicon area. The MSR radix-4 logMAP turbo decoder for 3GPP-HSDPA mobile wireless,” in
method, which is proposed in this brief, can be also used on top Proc. IEEE ISSCC Tech. Dig. Papers, 2003, vol. 1, pp. 150–484.
[13] S. Papaharalabos, P. Mathiopoulos, G. Masera, and M. Martina, “On op-
of this two-stage technique, providing an additional 15% saving timal and near-optimal turbo decoding using generalized max operator,”
in the silicon area, when the conventional ACS architecture is IEEE Commun. Let., vol. 13, no. 7, pp. 522–524, Jul. 2009.
used. In fact, since the proposed MSR technique is devised [14] J.-F. Cheng and T. Ottosson, “Linearly approximated log-MAP algo-
for radix-4 ACS architectures, in order to implement a higher rithms for turbo decoding,” in Proc. IEEE VTC—Spring,Tokyo, Japan,
May 2000, vol. 3, pp. 2252–2256.
radix ACS architecture, it can be divided into few radix-4 ACS [15] S. Talakoub, L. Sabeti, B. Shahrrava, and M. Ahmadi, “An improved max-
blocks. For instance, a radix-16 ACS architecture, i.e., PT = 4, log-MAP algorithm for turbo decoding and turbo equalization,” IEEE
may be implemented using two radix-4 architectures. As for the Trans. Instrum. Meas., vol. 56, no. 3, pp. 1058–1063, Jun. 2007.
[16] B. Classon, K. Blankenship, and V. Desai, “Channel coding for 4G sys-
other values of PT , e.g., PT = 3, the radix-22 × 21 architecture tems with adaptive modulation and coding,” IEEE Wireless Commun.,
may be used, where the MSR is applied to the radix-22 stage vol. 9, no. 2, pp. 8–13, Apr. 2002.
for area reduction purposes. In [20], the higher radix ACS [17] W. Gross and P. Gulak, “Simplified MAP algorithm suitable for im-
architecture is constructed based on a basic architecture referred plementation of turbo decoders,” Electron. Lett., vol. 34, no. 16,
pp. 1577–1578, Aug. 1998.
to as maximum value generator 2 (MVG-2). The MSR method [18] H. Wang, H. Yang, and D. Yang, “Improved Log-MAP decoding al-
can be also applied to the MVG-2 architecture, resulting in gorithm for turbo-like codes,” IEEE Commun. Lett., vol. 10, no. 3,
lower silicon area compared with its original architecture. pp. 186–188, Mar. 2006.
[19] C.-C. Wong, M.-W. Lai, C.-C. Lin, H.-C. Chang, and C.-Y. Lee, “Turbo
decoder using contention-free interleaver and parallel architecture,” IEEE
J. Solid-State Circuits, vol. 45, no. 2, pp. 422–432, Feb. 2010.
VII. C ONCLUSION [20] M. Martina, S. Papaharalabos, P. Mathiopoulos, and G. Masera, “Simpli-
fied log-MAP algorithm for very low-complexity turbo decoder hardware
In this brief, by investigating the relation between the re- architectures,” IEEE Trans. Instrum. Meas., vol. 63, no. 3, pp. 531–537,
cursion computations, a novel method has been proposed, Mar. 2014.