TCAS_CUI_2020
TCAS_CUI_2020
Abstract— Low-density parity-check (LDPC) code as a very coding scheme in the enhanced mobile broadband (eMBB)
promising error-correction code has been adopted as the channel scenario [5]. LDPC codes can perform close to the Shannon
coding scheme in the fifth-generation (5G) new radio. However, limit when paired with the belief propagation (BP) decoding
it is very challenging to design a high-performance decoder
for 5G LDPC codes because their inherent numerous degree-1 algorithm [6]. However, the BP algorithm involves complex
variable-nodes are very prone to be erroneous. In this article, non-linear functions in check-node (CN) processing, lead-
the problem is solved gracefully by developing a low-complexity ing to large implementation complexity. As an alternative,
check-node update function, greatly improving the reliability of the min-sum (MS) algorithm [7] was proposed and became
check-to-variable messages. By further incorporating the pro- the primary solutions in practical applications. By approxi-
posed column degree adaptation strategy, our decoder could offer
a 0.4dB performance gain over the existing ones. In addition, this mating the non-linear functions with simple summation and
article presents an efficient 5G LDPC decoder architecture. Ben- comparison operations, the MS algorithm can get significant
efiting the specific structure of 5G LDPC codes, layer merging, complexity reduction at the cost of obvious performance loss.
split storage method, and selective-shift structure are introduced By introducing the correction factor to decoding, the nor-
to facilitate a significant reduction of decoding delay and area malized MS (NMS) and offset MS (OMS) algorithms could
consumption. Implementation result on 90-nm CMOS technology
demonstrates that the proposed decoder architecture yields an offer a better balance between decoding complexity and
impressive improvement in throughput-to-area ratio, achieving performance [8].
up to 173.3% compared to conventional design. This article targets the design of an area-efficient and high-
Index Terms— Low-density parity-check codes, 5G LDPC performance 5G LDPC decoder. In general, 5G LDPC codes
decoder, high-performance, VLSI implementation. are built from a concatenation of a high-rate LDPC code and
a low-density generator matrix (LDGM) code [9]. Since the
variable-nodes (VNs) in the LDGM part are degree-1 VNs
I. I NTRODUCTION
which can only receive one check-to-variable (CTV) message
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 3
matrix and adopts a dual diagonal structure for parity bits Step 2 (CN update): In the BP decoding, the CTV message
to simplify the encoding process. The extension part has an is given by
equal amount of VNs and CNs, and all extension VNs are
degree-1 nodes. O denotes an all-zero matrix and I denotes (t )
αm,n (t )
= τm,n ·φ φ(|βn(t ,m
)
|) , (3)
an identity matrix. The core checks usually have a higher row n ∈N (m)\n
degree than the extension checks. The leftmost two columns of
(t ) (t )
the base matrix correspond to the punctured bits, also known where τm,n = sgn(βn ,m ) and φ(x) = − log[tanh(x/
as the state bits. One important feature for 5G LDPC codes is n ∈N (m)\n
(t )
that they are extremely irregular, which means there exists a 2)]. Considering φ −1 (x) = φ(x) and the magnitude of αm,n
(t )
significant difference in row degrees and column degrees. For is dominated by the minimum input |βn ,m | [24], the MS
instance, in base matrix BG2, dv varies from 1 to 23 and dc simplifies (3) according to
varies from 3 to 10.
(t ) (t ) (t )
αm,n τm,n · φ φ( min |βn ,m |)
n ∈N (m)\n
C. Fixed-Point LDPC Decodings (t ) (t )
= τm,n · min |βn ,m |. (4)
n ∈N (m)\n
Assume an LDPC codeword c = {c0 , c1 , . . . , c N−1 } is
transmitted over the additive white Gaussian noise (AWGN) Since φ(|βn(t ,m
)
|) > 0, we have φ( min |βn(t ,m )
|) <
channel using the binary phase shift keying (BPSK) modula- n ∈N (m)\n
(t )
tion, the received vector y is φ(|βn ,m |). Moreover, because φ(x) is a decreasing
n ∈N (m)\n
yi = x i + n i , n i ∼ N (0, σ ), i = 0, 1, . . . , N − 1, (1)
2 function, it can be deduced from (3) and (4) that the MS
decoding overestimates the magnitudes of CTV messages
where x i = 1 − 2ci and n i is a Gaussian random variable compared to the BP decoding, leading to the performance
with zero mean and variance σ 2 . In fixed-point implementa- degradation [11]. To alleviate the overestimation, an offset
tions, the quantized version of y, denoted by γ , is typically factor is included in the OMS decoding, as shown in (5).
input to the decoders. Let represent the input alphabet
(t ) (t ) (t )
comprising of integers, and then we have γi = [μ · yi ] αm,n = τm,n · max min |βn ,m | − λ, 0 , (5)
n ∈N (m)\n
where μ > 0 is a constant referred to as the gain factor. [x]
returns the closest integer to x that belongs to . Assume where λ denotes the offset factor. In fixed-point implementa-
the input messages are expressed by q bits, we have = tions, λ is typically fixed to 1, which is the least significant
{−Q, . . . , −1, 0, 1, . . . , +Q} where Q = 2q−1 − 1. Actually, bit (LSB) under the integer representation [20].
μ = 2q means that all channel LLR values are shifted q bits to The only difference between the AMS and OMS algorithms
the left and then rounded to integers, which is the same as the is the CN processing procedure. To reduce the error proba-
usual quantization method when q fraction bits are preserved. bility of degree-1 VN, the AMS decoder processes the core
Moreover, the introduced quantization method is more flexible checks and extension checks differently using different offset
because the values of μ could be optimized to other values factors [10], as shown in (6). For the core checks, λ is set to
besides 2q for better decoding performance [23]. 1 to obtain the gain from the offset principle while λ is set to
Let αm,n and βn,m denote the messages passed from the 0 for the extension checks to reduce the offset effect on these
mth CN to the nth VN and from the nth VN to the mth VNs.
CN, respectively. γ̃ denotes the a-posteriori-probability (APP)
vector. The exchanged messages αm,n and βn,m are quantized (t ) applying (5), for the core checks
αm,n = (6)
to q bits. Since the APP messages are generally larger than the applying (4), for the extension checks.
input and exchanged messages, γ̃n is quantized to q̃ bits where
Step 3 (APP update): In order to achieve better precision,
q̃ > q to avoid clipping. A = {− Q̃, . . . , −1, 0, 1, . . . , + Q̃} (t )
denotes the alphabet for γ̃ where Q̃ = 2q̃−1 − 1. β̃n,m in (2) is used to update APP values according to
The decoding process of the layered schedule is described γ̃n(t ) = [αm,n
(t ) (t )
+ β̃n,m ]A , (7)
as follows.
1) Initialization: where function [·]A is applied to ensure the updated APP
Assign the values of the input vector γ to the APP vector γ̃ . values are taken from alphabet A.
Moreover, all CTV messages αm,n are initialized with zeros. After all layers have been processed, the tentative codeword
2) Iterative Process: ĉ(t ) can be obtained by applying the hard-decision to vector
In the layered schedule, each iteration comprises several γ̃ (t ) according to
decoding layers. The decoding is executed layer by layer and
each layer has three steps. (t ) (t ) 0, γ̃n(t ) ≥ 0
ĉn = H D(γ̃n ) =
Step 1 (VN update): In the tth iteration, the variable-to-
(t )
1, γ̃n(t ) < 0.
check (VTC) message βn,m is calculated by
The decoding stops when all parity check equations are satis-
(t ) (t )
βn,m = [β̃n,m ] = [γ̃n(t ) − αm,n
(t −1)
] . (2) fied or the maximum number of iterations I tmax is reached.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
III. T HE P ROPOSED IAMS D ECODING A LGORITHM Property: The offset factor λ will be 1 only when mi n 1 and
A. The Modified CN-Update Function mi n 2 are both strictly positive and equal. Otherwise, λ = 0.
Proof: To prove this property, we consider three cases.
As mentioned above, all extension VNs in 5G LDPC codes
Case 1: mi n 1 = 0. In this case, a = 0. Then,
are with degree-1 and each is connected to a unique CN.
Consequently, these VNs only receive one CTV message 1 + e− 1
λ = log
−
+ = 0.
in each iteration so they are sensitive to the reliability of 1+e 2
CTV messages and the choice of offset factor. In fixed-point Case 2: mi n 1 > 0 and mi n 1 = mi n 2 . In this case, a ≥ 1
implementations, the offset factor is generally not optimal so and ≥ 1. Therefore,
the reliability of CTV messages is limited due to the limited
1 + e−
bit representation of messages, which is the main reason for log
the severe performance degradation appearing in fixed-point 1 + e−(2a+ )
1 + e−1
OMS decoder. In order to improve the performance of 5G ≤ log < log(1 + e−1 )
LDPC decoders, we propose a new CN-update function in this 1 + e−(2a+1)
subsection to improve the reliability of CTV messages, and 1 + e− 1 1
⇒ log + < log(1 + e−1 ) + < 0.7133
thus efficiently benefits the performance improvement of 5G 1 + e−(2a+ ) 2 2
LDPC decoders. ⇒ λ = 0.
Denote the first and second minimum magnitudes of the
Case 3: mi n 1 > 0 and mi n 1 = mi n 2 . In this case, a ≥ 1
input VTC messages in a CN by mi n 1 and mi n 2 , respectively.
and = 0. Let = 0 and we have
In order to maintain the low computation complexity, we only
use these two values which are available in conventional 1 + e− 2
log = log < log 2
MS decoder to design a new CN-update function. Let i d x 1 1 + e−(2a+ ) 1 + e−2a
and i d x 2 be the indices of VNs corresponding to mi n 1 and 1 + e− 1
mi n 2 , respectively. I(m) is defined as I (m) = {i d x 1, i d x 2 } ⇒ log + < 1.1931
1 + e−(2a+ ) 2
and Ī(m) = N (m) \ I(m). Observing (3) we notice that, for ⇒ λ < 2.
n ∈ Ī(m), both mi n 1 and mi n 2 are extrinsic VTC messages
(t ) Also,
that are used to calculate the CTV message αm,n . Since the
(t )
magnitude of αm,n is dominated by the minimum magnitude 1 + e− 2 2
log = log −2a
≥
of extrinsic VTC messages, a sufficient precision can be 1 + e−(2a+ ) 1+e 1 + e−2
achieved if the first and second minimum magnitudes of the 1+e − 1
extrinsic VTC messages are both employed to approximate ⇒ log −(2a+ )
+ ≥ 1.0662
1+e 2
the CN-update function of the BP algorithm. Therefore, for ⇒ λ ≥ 1.
n ∈ Ī(m), we approximate the CN-update function shown in
(3) to Therefore, we have λ = 1.
Based on this property, the offset factor for n ∈ Ī(m)
(t )
αm,n (t )
= τm,n · φ φ(mi n 1 ) + φ(mi n 2 ) . (8) can be determined according to mi n 1 and mi n 2 . For n ∈
I (m), we cannot obtain a more precise correction factor
It can be seen that the overestimation of the CTV messages only based on mi n 1 and mi n 2 . Since MS decoder performs
appearing in the MS algorithm could be alleviated by using better than OMS decoder on 5G LDPC codes in fixed-point
(8) since more extrinsic VTC messages are included. Based implementations [10], λ is set to 0 for n ∈ I (m). The proposed
on the approximate-min* algorithm proposed in [15], (8) can CN-update function is shown in (10), which still remains the
also be written as low-complexity property.
⎧ (t )
(t ) (t ) ⎪ τm,n · mi n2 , n = i d x1
αm,n = τm,n · (mi n 1 mi n 2 ), (9) ⎪
⎪
⎨ τ (t ) · mi n , n = i d x2
−|x−y| (t ) 1
where x y = min(|x|, |y|) − log 1+e . In fact, (9) can αm,n = m,n
(t ) (10)
1+e−|x+y| ⎪
⎪ τm,n · max(mi n 1 − 1, 0), n ∈ Ī(m) & = 0
be viewed as the MS decoding with an offset factor which is ⎪
⎩ (t )
inherently optimized by the BP decoding. For simplicity, let a τm,n · mi n1 , n ∈ Ī(m) & = 0.
represent mi n 1 so mi n 2 = a + . Therefore, the offset factor To demonstrate the effectiveness of the proposed CN-update
λ in (9) is function, the mismatch probabilities of different CN-update
1 + e− functions are shown in Fig. 2, where the exchanged messages
λ = log . are quantized to 4 bits, i.e., q = 4. Therefore, the values of
1 + e−(2a+ )
|βn,m | can only be 0 ∼ 7 so the total number of combinations
Since mi n 1 and mi n 2 are both non-negative integers in fixed- of the received messages in a degree-dc CN is 8dc (2(q−1)·dc ).
point implementations, a and are also non-negative integers. For each case, if the CTV value calculated by the tested
Therefore, we can conclude that λ ≥ 0 so the quantized decoder is not equal to the CTV value calculated by the 4-bit
version of λ is quantized BP decoder, we consider this case as a mismatch
1 + e− 1 case. The mismatch probability is obtained by testing all 8dc
λ = log + .
1 + e−(2a+ ) 2 cases and then calculating the proportion of mismatch cases.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 5
From Fig. 2 we can see that, compared to the MS, OMS, and degree adaptation is only applied to core checks, whose CTV
AMS decodings, the proposed CN-update function shows a messages show a lower mismatch probability than those of
much lower mismatch probability in the simulated row degree extension checks when applying the OMS decoding, as shown
region, which is also the row degree region for BG2 codes. in Fig. 2. Consequently, the influence of strong messages
Therefore, the reliability of CTV messages is significantly to the decoding process could be managed to some extent
improved, especially for the extension checks. It can also by adjusting parameter D and the decoding performance
be seen that the OMS decoder shows a higher mismatch could get a better balance in the waterfall and error-floor
probability for the extension checks while a lower probability regions. Moreover, one can select a proper D to get the best
for the core checks compared with the MS decoder. The AMS performance in the required SNR region.
decoder [10] combines the advantages of the MS and OMS The effectiveness of the column degree adaptation is illus-
decoders, which explains its performance improvement. trated in Fig. 3. In this work, we divide the codeword into Nb
groups and each group corresponds to a column in base matrix
B. Column Degree Adaptation H B . Consequently, a group consists of Z bits and the bits
As stated before, 5G LDPC codes are extremely irregular in each group have the same column degree. In simulations,
and there exists a wide variation in column degrees. In base a group is considered as erroneous if there exists an error bit
matrix BG2, the column degree varies from 1 to 23 and from in the group. Since 5G LDPC codes are extremely irregular,
1 to 30 in BG1. With more neighbor CNs, the high degree VNs the degrees of bits are very different, so the bits with different
usually have larger APP magnitudes, which are called strong column degrees may perform differently under the same
messages. These strong messages can be helpful or harmful decoding algorithm. Considering the bits in different groups
to the decoding process, depending on whether they are cor- may have different column degrees, Fig. 3 shows the error-rate
rect or not. In the waterfall region where many bits are received of each group when E b /N0 = 2.0dB. The R = 1/5, Z = 52,
incorrectly, the incorrect strong messages tend to negatively N = 2600 5G LDPC code defined by BG2 is applied and
influence the correction of the received bits. In the error-floor all decodings are quantized with parameters (q, q̃) = (4, 6).
region where the channel conditions are good and trapping-sets For each decoding, at least 1000 error frames are collected.
dominate the decoding performance [25], the correct strong We denote the decoding where only the proposed CN-update
messages can overcome the incorrect messages in trapping- function is applied as M1 and the decoding where both the
sets and thus contribute to improving the decoding perfor- proposed CN-update function and column degree adaptation
mance [26]. Therefore, the requirement of strong messages is are applied as M2, namely the IAMS algorithm. The parameter
different in different SNR regions. D is selected by traversing all row degrees of the code to
In order to manage the influence of strong messages on find the value which shows the optimal performance through
the decoding process, we propose a column degree adaptation Monte-Carlo simulations. For the selected code, D = 6.
strategy in which the CTV messages passed to different VNs Considering the degrees of bits in the first two groups are
from a CN are computed non-uniformly. Observing (5) and much larger than others, these bits have more chances to be
(10) we can conclude that the magnitudes of CTV messages corrected so the first two groups show the best performance,
computed by the OMS decoding are generally smaller than especially for the OMS and M2 decoders. Since Fig. 3 shows
those by the proposed CN-update function. To limit the the simulation results in the low SNR region where many bits
magnitudes’ growth of strong messages, the CTV messages are received incorrectly, the propagation of incorrect strong
transmitted to the VNs whose degrees are larger than thresh- messages has larger negative influence to decoding than the
old D is computed using the CN-update function of the imprecise offset factor. Therefore, the OMS decoder performs
OMS decoding rather than the proposed CN-update function. better than the MS decoder. However, they both perform worse
To avoid over-correction to strong messages, the column than the AMS decoder [10], which is the state-of-the-art one
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 for n = 0 to N − 1 do
(t ) (t ) moderate code rates 5G LDPC codes. Therefore, we consider
13 ĉn = H D(γ̃n ) two 5G LDPC codes with different rates and lengths: a R =
14 if ĉ(t ) · HT = 0 then 1/5, Z = 52, N = 2600 BG2 code and a R = 2/3, Z = 104,
15 break N = 3432 BG1 code. For simplicity, it is assumed that the
codeword is sent only once without using any hybrid automatic
output : ĉ(t ) repeat request (HARQ) scheme.
A. Performance Comparisons
IV. N UMERICAL R ESULTS Since the maximum number of iterations is typically less
In this section, the decoding performance of the proposed than 20 in practical implementations considering the through-
IAMS algorithm is illustrated and compared to the MS, OMS, put requirement while the decoders need about 100 iterations
and AMS decodings. All decodings take the layered schedule. to be saturated, Fig. 4 and Fig. 5 show the simulation results
In practical applications, the number of quantization bits used on the R = 1/5, Z = 52, N = 2600 BG2 code when
in LDPC decoders is usually no more than 6 in order to reduce I tmax = 15 and I tmax = 100, respectively. For a fair
the area and power consumption. Therefore, the quantization comparison, the channel gain factors for each decoding are
parameters are set to (q, q̃) = (4, 6) in this work. Moreover, fixed and optimized by simulations to find the value which
the performance of the floating-point MS and OMS algorithms performs best when F E R = 10−7 , where the test step is set to
are shown for reference, which also take the layered schedule. 0.05. The optimal values for the OMS, MS, AMS, and IAMS
The offset value for the floating-point OMS decodings is set to decoders are 1.3, 1.1, 0.85, and 0.8, respectively. Due to the
0.2. The simulation results are obtained through Monte-Carlo imprecise offset factor, the OMS decoder suffers from severe
simulations that generate at least 100 error frames for each performance degradation under (4,6) quantization, which could
plotted point. Because the fraction of degree-1 bits is very be compensated by increasing one bit of quantization length.
small in high code rate 5G LDPC codes while our approach Compared to the AMS decoding, the proposed IAMS decoding
targets for improving the performance regarding the degree- shows a much better performance. When the threshold D is
1 bits, the proposed decoder is more suitable for the low to well-selected, the performance gain could be 0.4dB in the
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 7
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
T HE VALUES OF R ECEIVED CTV M ESSAGES FOR A VN
B ELONGING TO T RAPPING -S ET.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 9
decoding layer takes one clock cycle and the total number
of clock cycles is L × I tmax . The throughput θ is computed
as
f ×N
θ= , (11)
Fig. 12. Compressed format of CTV messages. L × I tmax
where f denotes the frequency of the decoder.
so one can easily modify the proposed decoder architecture
to support different quantization schemes when the number of
B. Memory and Clock Cycles Reduction
quantization bits remains unchanged.
For a QC-LDPC code defined by an Mb × Nb base matrix By observing the structure of BG1 and BG2 matrices, it can
H B , the number of decoding layers L usually equals Mb . be found that part of them has the orthogonality property,
Therefore, the parallelism degree of decoder equals the expan- meaning no VN is connected to two consecutive layers.
sion factor Z . As shown in Fig. 11, the input and exchanged For instance, the 21st to 46th rows of the BG1 matrix are
messages are quantized to q bits and the APP values are orthogonal. Similarly, the 21st to 42nd rows of the BG2 matrix
quantized to q̃ bits. All control signals are generated by the are orthogonal. In two orthogonal layers, the APP messages
Controller. Two memory blocks, namely APP memory and updated in the previous layer will not be used in the next
CTV memory, are used to store the APP and CTV messages, layer. Therefore, the decoding processes in such two layers
respectively. The CTV memory is implemented with the dual- are independent. Based on this feature of 5G LDPC codes,
port random access memory (DP-RAM) to support simulta- we propose a layer merging method to reduce the number
neous read and write operations. In order to allow massively of clock cycles. A similar idea was also applied in [29] to
parallel read, write, and initiate operations, the APP memory optimize a pipelined decoder for IEEE 802.11ad standard.
is implemented with registers. In the proposed architecture, However, the configurations in these two architectures are
the APP memory is divided into three parts and the CTV different.
memory consists of two parts. The reason for this configuration In the proposed architecture, two consecutive layers in
will be presented in the following subsections. the orthogonal part are processed simultaneously. Therefore,
In each decoding layer, APP messages are read from the the number of decoding layers in the orthogonal part is reduced
APP memory first and then passed to the Read Network, by half, which leads to fewer clock cycles. For BG1 and
which rearranges and selects these messages according to the BG2 codes, the number of clock cycles could be reduced
current processing layer to ensure they will be processed by by 28.3% and 26.2%, respectively. Because row degrees in
the proper left barrel shifters (LBSs) and VN unit (VNUs). the orthogonal part are all less than dcmax /2, no additional
Similarly, the Write Network is used to rearrange the updated LBS or VNU is needed and the APP memory remains
APP messages to ensure they can be stored in the correct unchanged. However, since two layers are processed in one
addresses of APP memory. Let dcmax denote the maximum row clock cycle, the CNU and CTV memory should be modified
degree of the code. In the proposed architecture, dcmax pairs to make generating and storing two sets of CTV messages
of LBSs and VNUs are applied. Messages output from the at the same time feasible. Fig. 13 shows the architecture
Read Network should be left rotated first by LBSs according of CNU, which is divided into two subunits. When two
to the corresponding shift factors and then passed to VNUs orthogonal layers are processed simultaneously, two sets of
to calculate the VTC messages. By adopting the method to VTC messages are input to CNU1st and CNU2nd , respectively.
generate the shift factor presented in [20], the data write-back In this case, the Compare & Select unit is disabled so two sets
barrel shifters can be eliminated. of CTV messages are output from the CNU. Let dco denote the
After being saturated to q bits, the VTC messages are maximum row degree in the orthogonal part. In order to store
sent to the CN unit (CNU) which generates CTV messages. two sets of CTV messages in the same address, the width of
The CNU is implemented using the area-efficient architecture CTV memory is set to
proposed in [28]. In the IAMS decoder, i d x 2 should also be W = max{z × (dcmax + 2 · (q − 1 + log2 dcmax )),
calculated and stored, which is the main difference with other
2 × z × (dco + 2 · (q − 1 + log2 dco ))}. (12)
decoders. As shown in Fig. 12, CTV messages are stored in
a compressed format to reduce the memory cost. Therefore, Table II shows the size of CTV memory when q = 4. As can
the width of the CTV memory is z × (dcmax + 2 · (q − 1 + be seen, though the width of CTV memory is slightly increased
log2 dcmax )). Since the CTV messages in all layers need to after applying the layer merging, the depth is reduced due to
be stored, the depth of the CTV memory is L. In order to less number of layers. Therefore, besides reducing the number
convert the CTV messages from the compressed format to the of clock cycles, the proposed layer merging method could
uncompressed format, two De-compressors are inserted into reduce the size of CTV memory by 26.2% and 13.9% for
the decoder which generate the final CTV messages for the BG1 and BG2 codes, respectively.
following calculations. Then, the APP values can be updated. Considering the 5G LDPC codes are extremely irregular
After writing them back to the APP memory, one decoding and the degrees of some layers are relatively small, setting
layer is finished. the width of CTV memory according to (12) will lead to
To minimize the number of clock cycles, no pipeline a great waste of memory resource. To further reduce the
is inserted into the proposed architecture. Therefore, one memory cost, we present a split storage method. As mentioned
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 15. The structure of the APP memory for extension bits.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 11
TABLE III
ASIC S YNTHESIS R ESULTS ON 90-nm CMOS T ECHNOLOGY.
TABLE IV
T HE A REA OF E ACH B LOCK .
Fig. 16. Mapping relationship between the input and output messages in the
Read Network.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
0.848mm2 to 0.343mm2 after applying the proposed modifi- [13] K. Sun and M. Jiang, “A hybrid decoding algorithm for low-rate LDPC
cations, decreased by up to 59.6%. Moreover, the area of CTV codes in 5G,” in Proc. 10th Int. Conf. Wireless Commun. Signal Process.
(WCSP), Hangzhou, China, Oct. 2018, pp. 1–5.
memory is reduced by 25.1%, which is less than the theoretical [14] X. Wu, M. Jiang, and C. Zhao, “Decoding optimization for 5G LDPC
analysis (29.2%). This result mainly comes from the reason codes by machine learning,” IEEE Access, vol. 6, pp. 50179–50186,
that the area of DP-RAM is not fully decided by data size, 2018.
[15] C. Jones, E. VaIles, M. Smith, and J. Villasenor, “Approximate-
so the reduction of the total area is not strictly equal to that of MIN∗ constraint node updating for LDPC code decoding,” in Proc.
the data. We also notice that the area of the APP memory is IEEE Mil. Commun. Conf. (MILCOM), Boston, MA, USA, Oct. 2003,
slightly increased, which comes from applying the selective- pp. 157–162.
[16] K. Zhang, X. Huang, and Z. Wang, “A high-throughput LDPC decoder
shift structure. However, considering it greatly benefits the architecture with rate compatibility,” IEEE Trans. Circuits Syst. I, Reg.
interconnection blocks, this overhead is acceptable. Papers, vol. 58, no. 4, pp. 839–847, Apr. 2011.
[17] C.-C. Cheng, J.-D. Yang, H.-C. Lee, C.-H. Yang, and Y.-L. Ueng,
“A fully parallel LDPC decoder architecture using probabilistic
VII. C ONCLUSION min-sum algorithm for high-throughput applications,” IEEE Trans.
In this article, we propose a high-performance decoding Circuits Syst. I, Reg. Papers, vol. 61, no. 9, pp. 2738–2746,
Sep. 2014.
algorithm, named the improved adapted min-sum algorithm, [18] H.-C. Lee, M.-R. Li, J.-K. Hu, P.-C. Chou, and Y.-L. Ueng, “Opti-
for fixed-point decoding of 5G LDPC codes. To reduce the mization techniques for the efficient implementation of high-rate layered
error-probability of degree-1 VNs, a new CN-update function QC-LDPC decoders,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 64,
no. 2, pp. 457–470, Feb. 2017.
is designed, and the column degree adaptation is proposed [19] I. Tsatsaragkos and V. Paliouras, “A reconfigurable LDPC
to alleviate the excessive growth of posterior probability in decoder optimized for 802.11n/AC applications,” IEEE Trans.
high-degree VNs. As a result, the proposed decoder could Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 1, pp. 182–195,
Jan. 2018.
outperform the state-of-the-art AMS decoder by 0.4dB in [20] T. T. Nguyen-Ly, V. Savin, K. Le, D. Declercq, F. Ghaffari, and
FER performance. We also present an efficient architecture O. Boncalo, “Analysis and design of cost-effective, high-throughput
for 5G LDPC decoders. First, the layer merging technique is LDPC decoders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 26, no. 3, pp. 508–521, Mar. 2018.
applied based on the orthogonality property of the base matrix. [21] C.-Y. Liang, M.-R. Li, H.-C. Lee, H.-Y. Lee, and Y.-L. Ueng, “Hardware-
Then, the split storage method is adopted to further reduce friendly LDPC decoding scheduling for 5G HARQ applications,”
CTV memory cost. Finally, the interconnection blocks are in Proc. ICASSP-IEEE Int. Conf. Acoust., Speech Signal Process.
(ICASSP), Brighton, U.K., May 2019, pp. 1418–1422.
optimized by using the selective-shift structure and message [22] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans.
reordering method. Implementation results demonstrate that Inf. Theory, vol. IT-27, no. 5, pp. 533–547, Sep. 1981.
the proposed architecture can improve the throughput-to-area [23] Z. Mheich, T.-T. Nguyen-Ly, V. Savin, and D. Declercq, “Code-aware
quantizer design for finite-precision min-sum decoders,” in Proc. IEEE
ratio by 173.3%. Int. Black Sea Conf. Commun. Netw. (BlackSeaCom), Varna, Bulgaria,
Jun. 2016, pp. 1–5.
R EFERENCES [24] W. E. Ryan, “An introduction to LDPC codes,” in CRC Handbook
for Coding and Signal Processing for Magnetic Recording Systems,
[1] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. B. Vasic, Ed. Boca Raton, FL, USA: CRC Press, 2004, ch. 36.
Theory, vol. 8, no. 1, pp. 21–28, Jan. 1962. [25] T. Richardson, “Error floors of LDPC codes,” in Proc. 41st
[2] IEEE 802.11n Wireless LAN Medium Access Control MAC and Physical Annu. Allerton Conf. Commun., Control, Comput., Oct. 2003,
Layer PHY Specifications, Standard IEEE 802.11n-D2.0, 2007. pp. 1426–1435.
[3] Second Generation Framing Structure, Channel Coding and Modula- [26] X. Zhang and P. H. Siegel, “Quantized iterative message passing
tion Systems for Broadcasting, Interactive Services, News Gathering decoders with low error floor for LDPC codes,” IEEE Trans. Commun.,
and Other Broadband Satellite Applications (DVB-S2), ETSI, Sophia vol. 62, no. 1, pp. 1–14, Jan. 2014.
Antipolis, France, 2009. [27] F. Angarita, J. Valls, V. Almenar, and V. Torres, “Reduced-complexity
[4] Standard: Synchronization Standard for Distributed Transmission, min-sum algorithm for decoding LDPC codes with low error-floor,”
ATSC, Boston, MA, USA, 2007. IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 7, pp. 2150–2158,
[5] Multiplexing and Channel Coding, document TS 38.212 V15.0.0, 3GPP, Jul. 2014.
Dec. 2017. [28] C. Zhang, S. Weng, X. You, and Z. Wang, “Area-efficient check node
[6] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity- unit architecture for single block-row quasi-cyclic LDPC codes,” in
check codes under message-passing decoding,” IEEE Trans. Inf. Theory, Proc. IEEE Asia Pacific Conf. Circuits Syst. (APCCAS), Ishigaki, Japan,
vol. 47, no. 2, pp. 599–618, Feb. 2001. Nov. 2014, pp. 17–20.
[7] M. P. C. Fossorier, M. Mihaljevic, and H. Imai, “Reduced complexity [29] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for
iterative decoding of low-density parity check codes based on belief high-data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits
propagation,” IEEE Trans. Commun., vol. 47, no. 5, pp. 673–680, Syst. (ISCAS), Janeiro, Brazil, May 2011, pp. 1784–1787.
May 1999.
[8] J. Chen, A. Dholakia, E. Eleftheriou, M. P. C. Fossorier, and X.-Y. Hu,
“Reduced-complexity decoding of LDPC codes,” IEEE Trans. Commun.,
vol. 53, no. 8, pp. 1288–1299, Aug. 2005.
[9] T. Richardson and S. Kudekar, “Design of low-density parity check
codes for 5G new radio,” IEEE Commun. Mag., vol. 56, no. 3, pp. 28–34,
Mar. 2018. Hangxuan Cui received the B.S. degree in under-
[10] K. Le Trung, F. Ghaffari, and D. Declercq, “An adaptation of min- water acoustic engineering from Northwestern Poly-
sum decoder for 5G low-density parity-check codes,” in Proc. IEEE Int. technical University, Xi’an, China, in 2017. He is
Symp. Circuits Syst. (ISCAS), Sapporo, Japan, May 2019, pp. 1–5. currently pursuing the Ph.D. degree with Nanjing
[11] LDPC Decoding With Adjusted Min-Sum, document R1-1610140, TSG University.
RAN WG1 #86bis, 3GPP, Qualcomm Incorporated, Lisbon, Portugal, His research interests include channel coding algo-
Oct. 2016. rithms and low-power and high-throughput VLSI
[12] W. Zhou and M. Lentmaier, “Generalized two-magnitude check node systems for digital signal processing.
updating with self correction for 5G LDPC codes decoding,” in
Proc. 12th Int. ITG Conf. Syst., Commun. Coding, Rostock, Germany,
Mar. 2019, pp. 1–6.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
CUI et al.: DESIGN OF HIGH-PERFORMANCE AND AREA-EFFICIENT DECODER FOR 5G LDPC CODES 13
Fakhreddine Ghaffari (Member, IEEE) received Jun Lin (Senior Member, IEEE) received the B.S.
the degree in electrical engineering and master’s degree in physics and the M.S. degree in micro-
degree from the National School of Electrical Engi- electronics from Nanjing University, Nanjing, China,
neering (ENIS), Tunisia, in 2001 and 2002, respec- in 2007 and 2010, respectively, and the Ph.D. degree
tively, and the Ph.D. degree in electronics and in electrical engineering from Lehigh University,
electrical engineering from the University of Sophia Bethlehem, in 2015. From 2010 to 2011, he was
Antipolis, France, in 2006. an ASIC Design Engineer with AMD. In summer
He is currently an Associate Professor with the 2013, he was an Intern with Qualcomm Research,
Université de Cergy-Pontoise, France. His research Bridgewater, NJ, USA. In June 2015, he joined
interests include VLSI design and implementation the School of Electronic Science and Engineering,
of reliable digital architectures for wireless commu- Nanjing University, where he is currently an Asso-
nication applications in ASIC/FPGA platform and the study of mitigating ciate Professor. He was a member of the Design and Implementation of
transient faults from algorithmic and implementation perspectives for high- Signal Processing Systems (DISPS) Technical Committee of the IEEE Signal
throughput applications. Processing Society. His current research interests include low-power high-
speed VLSI design for digital signal processing and deep learning, hardware
acceleration for big data processing, and emerging computer architectures.
He was a co-recipient of the Merit Student Paper Award at the IEEE Asia
Pacific Conference on Circuits and Systems in 2008, the Best Paper Award at
Khoa Le (Member, IEEE) received the bachelor’s the IEEE Computer Society Annual Symposium on VLSI (ISVLSI) in 2019,
and M.Sc. degrees in electronics engineering from and the Best Paper Award (The First Place) at the IEEE International Signal
the Ho Chi Minh City University of Technology Processing Systems (SiPS) in 2019. He was a recipient of the 2014 IEEE
(HCMUT), Vietnam, in 2010 and 2012, respec- Circuits & Systems Society (CAS) Student Travel Award.
tively, and the Ph.D. degree from the Université
de Cergy-Pontoise, France, in 2017. He is currently
a Post-Doctoral Researcher with the ETIS Labora- Zhongfeng Wang (Fellow, IEEE) received the
tory, ENSEA, France. His research interest includes B.E. and M.S. degrees from the Department of
error correcting code algorithms, analysis, and their Automation, Tsinghua University, Beijing, China,
implementations in FPGA/ASIC. in 1988 and 1990, respectively, and the Ph.D. degree
from the University of Minnesota, Minneapolis,
in 2000. He was with Oregon State University and
National Semiconductor Corporation. From 2007 to
2016, he was a Leading VLSI Architect with Broad-
com Corporation, CA, USA. Since 2016, he has been
David Declercq (Senior Member, IEEE) was born a Distinguished Professor with Nanjing University,
in June 1971. He received the Ph.D. degree in China.
statistical signal processing from the Université de He is a world-recognized expert on Low-Power High-Speed VLSI Design
Cergy-Pontoise, France, in 1998. From 2009 to for Signal Processing Systems. He has published more than 200 technical
2014, he held the junior position with the Institut articles with multiple best paper awards received from the IEEE technical
Universitaire de France. He is currently a Full Pro- societies, among which is the VLSI Transactions Best Paper Award of 2007.
fessor with ENSEA, Cergy. He is also the General He has edited one book VLSI and held more than 20 U.S. and China
Secretary of the National GRETSI Association. He patents. In the current record, he has had many articles ranking among top
worked several years on the particular family of 25 most (annually) downloaded manuscripts in the IEEE T RANSACTIONS
LDPC codes, both from the code and decoder design ON V ERY L ARGE S CALE I NTEGRATION (VLSI) P ER S TYLE S YSTEMS .
aspects. Since 2003, he has been developing a strong His current research interests include optimized VLSI design for digital
expertise on non-binary LDPC codes and decoders in high order Galois fields communications and deep learning. He has also served as a TPC member
GF(q). A large part of his research projects are related to non-binary LDPC and various chairs for tens of international conferences. Moreover, he has
codes. He mainly investigated two aspects the design of GF(q) LDPC codes for contributed significantly to the industrial standards. So far, his technical
short and moderate lengths and the simplification of the iterative decoders for proposals have been adopted by more than 15 international networking
GF(q) LDPC codes with complexity/performance tradeoff constraints. He pub- standards. In 2015, he was elevated to the Fellow of IEEE for contributions
lished more than 40 articles in major journals [the IEEE T RANSACTIONS ON to VLSI design and implementation of FEC coding. In the past, he has served
C OMMUNICATIONS, the IEEE T RANSACTIONS ON I NFORMATION T HEORY, as an Associate Editor for the IEEE T RANSACTIONS ON C IRCUITS AND
the IEEE C OMMUNICATONS L ETTERS , and EURASIP Journal on Wireless S YSTEMS I: R EGULAR PAPERS , the IEEE T RANSACTIONS ON C IRCUITS
Communications and Networking (JWCN)], and more than 120 articles in AND S YSTEMS II: R EGULAR PAPERS , and the IEEE T RANSACTIONS ON
major conferences in information theory and signal processing. His research V ERY L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS P ER S TYLE for many
interests include digital communications and error-correction coding theory. terms.
Authorized licensed use limited to: UNIVERSITE DE CERGY PONTOISE. Downloaded on December 03,2020 at 17:12:01 UTC from IEEE Xplore. Restrictions apply.