0% found this document useful (0 votes)
2 views5 pages

10329903

This paper presents an efficient nested key equation solver (KES) design for short generalized integrated interleaved BCH codes, aimed at improving error correction capabilities in storage class memories. The proposed design reduces the critical path to one multiplier and eliminates the need for scalar pre-computation, achieving significant area and area-time product reductions compared to previous methods. The architecture is particularly beneficial for short codes, maintaining low redundancy while enhancing decoding efficiency.

Uploaded by

iamshareef00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

10329903

This paper presents an efficient nested key equation solver (KES) design for short generalized integrated interleaved BCH codes, aimed at improving error correction capabilities in storage class memories. The proposed design reduces the critical path to one multiplier and eliminates the need for scalar pre-computation, achieving significant area and area-time product reductions compared to previous methods. The architecture is particularly beneficial for short codes, maintaining low redundancy while enhancing decoding efficiency.

Uploaded by

iamshareef00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Efficient Nested Key Equation Solver for Short

Generalized Integrated Interleaved BCH Codes


Zhenshan Xie and Xinmiao Zhang
Dept. of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210 USA
Email: {xie.855, zhang.8952}@osu.edu

Abstract—Generalized integrated interleaved (GII) codes can path is reduced to two multipliers by incorporating higher-
nest BCH sub-codewords to form stronger BCH codewords. They order syndromes iteratively into existing KES results in [5]–
are among the best candidates for error correction in the new [7]. Although it can be further reduced by a half using the
storage class memories (SCMs). However, SCMs require short
codeword length and low redundancy. In this case, the nested slow-down and re-timing techniques, the decoding of two sub-
key equation solver (KES), which is a key step in GII decoding, words needs to be interleaved. By pre-computing the combined
has a small number of iterations. The initialization and/or scalar polynomial scalars, the critical path is truly reduced to one
pre-computation in previous nested KES designs have large area multiplier in [8], [9] although 5 clock cycles are needed for
and may take even longer time than the iterations themselves. the pre-computation.
This paper proposes an efficient nested KES design for short
GII-BCH codes. The polynomial updating is decomposed into Since SCMs require short codeword length and low re-
two steps to reduce the critical path without requiring scalar dundancy, the error correction capabilities of the BCH codes
pre-computation. Besides, the KES is reformulated to reduce the involved in the GII codes and their differences are small. This
number of clock cycles without incurring any area overhead. means that the KES of each nested decoding round has a small
For an example code over GF (210 ) that protects 2560 bits with
10% redundancy, the proposed design achieves at least 25% area
number of iterations and the involved polynomials are short.
reduction and 37% reduction on the area-time product averaged In this case, the scalar pre-computations in [7], [9] targeting
over the nested decoding rounds compared to prior efforts. at longer codes require not only more clock cycles but also
larger area compared to the nested KES iterations themselves.
I. I NTRODUCTION This paper proposes an efficient nested KES design for
short GII-BCH codes. The polynomial updating with long
The new fast storage class memories (SCMs) may bring data path is decomposed into two steps so that the critical
paradigm shifts to many systems, such as computer memory path is reduced to one multiplier without any scalar pre-
architecture, machine learning, and big data analytics. Error- computation. Shareable substructures are identified to reduce
correcting codes with hyper-speed decoding and excellent cor- the area requirement. Additionally, by reformulating the nested
rection capability are essential to realizing the speed potential KES algorithm, the order of the error locator and discrepancy
of SCMs. Generalized integrated interleaved (GII) codes [1], polynomial updating is switched so that one more clock cycle
[2] that generate stronger BCH codewords by nesting short is eliminated in the end. Efficient implementation architectures
BCH sub-codewords are among the best candidates for SCMs. are also developed for the proposed nested KES algorithm.
GII decoding has two stages [2]. The first is the traditional Although the proposed design needs two clock cycles for each
BCH decoding on individual sub-words. A key equation solver nested KES iteration, it does not require expensive scalar pre-
(KES), such as the Berlekamp-Massey (BM) algorithm, takes computation. For an example code over GF (210 ) that protects
the syndromes and computes the error locator polynomial. If 2560 bits with 10% redundancy considered for SCMs, the
some sub-words failed the decoding or were miscorrected, proposed architecture achieves at least 25% area reduction and
the multi-round second-stage nested decoding is activated to 37% improvement on the area-time product (ATP) averaged
correct extra errors. In each round, higher-order sub-word over the nested decoding rounds compared to prior work.
syndromes are derived from those of the nested words. Then
the nested KES is carried out to update the error locator
II. GII-BCH D ECODING AND N ESTED KES A LGORITHMS
polynomial and accordingly correct more errors.
Since the higher-order syndromes are not available in the Let Cv ⊆· · · ⊆C1 ⊂C0 be v+1 BCH codes defined over
beginning to initialize the discrepancy polynomial, the con- GF (2q ) with error-correction capabilities tv ≥· · · ≥t1 >t0 . A
ventional reformulated inversionless (ri-)BM architecture [3] GII-BCH [m,v] code with m length-n sub-codewords can be
cannot be used to reduce the critical path of the nested KES. constructed from these BCH codes as [2]:
Re-initializing the discrepancy polynomial as in [4] before the n
nested KES requires many multiplier-adder trees. The critical C , [c0 (x), c1 (x), · · · , cm−1 (x)] : ci (x) ∈ C0 ,
m−1 o (1)
This work was supported in part by Kioxia Corporation and by the National
X
c̃l (x) = αil (x)ci (x) ∈ Cv−l , 0 ≤ l < v ,
Science Foundation under Award No. 2011785.
i=0
Algorithm 1: Nested GII-BCH KES Algorithm re-initialized according to all the higher-order syndromes using
(u)
Λeven (x),
(u) (u)
Λodd (x), Beven (x),
(u)
Bodd (x), ˆ (u)

expensive multiplier-adder trees. Algorithm 1 [7] incorporates
Input: even (x),
(u) (u) (u) two higher-order syndromes, Sr+1 and Sr+2 , to update ∆(x) ˆ
Θ̂even (x),
γ ,k from previous KES;
Si (u≤i<w); Sw =0 in iteration r so that the correct δ (r+2) is always ready before
Initialization: iteration r+2 and the odd iterations are skipped for GII-
∆ˆ (u) ˆ (u) (u)
even (x)=∆even (x)+Su Λeven (x) BCH decoding. In this algorithm, ‘even’ and ‘odd’ denote
(u) (u) (u)
Θ̂even (x)=Θ̂even (x)+Su Beven (x) the even and odd coefficients, respectively, of the polynomi-
Iterations: for r=u, u+2, · · · , w−2 als. B(x) and Θ̂(x) are auxiliary polynomials assisting the
(r+2) (r) ˆ (r) x2 Beven
(r)
1) Λeven (x)=γ (r) Λeven (x)+∆ 0 (x) ˆ
(r+2) (r) (r) ˆ (r) 2 (r) updating of Λ(x) and ∆(x), respectively. A scaled nested
2) Λodd (x)=γ Λodd (x)+∆0 x Bodd (x)
ˆ (r+2) ˆ (r)
(r) (∆ 2 (r) (r) KES (SNK) algorithm was also developed in [7] to reduce the
3) ∆ even (x)=γ even (x)/x +Sr+1 Λodd (x)/x+Sr+2 Λeven (x))
(r) (r) (r) (r)
ˆ (Θ̂even (x)+Sr+1 xB (x)+Sr+2 x2 Beven (x))
+∆
area requirement. The critical paths of these algorithms have
0 odd
(r)
ˆ 6=0 and k(r) ≥−1)
if (∆
two multipliers due to the computations in Line 3. Although
0
4)
(r+2) (r) (r+2)
Beven (x)=Λeven (x); Bodd (x)=Λodd (x)
(r) they are reduced to one multiplier by applying slow-down
(r+2)
Θ̂even (x)=∆ ˆ even (x)/x +Sr+1 Λ (x)/x+Sr+2 Λ(r)
(r) 2 (r) and re-timing [11], two sub-words are interleaved to increase
5) odd even (x)
6) γ (r+2) ˆ (r)
=∆0 ; k(r+2) =−k (r) −2 the efficiency. The critical path is truly shortened to one
else multiplier in the scaled fast nested KES (SFNK) algorithm
(r+2) (r) (r+2) (r)
7) Beven (x)=x2 Beven (x); Bodd (x)=x2 Bodd (x) [9] by combining and pre-computing the scalars in Line 3,
(r+2) (r) (r) (r)
8) Θ̂even (x)=Θ̂even (x)+Sr+1 xBodd (x)+Sr+2 x2 Beven (x) which introduce 5 clock cycles at initialization. Besides, both
(r+2) (r) (r+2) (r)
9) γ =γ ; k =k +2 the SNK and SFNK designs require a number of multipliers
for scalar pre-computation.
where ci (x) is a sub-codeword, c̃l (x) is a nested codeword, α A major application of GII codes is SCMs and they require
is a primitive element of GF (2q ), and αil (x) is the polynomial short codes with low redundancy, such as 2560 data bits
form of the standard basis representation of αil . protected by 10% redundancy. An example GII-BCH code for
GII-BCH decoding includes two stages. The first is the these code parameters is a [4,3] code over GF (210 ) with sub-
traditional BCH decoding that corrects ≤t0 errors in each codeword length n=704 and [t0 , t1 , t2 , t3 ]=[3,5,6,11]. For such
received sub-word. 2t0 syndromes are computed from each short codes, the 5 clock cycles for scalar pre-computation [9]
received sub-word, y(x), as Sj =y(αj+1 ) (0≤j<2t0 ). Then are more than the number of clock cycles for the nested KES
a KES, such as the BM algorithm [10], uses the syndromes iterations themselves, which is ti −ti−1 for nested decoding
to iteratively compute the error locator polynomial Λ(x). round i. Also when tv is smaller, the scalar pre-computation
P (r)
In iteration r, a discrepancy coefficient δ (r) = Λi Sr−i is in [7] and [9] may dominate the overall nested KES area.
calculated to update the polynomials. The riBM algorithm uti-
ˆ (r) (x)=Λ(x)S(x)/xr , whose III. E FFICIENT KES FOR GII-BCH N ESTED D ECODING
lizes a discrepancy polynomial ∆
ˆ (r) ˆ An alternative approach is proposed in this section to reduce
constant coefficient, ∆0 , equals δ (r) [3]. ∆(x) is initialized
as S(x) and updated in parallel with Λ(x) so that the critical the critical path of the nested KES to one multiplier. Instead
path is reduced to one multiplier and one adder. of combining and pre-computing the scalars, the computations
The second-stage nested decoding can correct more errors. in Line 3 of Algorithm 1 are broken down and implemented
2t higher-order syndromes are needed to correct t extra in two clock cycles. Besides, reformulations are carried out to
errors. Let the indices of the sub-words with extra errors eliminate one clock cycle in the end. As a result, the proposed
be i0 , · · · , ib−1 (b≤v). Since the nested codewords are at nested KES for decoding iteration i requires 2(ti −ti−1 ) clock
least t1 -error-correcting, 2(t1 −t0 ) higher-order syndromes are cycles with no extra latency for scalar pre-computation or
(l)
computed as S̃j =ỹl (αj+1 ) (0≤l<b, 2t0 ≤j<2t1 ), where initialization. For small ti −ti−1 as in short GII codes, the
Pm−1 il proposed design achieves significant latency reduction com-
ỹl (x)= i=0 α (x)yi (x). From (1), higher-order syndromes
pared to previous approaches. Additionally, multipliers are
for the sub-words can be derived as
shared among the computations and no complicated scalar pre-
(ib−1 ) T
h i h iT
(i0 )
Sj
(i1 )
, Sj , · · · , Sj = A−1 S̃j , S̃j , · · · , S̃j
(0) (1) (b−1)
, (2) calculation is needed. Hence, the proposed design also requires
much smaller area than prior architectures.
where Au,w =αiw u(j+1) (0≤u, w<b). Then the BM algorithm Each term in the parentheses in Line 3 of Algorithm 1
takes the 2(t1 −t0 ) higher-order syndromes to correct ≤t1 has up to one scalar, and the second scalar is outside of
errors in each of the b sub-words. If there are b0 ≤v−1 sub- the parentheses. Denote the sums in the first and second
words with more errors, 2(t2 −t1 ) higher-order syndromes are parentheses by ∆ ˆ 0(r) 2 0(r)
even (x)/x and Θ̂even (x), respectively. To
derived in the next round and this process is repeated for up reduce the critical path to one multiplier, they are computed in
to v rounds. the first clock cycle and then multiplied with γ (r) and ∆ˆ (r) in
0
(r+2)
The KES in the nested decoding cannot continue from the the second clock cycle. They are also used to update Θ̂even (x)
result of the sub-word decoding and use the riBM algorithm to as in Line 5 and 8 of Algorithm 1. The coefficients of the
shorten the critical path since the higher-order syndromes are same degree in these polynomials can be calculated using
ˆ
not available in the beginning to initialize ∆(x). ˆ
In [4], ∆(x) is four multipliers. In the second clock cycle, each coefficient
Algorithm 2: Nested GII-BCH KES w. Switched Updating ()
Δ 0
D   
D D CTRL
(u) (u) (u) (u) ¯ (u)
Input: Λeven (x), Λodd (x), Beven (x), Bodd (x), ∆ even (x), PE10  () /1 PE11 PE1ℎ D 1
0 1
(u) (u) (u)
Θ̄even (x), γ , k from previous KES; D ()
Δ 0 /1
D D D 1
0 1
syndromes Si (u−1≤i≤w−2)
set Su−1 =0 for the first nested decoding round  () / D 1
D D D 0 

Iterations: for r=u, u+2, · · · , w−2 PE00,0 PE00,1 ()
Δ0 / PE00,ℎ D 1
0 
1) ∆ˆ (r) ¯ (r) (r) (r)
even (x)=∆even (x)+Sr−1 Λodd (x)/x+Sr Λeven (x)
0 

(r) (r) (r) (r) 0


D D ()
  
D$ 0
2) Θ̂even (x)=Θ̄even (x)+Sr−1 Bodd (x)/x+Sr Beven (x)  ()
 () /−1 1
D D D D 0 −1
(r+2) (r) ˆ (r) x2 Beven
(r)
3) Λeven (x)=γ (r) Λeven (x)+∆ 0 (x) PE01,0 PE01,1 ()
Δ PE01,ℎ D 1
0 /−1 ()
(r+2) (r) (r) (r)
ˆ x B (x)
2 (r) 0 −1 Δ0
4) Λodd (x)=γ Λodd (x)+∆ 0 odd D D () D$
0
5) ∆¯ (r+2)
even (x)=γ
(r) ˆ (r)
∆even (x)/x2 +∆ ˆ (r) Θ̂(r)
even (x)
  +1
0
ˆ (r) 6=0 and k(r) ≥−1) Fig. 1. Overall architecture for proposed nested GII-BCH KES.
6) if (∆ 0
7)
(r+2) (r) (r+2)
Beven (x)=Λeven (x); Bodd (x)=Λodd (x)
(r)
constant coefficient of ∆ ˆ (r)
even (x) is the discrepancy coefficient.
(r+2) ˆ (r) 2
8) Θ̄even (x)=∆even (x)/x In the next clock cycle, all the computations in Line 3 through
9) γ (r+2) =δ (r) ; k(r+2) =−k (r) −2 13 are carried out. The dashed line in Algorithm 2 is used to
10) else
(r+2) (r) (r+2) (r) separate the computations executed in the even and odd clock
11) Beven (x)=x2 Beven (x); Bodd (x)=x2 Bodd (x) (r+2) (r+2)
12)
(r+2) (r)
Θ̄even (x)=Θ̂even (x)
cycles. The updating of Λeven (x) and Λodd (x) are identical
13) γ (r+2) =γ (r) ; k(r+2) =k(r) +2 to those in Algorithm 1. At the same time, the first items in
the two parentheses in Line 3 of Algorithm 1 are multiplied
ˆ (r+2) (r) ˆ 0(r) ˆ (r) Θ̂0(r) with the corresponding scalars as in Line 5 of Algorithm 2.
in ∆ even (x)=γ ∆even (x)/x2 +∆ 0 even (x) is computed
The result is denoted by ∆ ¯ (r+2)
even (x). Then in the first clock
by using another two multipliers. Besides, the four multipliers
ˆ 0(r) 2 0(r) cycle of the next iteration, ∆ ¯ (r+2)
even (x) is further added with
for ∆ even (x)/x and Θ̂even (x) calculation are shared to
update Λ (r+2)
(x) as in Line 1 and 2 and also reused for another two items as in Line 1. As explained previously, the
the ∆ ˆ (u)
even (x) and
(u)
Θ̂even (x) initialization. Each iteration has sum of these two terms equals the sum of the other four
terms of the ∆ ˆ (r+2)
even (x) formula in Algorithm 1. Therefore, the
two clock cycles and the initialization needs one. Hence,
1+2(ti −ti−1 ) clock cycles are required to finish the nested ∆ˆ (r)
even (x) computed in Line 1 of Algorithm 2 is the same as
KES over one sub-word in decoding round i using this the ∆ ˆ (r)
even (x) in Algorithm 1. Similar computations are carried
(r)
split polynomial updating. This latency is much shorter than out to derive Θ̂even (x). At the end of the second clock cycle
the 5+(ti −ti−1 ) clock cycles for the SFNK design when of the last iteration, Λ(w) (x) is computed. Hence this process
(ti −ti−1 ) is small, such as in the case of the short code with takes 2(ti − ti−1 ) clock cycles for nested decoding round i.
[t0 , t1 , t2 , t3 ]=[3, 5, 6, 11]. The ∆ ¯ (w) (w)
even (x) and Θ̄even (x) derived in the end are updated
(w) (w)
ˆ even (x) and Θ̂even (x), respectively, at the beginning of
In the above split polynomial updating, ∆ ˆ 0(r) 2
even (x)/x and
to ∆
0(r)
Θ̂even (x) are first computed and then Λ(r+2) (x) is updated. the KES of the next nested decoding round according to Line
However, only Λ(x) is needed as the KES output. If the 1 and 2 of Algorithm 2.
order of polynomial updating can be switched, then the IV. H ARDWARE A RCHITECTURES AND C OMPARISONS
(r)
ˆ even (r)
∆ (x)/x2 and Θ̂even (x) computation for the last iteration
An efficient architecture is developed in this section to
can be skipped. Accordingly, the nested KES latency can be
implement the proposed low-latency nested KES algorithm for
reduced by one clock cycle, which is a significant reduction
the decoding of short GII-BCH codes. Then the hardware com-
when (ti−ti−1 ) is small. It can be observed from Algorithm 1
(r+2) plexity is analyzed and compared with prior designs using the
that the Λeven (x) in Line 1 multiplied with Sr+2 happens to
example code over GF (210 ) with [t0 , t1 , t2 , t3 ]=[3, 5, 6, 11].
be the sum of the third items in the two parentheses of Line
(r+2) Fig. 1 shows the overall architecture of the nested KES
3. Besides, the Λodd (x) in Line 2 multiplied by Sr+1 /x is
for GII-BCH decoding. It consists of d(tv +1)/2e groups of
the sum of the second items in the parentheses of Line 3.
(r+2) (r+2) processing elements (PEs) and a control unit. The PEs carry
Hence, Λeven (x) and Λodd (x) can be reused to calculate out the polynomial updating in Algorithm 2 in parallel. In
(r+2)
ˆ even (x) in the next clock cycle. Besides, the critical path
∆ the first clock cycle of each iteration, they implement Line
can be kept as one multiplier since the rest of the two terms 1 and 2 of Algorithm 2. The discrepancy coefficient, ∆ ˆ (r) ,
0
in the ∆ ˆ (r+2)
even (x) formula also have one single scalar each. is calculated in the leftmost PE1 at the end of this clock
Our proposed reformulated nested GII-BCH KES algorithm cycle. Then in the second clock cycle of each iteration, ∆ ˆ (r)
0
is listed in Algorithm 2. This algorithm also takes the output is sent to each PE through the control logic to update the rest
(r)
of the KES results from the sub-word decoding or the previous polynomials. Only the highest coefficient of Θ̂even (x) needs
nested decoding round as the input. In the first nested decoding to be calculated by the two highest PE0s. Hence, they are
round, Su−1 is set to zero. Hence, the computations in Line simplified to two multipliers and one adder as in Fig. 1.
1 and 2 of Algorithm 2 are essentially the same as the The details of one group of PEs are illustrated in Fig. 2(a).
polynomial initialization in Algorithm 1. Accordingly, the The critical path is denoted by the ticker wires. Applying re-
TABLE I
C OMPLEXITIES OF NESTED GII-BCH KES ARCHITECTURES FOR EXAMPLE CODE OVER GF (210 ) WITH [t0 , t1 , t2 , t3 ] = [3, 5, 6, 11]

Mult. Add Reg. Mux Mux with Inv. total crit. path # clks/sub-word of KES normalized ATP
const. input # XORs # gates in nested round 1, 2, 3 nested round 1, 2, 3
re-init. [4] 86 58 36 28 0 0 16904 10 3, 2, 6 (1+(ti − ti−1 )) 1.60, 2.13, 1.28
SNK [7] 46 42 98 18 39 0 11739 9 5, 3, 11 (1+2(ti − ti−1 ))∗ 1.67, 2.00, 1.47
SFNK [9] 78 57 85 37 40 2 18042 10 7, 6, 10 (5+(ti − ti−1 )) 3.98, 6.83, 2.28
proposed 38 43 48 28 9 0 8807 9 4, 2, 10 (2(ti − ti−1 )) 1.00, 1.00, 1.00

() CTRL () CTRL


respectively. Additionally, the SFNK design has one more gate
()
Δ D 01 Δ 2 ()
Δ D 01 Δ 2
0
1
0 1
0
D 1
0 1
in the critical path.
PE10 1 PE10 1 D
D 0
 ()
0
D  () The numbers of clock cycles needed for the KES over one
(
Θ 0
)
1 (
Θ 0
) sub-word in the i-th nested decoding round of different designs
0 1 D 01 1
() ()
for the example GII-BCH code are listed in Table I. Note that
PE00,1 0 Λ2 PE00,1 0 Λ2
0 D 1
0  0 D D 01

the SNK design interleaves the decoding of two sub-words

and hence its latency for two sub-words is the same as that for
1 () 1 ()
() 2 () 2
0 0 D 0 0 D one sub-word. Despite that the proposed design requires two
1
0  D 01

 clock cycles for each iteration, it does not have the complicated
() ()
PE01,1 Λ3 PE01,1 Λ3
scalar pre-computation as in the SFNK design, which takes 5
D 1
0  D D 01
−1 −1
() 1 ()
3 () () 1 ()
3 ()
clock cycles regardless of ti −ti−1 . As a result, the proposed
1 0 D Δ 0 1 0 D Δ 0
1
0  D 01
−1
design achieves lower latency for the short example code in
−1
() () the first two nested decoding rounds. Earlier nested decoding
Fig. 2. PE architectures for proposed nested GII-BCH KES: (a) before re- rounds are carried out with much higher probabilities than
timing; (b) after re-timing.
later rounds in GII decoding. Also, the proposed reformulation
timing along the cutsets denoted by the dashed lines, the crit- saves one clock cycle compared to the SNK architecture.
ical path is reduced to 1 multiplier and 3 adders/multiplexers Although the design from [4] has even lower latency in terms
as shown in the architecture of Fig. 2(b). of the clock cycle number, its area requirement is almost twice.
The complexity of the proposed design is listed in Table ATP is usually used to compare designs with different area and
I. The control unit is simple and its area is negligible. Since latency. It can be calculated that the proposed design achieves
tv =11, h+1=d(11+1)/2e=6 groups of PEs are needed. Over at least 1-(1/1.60+1/2.13+1/1.28)/3=37% lower ATP averaged
GF (210 ), each multiplier using normal basis representation over the three nested decoding rounds for the example code
can be implemented by the area of around 174 XOR gates compared to prior efforts.
with 6 gates in the critical path. The areas of each adder, For GII-BCH codes with smaller tv , the area saving of
register, and multiplexer are about those of 10, 30, and 10 XOR the proposed design would be even larger, because the scalar
gates, respectively. A multiplexer with a constant input has a pre-computation takes a significant portion of the overall area
half of the complexity of a general multiplexer. From these in the SNK and SFNK architectures and its complexity does
assumptions, the overall complexity of the proposed design not change with tv . For codes with larger v, which have bet-
for the example code can be estimated as listed in Table I. ter error-correcting performance under the same redundancy,
ti −ti−1 are even smaller. In this case, the proposed design
For comparisons, the complexities of the nested KES archi-
can achieve even more significant latency reduction.
tecture using the re-initialization scheme [4], SNK architecture
[7], and SFNK architecture [9] are also included in Table I. In
the design of [4] for the example code, 50 multipliers and 40 V. C ONCLUSIONS
adders organized in tree structures are required to re-initialize This paper proposes an efficient design to reduce the area
∆ˆ even (x) and Θ̂even (x), before a simplified version of the
complexity and decoding latency of the nested KES for short
riBM architecture for BCH codes is used to carry out the KES GII-BCH codes. The proposed split polynomial updating re-
process. As a result, this design requires around twice the area duces the critical path to one multiplier without pre-computing
with longer critical path compared to the proposed design. combined scalars. Besides, substantial area saving is achieved
Even though the SNK and SFNK designs have the same by sharing hardware units for polynomial updating. Addition-
number of PE groups as the proposed architecture and each ally, one clock cycle is eliminated from the nested KES of each
group also has 6 multipliers, they require many more multi- decoding round by reformulating the nested KES algorithm.
pliers for scalar pre-computation. Besides, the application of Efficient hardware architectures are also developed and the
the slow-down technique in the SNK architecture doubles the critical path is further reduced by re-timing. Overall, the
number of registers. Also an inverter over GF (210 ) takes the proposed nested KES design can achieve significant reductions
area of around 390 XOR gates to implement. Accordingly, in the area and latency for short GII-BCH codes compared
it can be estimated that the proposed design requires 25% to previous designs. Future work will study other decoder
and 51% smaller area than the SNK and SFNK architectures, components for short GII-BCH codes.
R EFERENCES
[1] X. Tang and R. Koetter, “A novel method for combining algebraic
decoding and iterative processing,” in Proc. IEEE Int. Symp. Inf. Theory,
Seattle, WA, USA, Jul. 2006, pp. 474-478.
[2] Y. Wu, “Generalized integrated interleaved codes,” IEEE Trans. Inf.
Theory, vol. 63, no. 2, pp. 1102-1119, Feb. 2017.
[3] D. V. Sarwate and N. R. Shanbhag, “High-speed architecture for Reed-
Solomon decoders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 9, no. 5, pp. 641-655, Oct. 2001.
[4] W. Li, J. Lin, and Z. Wang, “A 124-Gb/s decoder for generalized
integrated interleaved codes,” IEEE Trans. Circuits and Syst. I: Reg.
Papers, vol. 66, no. 8, pp. 3174-3187, Aug. 2019.
[5] X. Zhang and Z. Xie, “Efficient architectures for generalized integrated
interleaved decoder,” IEEE Trans. Circuits Syst. I: Reg. Papers, vol. 66,
no. 10, pp. 4018-4031, Oct. 2019.
[6] Z. Xie and X. Zhang, “Scaled nested key equation solver for generalized
integrated interleaved decoder,” IEEE Trans. Circuits Syst. II: Exp.
Briefs, vol. 67, no. 11, pp. 2457-2461, Nov. 2020.
[7] Z. Xie and X. Zhang, “Reduced-complexity key equation solvers for
generalized integrated interleaved BCH decoders,” IEEE Trans. Circuits
Syst. I: Reg. Papers, vol. 67, no. 12, pp. 5520-5529, Dec. 2020.
[8] Z. Xie and X. Zhang, “Fast nested key equation solvers for generalized
integrated interleaved decoder,” IEEE Trans. Circuits Syst. I: Reg.
Papers, vol. 68, no. 1, pp. 483-495, Jan. 2021.
[9] Z. Xie and X. Zhang, “Scaled fast nested key equation solver for
generalized integrated interleaved BCH decoders,” in Proc. IEEE Int.
Conf. Acoust. Speech Signal Process., Toronto, Canada, Jun. 2021, pp.
7883-7887.
[10] E. R. Berlekamp, Algebraic Coding Theory, McGraw-Hill, 1968.
[11] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation, John Wiley & Sons, 1999.

You might also like