10vlsisoc Sphere

This document summarizes a VLSI implementation of sphere decoding optimized for area and throughput. Sphere decoding is used for data detection in MIMO wireless systems. The authors propose architectural optimizations for sphere decoding cores to improve the area-delay product when multiple cores are used in wideband MIMO systems. These include a low-complexity Schnorr-Euchner enumeration and pipeline interleaving. VLSI results show the optimized implementation reduces the area-delay product by almost 50% compared to other reported sphere decoding implementations.

Uploaded by

Duy Nv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

10vlsisoc Sphere

Uploaded by

Duy Nv

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Area- and Throughput-Optimized VLSI

Architecture of Sphere Decoding

Markus Wenk, Lukas Bruderer, Andreas Burg
Integrated Systems Laboratory
ETH Zurich, CH-8092 Zurich, Switzerland
E-mail: mawenk,bruderer,[email protected]
Christoph Studer
Communication Technology Laboratory
ETH Zurich, CH-8092 Zurich, Switzerland
E-mail: [email protected]
AbstractSphere decoding (SD) is a promising means for
implementing high-performance data detection in multiple-input
multiple-output (MIMO) wireless communication systems. In this
paper, we focus on the register transfer level implementation
of SD with minimum area-delay product for application in
wideband MIMO communication systems, such as IEEE 802.11n,
where multiple SD cores need to be instantiated. The basic
architectural considerations and the proposed optimizations are
explained based on hard-output SD, but are also applicable to
soft-output SD. Corresponding VLSI implementation results (for
both hard-output and soft-output SD) show an improvement in
the area-delay product by almost 50 % compared to that of other
SD implementations reported in the literature.
I. INTRODUCTION
The ability to increase throughput and range without requir-
ing more bandwidth or transmit power renders multiple-input
multiple-output (MIMO) communication the key technology
for wideband communication standards [1]. The MIMO gains
come, however, at the cost of (often signicant) complexity re-
quired for data detection. Maximum-likelihood (ML) detection
provides excellent error-rate performance, but a straightfor-
ward implementation requires to exhaustively test all possible
transmit symbols. For high spectral efciencies, the exponen-
tial complexity increase of the number of candidate symbols
(in the number of transmit antennas) is prohibitive, even for
practical data-rates.
The sphere decoding (SD) algorithm [2] is one of the most
promising methods for ML detection in MIMO systems, since
its average complexity is far below that of an exhaustive
search. The basic idea behind SD is to transform MIMO
detection into a weighted tree-search problem, which is then
solved efciently by a branch-and-bound procedure. The main
drawback of this approach lies in the fact that the decoding
effort for SD is essentially determined by the number of nodes
to be examined in that tree for each received symbol. For most
VLSI implementations of SD, the number of visited nodes
corresponds to the number of clock cycles required for each
symbol [3]. This number depends on the channel and the noise
realization. In the worst-case, all nodes in the tree must be
examined, corresponding to the (often prohibitive) complexity
of an exhaustive search. Since on-chip storage and higher-layer
requirements limit the latency that may be inferred to support
the processing of symbols for which the decoding effort lies far
above the average, the worst-case complexity of SD renders its
application in real-world systems difcult. This problem can
be mitigated by limiting the maximum decoding effort through
early termination of the decoding process, e.g., [4]. Such
constraints, however, lead to a tradeoff between the maximum
decoding effort and the receiver performance. A universally
applicable VLSI architecture for a MIMO detector suitable
for wideband MIMO systems must therefore be tailored to
provide a straightforward solution to adjust this tradeoff and
minimize overall silicon area for a given minimum perfor-
mance requirement.
Outline and Contributions: In this paper, we describe the
design and optimization of a SD core that is suitable for
wideband MIMO systems. To this end, we rst review the
SD algorithm and we argue that the optimization target for
each SD core in a wideband system differs from that usually
employed in narrow-band MIMO systems, where a single
SD-core can handle the throughput requirement (Sec. II). In
Sec. III, we describe the register-transfer-level (RTL) archi-
tecture for hard-decision SD and propose a low-complexity
approximation to the Schnorr-Euchner (SE) enumeration. We
also introduce pipeline interleaving and analyze the level of
pipelining required to yield the lowest area-delay product. In
Sec. IV, we discuss our results and present a comparison to
other SD implementations.
For better understanding and due to the limited page count,
we focus on hard-output SD throughout the paper. The pre-
sented architecture, the proposed enumeration scheme, and
pipeline interleaving can, however, also be applied to soft-
output SD architectures (e.g.,single tree-search (STS) SD [5]).
To support this claim, corresponding performance and imple-
mentation results are presented.
II. SPHERE DECODING AND WIDEBAND
MIMO RECEIVER ARCHITECTURE
In the following, we introduce the MIMO system model,
summarize the SD algorithm, and provide an overview of a
wideband MIMO receiver architecture for the case where a sin-
gle SD core is insufcient to meet the throughput-requirements
associated with a high communication bandwidth.
A. MIMO Detection as a Weighted Tree-Search Problem
System model: We consider a MIMO system employing
spatial-multiplexing with M
T
transmit and M
R
M
T
re-
ceive antennas. The data to be transmitted is mapped to
M
T
-dimensional transmit vectors s O
MT
, where O is
Fig. 1. MIMO detection as a weighted tree-search problem illustrated for
M
T
= 3 and QPSK modulation.
the complex-valued scalar constellation. The baseband input-
output relation, as seen by the MIMO detector, is given by
y = Hs +n (1)
where H is the complex-valued M
R
M
T
channel matrix
and n is an i.i.d. circularly symmetric complex Gaussian noise
vector of dimension M
R
. The ML detection rule for the input-
output relation in (1) is given by
s = arg min
sO
M
T
|y Hs|
2
. (2)
Sphere decoding: SD [6] starts from the QR decomposition
of the channel matrix H = QR, with Q being unitary of
dimension M
R
M
T
and R being M
T
M
T
upper-triangular.
This decomposition allows to rewrite (2) as
s = arg min
sO
M
T
| y Rs|
2
(3)
with y = Q
H
y. Thanks to the upper-triangularity of R,
the minimization problem (3) can be interpreted as a
weighted tree-search problem where the nodes of the tree
on level i are associated with a partial symbol vector
s
(i)
= [ s
i
s
MT
]
T
and with a corresponding partial Eu-
clidean distance (PED) d
i

s
(i)

. Fig. 1 illustrates the cor-

responding weighted tree for a MIMO system with M
T
=
M
R
= 3 using QPSK modulation. When starting from the
root of the tree (at level i = M
T
+ 1 with d
MT+1
= 0),
the PEDs can efciently be computed in a recursive manner
according to
d
i

s
(i)

= d
i+1

s
(i+1)

+[b
i+1
R
i,i
s
i
[
2
(4)
using the denition
b
i+1
= y
i

k=i+1
R
i,k
s
k
(5)
when proceeding from a parent node on level i + 1 to one
of its children on level i. The ML solution corresponds to
the path through the tree leading to the leaf associated with
the smallest PED. To nd this leaf, SD traverses the tree in
a depth-rst manner. Complexity reduction (compared to an
exhaustive search) is achieved by pruning those nodes from the
tree for which d
i

s
(i)

is larger than a radius r > 0. We use a

technique known as radius-reduction [6], which initializes the
radius to r (prior to detection) and performs the radius
Ch. est,
QR decom.
Sphere
decoder
Re-order
buffer
Sphere
decoder
FIFO
FIFO RAM
Channel
decoding
MIMO detection
MIMO pre-processing
OFDM
demodulation
Training
Data
Runtime
limit
Scheduler
Fig. 2. System architecture of a wideband MIMO receiver.
update r d
1

s
(1)

whenever a leaf-node s
(1)
is reached. In
the following, we refer to the condition d
i

s
(i)

< r as the
sphere constraint (SC).
B. Wideband MIMO Receiver Architecture
In wideband MIMO systems, such as IEEE 802.11n, a
single SD core is usually insufcient to support both the band-
width and the (error-rate) performance requirements, even for
advanced process technologies. Hence, multiple SD-cores are
necessary to meet the associated throughput and performance
requirements.
Architecture overview: The high-level system architecture
of a wideband MIMO receiver based on SD is illustrated
in Fig. 2. The data ow starts with the OFDM demodulation.
During a training phase, received training symbols are deliv-
ered to a MIMO preprocessing unit. This unit estimates the
channel matrices H and performs necessary pre-computations
on H (i.e., the QR decomposition). During the data phase, the
demodulation unit and the MIMO preprocessing unit forward
the received vectors and the results of the pre-computation
of the corresponding channel matrices to the MIMO detector
at a constant arrival rate, which is essentially given by the
communication bandwidth of the system. In the MIMO de-
tector, the information required to decode a symbol is rst
queued in a FIFO. A scheduler reads the entries of the FIFO
and forwards them to the next idle SD core together with a
runtime constraint (i.e., a constraint on the number of nodes
that are allowed to be examined by SD). When the FIFO lls
up, the runtime constraints are reduced to ensure that no data
is lost. Note that this reduction degrades the quality of the
detection.
1
The outputs from the N SD cores are collected
and reordered since the variable runtime may cause decoded
symbols to arrive out-of-order. The reordered symbol estimates
are then forwarded to the channel-decoding block.
Implications on SD core optimization: With the above
described architecture, the average decoding effort, i.e., the
number of visited nodes that can be allocated for decoding of
each symbol is determined by

N
T
c
B
[nodes]
where B denotes the bandwidth of the system (i.e., the arrival-
rate of the symbols to be decoded), T
c
is the clock period
of a SD core (assuming one node in the tree is checked in
each cycle), and N is the number of SD instances. At the
1
The particularities of the scheduling mechanism and the associated per-
formance tradeoffs are outside the scope of this paper, which focuses on the
implications on the RTL optimization of the SD cores.
Fig. 3. High-level block diagram of the SD architecture. The shaded registers
and the ring buffer (in the level cache) are only required when pipeline
interleaving is applied.
system-level, the performance/complexity tradeoff can now be
adjusted by the choice of N. The resulting area of such a
system corresponds to A
tot
= NA
SD
, where A
SD
denotes
the silicon area of a single SD core. For large N, the overall
silicon area for a guaranteed number of visited nodes

that
can be used for decoding received symbols, is given by
A
tot

B
SD
with
SD
= T
c
A
SD
. (6)
From (6), it follows that if multiple SD cores are necessary
to meet the performance requirements of a wideband MIMO
system, the focus for the optimization of the SD core shifts
from minimizing the area or maximizing the throughput to
minimizing the corresponding area-delay (AT-)product
SD
.
III. VLSI ARCHITECTURE OF HARD-OUTPUT SD
On the rst level of hierarchy, the proposed SD architecture
is similar to the one proposed in [3]. In the following,
we summarize this architecture and describe a number of
optimizations that result in an improved AT-product compared
to previously reported SD-implementations.
A. High-level Architecture
Fig. 3 shows the high-level block diagram of the proposed
SD circuit. The design is comprised of a metric computation
unit (MCU), a metric enumeration unit (MEU), an SC check
unit, a level-select multiplexer, and a cache.
The MCU is responsible for the forward-iteration of the
depth-rst tree-traversal. In the implementation [3], this for-
ward iteration includes the sequential evaluation of (5) and
the computation of the PED in (4). In the present circuit
(cf. Fig. 4), a slicer-unit performs a decision on the nearest
constellation point and the MCU computes b
i
(instead of b
i+1
)
in parallel to the PED of level i + 1 as proposed in [7]. The
resulting b
i
is then used in the next iteration (provided that the
SC is met); this optimization reduces the critical path without
the need for additional hardware.
The MEU operates in parallel to the MCU. While the MCU
is processing a node on layer i, the MEU selects the next-best
constellation point on layer i +1 according to an enumeration
scheme and computes its PED. Hence, once the SD algorithm
Fig. 4. RTL block diagram of the MCU and MEU. The shaded registers are
only required when pipeline interleaving is applied.
needs to move upward in the tree, the MCU can directly start
the next forward iteration as all required intermediate results
have already been computed beforehand by the MEU. The
RTL architecture of the MEU (cf., Fig. 4) is similar to the
one of the MCU. However, the slicer-unit that determines the
closest CP is replaced by an enumeration unit that determines
which CP should be considered next on layer i + 1.
The cache stores intermediate results for each level com-
puted by the MEU and the MCU. The SC check is carried out
immediately after the computation of the new PEDs. MEU,
MCU, level cache, and the result of the SC check decide on
which layer the SD algorithm proceeds next. If a leaf that
fullls the SC is found, the radius is updated. In this case an
additional clock cycle is necessary, as the PEDs in the level
cache need to be checked against the new radius.
B. Enumeration Strategy
The enumeration strategy (implemented by the enumeration
unit in the MEU) denes the order in which the children
of a node are visited. Radius reduction (cf. Section II-A)
is most efcient in combination with the Schnorr-Euchner
(SE) enumeration [8], which visits the children of a node in
ascending order of their PEDs. An important advantage of
this enumeration strategy is that leaves that are more likely to
lead to the ML solution are found early, which expedites the
pruning of the tree. Moreover, enumeration of the children of
a node can terminate as soon as the rst child violates the SC.
Implementation of Schnorr-Euchner enumeration: For each
visited node, SE enumeration is comprised of two types of
operations: The rst operation is to initialize the enumera-
tion of the children by identifying the child associated with
the smallest PED. This task can easily be accomplished by
comparing b
i+1
in (5) to a number of decision boundaries,
i.e., by performing a slicing operation in the MCU of Fig. 3.
The second type of operation is to enumerate the remaining
children in ascending order of their PEDs, which is a non-
trivial task for complex-valued constellations. In order to
minimize the AT-product
SD
of the SD core, an efcient
implementation of this operation is of paramount importance.
Exhaustive enumeration: Exhaustive enumeration is a
straightforward (but rather inefcient) solution to perform SE
enumeration [3]. The idea is to rst compute the PEDs of
all children of a node. During enumeration, a min-search
(limited to the subset of children that have not yet been visited)
identies the next child. The main drawbacks of this solution
are i) the area requirement to compute the PEDs of all children
of a node, ii) the need to store them in the cache, and iii) the
fact that a min-search is costly in terms of area and timing,
especially for higher order constellations.
Subset enumeration: More elaborate solutions for SE enu-
meration were presented in [3], [6], and [9]. The main idea
of these approaches is to divide the complex-valued (two-
dimensional) constellation into one-dimensional subsets which
only require to compute and store one PED per subset and
consequently also reduce the complexity of the min-search.
Unfortunately, the number of required subsets gets large for
higher-order modulation schemes, which has considerable im-
pact on circuit area and timing of implementations supporting
64-QAM.
C. Approximate SE Enumeration
The goal of considering approximations to SE enumeration
is to perform the enumeration without the need for computing,
caching, and comparing PEDs for multiple candidate CPs on
the same level. Such, approximations based on geometrical
considerations were rst proposed in [10] and [11]. The basic
idea is to store predened enumeration sequences in one or
multiple look-up tables (LUTs). A xed sequence is chosen
based on several geometric rules that analyze the position
of the received point b
i+1
relative to the closest CP. The
accuracy of these techniques can be adjusted by the number
and complexity of the associated selection criteria together
with the number of predened LUTs. The major drawback of
this approach is the rather poor scaling behavior of the size
of the LUTs required for higher-order modulation schemes.
Ordered l
f
-Norm Enumeration: In the following, we de-
scribe an approximation to SE enumeration that can be im-
plemented efciently in hardware without the need for LUTs
and therefore, scales well to higher-order constellations (i.e.,
constellations including and beyond 64-QAM).
Inspired by the l
f
-norm SD algorithm [3], [12], we de-
ne the l
f
-norm of a vector x according to |x|
f
=
max['(x)[, [(x)[, where '(x) and (x) denote the real
and imaginary part of the entries of x, respectively. The
starting point for the enumeration is trivially determined by
the closest CP (in Euclidean distance). However, the CPs are
enumerated
2
according to their l
f
-norm distance
d
f
= [b
i+1
R
i,i
s
i
[
f
= max['(b
i+1
R
i,i
s
i
)[ , [(b
i+1
R
i,i
s
i
)[
from b
i+1
. To this end, the area around the closest CP is rst
subdivided into eight sectors as illustrated in the lower right
corner of Fig. 5. The sector containing b
i+1
is identied with
simple geometric rules to dene the second CP in the enumer-
ation and the direction for the ordered l
f
-norm enumeration.
CPs with identical l
f
-norm form one-dimensional subsets.
2
We use the l
f
-norm only for enumeration, whereas the algorithm in [3],
[12] also uses it for distance computations.
Fig. 5. Principle of ordered l
f
-norm enumeration for 64-QAM modulation.
All nodes within the same subset are processed before the
algorithm selects the next subset. In the example provided in
Fig. 5, the processing order of the one-dimensional subsets is
illustrated by the leading number attached to each CP. Within
each subset, zig-zag enumeration is applied around the CP
closest to b
i+1
; this is illustrated by the trailing number in
Fig. 5. The members of each subset are returned in SE order
and subsets are enumerated in order of increasing l
f
-norm.
RTL architecture: For the RTL implementation, the above-
described enumeration algorithm can be split into two basic
tasks: i) tracking of the position, size, and orientation of
the linear subsets, and ii) zig-zag enumeration within the
subsets and checking for the boundaries of the nite-size
modulation alphabet. Both tasks can be implemented using
simple combinational logic, comparators, and three counters.
Hence, the required circuit complexity is low.
Impact on error-rate performance and number of visited
nodes: Besides a reduction of the hardware complexity, the
approximation to the SE enumeration has an impact on the
number of visited nodes and on the (error-rate) performance
of the SD algorithm. The reason for this impact lies in the fact
that the approximation does not guarantee that the children
of a node are always enumerated strictly in ascending order
of their PEDs (only the rst three CPs always correspond
to the rst three CPs obtained by SE enumeration). Hence,
numerical simulations are performed to verify that the error-
rate implementation loss due to the approximation of the
enumeration is low and that the number of visited nodes does
not increase substantially. Corresponding results
3
for hard- and
soft-output SD are shown in Fig. 6. It can be seen that the loss
in terms of the coded frame-error rate (FER) performance is
negligible and that the number of visited nodes with l
f
-norm
3
We consider coded (rate 2/3 convolutional code, constraint length 7,
generator polynomials [133o 171o], and random interleaving across space
and frequencies) MIMO-OFDM transmission with M
R
= M
T
= 4, 64-
QAM (Gray mapping), 64 OFDM tones. One frame corresponds to 1536
coded bits. A TGn type C [13] channel model is used. We assume perfect
channel state information at the receiver and employ minimum mean-square
error sorted QR decomposition (MMSE-SQRD) [14] for SD-preprocessing.
The SNR is per receive antenna.
Fig. 6. FER performance and average number of visited nodes for ordered
l
f
-norm and SE enumeration (M
T
= M
R
= 4 using 64-QAM).
enumeration is slightly less (i.e., approximately 5 %) compared
to exact SE enumeration.
D. Pipeline Interleaving
Pipelining cannot directly be applied to SD due to the rst-
order feedback path present in the architecture. Nevertheless,
symbol-wise pipeline interleaving can be used to shorten the
critical path. The main idea of this approach is to process
multiple (independent) symbol-vectors in parallel within the
same circuit. This basic idea has already been suggested for
SD [15], [16], but neither details on suitable locations of the
pipeline registers, nor a discussion of the number of pipeline
stages yielding the optimal AT-product has been provided.
Fig. 3 and Fig. 4 show the location of the pipeline registers
(in light grey) in the RTL architecture for three pipeline stages.
The location was manually chosen to approximately balance
the path delays between the pipeline stages and register-
retiming during synthesis was allowed for further optimization.
Besides adding the pipeline registers in the datapath, the level
cache in Fig. 3 was extended to a ring-buffer in which each
entry is associated with one of the symbols in the pipeline and
corresponds to one instance of the original level cache.
IV. IMPLEMENTATION RESULTS AND COMPARISON
A. Results for Hard-Output SD
The AT-diagram in Fig. 7 shows the implementation re-
sults of hard-output SD with ordered l
f
-norm enumeration
and pipeline interleaving with different number of pipeline
stages
4
. The proposed architectures have been implemented
with support for multiple modulation schemes (BPSK, QPSK,
16-QAM, and 64-QAM) and for up to four spatial streams.
The architecture with three pipeline stages achieves the best
AT-product. But also the architectures with more than three
pipeline stages come close to AT-optimality, whereas the archi-
tectures with fewer pipeline stages are clearly outperformed in
terms of hardware-efciency. For comparison, implementation
4
The results were obtained by synthesizing the RTL description in VHDL
with different timing constraints.
0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
50
60
70
80
Clock Period in ns
A
r
e
a

i
n

k
G
E

6 pipeline stages
5 pipeline stages
4 pipeline stages
3 pipeline stages
2 pipeline stages
unpipelined
constant AT
1
2
3
4
5
6
Fig. 7. AT-diagram of hard-output SD with different number of pipeline
stages. The optimal synthesis results (in terms of the area/delay tradeoff) are
highlighted by circles and implementation results of previous architectures are
indicated by stars. All designs are scaled to 0.13 m CMOS technology.
results of previous hard-output SD implementations are also
included in Fig. 7 (the results are also summarized in Tbl. I).
It can be seen that the proposed unpipelined hard-output SD
architecture outperforms previous unpipelined designs by a
least 23% in terms of area and by at least 28% in terms of
clock frequency
5
. Furthermore, the AT-product [kGE/MHz] of
the proposed architecture with pipeline interleaving is more
than a factor two better than that of a previously reported
implementation [15] with pipeline interleaving.
B. Application to Soft-Output STS-SD
The proposed enumeration scheme and pipeline interleaving
can also be applied to soft-output SD. The corresponding
architecture is based on the soft-output single tree-search
(STS) algorithm proposed in [5]. Fig. 6 demonstrates that also
for STS-SD, the FER performance loss due to the proposed
l
f
-norm enumeration scheme is negligible and the average
number of visited nodes is slightly reduced. Implementation
results for soft-output STS-SD with the proposed l
f
-norm
enumeration scheme are shown in Tbl. II and compared to
previous soft-output detection implementations. The presented
implementation is clearly superior in terms of area and clock
frequency compared to the soft-output detector shown in [11].
The original implementation of soft-output STS-SD in [5] only
supports 16-QAM modulation, which is the main reason for
the smaller area in the unpipelined case. For hard-output SD,
pipeline interleaving with three pipeline stages showed to be
pareto-optimal. As the additional units required for STS-SD do
not inuence the critical path, STS-SD was also implemented
with three pipeline stages. Tbl. II shows that the AT-product
is improved by more than 30 % due to pipeline interleaving.
C. The Case for Multiple SD-Cores
In Section II, we argued that a single SD core is in-
sufcient to meet the bandwidth and error-rate performance
requirements of modern wireless communication standards
5
The clock frequencies of the designs are scaled to a 0.13 m CMOS
technology.
TABLE I
IMPLEMENTATION RESULTS AND COMPARISON OF HARD-OUTPUT SD.
[3] [5] [15] This work
CMOS Tech. 0.25 m 0.25 m 0.13 m 0.13 m
Antennas 44 44 44 11 to 44
Modulation 16-QAM 16-QAM 16-QAM BPSK to 64-QAM
Norm l
f
l
2
l
f
l
2
Enumeration SE SE SE ordered l
f
-norm
Pipeline stages no no 3 no 3 5
Area
a
[kGE] 50 34.4 70 27.1 38.4 55.3
Freq. [MHz] 137
b
140
b
333 196 455 625
[kGE/MHz] 0.37 0.25 0.21 0.14 0.08 0.09
Throughput for Davg = 7
c
[Mbps]
470 480 1141 672 1560 2143
TABLE II
IMPLEMENTATION RESULTS AND COMPARISON OF SOFT-OUTPUT SD FOR
A 4 4 MIMO-OFDM SYSTEM.
[11] [5] This work
CMOS Technology 0.13 m 0.25 m 0.13 m
Modulation 64-QAM 16-QAM BPSK to 64-QAM
Enumeration tabular SE ordered l
f
-norm
Pipeline stages no no no 3
Area
a
[kGE] 350 56.8 70.4 97.1
Max. frequency [MHz] 198 137
b
183 383
AT-product [kGE/MHz] 1.77 0.41 0.38 0.25
a
One GE corresponds to the area of a two-input drive-one NAND gate.
b
Scaled from 0.25 m to 0.13 m by multiplying by 0.25/0.13.
c
Davg denotes the average number of nodes used for block processing [4].
such as IEEE 802.11n, where a throughput of 600 Mbps is
required. From Tbl. I, we observe that hard-output SD meets
the throughput requirement when early-termination and block-
processing according to [4] are applied.
For soft-output STS-SD, the number of visited nodes is
signicantly increased: from seven for hard-output SD to a
least 100 for soft-output STS-SD
6
. To illustrate the necessity
for multiple soft-output STS-SD cores, we hypothetically
assume D
avg
= 100 for 64-QAM modulation. The throughput
of one STS-SD core is then 92 Mbps. To fulll the through-
put requirement of 802.11n, up to seven STS-SD cores are
required.
V. CONCLUSION
To meet the throughput and latency requirements of wide-
band systems (e.g., IEEE 802.11n) with sphere decoding (SD),
multiple detection cores need to be instantiated. Therefore, the
efciency or the area-delay product of a single SD core needs
to be optimized. To this end, two techniques, namely ordered

f
-norm enumeration and pipeline interleaving, have been
proposed. The enumeration scheme signicantly reduces cir-
cuit area and the critical path-delay. Simulations also showed,
that the performance loss due to the new enumeration scheme
6
A in-depth evaluation for the number of visited nodes for 64-QAM
goes beyond the scope of this paper and involves optimizations of different
parameters (e.g., clipping level, run-time constraint, SNR requirement). We
expect, based on simulation results, a hundred to a few hundreds of nodes to
be visited. For 16-QAM, the numbers have been presented in [5].
is negligible. With pipeline interleaving multiple independent
symbol vectors are processed in parallel and the available
hardware resources are better exploited. A design-space explo-
ration with different number of pipeline stages revealed that
the architecture with three pipeline stages is the most efcient.
With these two approaches, the area-delay product is improved
by almost 50 % compared to that of other SD implementations.
Finally, we showed that both approaches can also be applied
to soft-output SD.
ACKNOWLEDGMENT
The authors thank H. Friederich, P. Luethi, N. Felber,
W. Fichtner, and H. B olcskei for their support during the
design of the SD architecture. This work was partially
supported by the STREP project MASCOT (IST-026905)
within the Sixth Framework of the European Commis-
sion and by the Swiss National Science Foundation project
No. PP002-119052.
REFERENCES
[1] H. B olcskei, D. Gesbert, C. Papadias, and A. J. van der Veen, Eds.,
Space-Time Wireless Systems: From Array Processing to MIMO Com-
munications. Cambridge Univ. Press, 2006.
[2] U. Fincke and M. Pohst, Improved methods for calculating vectors of
short length in a lattice, including a complexity analysis, Mathematics
of Computation, vol. 44, no. 170, pp. 463471, Apr. 1985.
[3] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and
H. B olcskei, VLSI implementation of MIMO detection using the sphere
decoding algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, pp.
15661577, Jul. 2005.
[4] A. Burg, M. Borgmann, M. Wenk, C. Studer, and H. B olcskei, Ad-
vanced receiver algorithms for MIMO wireless communications, in
DATE 06: Proc. of the conf. on design, automation and test in Europe,
Mar. 2006, pp. 593598.
[5] C. Studer, A. Burg, and H. B olcskei, Soft-output sphere decoding:
Algorithms and VLSI implementation, IEEE J. Sel. Areas Commun.,
vol. 26, no. 2, pp. 290300, Feb. 2008.
[6] B. M. Hochwald and S. ten Brink, Achieving near-capacity on a
multiple-antenna channel, IEEE Trans. Commun., vol. 51, no. 3, pp.
389399, Mar. 2003.
[7] E. M. Witte, F. Borlenghi, G. Ascheid, R. Leupers, and H. Meyr, A
scalable VLSI architecture for soft-input soft-output depth-rst sphere
decoding, 2009, available online at http://arxiv.org/abs/0910.3427.
[8] E. Agrell, T. Eriksson, A. Vardy, and K. Z. r, Closest point search in
lattices, IEEE Trans. Inf. Theory, vol. 48, no. 8, pp. 22012214, Aug.
2002.
[9] C. Hess, M. Wenk, A. Burg, P. Luethi, C. Studer, N. Felber, and
W. Fichtner, Reduced-complexity MIMO detector with close-to ML
error rate performance, in Proc. 17th ACM Great Lakes Symposium on
VLSI, 2007, pp. 200203.
[10] B. Mennenga and G. Fettweis, Search sequence determination for tree
search based detection algorithms, in IEEE Sarnoff Symposium, Apr.
2009, pp. 16.
[11] L. Chun-Hao, W. To-Ping, and C. Tzi-Dar, A 74.8 mw soft-output
detector IC for 88 spatial-multiplexing MIMO communications, IEEE
J. Solid-State Circuits, vol. 45, no. 2, pp. 411421, Feb. 2010.
[12] D. Seethaler and H. B olcskei, Performance and complexity analysis of
innity-norm sphere-decoding, IEEE Trans. Inf. Theory, vol. 56, no. 3,
pp. 10851105, Mar. 2010.
[13] V. Erceg and et al., TGn channel models. IEEE 802.11-03/940r4, May
2004.
[14] D. W ubben, R. B ohnke, V. K uhn, and K.-D. Kammeyer, Mmse
extension of v-blast based on sorted QR decomposition, in IEEE 58th
Vehicular Technology Conference, Oct. 2003, pp. 508512.
[15] A. Burg, M. Wenk, and W. Fichtner, VLSI implementation of pipelined
sphere decoding with early termination, Proceedings of the European
Signal Processing Conference, Sep. 2006, invited paper.
[16] J. Lee, S.-C. Park, and S. Park, A pipelined VLSI architecture for a
list sphere decoder, in Proc. IEEE Int. Symp. on Circuits and Systems
(ISCAS06), Sep. 2006, pp. 397400.

Maths Test 1 PDF
No ratings yet
Maths Test 1 PDF
2 pages
Massive MIMO Detection Algorithm and VLSI Architecture 2019 PDF
No ratings yet
Massive MIMO Detection Algorithm and VLSI Architecture 2019 PDF
348 pages
Simulation of Digital Communication Systems Using Matlab
From Everand
Simulation of Digital Communication Systems Using Matlab
Mathuranathan Viswanathan
3.5/5 (22)
Class 3 Comp Worksheet No2
67% (3)
Class 3 Comp Worksheet No2
7 pages
VLSI Implementation of Hard-And Soft-Output Sphere Decoding For Wide-Band MIMO Systems
No ratings yet
VLSI Implementation of Hard-And Soft-Output Sphere Decoding For Wide-Band MIMO Systems
27 pages
Soft-Output Sphere Decoding Performance and Implementation Aspects PDF
No ratings yet
Soft-Output Sphere Decoding Performance and Implementation Aspects PDF
6 pages
Sphere Decoder For Massive MIMO Systems
No ratings yet
Sphere Decoder For Massive MIMO Systems
6 pages
15 - 4 - 21trans. IEEE 2012 - Software-Defined Sphere Decoding For FPGA-Based MIMO Detection
No ratings yet
15 - 4 - 21trans. IEEE 2012 - Software-Defined Sphere Decoding For FPGA-Based MIMO Detection
10 pages
Software-Defined Sphere Decoding For FPGA-based MIMO Detection
No ratings yet
Software-Defined Sphere Decoding For FPGA-based MIMO Detection
22 pages
Symbol Detection in MIMO System: y HX + V
No ratings yet
Symbol Detection in MIMO System: y HX + V
12 pages
Novel MIMO Detection Algorithm For High-Order Constellations in The Complex Domain
No ratings yet
Novel MIMO Detection Algorithm For High-Order Constellations in The Complex Domain
14 pages
VLSI Implementation of MIMO Detection Using The Sphere Decoding Algorithm
No ratings yet
VLSI Implementation of MIMO Detection Using The Sphere Decoding Algorithm
12 pages
Complexity of Sphere Decoding
No ratings yet
Complexity of Sphere Decoding
5 pages
A Reduced-Complexity Sphere Decoding Algorithm For MIMO Systems
No ratings yet
A Reduced-Complexity Sphere Decoding Algorithm For MIMO Systems
5 pages
Collaborative Sphere Decoder For A MIMO Communication System
No ratings yet
Collaborative Sphere Decoder For A MIMO Communication System
7 pages
System Architecture and Implementation of MIMO PDF
No ratings yet
System Architecture and Implementation of MIMO PDF
10 pages
Sphere Decoding For Spatial Modulation Systems
No ratings yet
Sphere Decoding For Spatial Modulation Systems
4 pages
Design of MIMO K-Best Detection Algorithm and Its FPGA Implementation
No ratings yet
Design of MIMO K-Best Detection Algorithm and Its FPGA Implementation
7 pages
SPH Decodin
No ratings yet
SPH Decodin
44 pages
Iterative QR Decomposition Architecture Using The Modified GramSchmidt Algorithm For MIMO Systems
No ratings yet
Iterative QR Decomposition Architecture Using The Modified GramSchmidt Algorithm For MIMO Systems
8 pages
Low-Complexity Iterative Detection For Large-Scale Multiuser MIMO-OFDM Systems Using Approximate Message Passing
No ratings yet
Low-Complexity Iterative Detection For Large-Scale Multiuser MIMO-OFDM Systems Using Approximate Message Passing
14 pages
Thesis On WiMax
No ratings yet
Thesis On WiMax
188 pages
Optimizing_a_Pipelined_MIMO_Sphere_Detector_for_Energy_Efficiency
No ratings yet
Optimizing_a_Pipelined_MIMO_Sphere_Detector_for_Energy_Efficiency
6 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
Diversity and Spatial Multiplexing of MIMO Amplitude Detection Receivers
No ratings yet
Diversity and Spatial Multiplexing of MIMO Amplitude Detection Receivers
5 pages
Incredible VLSI Design For MIMO System Using SEC-QPSK Detection
No ratings yet
Incredible VLSI Design For MIMO System Using SEC-QPSK Detection
12 pages
Paper - A Reconfigurable Architecture For MIMO Detection Using CORDIC Operator
No ratings yet
Paper - A Reconfigurable Architecture For MIMO Detection Using CORDIC Operator
4 pages
Paper 14 PDF
No ratings yet
Paper 14 PDF
6 pages
Detailed Study On Low Complexity Detection Techniques For Large MIMO System: A Review
No ratings yet
Detailed Study On Low Complexity Detection Techniques For Large MIMO System: A Review
7 pages
Efficient Soft MIMO Detection Algorithms Based On Differential Metrics
No ratings yet
Efficient Soft MIMO Detection Algorithms Based On Differential Metrics
5 pages
Efficient DSP Circuit Architectures For Massive MIMO 1807.05882
No ratings yet
Efficient DSP Circuit Architectures For Massive MIMO 1807.05882
21 pages
Design of Efficient Massive MIMO For 5G Systems - Present and Past: A Review
No ratings yet
Design of Efficient Massive MIMO For 5G Systems - Present and Past: A Review
4 pages
Vadim Kbest Detector
No ratings yet
Vadim Kbest Detector
4 pages
Compressive Sensing Based Multi-User Detector For Large-Scale SM-MIMO Uplink
No ratings yet
Compressive Sensing Based Multi-User Detector For Large-Scale SM-MIMO Uplink
6 pages
Channel Estimation and Symbol Detection in Massive MIMO Systems U
No ratings yet
Channel Estimation and Symbol Detection in Massive MIMO Systems U
122 pages
"Research Note" A New Modified Viterbo-Boutros Sphere Decoding Algorithm
No ratings yet
"Research Note" A New Modified Viterbo-Boutros Sphere Decoding Algorithm
6 pages
PengTCAS18
No ratings yet
PengTCAS18
14 pages
A Performance Study of MIMO Detectors: Christoph Windpassinger, Lutz Lampe, Robert F. H. Fischer, Thorsten Hehn
No ratings yet
A Performance Study of MIMO Detectors: Christoph Windpassinger, Lutz Lampe, Robert F. H. Fischer, Thorsten Hehn
16 pages
Channel Estimation in TDD and FDD-Based Massive MIMO Systems (Mirzaei Javad - 202106 - PHD - Thesis)
No ratings yet
Channel Estimation in TDD and FDD-Based Massive MIMO Systems (Mirzaei Javad - 202106 - PHD - Thesis)
159 pages
A Comparative Study of QRDM Detection and Sphere Decoding For Mi
No ratings yet
A Comparative Study of QRDM Detection and Sphere Decoding For Mi
5 pages
Enhanced Mobile Digital Video Broadcasting With Distributed Space-Time Coding
No ratings yet
Enhanced Mobile Digital Video Broadcasting With Distributed Space-Time Coding
5 pages
Hight Throughput FPGA Implementation of Low-Complexity Detector For High-Rate Spatial Modulation
No ratings yet
Hight Throughput FPGA Implementation of Low-Complexity Detector For High-Rate Spatial Modulation
6 pages
Performance Analysis of MIMO System With Linear MMSE Receiver
No ratings yet
Performance Analysis of MIMO System With Linear MMSE Receiver
6 pages
5 Garc
No ratings yet
5 Garc
16 pages
Design of Adaptive MIMO System Using Linear Dispersion Code
No ratings yet
Design of Adaptive MIMO System Using Linear Dispersion Code
5 pages
57789-Điều văn bản-162562-1-10-20210525
No ratings yet
57789-Điều văn bản-162562-1-10-20210525
16 pages
Proceedings of Spie: An Improved Detection Algorithm For Massive MIMO System
No ratings yet
Proceedings of Spie: An Improved Detection Algorithm For Massive MIMO System
9 pages
Iterative Soft Decision Based Complex K-Best MIMO Decoder
100% (1)
Iterative Soft Decision Based Complex K-Best MIMO Decoder
12 pages
Development of A DFT-Precoding Scheme For Spatially Multiplexed 4G Wireless Communication
No ratings yet
Development of A DFT-Precoding Scheme For Spatially Multiplexed 4G Wireless Communication
4 pages
Channel Hardening-Exploiting Message Passing (CHEMP) Receiver in Large-Scale MIMO Systems
No ratings yet
Channel Hardening-Exploiting Message Passing (CHEMP) Receiver in Large-Scale MIMO Systems
16 pages
15-3 (8102)
No ratings yet
15-3 (8102)
4 pages
Full-Diversity Full-Rate Complex-Field Space-Time Coding
No ratings yet
Full-Diversity Full-Rate Complex-Field Space-Time Coding
15 pages
Precoding-Aided Spatial Modulation With Increased Robustness To Channel Correlations
No ratings yet
Precoding-Aided Spatial Modulation With Increased Robustness To Channel Correlations
4 pages
MIMO Tutorial
No ratings yet
MIMO Tutorial
54 pages
Recovering Signal Energy From The Cyclic Prefix in OFDM
No ratings yet
Recovering Signal Energy From The Cyclic Prefix in OFDM
7 pages
On Maximum-Likelihood Detection and Decoding For Space-Time Coding Systems
No ratings yet
On Maximum-Likelihood Detection and Decoding For Space-Time Coding Systems
8 pages
Detection of Data Symbol in A Massive MIMO Systems For 5G Wireless Communication
No ratings yet
Detection of Data Symbol in A Massive MIMO Systems For 5G Wireless Communication
4 pages
Spatial Modulation
No ratings yet
Spatial Modulation
14 pages
Iterative Data Detection and Channel Estimation For Single-Parity Check-Product Coded MIMO Wireless Communications System
No ratings yet
Iterative Data Detection and Channel Estimation For Single-Parity Check-Product Coded MIMO Wireless Communications System
5 pages
Abstract
No ratings yet
Abstract
9 pages
Deep Learning-Based Channel Estimation With Application To 5G and Beyond Networks
No ratings yet
Deep Learning-Based Channel Estimation With Application To 5G and Beyond Networks
7 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Classification of Acids
100% (1)
Classification of Acids
8 pages
How To Select SET Exam Books?
No ratings yet
How To Select SET Exam Books?
7 pages
Slouchy Shirt: by Anne Weaver at Craft Gossip
100% (1)
Slouchy Shirt: by Anne Weaver at Craft Gossip
15 pages
Experiment 6
No ratings yet
Experiment 6
11 pages
Steam Distillation
100% (1)
Steam Distillation
28 pages
On My Honor, I Have Neither Solicited Nor Received Unauthorized Assistance On This Assignment
No ratings yet
On My Honor, I Have Neither Solicited Nor Received Unauthorized Assistance On This Assignment
6 pages
Trig Note
No ratings yet
Trig Note
5 pages
PFAC30PAR 29nov2019 WR
No ratings yet
PFAC30PAR 29nov2019 WR
2 pages
85214038.pdf (Fructure in Wood) PDF
No ratings yet
85214038.pdf (Fructure in Wood) PDF
11 pages
Sistema de Ozono
No ratings yet
Sistema de Ozono
47 pages
Channel Routing
No ratings yet
Channel Routing
16 pages
Symptoms of Muscle Dysmorphia, Body Dysmorphic Disorder, and Eating Disorders in A Nonclinical Population of Adult Male Weightlifters in Australia
No ratings yet
Symptoms of Muscle Dysmorphia, Body Dysmorphic Disorder, and Eating Disorders in A Nonclinical Population of Adult Male Weightlifters in Australia
9 pages
LTZ Catalogo Ga041 0601gb
No ratings yet
LTZ Catalogo Ga041 0601gb
24 pages
Final Technical Specification For COMPRESSION CONNECTORS For Substation Equipment and
No ratings yet
Final Technical Specification For COMPRESSION CONNECTORS For Substation Equipment and
5 pages
Chemistry Class Test: Target - Bonding - Coordination Compound
No ratings yet
Chemistry Class Test: Target - Bonding - Coordination Compound
4 pages
Mai HL Year 1 Exam 2 (2020-2021)
No ratings yet
Mai HL Year 1 Exam 2 (2020-2021)
12 pages
Department of Oral Medicine and Radiology MCQ 1
No ratings yet
Department of Oral Medicine and Radiology MCQ 1
14 pages
Grammar and Maths Questions
100% (1)
Grammar and Maths Questions
3 pages
4
No ratings yet
4
39 pages
CSE-S4-Syllabus-2K9-Calicut University
No ratings yet
CSE-S4-Syllabus-2K9-Calicut University
21 pages
Orgeas Dumont 2012 Wiley Encyclopedia Composites
No ratings yet
Orgeas Dumont 2012 Wiley Encyclopedia Composites
36 pages
2 Gavrilla
No ratings yet
2 Gavrilla
19 pages
Asset Worksheet Grade V 22-12-23
No ratings yet
Asset Worksheet Grade V 22-12-23
8 pages
UnivPrep Study+Guide BTMG 2021
No ratings yet
UnivPrep Study+Guide BTMG 2021
272 pages
Group Two'S Seminar Work: Topic: Enzyme Regulation Allosteric Regulation and Models Outline
No ratings yet
Group Two'S Seminar Work: Topic: Enzyme Regulation Allosteric Regulation and Models Outline
13 pages
Bangladesh Affair-A Booklet
No ratings yet
Bangladesh Affair-A Booklet
15 pages
NXR4 CX
No ratings yet
NXR4 CX
4 pages
HBDL Peca3001
No ratings yet
HBDL Peca3001
7 pages

10vlsisoc Sphere

Uploaded by

10vlsisoc Sphere

Uploaded by

Area- and Throughput-Optimized VLSI

Architecture of Sphere Decoding

. Fig. 1 illustrates the cor-

is larger than a radius r > 0. We use a

You might also like