0% found this document useful (0 votes)
29 views

Paper - A Reconfigurable Architecture For MIMO Detection Using CORDIC Operator

Uploaded by

Umamaheshwarsoma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Paper - A Reconfigurable Architecture For MIMO Detection Using CORDIC Operator

Uploaded by

Umamaheshwarsoma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A reconfigurable architecture for MIMO

detection using CORDIC operator


Hongzhi Wang, Pierre Leray and Jacques Palicot
IETR/Supelec
Campus de Rennes
Av. de la Boulais, BP 81127
35511 Cesson-sevigne, France
{hongzhi.wang, pierre.leray, jacques.palicot}@supelec.fr

Abstract— An implementation of reconfigurable ar- DSP processors cannot achieve high performance
chitecture for MIMO V-BLAST (Vertical Bell Labo- on throughput in highly parallel 3G/4G applications.
ratories Layered Space-Time) detection based on the The traditional architecture solutions ASIC imple-
square root algorithm is presented. The decoder sup- mentations are the most computationally efficient sys-
ports MIMO system with various number of anten-
tem, but its implementations are not flexible enough
nas, different throughputs and different signal constel-
lations. The decoder architecture is based on various to the wide diversity of the future systems. FPGAs
number of CORDIC operators (COordinate Rotation are widely used in signal processing because of their
DIgital Computer). The system prototype of the de- reconfigurability and support of parallelism. An im-
coder reaches 600Mbit/s data rate on an Xilinx Virtex- plementation of square root algorithm is realized by
II FPGA for a 2 antennas system with a QPSK signal Z.Guo in [5], which is not adaptable to different re-
constellation. quirements of the future system. We will propose
Index Terms—MIMO, V-BLAST, Square Root algo- here a FPGA implementation of the square root al-
rithm, CORDIC. gorithm for V-BLAST detection which is based on
various number of CORDIC operators. We will show
I. I NTRODUCTION that this square root detector is reconfigurable to be
To meet the demand for higher data rates and more adapted to a various number of antennas, different
system capacity without increasing bandwidth, an signal constellations and throughputs.
emerging technology called Multiple-Input Multiple- In this paper we first overview the MIMO detec-
Output (MIMO) has appeared. It is well known that tion techniques in Section II. The square root algo-
an extraordinary spectral efficiency can be achieved rithm is briefly described in section III. The reconfig-
in MIMO system [1]. MIMO is one of the most urable architecture for square root decoder is detailed
promising technologies to improve the performance in section IV. The throughput is analysed in section
of a wireless link. For example, it will be adopted in V. The experimental results and performance analysis
the next phase of the 3GPP (3rd Generation Partner- are provided in Section VI. The conclusions are stated
ship Project) standards in order to further increase the in section VII.
HSDPA (High-Speed Downlink Packet Access) sys-
tem capacity and enhance the quality of Internet and II. OVERVIEW OF MIMO DETECTION
multimedia services. The MIMO system is also the
The multiple antennas system with M transmits an-
candidate to answer the high performance expected
tennas and N ≥ M receive antennas is modeled in
in 4G broadband wireless for future mobile services
baseband by following relation:
[2]. In order to be used in these wireless standards,
future MIMO systems will need to support multiple
r = Hs + v . (1)
air-interfaces and modulation formats. These are the
reasons for the recent interest in reconfigurable archi- In the relation(1), s = [s1 ,s2 ,. . . ,sM ]T is the trans-
tectures to MIMO system. mitted symbol vector, in which each component si is
MIMO decoders are generally implemented us- independently drawn from a complex constellation.
ing DSP (Digital Signal Processing) processors. But The total transmit power is normalized to unity. The
vector r = [r1 ,r2 ,. . . ,rN ]T is the received symbol vec- B2 ) Find a unitary Σ to block upper triangularize
tor and v = [v1 ,v2 ,. . . ,vN ]T is an independently iden- P 1/2 :
tically distributed (i.i.d) complex zero-mean Gaussian "
1/2
#
noise vector with variance σ 2 per dimension. The el- 1/2 Pi−1 ×(i−1 )×1
Pi Σi = (3)
ements hij represent complex channel gain between 0 pi
the j-th transmit antenna and the i-th receive antenna.
B3 ) Update Qa to Qa Σi , the nulling vector for the
These path gains are modeled with zero mean and 0.5
i-th signal is given by
variance independent complex Gaussian random vari-
ables per dimension. The channel characteristics are ∗
wi = pi qα,i (4)
not changed during the transmission period of an en-
∗ is the i-th column of Q ∗ .
where qα,i
tire frame in accordance with the quasi-static flat fad- a
ing assumption. B4 ) Compute yi = wi r, and then the i-th transmitted
In various MIMO detection algorithms, the com- signal in s is detected as the closest point in the signal
plexity of the optimal ML (maximum likelihood) de- constellation.
tector is too huge to be implemented for a system with B5 ) Cancel the interferences of the detected signal
a large number of antennas and a large signal constel- in the remaining received signal s:
lation size. The sphere detector has more complex-
r = r − si (H )i (5)
ity than the V-BLAST square root detector [5]. The
linear detector like MMSE (Minimum Mean Squared 1/2
B6 ) Go back to the step B1 , but now with Pi−1 and
Error) and ZF (Zero-Forcing) is poor in BER (bit er-
Qα,i−1 (the first i-1 columns of Qa ).
ror rate) performance. Hence the square root detector
is an attractive solution to obtain a high performance
with reasonable complexity. IV. A RCHITECTURE

H Qa
M1 M2 M3
III. D ECODING ALGORITHM Unitary
transformation 4 i
Unitary
transformation ¦i
Unitary
transformation ¦i
input data P1/2
r
The V-BLAST square root algorithm is proposed in pi Q*a
M6 si M5 wi M4
[4], which successfully avoids the repeated pseudo- ri Interferences Calculation of Calculation of
Cancellation yi and si nulling vectors
inverse and matrix inverse computations by using
output data
unitary transformations. The computational cost is
reduced effectively from O(M 4 ) to O(M 3 ) without Fig. 1. Block diagram of square root decoder architecture
degradation in BER performance, where M is the
number of transmit antennas. The whole algorithm The architecture of the MIMO square root decoder
is described in the following steps: is illustrated in figure 1. It consists of 6 processing
A) Compute P 1/2 and Qa : for i= 1, 2,. . . ,N: modules. The values of matrix channel H, r are as-
sumed to have been pre-calculated. The three first
modules(M1 ,M2 ,M3 ) use unitary transformations to
(H )i P M
   
1 ×M
i−1 × 01 ×M compute P 1/2 (Step A), Qa (Step A), pi (Step B2 )
 M ×1 M ×M M ×M 
 0 Pi−1  Θi = ×Pi (2)
 
∗ (Step B ) by employing various numbers of
and qa,i 3

N ×1 N ×M N ×M
−ei Qi−1 × Qi CORDIC. The following module (M4 ) calculates the
optimal ordering and nulling vectors wi (Step B3 ).
In this relation, P0 = βI , Q0 = 0N ×M , ei is the i-th Module M5 compute the transmitted symbol vector
unit vector of dimension N, Θi is any unitary transfor- (Step B4 ). The last module(M6 ) performs interfer-
mation that block lower triangularizes the pre-array ences cancellation (Step B5 ).
and × is the ignored result. After N steps, we obtain: The modules (M1 ,M2 ,M3 ) have the similar archi-
P 1/2 = PN and Qα = QN . tecture [6]. Instead of the conventional QR triangu-
B) Determine the optimal ordering and nulling vec- lar array which employs too high number of proces-
tors:for i=M, M-1,. . . ,1: sors, unitary transformations are used in these mod-
B1 )Find the minimum length row of P 1/2 and per- ules. Unitary transformations are performed by a se-
mute it to be the last (Mth) row. Permute s accord- quence of numerically stable complex Givens rota-
ingly. tions which is suitable for implementation because
the hardware elementary is based on CORDIC in of CORDIC. On the other hand, if the throughput re-
which only shifters and adder are involved [3]. It re- quirement is not crucial, the number of CORDIC can
duces the computational complexity significantly. be decreased by a single CORDIC.
We take an example, the calculation of P 1/2 and
Qa (first iteration) in the module M1 , to show how (a) 3 parallel CORDIC

to design different architectures for various through-


puts by changing number of CORDIC operators. This ș1 Ø1 Ø1 Ø2 ș3 ș4 Ø3 Ø3 Ø4 Ø4

iteration requires 29 CORDIC operations which are ș2 Ø1 Ø2 Ø2 ș3 ș4 Ø3 Ø3 Ø4 Ø4

illusted in figure 2. The angles(θ1 , θ2 , φ1 , φ2 ) for Ø1 Ø2 Ø2 ș3 Ø3 Ø3 Ø4 Ø4 Ø4

CORDIC operations are given by the elements of 1 2 3 4 5 6 7 8 9 10


equation (2). The elements of equation (2) are passed N cycles
by column to CORDIC operator which performes (b) 5 parallel CORDIC
Givens rotations. Then the products are stocked in ș1 Ø1 Ø2 ș3 Ø3 Ø4 Ø4

the buffer waiting to be passed again to CORDIC op- ș2 Ø1 Ø2 ș3 Ø3 Ø4 Ø4

erator. This complets an iteration. After N iterations, Ø1 Ø2 ș3 Ø3 Ø4 Ø4

0 0 1/2 0
the module output becomes P and Qa0. 0 0 Ø1 Ø2 ș4 Ø3 Ø4

Ø2 ș4 Ø3 Ø4

1/2
P0 and Q0
1 2 3 4 5 6 7
N cycles
ș1 ș2
0 0 0 0 Fig. 3. Different number of parallel CORDIC for different
Ø1 Ø1 Ø1 Ø1 throughput
-1 0
0 0 0 0
Ø2 Ø2 Ø2 Ø2 Ø2 All iterations of CORDIC algorithm are performed
0 in parallel, using a pipelined structure, as shown in
ș3 ș3 ș4 ș4 ș3 figure 4. The pipeline structure ensures the highest
0
0 throughput possible, because a CORDIC transforma-
Ø3 Ø3 Ø3 Ø3 Ø3 Ø3 tion can be performed each clock cycle.
-1 0 0 0 0 0 0 0 x y a

Ø4 Ø4 Ø4 Ø4 Ø4 Ø4 Ø4
Cordic_pipe

Register xi yi ai constanti
P1/2 and Qa
Signe of
Cordic_pipe
yi or ai
+/- +/- +/-
Fig. 2. 29 CORDIC operations required for the calculation of Register
1/2 Register Register Register
P1 and Qa (first iteration) in the module M1
Cordic_pipe xi+1 yi+1 ai+1

We propose a structure in which the number of Register

CORDIC is adaptable depending on the throughput


x’ y’ a’
required and the number of antennas. A total par-
allel structure may lead to a waste of computational Fig. 4. CORDIC unit diagram
capabilities, since the channel data changes slower
than the received symbol data. Therefore several The last three modules (M4 ,M5 ,M6 ) are based on
CORDIC operators are used iteratively to optimize PE (Processor Elementary). Every PE unit consists
the resources. of a multiplier-accumulation unit, a adder-subtractor
We compare here two architecture A1 (3 parallel and a buffer. The throughput can be improved by par-
CORDIC) and A2 (5 parallel CORDIC). The organi- alleling several modules.
zation of calculations is showed in the figure 3. Ten
cycles are required to complete the computation by
A1 . But the same computation can be performed V. T HROUGHPUT ANALYSIS
in seven cycles by A2 . The throughput is increased Different architectures are designed for various
1.5 times. In contrary, A2 takes more surface of throughputs, as shown in the section IV. We will de-
FPGA architecture than A1 . The detecting through- duce the relation between the number of CORDIC op-
put can be improved further by increasing the number erators and the throughput required in this section.
The total number of CORDIC operators (NCtotal ) of controller becomes important, when the number of
depends on the number of transmit and receive an- CORDIC is decreased. The throughput obtained is
tennas in a MIMO system, that is NCtotal =f(M,N). widely superior to the requirements of current stan-
We define one frame as Nsymbols/H symbols at a fre- dards. For instant, the emerging IEEE 802.11n stan-
quence Fs . Under the assumption of channel sta- dard requires a data rate of 150 Mbits/s. In that case
tionarity during the frame, the minimal number of the number of CORDIC operator can be still reduced.
CORDIC operators to keep the same throughput is
TABLE I
obtained as following:
S YNTHESIS RESULTS OF MIMO SQUARE ROOT DECODER
N Ctotal × Fs
N Cminimal = (6) Target FPGA Xilinx Virtex
F req × Nsymbols/H
Number of CORDIC 50 16 8
The throughput of square root decoder for a MIMO Number of slices 29036 14380 9936
system with M transmit and N receive antennas is de- Max.Freq (MHz) 148.6 148.6 148.6
termined as: Throughput (Mbits/s) 600 600 300

N Cused
T hroughput = (F req ×N ×b)( ) (7)
N Cminimal
In the equation(7), Freq represents the clock fre- VII. C ONCLUSIONS
quency of CORDIC, b is the bit per symbol, NCused A reconfigurable square root decoder for MIMO
is the used number of CORDIC. The relation of system has been designed and implemented on a
throughput and number of CORDIC is illustrated in FPGA. It is attractive for the future wireless appli-
figure 5. The other factors are considered as constants cations, supporting different antenna sizes, different
in a same MIMO system. modulation schemes and different throughputs. The
architecture is adaptable by employing different num-
Throughput bers of CORDIC. The CORDIC operator can be also
used like a common operator for the SDR applica-
Max.
tions [7]. The architectures of modules are defined
and synthesized individually by Xilinx software tool
Required [8]. Future works will carry on managing dynamic
reconfiguration of this decoder.

NCused NCminimal NCtotal Number of


R EFERENCES
CORDIC [1] G. J. Foshini: “Layered space-time architecture for wireless
communication in a fading environment when using multi-
Fig. 5. Relation between throughput and number of CORDIC element antennas,” Bell Labs Technical Journal, pages 41-
57, Autumn 1996.
[2] J. Hu, W.Lu: “Open wireless architecture - the core to 4G
mobile communications, Communication Technology Pro-
VI. E XPERIMENTAL RESULTS
ceedings,” ICCT 2003, Volume: 2, pp:1337 - 1342 vol.2, 9-11
The decoder for 2 antennas system with QPSK April 2003.
signal constellation is designed in VHDL, simulated [3] R.Andraka: “A Survey of CORDIC Algorithms for FPGAs,”
FPGA ’98. Proceedings of the 1998 ACM/SIGDA sixth in-
with Modelsim. It is implemented and tested on
ternational symposium on Field programmable gate arrays,
Xilinx Virtex-II FPGA. Table I shows synthesis re- Feb. 22-24, 1998, Monterey, CA.
sults of different architectures with various number [4] B. Hassibi: “An efficient square-root algorithm for BLAST,”
of CORDIC operators. It can operate at 148.6 MHz. http://mars.bell-labs.com/.
The frequence Fs is considered as the same as Freq. [5] Z. Guo and P. Nilsson: “A VLSI implementation of MIMO
detection for future wireless communications,”Proc. IEEE
The first architecture have a total parallel structure PIMRC’03, vol. 3, pp. 2852-2856, 2003.
which wastes some operators to compute the same [6] H. Wang, P. Leray and J. Palicot: “A reconfigurable architec-
channel matrix. The minimal number of CORDIC is ture for MIMO Square Root Decoder,”International Work-
16. When the number of CORDIC is reduced to 8, the shop on Applied Reconfigurable Computing (ARC2006).
[7] J. Palicot, C. Roland: “FFT : A basic function for a reconfig-
throughput is reduced 2 times smaller. But the num- urable receiver,” ICT, Papeete, Tahiti, february 2003.
ber of slices in FPGA architecture is not reduced pro- [8] http://www.xilinx.com/support/mysupport.htm.
portionally to number of CORDIC, because the size

You might also like