0% found this document useful (0 votes)
14 views

Design and Simulation of 5G Massive MIMO Kernel Algorithm On SIMD Vector Processor

This document discusses the design and simulation of 5G massive MIMO kernel algorithms on SIMD vector processors. It begins with an introduction to massive MIMO and its role in 5G communications. It then discusses channel estimation algorithms like LS and MMSE, as well as detection algorithms like ZF and MMSE. The main challenge is the complexity of matrix computations for massive MIMO. To address this, the paper proposes a parallel processing scheme using Gauss-Jordan elimination on a SIMD vector processor to optimize matrix inversion speed, which is crucial for 5G channel estimation and detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Design and Simulation of 5G Massive MIMO Kernel Algorithm On SIMD Vector Processor

This document discusses the design and simulation of 5G massive MIMO kernel algorithms on SIMD vector processors. It begins with an introduction to massive MIMO and its role in 5G communications. It then discusses channel estimation algorithms like LS and MMSE, as well as detection algorithms like ZF and MMSE. The main challenge is the complexity of matrix computations for massive MIMO. To address this, the paper proposes a parallel processing scheme using Gauss-Jordan elimination on a SIMD vector processor to optimize matrix inversion speed, which is crucial for 5G channel estimation and detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SPACES-2018, Dept.

of ECE, K L Deemed to be UNIVERSITY

Design and Simulation Of 5G Massive MIMO


Kernel Algorithm On SIMD Vector Processor
Sadineni Sivakrishna
Ravi Sekhar Yarrabothu
Department of ECE
Department of ECE
Vignan’s University
Vignan’s University
Guntur, India
Guntur, India
[email protected]
[email protected]

Abstract — In cellular communications, recently Multi In network allows the transmission and reception of multiple data
Multi Out (MIMO) and Massive MIMO research is getting signals simultaneously over a single radio channel as shown in
attention for the need of high data rates in Long Term Evolution figure 1. Standard MIMO networks generally use two are four
Advanced(LTE-A) and 5G Communications. In MIMO baseband antennas but massive MIMO uses more number of antennas.
signal processing at physical layer, both the channel estimation
and the detection algorithms play a crucial role. In this paper it is
discussed the estimation algorithms least square (LS) and
minimum mean square error (MMSE) and the channel detection 1 1
1
algorithms Zero Forcing (ZF) and MMSE. Currently none of the 1
channel estimation algorithms of LTE-A offers twin advantages 1

of low battery consumption and very low latency, which is a key


1

2
requirement of 5G. It is expected the massive MIMO with 128 or Base
more antennas will be a norm at 5G base stations. To achieve the
station
ultra-low latency, the matrix computations for massive MIMO
are the very big bottleneck in realizing the channel estimation 2
and massive MIMO detection algorithms. For the optimization of
the 5G Massive MIMO channel estimation and detection
algorithms, the prerequisite is massive complex matrix inversion
speed. In this paper, a parallel processing based coding scheme is
proposed by using Gauss-Jordan elimination kernel algorithm on
a single instruction multiple data (SIMD) stream vector 1

processor to realize a complex matrix inverse with optimum N 1

N
speed which is the need of 5G channel estimation and detection.
Fig. 1. Massive MIMO system
Keywords—5G, MIMO, Massive MIMO, LS, MMSE, ZF, Already standard MIMO principles are used in numerous
SIMD
Wi-Fi and LTE standards, and massive MIMO is one of the
I. INTRODUCTION key technologies for the success of 5G cellular
communications. The main challenge of MIMO signal
The mobile communication technology has developed from
processing is the complexity involved in the channel
the First Generation (1G) mobile phone networks, only
estimation and detection.
analogue voice to Fourth Generation(4G), both digital voice,
data, video and IoT Traffic[1]. 5G Wireless communication Single Input and Multiple Data (SIMD) instruction
systems are getting developed and the big challenges for the processing is one of the best type of parallel processing [4].
design and deployment of 5G cellular system are reducing The main scheme of SIMD processor is to apply the same
power consumption, ultra low latency, ultra high data rates sequence of data to a huge number of distinct data streams.
and increased compatibility between the IoT devices. The SIMD processor, each instruction uses the number of
processing elements (PEs) as shown in figure 2.
LTE Advanced is the one of the major step in the evolution
of our LTE networks towards 5G. The introduced key SIMD processors have mainly two types one is array
technologies in LTE-A are carrier aggregation, enhanced use processor and second one is vector processor. An array
of multiple antenna elements mapped. In the current paper, the processor is works on time based, that means multiple data
main focus is on Multiple-input multiple-output elements simultaneously. A vector processor operates
(MIMO)[2][3] for 5G. multiple data elements in successive time steps.
Massive MIMO: Generally MIMO systems utilize multiple
antennas that are located at both the source and destination. This paper is presented in four sections. The section II,
While it involves multiple technologies, MIMO can describes channel estimation and detection for the MIMO
essentially consist of a simple principle where wireless system, Section III talks about the Implementation of Massive

53
SPACES-2018, Dept. of ECE, K L Deemed to be UNIVERSITY

MIMO matrix inversion, Section IV discusses the results and where is the channel autocorrelation matrix at the
finally conclusions are made in section V pilot symbol position and is the cross correlation matrix
between the channel at the data symbol position and pilot
symbol position.
Instruction
B. MIMO Detection Algorithms
In MIMO detection, the detector compares the estimation of
PE the transmitted signal based on the received signal and
calculates the estimated channel matrix. The transmitted signal
PE is recovered from the receiver signal as an output of the
PE
detector [9], once the estimation and calculation of the channel
Data Pool matrix is done. Two algorithms are used for detect the signal
PE they are ZF detection and MMSE detection.
ZF detection: This algorithm is the easiest one with the least
computational complexity. ZF detection starts with
PE multiplication of the received symbol vector and the channel
matrix[10], [11].
Fig.2. SIMD processor

II. MIMO CHANNEL ESTIMATION AND DETECTION


One of the biggest issue in ZF detection is the sudden noise
Channel estimation is estimating the channel in current place
enhancement along with received signal.
based on the observed data. In my project pilot-aided[5]
method is used for channel estimation. Channel estimation is MMSE detection: Deals with the issues of ZF and MMSE, by
performed by comparing the transmitted and received pilot minimizing the mean square error. The minimum mean square
signals. A pilot provide reference signal used by both error equalization matrix it represented as follows:
transmitter and receiver and the reference signal is
demodulation reference signal (DMRS)[6]. Both the base
station and user equipment (UE) knows the reference signal
which is provided by the pilot signals. LTE-A uplink By comparing both the algorithms the noise variation and
processing uses two reference signals, one is known as DMRS, decrease in noise enhancement, are seen as the advantage, but
which is used for the data reception and another one is for at the cost of increase in computational complexity.
scheduling and link adaption known as sounding reference
signal (SRS) [7] used. The matrix inversion takes places both in the MIMO channel
estimation and detection algorithms and the normal inversion
A. Channel Estimation Algorithms process takes a more time and causes high latency. To
Two algorithms are used for channel estimation for LTE-A overcome this latency issue, in the current paper, it is proposed
uplink physical processing. One is Least square (LS) and to use SIMD processor with a 5G massive MIMO kernel
second one is minimum mean square error (MMSE) algorithm.
Least square: It is one of the simple algorithm for the III. DESIGN OF MASSIVE MIMO MATRIX INVERSION KERNEL
channel estimation. LS is considered as a minimum complexity ALGORITHM ON SIMD PROCESSOR
algorithm [7] In massive MIMO systems, the huge matrix inversion
computations in the channel estimation and detection are the
biggest challenge for the researchers. In this paper, a fast
complex matrix inversion algorithm is proposed on SIMD
In above equation is the Hermitian transpose of instruction processor. First select a suitable algorithm for
frequency demine transmitted pilot signal matrix inversion, this algorithm should apt for the SIMD
architecture. In this paper Gauss-Jordan Elimination [12]
Minimum mean square error: MMSE is a one of the best method is used as it provides the low computational
algorithm and most used one, since the effect of noise is taken complexity and exceptional accuracy, at the same time its data
in to account. However, the high computational complexity is access and storage modes are quite appropriate for the
the major disadvantage of the MMSE algorithm [8] parallelism of SIMD. The flow chart of the massive MIMO
matrix inversion algorithm is as shown in figure 3.
For the 5G kernel algorithm, Gauss-Jordan Elimination method
is used for performing the matrix inversion and it is described
as below:

54
SPACES-2018, Dept. of ECE, K L Deemed to be UNIVERSITY

x Select pivot, note the positioned row and column of the processor carry out an ergodic access which means every
pivot. element from the earliest row to the last row is accessed by the
x Interchange the column and row processor, with the aim of selecting the pivot of each row. In
x Calculate the reciprocal of the pivot, and then make the mode2, to interchange the row and column depending on
linear transformation of row/column exact pivot position. In mode3, every iteration of the complex
matrix inversion algorithm analyzes the outmost loop of
x Exchange row and column, and recommence pivot
computation and not access the row of the current pivot. In
location selection mode4 same operation of mode 3(row access) performs but it
A. SIMD instruction mapping: will in column access. Mode5 will performs data will allocate
After examined the working of matrix inversion algorithm and
to SIMD.
verifying the precision of algorithm's, to map this algorithm to SIMD B. Overall data allocation
instructions. For each computation of the matrix inverse algorithm
the SIMD instruction mapping as given below: Two vector memories of the SIMD processor are allocated
for the calculation of matrix inversion. Some of the
Start intermediate steps are present they performs reciprocal,
complex number multiplication, and subtraction needed to be
stored in vector registers. Data allocation in SIMD is as shown
in figure 4.
Analysis of the algorithm
Main memory

To evaluate precision into the matrix inersion


algorithm
Output
matrix
Data buffer
Map SIMD instruction for matrix inversion for Gauss- Data buffer
computation Jordan for
elimination row/column
exchange
Analysis of data access modes Input matrix Reference
(permuted) row
Data allocation
Pivot
selection

Computing cost over the SIMD processor Data for 1/x


Polynomial
calculation
coefficients
Intermediate
End results

Fig .3. Flow Chart of the massive MIMO matrix inversion


algorithm Fig.4. Data allocation for SIMD

Select pivot: The algorithm selects the maximum value of the In the figure 4, the main memory has store the data of
complex element as the pivot in each row. The maximum value original input matrix and output matrix. After out of use
permutation the vector memory1 is used to store the input
of the pivot, which can be resulted from using TMAC2
matrix. In the input matrix to calculate the square of the
instruction repeatedly. complex number, select pivot and reciprocal values are stored
Reciprocal: By using the parallel polynomial estimation in polynomial coefficients. To exchange the row/column
method to calculate the reciprocal of the complex number. exchange in data buffer and register buffer are used to calculate
Here also use TMAC2 the reciprocal of complex number. In the vector memory2 the
Linear transformation of row and column: CMAC and reference memory is used for gauss-Jordan elimination and
CMUL instructions are used, to swap the row and column by data buffer is storing the results of elimination of every row.
means of the multiplication and subtraction of the parallel Finally the output matrix is in vector memory2 in SIMD
complex numbers. processor.
Data access modes: In the matrix inversion algorithm we
design five types of accessing modes are used. In the mode1

55
SPACES-2018, Dept. of ECE, K L Deemed to be UNIVERSITY

IV. RESULTS AND ANALYSIS Table 2. Computing instruction on SIMD processor


In this section, estimation of the computational cost of the Instruction 4way 8way 16way
SIMD implementation is discussed and analyzed statistically. ADD 4 additions 8 additions 16 additions
Generally, measurement of complex matrix inversion MUL 4 8 16
multiplications multiplications multiplications
algorithm can be classified into six elements: add/subtract,
CMUL 4 complex 8 complex 16 complex
multiples, conjugate/reciprocals, row column exchange, multiplications multiplications multiplications
comparison and absolute values. The computational TMAC2 4 8 8
complexity of original algorithm was calculated from analysis multiplications, multiplications, multiplications,
and statistics of the computation for NxN matrix. 2 additions 4 additions 4 additions
CMAC 4 complex 8 complex 16 complex
The measurement of computational complexity of multiplications, multiplications, multiplications,
add/subtract for complex matrix inversion algorithm is 4 complex 8 complex 416 complex
additions additions additions
multiples is -K, reciprocal is K, row/column TMAX 3 comparisons, 7comparisons, 15
exchange is approximately equal to and the comparisons 3 selections 7 selections comparisons,
and absolute values are equal to . The computational cost 15 selections
of Gauss-Jordan algorithm is shown below Tables 1(a) and
1(b). For example a MUL instruction can execute 4, 8 or 16
Table.1(a) Computational cost for the Gauss-Jordan multiplications in a 4, 8 or 16 -way parallel process
algorithm respectively. Since this feature of parallelism exists in a SIMD
vector processor, faster computation of the matrix inversion is
Complex 8x8 16x16 32x32 possible.
matrix
inversion Apart from the normal computation part, SIMD has overhead
Add/sub 392 3600 30725
which consist of 3 elements - control, data movement, and
multiplies 504 4080 32736
reciprocal 8 16 32 dependency. The computation cost of execution on SIMD
Row/column 128 2048 entails overhead and computation part.
exchanges
comparison 86 683 5462 Table 3: SIMD cost estimation in cycles
Absolute 86 683 5462 Antennas 4-way 8-way 16-way
values 955 796 717
8x8
total 1203 76490 16x16 3561 2411 1756
32x32 19909 11209 6619
64x64 140157 72581 38233
Table.1(b) Computational cost for the Gauss-Jordan 128x128 1074925 542461 275029
algorithm 256x256 8474061 4247021 2131021
Complex 64x64 128x128 256x256
matrix In the table 3, 16-way parallel fixed point instructions
inversion have taken lowest cost, since an instruction can perform
Add/sub 254016 2064512 16646400 parallel operations up to 16 in a single SIMD instruction.
multiplies 262080 2097024 16776960 To put emphasis on the very low latency brought by the SIMD
reciprocal 64 128 256 vector processor, it is defined a bandwidth of 1500MHz, and
Row/column 32768 131072 cycles converted into seconds. Comparison of original
exchanges algorithm and extended SIMD algorithm using 16-way parallel
comparison 43692 349526 2796203 process in time as shown in table 4
Absolute 43692 349526 2796203 Table 4. Comparison of Original and 16-way extended
values algorithm.
total 611734 4893483 39147094
Matrix size Original 16-way SIMD
algorithm (μs) extended
algorithm (μs)
Once the computational cost of the original complex matrix 8x8 0.802 0.478
algorithm is known, the estimation of the cost for this 16x16 6.3826 1.1706
algorithm with SIMD extensions is done. Generally, SIMD 32x32 50.993 4.4126
vector processor includes 4/8/16-way parallel fixed -point 64x64 407.822 25.488
instructions. By using these instructions we will estimate the 128x128 3262.322 183.352
cost of SIMD. There are six SIMD computational instructions; 256x256 26,098.062 1420.680
they are ADD, MUL, CMUL, TMAC2, CMAC, and TMAX. .
The SIMD instructions and there equivalent operation count From table 4. It is observed that for 8x8 MIMO the
are shown in the table 2 advantage is around 40% faster, by using the 5G kernel

56
SPACES-2018, Dept. of ECE, K L Deemed to be UNIVERSITY

algorithm with Gauss-Jordan algorithm. It is also observed that REFERENCES


as we increase the order of MIMO the gain increases [1] Agilent Technologies, “Introducing LTE-Advanced”, March 8, 2011.
exponential take an example of 256x256 MIMO; it has a gain [Online].Available: http://cp.literature.agilent.com/litweb/pdf/5990-
of 18.4 times faster computational speed, which is a huge 6706EN.pdf [Accessed:March 20, 2014].
benefit for the 5G massive MIMO signal processing. As we [2] Edward Kasem and Jan Prokopec, “The evolution of LTE to LTE-
increase the order of MIMO, the efficiency of the algorithm Advanced and thecorresponding changes in the uplink reference
increases. 5G networks are going to employ massive MIMO signals”, Elektrorecvue, ISSN 1213-1539, June 2012.
going forward, the proposed algorithm will help in reducing the [3] Edward Kasenm, Roman Marsalek, and Jiri Blumenstein, “Performance
of LTE Advanced Uplink in a Flat Rayleigh Channel”, 2013 Advances
MIMO signal processing latency significantly. in Electrical and electronic Engineering, Information and
Communication Technologies and Services, Volume: 11, p266,
V. CONCLUSIONS September 2013.
This paper discusses about the Massive MIMO detection and [4] Martin Henkel, Christoph Schilling, Wolfgang Schroer, “Comparison of
estimation algorithms and identifying the operation which Channel Estimation methods for pilot aided OFDM systems”, IEEE,
2007
causes the latency in baseband signal processing. Matrix [5] Michael Sung, “SIMD Parallel Processing”, MIT computer science and
inversion is one such operation, which increases the latency. artificial intelligence laboratory, Feb 22, 2000.
To increase the computational speed, SIMD technique is used [6] 3GPP, “Evolved Universal Terrestrial Radio Access (E-UTRA):
to shorten and speed up matrix inversion time. With this Physical channels and modulation”, TS 36.211, Re.12.3.0, 2014.
technique, the 5G Kernel algorithm of channel estimation and [7] Xiao-lin Hou, Hidetoshi Kayama, “Demodulation Reference Signal
MIMO detection will achieve the very low latency. In this Design and channel estimation for LTE-Advanced Uplink”, Advances in
Vehicular NetworkingTechnologies, InTech,2011.
paper, it is designed and implemented very much fine tuned
[8] Michal Simko, Di Wu, Christian Mehlfuhrer, Johan Eilert, and Dake
parallel algorithm on a single processor core, by taking Liu,“Implementation Aspects of channel estimation for 3GPP LTE
advantage of SIMD data-level parallelism. It is proven that for Terminals”, In: 11thEuropean Wireless conference 2011 – sustainable
as the order of MIMO increase the performance also increased wireless technologies. Vienana,2011, pp. 1-5.
exponentially. As a future work using Gauss elimination [9] Markus Myllyla and Johanna Ketonen, “MIMO detector algorithms and
method for matrix inversion for Massive MIMO signal their implementations for LTE/LTE-A”, GIGA seminar, November , 1st,
2010.
processing can be considered as it is already proven for higher
[10] DericW.Waters, “Signal Detection Strategies and Algorithms for
order matrix inversion Gauss elimination method is better. Multiple-Input Multiple-Output Channels”, Doctoral thesis, Georgia
Institute of Technology, December 2005.
[11] Cheng-yu Huang and Wei-Ho Chung, “An Improved MMSE-Based
MIMO detection using Low-Complexity Constellation Search”, IEEE
Globecom 2010 Workshop on Broadband Wireless Access, 2010.
[12] “Gauss-Jordan Elimination Method”, [Online]. Available:
http://pages.pacificcoast.net/~cazelais/251/ gauss jordan.pdf. [Accessed:
5-Sep-2014].

57

You might also like