0% found this document useful (0 votes)
1 views

Distributed self optimization techniques for heterogeneous network environments using active antenna tilt systems

This paper discusses the use of active antenna systems in 5G networks for self-optimizing network performance through distributed self-optimization techniques. It compares the performance of simple stochastic cellular learning automata (SCLA) with more complex Q-learning methods, revealing that SCLA performs more robustly despite its simplicity. The study emphasizes the importance of input variables in enhancing performance while maintaining system agility and scalability in complex network environments.

Uploaded by

Moazzam Tiwana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Distributed self optimization techniques for heterogeneous network environments using active antenna tilt systems

This paper discusses the use of active antenna systems in 5G networks for self-optimizing network performance through distributed self-optimization techniques. It compares the performance of simple stochastic cellular learning automata (SCLA) with more complex Q-learning methods, revealing that SCLA performs more robustly despite its simplicity. The study emphasizes the importance of input variables in enhancing performance while maintaining system agility and scalability in complex network environments.

Uploaded by

Moazzam Tiwana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Telecommunication Systems

https://doi.org/10.1007/s11235-018-0494-5

Distributed self optimization techniques for heterogeneous network


environments using active antenna tilt systems
Muhammad Nauman Qureshi1 · Moazzam Islam Tiwana1 · Majed Haddad2

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract
Active antenna systems in 4G and upcoming 5G networks offer the ability to electronically steer an antenna beam in any desired
direction. This unique feature makes them a suitable candidate for realizing self organizing network (SON) architectures in
5G for optimizing of key performance indicators like throughput, file transfer time etc. In this paper, we aim to analyse the
effect of increasing number of input variables and complexity of learning techniques on the performance of the network. We
compare performance of simple stochastic cellular learning automata (SCLA) technique with only one input to comparatively
complex Q-learning technique with two or more inputs. We use FTP flow based 5G network simulator for our work. The SON
architecture model proposed, is distributed with optional inter cell communication. Simulation results reveal that increasing
complexity of learning process does not necessarily benefit the system performance. The simple SCLA technique shows
more robust performance compared to Q-learning case. However, within the same technique increasing the number of input
variables does benefit the system, indicating that a complex technique can ultimately prove beneficial in complicated scenarios
provided it is able to quickly process and adapt to the environment.

Keywords 5G networks · Heterogeneous network · Self organizing networks · Antenna tilt · Reinforcement learning ·
Q learning · Self-optimization

1 Introduction Partnership Project (3GPP) brought the concept of SON start-


ing from 4th generation networks [4]. SON has its roots in
Future cellular networks are becoming exceedingly com- artificial intelligence and machine learning [5]. Characteris-
plex due to increasing number of users, requirement of high tics of a SON given in [6] are (i) scalability, (ii) stability and
bandwidth multimedia services, changing propagation con- (iii) agility. SON aims to achieve a fully cognitive autonomic
ditions and bandwidth limitation [1]. Several solutions have networking [4,5]. Such an intelligent network can lower the
been presented to address this increasing demand of capac- operational and maintenance costs of the network while sat-
ity and coverage like Intelligent Distributed Antenna System isfying cellular needs of future users [6]. Our work falls under
(IDAS), small cells, relays, etc. [2,3]. As these new network the umbrella of SON in 4G and 5G networks.
elements add to the network, it makes network management SON comprises of three distinct areas namely; self con-
and operations difficult, requiring intelligent and autonomous figuration, self optimization and self healing [6,7]. In self
behaviour to reduce manual intervention. 3rd Generation optimization, there are three active area of research i.e., load
balancing, capacity and coverage enhancement, and interfer-
B Muhammad Nauman Qureshi ence control [6]. Coverage adaptation is one use case of load
[email protected] balancing in which remotely controllable eNodeB (eNB)
Moazzam Islam Tiwana antenna radiation patterns can play an important role [8].
[email protected] AAS are advanced antenna structure that have active compo-
Majed Haddad nents such as tunable capacitors and/or microwave switches.
[email protected] They can be used to provide effective mechanisms to solve
1
these challenges of 4G and 5G networks [9]. In this paper,
Department of Electrical Engineering, COMSATS University,
Chak Shahzad, Islamabad, Pakistan we look into optimizing antenna tilt by exploiting remote
2
electronic tilt mechanism of AAS. Antenna tilt has a direct
University of Avignon, Avignon, France

123
M. N. Qureshi et al.

bearing on the eNB radiation pattern and can be effectively sharing with neighbouring cell for enhanced learning where
used to improve the Quality of Service (QoS) for network possible.
users [10]. Intelligent scheduling of physical resources com- Antenna tilt optimization strategies based on distributed
prising of available channels, transmit power, etc. in time and hybrid SON approaches are presented in [17–19]. In [17],
domain can also be used to address load balancing issues, the antenna tilt selection process in the network is devised as
but are beyond the scope of this work [6,11]. a throughput utility fairness problem to maximize user util-
In this paper, we compare different Reinforced Learning ity in terms of its recorded throughput. The authors showed
(RL) methods based on their complexity and input infor- performance gain but under high SINR regime conditions
mation they can process and learn. Optimization problem and mandatory information sharing among all base stations,
in SON can be formulated using different machine learning which makes the solution less practicable in real world. Our
techniques like Support Vector Machines (SVM), artificial proposed solution makes no assumption on SINR regime
neural networks (ANN), random forests, self organizing and neighbourhood information exchange is preferable but
maps, etc. through an incremental learning process [12–15]. not mandatory. In [18], the authors based their antenna tilt
Specifically, we choose SCLA [16] and compared it with Q- optimization strategy on fuzzy RL that works in a fully dis-
learning methods with two or more input variables apart from tributed asynchronous and autonomous way without prior
the same action and feedback. SCLA is a simple RL tech- network environment information. The approach is appre-
nique that depends only on action taken and feedback from ciable but the application of fuzzy step to reduce the state
the environment. Q-learning can learn from a number of input space of the network also scales down its flexibility for com-
variables apart from the action and environment feedback plex environment scenarios. We believe future network will
compared to SCLA. We also investigate the effect of increas- continue to increase in complexities, so SON models need to
ing the number of input variables on Q-learning. The SON have the capacity to scale while remaining agile to best match
architecture design is fully distributed with optional neigh- network requirements—a precondition we have assumed in
bourhood cell cooperation based on simple inter-cell distance our work. The most relevant paper in comparison to our
criteria. The cooperation is meant for sharing past decisions work is [19]. The authors in [19], proposed a distributed
and learning experience between cells to better predict future SON architecture with communication and coordination link
course of action. To achieve quick adaptability to chang- between eNBs. RL algorithm is used in every eNB to opti-
ing network environment we have selected epsilon greedy mize antenna tilt. The approach shows network performance
strategy for RL learning. Epsilon greedy strategy combines improvements but suffers from two issues; (a) exploration
exploration and exploitation. It also keeps the model open and exploitation stages of learning process are done sepa-
to exploration with a predefined probability. We find that rately, thereby consuming time to learn and adapt and (b)
the SCLA technique proves to be more robust and better all eNBs in the network are not optimized simultaneously,
able to react to environment dynamics as compared to Q- but one after the other making it less scalable. Thus in the
learning techniques due to its simple design. Additionally, context of SON, the approach fails to achieve desired char-
we find that within the same RL technique i.e., Q-learning acteristics of scalability and agility. In our work, we perform
the performance increases with additional inputs. However, exploration and exploitation together by giving some time
there is practical limit to these inputs for any particular to exploration while exploiting steps. Also, in our model,
technique as with each additional input the complexity of all eNBs optimize their parameters simultaneously, thereby
learning increases requiring more time to react to environ- saving time.
ment changes. Thus, addition of input variables to a learning The remainder of this paper is organized as follows: Sect.
technique directly affects system agility. 3 presents the system model. Section 4 gives the SON tech-
niques used. Section 5 shows the simulation results. Section
6 concludes the paper.
2 Related work

In the literature, optimization of antenna tilt parameters for 3 System model


Long Term Evolution (LTE) has been studied from differ-
ent SON design approaches. The SON design architecture 3.1 Overview
can be centralized, distributed or hybrid. Centralized designs
have scalability issues and are generally not preferred in A mobile LTE network in a full urban environment is
SON [6]. Our proposed SON architecture design based on considered, as shown in Fig. 1. We adapt a hybrid SON
antenna tilt optimization is hybrid, i.e., distributed with architecture, where each eNB acts as an independent SON
optional inter-cell information exchange. This approach is entity and is able to cooperate with neighbouring eNBs
better than centralized and also takes advantage of data through the X2 standard interface. Each eNB’s KPIs, set-

123
Distributed self optimization techniques for heterogeneous network environments using active…

SON Hybrid Architecture Mobile Network Environment


Cooperation &
Data Sharing

Network
Connected to other Connected to Management Server
eNBs through X2 central server (NMS)
(NMS)
interface for SON for sharing SON Connected to All eNBs
cooperation experience and
configuration
information

Independent
Learning

Coverage

1
2
3 Neighborhood
Distance

Group of 3 Independent SON


Entities
Each eNB comprises of one
directional Antenna

Fig. 1 System model with hybrid SON architecture

tings and position information is also shared with a central


Network Management Server (NMS). Every eNB can find
its neighbouring eNBs from the NMS based on the sim-
ple separation criteria and then interact directly with them
saving valuable time accessing and overloading the NMS.
In this way, a fully distributed SON architecture is realized
with cooperation between cells. The downlink interference
model includes thermal noise and large scale fading. Uplink
is not considered. We have mainly focused on through-
put of eNBs and UEs as the KPI for network optimiza-
tion. Throughput achieved directly relates to the network
QoS.

3.2 Antenna tilt design


Fig. 2 Antenna gain calculation with tilt angle
AAS enhances the coverage of eNB site by applying elec-
tronically steering beam. The beam can be focused in the
desired elevation and azimuth direction of a UE u to improve
gain and received signal power. In the current work, we have
limited ourselves to changes only in the antenna tilt angle θ be calculated from the antenna height and ground distance
(see Fig. 2). The change in tilt can alter eNB antenna cover- between eNB and that location point. While, φ can be calcu-
age area. Thus, θ can effectively be used to reduce inter cell lated from the antenna bore sight direction and the location
interference and enhance network capacity. point coordinates. For a trisectorial site, 3GPP [20] defines
The gain of antenna at a fixed location can be computed gains azimuth A H (φ), elevation A V (ψ) and the total gain
using antenna elevation angle ψ and azimuth angle φ. ψ can A(φ, ψ) at location (φ, ψ) as:

123
M. N. Qureshi et al.
  2 
φ SINR observed and bandwidth used [23]. Thus we define
A H (φ) = −min 12 , Am (1) our utility function U for UE u served with a finite number
φ3d B
  2  of PRBs by eNB e as:
ψ −θ
A V (ψ) = −min 12 , S L Av (2)
ψ3d B 
S I N Rue

A(φ, ψ) = −min{−[A H (φ) + A V (ψ)], Am } (3) Uue = Bue × BWe f f × log2 1 + (6)
S I N Re f f

where Am is the backward attenuation factor in horizontal where Bue is the total bandwidth corresponding to the total
plane and taken as 25 dB, S L Av is the backward attenuation number of PRBs assigned to user u by eNB e, BWe f f
factor in vertical plane and taken as 20 dB, φ3d B is the half or Bandwith e f f iciency is taken as 0.56, S I N Re f f
power azimuth beamwidth, ψ3d B is the half power elevation or S I N R e f f iciency is taken as 2.
beamwidth. The value of BWe f f and S I N Re f f have been taken from
The combined radiation effect A(φ, ψ) is used to compute [23] for LTE environment .
the eNB antenna gain G for any location. The utility function of one SON entity or eNB denoted as
Ue can now be given as the sum of all the data rates observed
3.3 Interference model by its associated UEs. The goal of our system is maximize
this cell throughput based utility function.
LTE uses OFDMA (Orthogonal Frequency Division Multiple
Access) for its air access technology due to its high spectral
N
efficiency. OFDMA subdivides the total cell bandwidth into 
Ue = Uue (7)
many sub bands, such that each user is allocated specific
u=1
sub-carriers for defined amount of time [21,22]. In LTE, the
basic set of twelve sub-carriers and one time slot is known
where Ue is the utility function for eNB e, N is the total
as Physical Resource Blocks (PRBs). One PRB or a set of
number of UEs associated with eNB e.
PRBs is assigned to a unique user, so that there is no overlap
For any given period in a cellular network, access proba-
of PRBs. Some PRBs are reserved for guard and signalling
bility (A P) is defined as the ratio of number of users accepted
information. Thus within a cell, the intra-cell interference
by the network for access to the number of users rejected by
is theoretically zero. The only interference observed by any
the network. Now, in order to have a realization of the com-
UE in LTE is the inter cell interference. Considering a UE u
plete system throughput performance as a function of only
attached to an eNB e, the average interference Iue observed
accepted users we define normalized throughput given as
by it on each sub-carrier is given as:
k
k
Pi G i e=1 Ue
U N or m System = A P × (8)

Iue = M(i, e) × νi (4) k
ξiu
i=1,i=e

where k stands for the total number of eNBs in the environ-


where M(i, e) is 1 if eNB i and eNB e use the same frequency,
ment.
otherwise it is 0, νi is the ratio of allocated PRBs to total
available PRBs in eNB i, Pi is the transmit power of eNB i,
G i is the gain of antenna for eNB i, ξiu : link loss between
eNB i and UE u. 4 Self optimization model
Note that the link loss includes the path loss, cable loss,
fading, etc. The signal to noise ratio observed by the UE u Each SON entity or eNB in the simulated LTE network
based on the interference received Iue can be given as: runs its optimization function independently. The self-
optimization model consists of: (i) an input block that accepts
Pe × G e different system inputs, (ii) a learning and optimization block
S I N Rue = (5)
ξue (Iue + δ 2 ) that learns and predicts optimum eNB antenna tilt setting
based on the inputs and past feedback and (iii) an action block
where Pe is the transmit power of eNB e, G e is the gain of that decides either to explore the neighbourhood environment
antenna for eNB e, ξue is the link loss between eNB e and and select a random antenna tilt, or exploit the learning done
UE u, δ 2 is the thermal noise power per carrier. and accept the proposed tilt angle by the previous block.
Shanon modified capacity theorem provides an estimate The system inputs comprise of its own and neighbourhood
of the data rate possible in an LTE environment given the antenna tilts, mobile users data and different selected KPIs.

123
Distributed self optimization techniques for heterogeneous network environments using active…

Neighborhood
SON Entities
SON Functional Model
(In each eNB)

New eNB
Conditional Antenna Tilt
Random
Parameter
Explore Antenna Tilt
Selection
Learning &
Network Optimization Explore or
Input Block
Management Block Exploit
Server (NMS) (RL / SCLA)
Select Best
Exploit Antenna Tilt Config
based on learning
algo (RL / SCLA)

Action Block
Observed Network
KPIs

Feedback

Fig. 3 SON entity environment interaction

The simulated environment runs for a predetermined time (i) it is a model-free RL technique that can be applied for
steps to get a feedback in the form of KPIs observed. This any given (finite) Markov decision process (MDP) to
functional model of SON entity and its interaction with envi- find an optimal action-selection policy. MDPs provide
ronment is illustrated in Fig. 3. The details of the algorithms a mathematical framework for RL cases as they model
used in self-optimization is explained in the subsequent sub- decision process in situations where results are partially
sections. random and partially under decision maker control. This
is a typical case in our scenario, where each eNB has
decision control to select its own antenna tilt but not
4.1 Reinforced learning with Q-learning antenna tilts of other eNBs in the environment,
(ii) it works by learning an action-value function by fol-
RL is an area of machine learning concerned with how learn- lowing a optimal policy to achieve the desired utility. A
ing agents select actions in an environment based on their rule that the agent follows in selecting actions based on
current states with the aim of maximizing some defined the current state is known as the policy. On completing
mechanism of cumulative reward. RL is useful in solving the learning process, the optimal policy can be formu-
control optimization problems similar to ours i.e., finding lated by selecting the action corresponding to the state
the best action in every state observed by the system so as having maximum expected cumulative reward,
to optimize some objective function. The objective function (iii) it can be directly applied to problems with stochastic
can maximize the instant reward or total discounted reward transitions and rewards without changes,
over a given time bar, depending upon how it is defined (iv) it has been proven that Q-learning can eventually find
and used. Specifically, in our case we want to maximize the an optimal policy for any finite MDP [24].
eNB throughput by selecting the best tilt angle for any given
state. We have selected Q-learning RL technique for our work
because:

123
M. N. Qureshi et al.

Algorithm 1 Q-learning Algorithm for eNB Antenna Tilt


Define:
s ∈ S the state space, a ∈ A the action space or tilt angles, TE pisode =
E pisode T ime Period, α = Lear ning Rate, reward function r =
f (K P I s), γ = Discount Factor
Initialize:
matrix Q(s, a) = 0, time t = 0, E pisode = 1, Set s, α, γ to fix
values, Random select a
repeat
if t mod TE pisode == 0
Apply a and observe KPIs, reward r and new state s ′
Select a ′ based on ǫ − gr eedy strategy
if Selection = Random
Random select a ′
else
Select a ′ from Q(s, a) matrix based on max Q
Update Q Table
Q(s, a) ← Q(s, a) + α[r + γ maxa ′ Q(s ′ , a ′ ) − Q(s, a)]
s ← s′ , a ← a′ Fig. 4 Encoding of UE position information
until t = End Simulation

The Q-learning algorithm for each eNB entity or learning


agent is given in Algorithm 1. The algorithm is character- The basis of Q-learning algorithm is based on the Eq. (10).
ized by states S and a set of actions A for each state in S. In It is a simple value step update that takes the old value and
effect, the algorithm calculates quantity Q of a state action makes a correction based on new observation. Before the
pair i.e., Q : S × A → R, where R are possible rewards. algorithm starts, Q is set to an (arbitrary) fixed value, cho-
In our case the state is governed by the number of inputs sen here as zero matrix. After simulation starts, Q values
the agent learns. The actions are the vector of different fixed are updated by each eNB entity depending on the state-
antenna tilt positions used. By executing an action a ∈ A, action pair selected and feedback from the network KPIs
the agent moves from one state to the next. By performing observed.
an action in a specific state provides the learning agent with In this paper, we define and implement two Q-learning
reward r ∈ R. The r is a numerical value that depends on the schemes based on the type of inputs. The first scheme
effect an action has on the utility function given in Eq. (7). labelled as Q L A = f (θ, u e (x, y), Ue ). Q L A has two
We consider r as positive value for an increase in the utility types of inputs, i.e., own antenna tilt position θ and the
U , and a negative value for a decrease in the utility observed associated mobile users positions u e (x, y), and last vari-
as given in Eq. (9). For an eNB e, the utility U at time t is able is its feedback Ue which is the utility of eNB e or
expressed as Ue (t). Over period of time, rewards add up for achieved throughput from Eq. (7). Total antenna tilt posi-
each state action combination. The agent learns and is able to tions are 6◦ , 8◦ , 10◦ , 12◦ , 14◦ and 16◦ . These positions
decide which action is optimal for any given state based on are selected lower than the 15◦ calibration tilt specified by
the total reward for any state action combination. The cumu- [20]. To limit the input state combinations due to associ-
lative reward is a weighted sum of the expected values of the ated mobile users locations, they are classified into three
rewards of all future action steps starting from the current sets depending on their ranges as shown in Fig. 4. The
state. The emphasis on number of steps to be considered in first range Sec0 is from eNB center to 100 m, second Sec1
weighing future rewards that require time interval of Δt, can from 101 to 200 m and third Sec2 greater than 200 m.
be expressed as γ Δt . γ is known as the discount factor and The maximum eNB coverage distance is expected around
has a value between 0 and 1 (0 ≤ γ ≤ 1). γ negotiates 300 m for our simulation environment. These Sec0 , Sec1
between sooner versus later rewards importance. and Sec2 are taken as bit values indicating whether the
corresponding range has active mobile users or not and

Positive , if Ue (t) ≥ Ue (t − 1) encoded as three bit number. In the second scheme labelled
rt = (9) as Q L B = f (θ, u e (x, y), Av N T He , Ue ), we add one
N egative , otherwise Ue (t) < Ue (t − 1)
additional input Av N T He to Q L B i.e., average of neigh-
Q(st , at ) ← Q(st , at ) + αt [rt+1 bourhood cells throughput. We find eNB cell neighbours
(10)
+γ maxa Q(st+1 , a) − Q(st , at )] by defining a pre-set maximum range for neighbours i.e.,
N eN B Range = 500 m. The location of neighbouring cells
where rt+1 is the reward observed after performing action at can be easily got from NMS network server to which all
in state st , whereas αt is the learning rate between 0 and 1. cells provide location information when getting registered to

123
Distributed self optimization techniques for heterogeneous network environments using active…

the network. Av N T He input is quantized to 10 levels i.e., rithm Eq. (9). With a learning rate α between 0 to 1, the eNB
from 0 × 175 kbps to 9 × 175 kbps, to limit the size of the probability vector is updated as follows:
input state matrix. The value of 175 kbps is roughly set so that 
nominal value of Av N T He falls in the mid step range of 5, e qte ( j) + (k − 1) × α , i f rt ≥ 0
namely; qt+1 ( j) = (12)
qte ( j) − α other wise

k

Av N T He = Ui | dintercell ≤ N eN B Range (11)
i=1 Algorithm 2 Stochastic Automata eNB Antenna Tilt Selec-
tion
4.2 Stochastic cellular learning automata (SCLA) Define: eNB Antenna tilt vector, A = [a(1), a(2), . . . , a( j), . . . , a(k)],
eNB Antenna probability vector, qt ( j) = k1 ∀ j ∈ {1, . . . , k}, utility Ue ,
reward r , time t, Episode TE pisode = 1, Averaging Time Period (AvP),
In order to see and compare the performance of rein- Initialization: Ue = 0, r = 0, t = 0, TE pisode = 1, TPeriod = Av P
forced Q-learning Q L A = f (θ, u e (x, y), Ue ) and Q L B = Random select a from A
f (θ, u e (x, y), Av N T He , Ue ), that can learn from more than repeat
if t mod Av P == 0, and TE pisode > 0
one input variables with a learning technique that learns only
Apply a and observe KPIs using moving averaging filter
from its actions, we select SC L A = f (θ, Ue ). In our earlier Calculate reward r using Equation (9)
work [16], we applied this technique successfully for select- Select a based on ǫ − gr eedy strategy
ing orthogonal component carriers for neighbouring fem- if Selection = Random
Random select a from A
tocells in Heterogeneous Network (HetNet) environment.
else
Particularly to HetNet, we have shown that SCLA approach Update probability vector using Eq. (12)
meets the SON requirements of scalability, stability and Select a( j) from A based on max qt ( j)
agility. SCLA follows a distributed architecture that allows I ncr ement TE pisode
else
quick adaptability to dynamic changes in the environment.
I ncr ement t
Thus, SCLA meets comparison requirement for our paper. until t = End Simulation
SCLA stems from the idea of applying stochastic automata
learning technique in cellular automata (CAs). CAs are
mathematical models for systems having large number of The pseudo-code of SCLA algorithm is given in Algo-
simple identical components with local interactions. The rithm 2. As the simulation time step t progresses, each eNB
simple components act together to produce complex global learning entity will continue to learn and improve on its
behaviour. Stochastic learning automata is a finite state antenna tilt angle selections and reach a stage where the prob-
machine that requires only environment feedback to achieve ability of the optimum tilt angle will almost reach unity. This
better performance. Stochastic learning automata can be stable condition is desirable if the neighbour eNBs do not
applied to both stationary and non-stationary environment. change their tilt selections. However, the model maintains
As stochastic learning automata does not have predetermined its dynamic nature and can respond to any new change in
relationships between actions and their results, so there is the neighbourhood environment as is the case here due to
no requirement of a closed form system model. By com- moving users.
bining machine learning technique of stochastic learning
automata to the plain CA, the new distributed learning model 4.3 Exploration versus exploitation
is known as SCLA. Provided the learning rate α is kept
between 0 and 1, SCLA converges to local minima. The Providing enough information to a system to learn is an
proof of convergence of SCLA technique can be found in important step to ensure that the system is able to steer itself
[16,25,26]. towards an optimum goal. In our work, we desire that our
In our work, SCLA picks out one eNB antenna tilt angle or algorithm shall continue to explore its environment space for
action a from all possible positions or actions i.e k =| A |. SON agility characteristic and also exploit it by taking deci-
This selection is done according to the probability vector sion based on learning experience. This being a typical issue
qte = [qte (1), qte (2), . . . , qte (k)] for eNB e at time t. Once of exploration versus exploitation problem, we selected the
the tilt angle has been selected the probability vector is ǫ − gr eedy strategy, which allows exploration with proba-
updated using Discrete Pursuit Reward Inaction (DPRI) pur- bility ǫ times and exploitation with probability (1 − ǫ) times.
suit algorithm that has good convergence properties [27]. Exploration is done with uniform selection, without pref-
SCLA learns through the feedback got from the environ- erence to any particular action. In the exploitation step, we
ment. A positive or negative reinforcement signal r is based choose the action corresponding to best Q value in Algorithm
on the earlier reward criteria we have set for Q-learning algo- 1 or probability in Algorithm 2.

123
M. N. Qureshi et al.

Table 1 Network simulator parameters


Parameter Value

System BW 5 MHz
Cell layout 96 eNBs, single sector
Inter-site distance 0.5–2 km
Sub carrier spacing 15 kHz
Thermal noise density − 173 dBm/Hz
PRBs per eNB 15
Path loss 128.1 + 37.6(Log10 R),
R is in Km
Shadowing standard deviation 6 dB
PRBs assigned to one mobile 1–4 on first come first serve basis
Mobility of mobiles 70%
Mobile speed Max 15 m/s
Mobile antenna gain 1.5 dBi
Mobile body loss 1 dB
Fig. 5 Simulated environment Traffic arrival rate λ 6
Traffic model FTP
File size 57000 Kbits
5 Simulation and results Antenna tilt 6◦ to 16◦
Antenna tilt Steps 6
5.1 Simulation environment SCLA learning rate 0.005
Q learning rate 0.05
The MATLAB LTE simulator described in [28] has been Q discount factor 0.8
modified for this work. A dense urban LTE environment has ǫ 0.01
been simulated as shown in Fig. 5. We consider downlink FTP
transmission. The detail simulation parameters are given in
Table 1. The simulator proceeds in fixed time steps by taking
correlated Monte Carlo snapshots. Users arrive following a
and QLB are shown in Figs. 6, 7 and 8 respectively. The
Poisson process. With every time step new users are added to
normalized throughput graph gives an idea of the average
the environment. Old users are checked for their FTP down-
throughput achieved by accepted users in the environment
loads, and if the file has been completely transferred these
only. These three comparison graphs show that SCLA has
users are removed from the environment. A Call Admission
clear advantage over other optimization techniques almost
Control (CAC) procedure based on eNBs resource availabil-
over the complete range of traffic arrival rate. SCLA outper-
ity and received signal strength by mobile users have been
forms other algorithms by up to 30%, with 15% improvement
implemented. A user explores and selects the eNB with high-
on average. In these three figures, QLB performance is almost
est received downlink power or specifically the Reference
comparable to QLA. QLB achieves about 10% advantage in
Signal Received Power (RSRP) defined in LTE. The selected
the mid range λ = 9–15 but drops for low and high traffic
eNB then accepts the mobile if it has at least one PRB in spare.
arrival rates. The performance improvement of QLB over
A call is dropped if mobile enters an area with low cellular
QLA is actually seen in Fig. 9 i.e., average file transfer time
coverage. KPIs are updated on every time step, however the
comparison graph. QLB performs about 5% better to QLA
optimization algorithms are run after fixed periods to account
from λ = 6–16. SCLA still maintains its advantage of 5%
for normalization. The result KPI graphs and tables are based
average over the range of λ = 2–18.
on moving average filter of 500 steps. The initial 200 steps
The CDF graph of all techniques is shown for λ = 10 in
are excluded to avoid transient effects.
Fig. 10. The graph is also in agreement with our earlier plots
of average system throughput and average user file transfer
5.2 Results time. QLB shows improvement for all users over QLA. Com-
pared with SCLA, QLB shows improvement over SCLA for
The average system throughput, normalized system through- 70% of users, after which it lags behind. This cross over
put (Eq. 8) and access probability comparison graphs is because SCLA has 5% improvement of average system
between the three optimization techniques i.e., SCLA, QLA throughput over QLB (see Fig. 6).

123
Distributed self optimization techniques for heterogeneous network environments using active…

Fig. 6 System throughput comparison Fig. 9 File transfer time comparison—for an average user

Fig. 7 Normalized throughput comparison Fig. 10 CDF comparison at λ = 10

Comparing QLB over QLA, we find that addition of


meaningful input does bring advantage, even though small.
Neighbourhood transmission has direct bearing on the inter-
ference a cell observes and actual throughput it can achieve
for edge users. We should have observed better performance
but the learning time required for the larger QLB Q-Matrix
i.e., 240 × 6 size is obviously more as compared to the 48 × 6
Q-Matrix of QLA, an observation also shown by SCLA per-
formance. However, not all cells in the environment will have
a large Q-matrix to learn. Cells not having neighbours within
Fig. 8 Access probability comparison the defined criteria will have a smaller Q-matrix to learn. It
is for this reason that QLB file transfer time performance is
better as compared to QLA although the advantage in system
5.3 Analysis throughput is not much.

The reason why SCLA performs better to other techniques


is due to its simple probability vector update rule. SCLA 6 Conclusion
probability vector dimension equalled the total number of
antenna tilt positions i.e., 6. In comparison, QLA has the Q From the results we have obtained, we can conclude that;
table dimension of 48 × 6, and QLB had 240 × 6 due to input First, simple RL techniques like SCLA perform much better
of additional 5 states of neighbourhood throughput vector as compared to more complex techniques like Q-learning.
(48×5 = 240). Thus, SCLA probability vector could update Second, within the same RL technique, addition of inputs
quickly and react much faster to the changing environment improves the network performance provided the inputs are
cum traffic scenario compared to the QLB or QLA Q-matrix chosen wisely. Finally, input learning parameters cannot be
updates. Also, for Q-matrix it will be likely possible that some added beyond a certain limit in the reinforced techniques
cells never get to see some states and learn from them until like Q-learning due to computational complexity involved
they occur. Thus the exploration time required for Q-learning in updating large state-action matrix if a particular system
is more in comparison to SCLA. response time is desired. The limitations observed for Q-

123
M. N. Qureshi et al.

learning techniques can be addressed using artificial neural 13. Guan, X., Liao, S., Bai, J., Wang, F., Li, Z., Wen, Q., et al. (2017).
networks that have the inbuilt characteristics of dimension- Urban land-use classification by combining high-resolution optical
and long-wave infrared images. Geo-Spatial Information Science,
ality reduction and learning weights of inputs. This is left for 20(4), 299–308.
future work. 14. Wahyono, Jo K. H. (2014). A comparative study of classification
methods for traffic signs recognition. IEEE International Confer-
ence on Industrial Technology (ICIT), 2014, 614–619. https://doi.
Compliance with Ethical Standards org/10.1109/ICIT.2014.6895001.
15. Ergul, E. (2017). Relative attribute based incremental learning for
image recognition. CAAI Transactions on Intelligence Technology,
Conflict of Interest On behalf of all authors, the corresponding author 2(1), 1–11.
states that there is no conflict of interest. 16. Qureshi, M. N., & Tiwana, M. I. (2016). A novel stochastic learn-
ing automata based son interference mitigation framework for 5G
hetnets. Radioengineering, 25(4), 763–773.
17. Partov, B., Leith, D. J., & Razavi, R. (2015). Utility fair optimiza-
References tion of antenna tilt angles in lte networks. IEEE/ACM Transactions
on Networking, 23(1), 175–185. https://doi.org/10.1109/TNET.
1. Akyildiz, I. F., Gutierrez-Estevez, D. M., Balakrishnan, R., & 2013.2294965.
Chavarria-Reyes, E. (2014). Full length article: LTE-advanced 18. Razavi, R., Klein, S., & Claussen, H. (2010). Self-optimization
and the evolution to beyond 4G (B4G) systems. Physical Com- of capacity and coverage in lte networks using a fuzzy rein-
munication, 10, 31–60. https://doi.org/10.1016/j.phycom.2013.11. forcement learning approach. In 21st Annual IEEE international
009. symposium on personal. Indoor and mobile radio communications
2. Hejazi, S. A., & Stapleton, S. P. (2014). Self-optimising intelli- (PIMRC), 2010 (pp. 1865–1870). https://doi.org/10.1109/PIMRC.
gent distributed antenna system for geographic load balancing. IET 2010.5671622.
Communications, 8(15), 2751–2761. ISSN 1751-8628. 19. Dandanov, N., Al-Shatri, H., Klein, A., & Poulkov, V. (2017).
3. Zahir, T., Arshad, K., Nakata, A., & Moessner, K. (2013). Inter- Dynamic self-optimization of the antenna tilt for best trade-off
ference management in femtocells. IEEE Communications Sur- between coverage and capacity in mobile networks. Wireless Per-
veys Tutorials, 15, 293–311. https://doi.org/10.1109/SURV.2012. sonal Communications, 92(1), 251–278. https://doi.org/10.1007/
020212.00101. s11277-016-3849-9.
4. Damnjanovic, A., Montojo, J., Wei, Y., Ji, T., Luo, T., Vajapeyam, 20. 3GPP. (2010). Evolved universal terrestrial radio access (E-
M., et al. (2011). A survey on 3GPP heterogeneous networks. IEEE UTRA); further advancements for (E-UTRA) physical layer
Wireless Communications, 18(3), 10–21. https://doi.org/10.1109/ aspects (release 9). 3GPP technical report technical report TR
MWC.2011.5876496. 36.814 v9.0.0 (2010-03).
5. Osterbo, O., & Grondalen, O. (2012). Benefits of self-organizing 21. 3GPP (2012). Evolved universal terrestrial radio access (E-UTRA)
networks (son) for mobile operators. Journal of Computer Net- and evolved universal terrestrial radio access network (eutran),
works and Communications, 2012(862527), 16. https://doi.org/10. overall description; stage 2. 3GPP technical specification TS
1155/2012/862527. 36.300.
6. Aliu, O. G., Imran, A., Imran, M. A., & Evans, B. (2013). A survey 22. 3GPP. (2018). Technical specification group radio access network;
of self organisation in future cellular networks. IEEE Communica- Evolved universal terrestrial radio access (E-UTRA); Physical
tions Surveys Tutorials, 15(1), 336–361. https://doi.org/10.1109/ channels and modulation. 3GPP technical specification TS 36.211
SURV.2012.021312.00116. v15.1.0.
7. 3GPP (2012). Self-configuration and self-optimizing network use 23. Mogensen, P., Na, W., Kovacs, I.Z., Frederiksen, F., Pokhariyal, A.,
cases and solutions. 3GPP technical report TR 36.902. Pedersen, K.I., Kolding, T., Hugl, K., & Kuusela, M. (2007). LTE
8. Razavizadeh, S. M., Ahn, M., & Lee, I. (2014). Three-dimensional capacity compared to the Shannon bound. In IEEE 65th vehicular
beamforming: A new enabling technology for 5G wireless net- technology conference—VTC2007-Spring, 2007 (pp. 1234–1238).
works. IEEE Signal Processing Magazine, 31(6), 94–101. https:// https://doi.org/10.1109/VETECS.2007.260.
doi.org/10.1109/MSP.2014.2335236. 24. Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning,
9. Kishiyama, Y., Benjebbour, A., Ishii, H., & Nakamura, T. (2012). 8(3–4), 279–292. https://doi.org/10.1007/BF00992698.
Evolution concept and candidate technologies for future steps of 25. Beigy, M., & Maeybodi, M. R. (2004). A mathemati-
lte-a. In 2012 IEEE International Conference on Communication cal framework for cellular learning automata. Advances in
systems (ICCS) 2012 (pp. 473–477). https://doi.org/10.1109/ICCS. Complex Systems, 07(03n04), 295–319. https://doi.org/10.1142/
2012.6406193. S0219525904000202.
10. Yilmaz, O. N. C., Hämäläinen, J., & Hämäläinen, S. (2013). Opti- 26. Narendra, K. S., & Annaswamy, A. M. (2005). Stable Adaptive
mization of adaptive antenna system parameters in self-organizing Systems. Mineola: Dover Pubn Inc. ISBN 0486442268.
lte networks. Wireless Networks, 19(6), 1251–1267. https://doi.org/ 27. Papadimitriou, G. I. (1994). A new approach to the design of
10.1007/s11276-012-0531-3. reinforcement schemes for learning automata: stochastic estimator
11. Yang, Y., et al. (2018). DECCO: Deep-learning enabled cover- learning algorithms. IEEE Transactions on Knowledge and Data
age and capacity optimization for massive MIMO systems. IEEE Engineering, 6(4), 649–654. https://doi.org/10.1109/69.298183.
Access, 6, 23361–23371. 28. Nasri, R., & Altman, Z. (2013). Handover adaptation for dynamic
12. Zhao, B., Gao, L., Liao, W., & Zhang, B. (2017). A new kernel load balancing in 3GPP long term evolution systems. ArXiv e-
method for hyperspectral image feature extraction. Geo-Spatial prints 1307.1212. http://adsabs.harvard.edu/abs/2013arXiv1307.
Information Science, 20(4), 309–318. 1212N.

123
Distributed self optimization techniques for heterogeneous network environments using active…

Muhammad Nauman Qureshi Majed Haddad is an assistant pro-


is currently pursuing his Ph.D. fessor at the University of Avi-
degree in Wireless Communica- gnon, France since 2014, where
tion Networks at the Department his main research interests are in
of Electrical Engineering, COM- radio resource management, het-
SATS University, Islamabad, Pak- erogeneous networks, green net-
istan. He received his B.E. in works, complex networks and
Avionics from College of Aero- game theory. In 2004, Majed
nautical Engineering, National received his electrical engineering
University of Science & Tech- diploma from the National Engi-
nology (NUST) Pakistan in 1997 neering School of Tunis, Tunisia.
and M.S. in Information Secu- He received the master degree from
rity from Sichuan University, the university of Nice Sophia-
Chengdu, China in 2007. His Antipolis in France in 2005, and
research interest include 5G Het- a doctorate in Electrical Engineer-
erogeneous Networks, Self Organising Networks, Artificial Intelli- ing from Eurecom institute in 2008. In 2009, he joined France Tele-
gence and Fog Networks. com R&D as a post-doctoral research fellow. In 2011, he joined the
university of Avignon in France as a researcher assistant. From 2012 to
2014, Dr. Haddad was a research engineer at INRIA Sophia-Antipolis
in France under an INRIA Alcatel-Lucent Bell Labs fellowship. Majed
Moazzam Islam Tiwana received has published more than 50 research papers in international confer-
a B.Sc. degree in Electrical and ences, journals, book chapters and patents. He also acts as TPC chair,
Electronics Engineering from the TPC member and reviewer for various prestigious conferences and
University of Engineering and journals.
Technology, Taxilla, Pakistan, in
2001, and a M.Sc. degree in Dig-
ital Telecommunication Systems
from ENST, Paris, France in 2007
and a Ph.D. degree in Mobile Com-
munications from Telecom Sud-
Paris Paris, France, in 2010. His
Ph.D. was with the R&D Group
of Orange Labs of France Tele-
com. He has more than 9 years
of industrial and academic experi-
ence with research publications in the reputed international journals.

123

You might also like