0% found this document useful (0 votes)
12 views

A_Hybrid_machine_learning_based_model_for_congestion_prediction_in_mobile_networks

This paper presents a hybrid machine learning model for predicting congestion in mobile networks, combining unsupervised co-clustering and supervised logistic regression. The model utilizes Key Performance Indicators (KPIs) from a live LTE network to enhance prediction accuracy and proactively manage network congestion. Validation of the model shows improved performance metrics, demonstrating its effectiveness in maintaining Quality of Service (QoS) for end-users.

Uploaded by

vohi0311
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

A_Hybrid_machine_learning_based_model_for_congestion_prediction_in_mobile_networks

This paper presents a hybrid machine learning model for predicting congestion in mobile networks, combining unsupervised co-clustering and supervised logistic regression. The model utilizes Key Performance Indicators (KPIs) from a live LTE network to enhance prediction accuracy and proactively manage network congestion. Validation of the model shows improved performance metrics, demonstrating its effectiveness in maintaining Quality of Service (QoS) for end-users.

Uploaded by

vohi0311
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Hybrid machine learning based model for

2022 IEEE 33rd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) | 978-1-6654-8053-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/PIMRC54779.2022.9977541

congestion prediction in mobile networks


Sara Kassan Imed Hadj-Kacem Sana Ben Jemaa Sylvain Allio
Orange Labs Orange Labs Orange Labs Orange Labs
Belfort, France Belfort, France Paris, France Belfort, France
[email protected] [email protected] [email protected] [email protected]

Abstract—Congestion avoidance in radio access networks en- have shown that both models have the ability to forecast
hances considerably the end-user Quality of Service (QoS). Con- the network behavior. Nevertheless, cell congestion is defined
gestion should be predicted in advance to allow Self Organizing based on the average download speed per user and does not
Networks (SON) algorithms to perform appropriate parameter
adjustments (such as handover parameters for mobility load take into account typical congestion criteria used for network
balancing). For this purpose, a novel hybrid model efficient optimization such as cell load, average number of active users
congestion prediction mechanism is proposed in this paper. This and traffic volume. In our work, we consider congestion rules
hybrid learning model combines unsupervised and supervised and thresholds used by operational teams to decide on the
learning algorithms. The unsupervised learning consists of a co- congestion state of a cell. In [2], authors developed unsuper-
clustering algorithm based on Latent Block Model (LBM) that
groups similar cells according to their KPIs behaviour over time. vised clustering approaches for features extracted from KPIs to
Following the co-clustering model, a logistic regression approach group cells that show similar performances and, consequently,
is applied on each cluster to predict congestion and alert to identify the groups that perform below the desired threshold.
operators to avoid congestion occurrence in mobile networks. However, only two groups of cells are identified using different
The applicability of the hybrid model is validated for a real clustering algorithms: one group consisting of the cells with
data represented by Key Performance Indicators (KPIs) collected
periodically for 12 days in a live Long-Term Evolution (LTE) the highest performance and the other one containing the
network. The hybrid proposed model has proven its efficiency in worst performing cells. The clustering approach used should
congestion prediction in terms of accuracy, precision, recall and be adapted to form clusters with more specific behaviours or
F-measure. performances. We can also mention other recent works that
Index Terms—Congestion prediction, Radio access network, predict congestion in mobile networks [3]. Authors considered
Co-clustering model, Hybrid learning model, functional logistic
regression. different supervised learning techniques for congestion predic-
tion: linear regression, logistic regression and random forest.
The performance of the different approaches are compared in
I. INTRODUCTION
terms of computational complexity and prediction accuracy.
Due to the traffic demand increase and the emergence Logistic regression using functional data presented the best
of several new services and technologies, mobile networks trade off between accuracy and complexity [3]. Therefore, we
become highly complex and their management lead to growing use the logistic regression approach to predict congestion that
challenges facing the network operators. Therefore, operators outperforms the state-of-art baselines and presents lower com-
have to provide high quality of service (QoS) to the customers plexity compared with other methods such as deep learning
while reducing the operational costs. Automation, through the [4].
introduction of Self Organizing Networks (SON), has been In order to enhance the precision of the congestion prediction,
widely adopted for radio access network management. In the we propose to precede the prediction model by an unsuper-
recent years, artificial intelligence have gained momentum as vised co-clustering approach to form clusters of cells with
it is considered as the further step beyond automation. similar behaviours. This proposed hybrid model increases the
This paper proposes a model that learns historic Key Per- precision metric of the congestion prediction model based
formance Indicators (KPIs) measurements and predicts future on logistic regression. Firstly, an unsupervised co-clustering
congestion in mobile networks. This model alerts operators approach is used to form clusters of cells and time inter-
of future radio congestion and permits to act in a proac- vals where KPIs are highly correlated. As we have in each
tive manner by avoiding congestion before it occurs, which cluster similar behaviours and, consequently, KPIs that are
significantly enhances the QoS. Despite the great interest in highly correlated, we apply the logistic regression approach
prediction techniques, few works tackled congestion prediction used in [3] for each cluster. This results in good congestion
in mobile networks. In [1], two supervised learning algo- prediction performance with relatively low complexity. The
rithms based on forecast machine learning models are used congestion criterion corresponds also to field engineering rules
and evaluated to show their effectiveness in forecasting the used by network operational teams. The added value of the
average downlink throughput of LTE base stations. The results co-clustering approach compared with simple supervised ap-

978-1-6654-8053-6/22/$31.00 ©2022 IEEE 583


Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.
proach is evaluated on operational network data corresponding out of the prediction model and uses training sets of similar
to large network areas with heterogeneous cell contexts in cells to improve the congestion prediction for each cluster.
terms of traffic, radio environment etc.
A. Smoothing technique for functional data reconstruction
The main contributions of this paper can be summarized as
follows: Smoothing technique aims at reconstructing a functional
time series from the discrete observations of a given KPI [7],
• Combining the co-clustering approach and the logistic
[8]. We assume that the dataset is a set of discrete observations
regression to improve the congestion prediction.
composed of n rows of times and p columns of different
KPIs measurements for different cells. In this work, KPIs
• Performing experiments using real measurements col-
measurements are collected for each cell during several days.
lected from a live LTE network to validate the proposed
To reconstruct the functional presentation of data from the dis-
hybrid model.
crete observation, based on smoothing methods, curves belong
The rest of the paper is organized as follows. Section II to a finite dimensional space spanned by a basis functions
presents the proposed hybrid model for congestion predic- such as b-splines, wavelets or trigonometric functions [9].
tion in mobile networks. Section III describes the network Each observed curve xij (t), for (1 ≤ i ≤ n, 1 ≤ j ≤ p)
data used for performance evaluation. Section IV provides can be expressed as a linear combination of basis function
numerical results. Section V concludes the paper and gives {ϕk (t)}k=1...M
an overview on future works.
M
X
II. HYBRID LEARNING MODEL TO IMPROVE xij (t) = aijk ϕk (t), t ∈ [0, T ] , (1)
k=1
CONGESTION PREDICTION
where xij (t) is the curve that can be estimated by the least
This section introduces an overview of data preparation and square smoothing and aijk is its basis expansion coefficient.
the proposed model to improve congestion prediction in radio Due to the nature of the real KPIs under study, we adapt
access networks. Since the extracted KPIs measurements are a smoothing method with the Fourier function as the basis
discrete, the first step is to find the functional nature of these function. In this work, the same basis function {ϕk (t)}k=1...M
KPIs using smoothing methods. Then, the hybrid approach that is used for all functional features and the number of basis
combines unsupervised co-clustering technique followed by functions M can be set empirically.
the supervised regression model to predict future congestion
is applied on functional data. The overall model is described B. Unsupervised Latent Block Model co-clustering for func-
in Fig. 1. tional data
Mobile network data may be vulnerable to errors commonly Functional Latent Block Model is one of the most widely
used models for co-clustering [10]. It allows to simultane-
ously rows and columns partitions for functional data. The
co-clustering defines Kr row groups for groups of cells
and Kc column groups for time groups. Let the two ran-
dom independent variables R = (rikr )i=1...n,kr =1...Kr and
C = (cjkc )j=1...p,kc =1...Kc indicate respectively the set of all
partitions of rows into Kr clusters of cells as well, the set
of all partitions of columns into Kc clusters of times, i.e.
cikr = 1 , if observation i belongs to the column cluster
kc , 0 otherwise. The rikr (resp. cjkc ) are binary indicators
Fig. 1. Overall proposed prediction model of row i (resp. column j) belonging to row cluster kr (resp.
column cluster kc ), such that the random variables xij are
encountered while collecting measurements. Such errors re- conditionally independent knowing R and C. Applying that
vealed different forms as missing values, outliers, noise and to functional data, we can define the functional latent block
others [5], [6]. Noisy data reduces the available data to be model by its density as follows.
analyzed, compromising the statistical power of the study, XX
p (a; θ) = p (R; θ) p (C; θ) p ((a | R) , C; θ). (2)
the learning capacity of data and eventually the reliability of
r∈R c∈C
its results. Therefore, the first step in the proposed model
consists in reducing the noise using smoothing technique. By assuming that αkr and βkc are the row and column mixing
After the data preparation, the unsupervised co-clustering proportions (belonging to [0, 1] and summing Q to c1), such that
Q rikr jkc
approach based on functional Latent Block Model (FunLBM) p(R; θ) = ikr αkr and p(C; θ) = jkc βkc and the
is used to form groups of cells that have similar behaviors probability of the independent basis expansion coefficients is
in terms of KPIs depending on times groups [9]. Then, the defined by [11]:
Y r c
supervised logistic regression is applied on each cluster. The p ((a | R) , C; θ) = p (aij ; θkr kc ) ikr jkc . (3)
hybrid proposed model removes uncongested clusters of cells ijkrkc

584
Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.
It means that a subset of rows exhibits similar behavior across
a subset of columns and vice-versa. Differently from clustering
techniques like K-means that can ignore the functional nature
of data (which can reduce the clustering performance) and
that are limited to one dimension clustering possibility [10],
we use the LBM co-clustering model based on the work in
[10] to form clusters that exhibit similar cells based on their
behaviors across a subset of times.

C. Supervised logistic regression model for functional LBM


co-clustering data
In this paper, the traffic load can be unbalanced between
cells and may thus dramatically increase the average delay Fig. 2. Geographical area used for the data under study
of delivering services to users located in a victim cell area.
This can adversely affect the quality of different services by
producing congestion. Therefore, prediction of future con- of users on the downlink”, “Traffic volume in downlink” and
gestion in mobile networks is urgently needed to limit the “Cell load”. Table I shows these three KPIs that allow a
quality of service degradation. As in our case, we consider deep investigation in LTE networks. The dataset is labeled
a binary classification and a label Y ∈ {0, 1} is created by
TABLE I
comparing its values to the threshold, where Y = 1 indicates L IST OF KPI S USED FOR CONGESTION PREDICTION ON LTE NETWORK .
that there is a congestion in the prediction horizon h and
Y = 0 indicates that the future behavior is normal at the KPI Description
DL Traffic Vol- Utilization rate of PRB (Physical Resource Block) in
same time horizon. For the efficiency and the simplicity of ume downlink
the logistic regression model used in [3], we propose to use Average Active Average active users within the cell
the same supervised model to predict congestion in mobile User
networks. Logistic regression is a discriminative probabilistic DL Load Utilization rate of PRB (Physical Resource
Block) in downlink
model: given xij (t), the observation of a stochastic process of
collected KPIs during a continuous time interval; it returns a based on comparing KPIs measurements and KPIs thresholds
probability distribution over the target variable Y . To avoid according to a desired level of performance. For mobile
the decrease in congestion prediction performance due to networks engineers, if one of these KPIs described in Table I
heterogeneous behavior inputs, we propose to create clusters shows degradation depending on fixed thresholds, congestion
that group similar behaviors cells together as described above has occurred.
in Section II. B. Then, we can apply the step of supervised
Logistic regression approach proposed in [4] for each cluster. IV. NUMERICAL RESULTS AND DISCUSSION
As the clusters are formed based on similar behaviors, Logistic In this section, we apply the proposed congestion prediction
regression is performed at each cluster to enhance prediction model on data described in Section III without and with the
performance. LBM co-clustering approach. The objective is to illustrate the
additional performance that the unsupervised functional LBM
III. NETWORK DATA DESCRIPTION co-clustering data approach can bring to the logistic regression
The model is applied for a real LTE network in an urban model to predict congestion.
area that corresponds to the North East of Ile-De-France region Firstly, we apply the supervised logistic regression to all
of Paris, in France with a population of nearly one million and the data to predict congestion. Then, we use the LBM co-
a half. Fig. 2 illustrates the geographical area used for the data clustering to regroup similar cells in terms of behaviours. After
extraction. The data extracted from an internal tool of Orange that, we apply the supervised logistic regression for each group
France are generated within 229 cells situated at different of cells. Finally, we compare and analyse the results obtained
locations and they reflect different environment conditions in by the overall model. We consider the congestion prediction
France. Thus, the data is extremely imbalanced. The period of where a regression method is implemented to predict the
observation covers 12 days in June 2020. It contains some forecasted variable based on the past heterogeneous values. In
ordinary workdays and weekends. The KPIs measurements the experiments, we use 80% for the training and the remaining
for each cell are collected periodically and aggregated with 20% for testing (9.6 first days for learning and 2.4 last days for
a granularity of 15 min to indicate the radio quality. The testing). To avoid the time loops and lookahead bias creeping
motivation behind this choice is that these KPIs are typically into the evaluation stage, the congestion prediction evaluation
used for detecting and analyzing congestion on LTE network. cannot be validated by K-folds cross-validation technique.
The operational proposal is to use six different radio quality In our study, the evaluation is given for different prediction
indicators. However, there are just three relevant radio quality horizon h in terms of confusion matrix, Accuracy (Acc),
indicators in the current dataset which are: “Average number Precision (P), Recall (R) and F-measure.

585
Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.
A. Future congestion prediction using only logistic regression
approach.
Congestion prediction is done using the labeled dataset
of measurements described above. The logistic regression is
adopted to predict congestion based on the past heterogeneous
KPIs measurements for different prediction horizons h as
shown in Fig. 3. Results presented in Fig. 3 show the

Fig. 4. Clusters of Paris North East: 229 cells.

TABLE II
D ESCRIPTION OF DATASET BEFORE AND AFTER F UNCTIONAL LBM
CO - CLUSTERING APPROACH .

Data Nb Nb Nb non Congestion Color


cells con- conges- rate in map
ges- tion
tion
Paris 229 28409 248452 0.1026
North
East
Fig. 3. Experimental results for congestion prediction of the logistic regres- Cluster1 25 77 30148 0.0025 blue
sion for different prediction horizons. Cluster2 13 92 15625 0.0058 white
Cluster3 25 163 30062 0.0054 dark
degradation of congestion prediction performance over time blue
mostly the recall less than the precision. It is due to the Cluster4 18 7321 14441 0.336 red
Cluster5 38 35 45907 0 green
degradation of the congestion prediction model efficiency Cluster6 14 11416 5510 0.6745 dark
over time. For our use case, the operator tends to maximize red
the precision. Thus, it only intervenes to modify network Cluster7 28 3078 30774 0.091 pink
Cluster8 6 348 6906 0.048 dark
parameters when it is certain that congestion will actually pink
occur in the future. Cluster9 56 5414 62290 0.07997 beige
Cluster10 6 465 6789 0.063 gray

B. Future congestion prediction by the proposed hybrid model


In this section, we present the different clusters of cells for or vice versa by adding more data. They might even spoil
Paris North East obtained by the co-clustering LBM approach recall, precision or both [11]. Therefore, we do not handle
as shown in Fig. 4. We notice that the suitable number of our imbalanced data with imbalance correcting methods. The
clusters is selected with the Greedy search algorithm using FunLBM co-clustering approach regroups the 229 cells in 10
the Integrated Complete Likelihood-Bayesian Information Cri- clusters based on daily similarity in cells behaviors represented
terion (ICL-BIC) criterion for choosing the number of blocks by KPIs without wrecking the real data properties. The co-
[9]. Cells in the same cluster that have more similarity in clustering approach succeeds to separate efficiently similar
terms of KPIs than those in other clusters are presented cells based on their behaviors in various categories without
by icons with the same color in Fig. 4. Then, the logistic any information about congestion. As illustrated in Table II,
regression prediction model is applied for each cluster to the the clusters 1, 2, 3 and 5 regroup daily very low rate of
congestion prediction performance. The initial dataset of Paris congestion cells. The clusters 4 and 6 correspond to the groups
North East without co-clustering approach and the different of high congestion rate. These clusters correspond to 70% of
clusters obtained with Functional LBM co-clustering approach the global congestion in the initial dataset. The clusters 7,
are described in Table II. 8, 9 and 10 present the groups of average congestion rate.
Table II presents a summary of the initial dataset and Table II indicates that the co-clustering approach regroups
the various clusters obtained with FunLBM co-clustering cells in term of their behaviors even if these clusters have
approach. The initial dataset is imbalanced and the majority similarity in terms of congestion rate. After the use of LBM
class of no congestion far outweighs the minority class of co-clustering approach, the logistic regression approach is
congestion just representing 10.3%. In this work, we keep the applied to predict future congestion. The results of the hybrid
real distribution of the data without applying any imbalance model are grouped according to congestion rate. In our study,
correcting methods. These methods that seek to correct the we focus on precision metric for congestion prediction in radio
imbalance frequently improve recall at the cost of precision access networks specified by the mobile network operator. As

586
Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.
shown in Table III, the cluster 5 is the group of non congested Clusters with high congestion rate present the 70% of the
cells all the time, the cells in this cluster are eliminated out total number of congestion in the initial dataset. Clusters with
of prediction model. Table III shows a very low precision to High Congestion rate can be analysed as one group named HC-
detect congestion for a 30 min perdition horizon in the case cluster and the result performance obtained by the prediction
of clusters with low congestion rate. Therefore, it is better model in terms of diversity metrics: recall, precision and F1
to eliminate these cells from prediction model. To illustrate are improved and illustrated in Fig. 5. Fig. 5 shows that the

TABLE III
P ERFORMANCE FOR CLUSTERS WITH LOW CONGESTION RATE FOR 30 MIN
PREDICTION HORIZON .

30 min Acc R P F Congestion Nb


Rate cells
Dataset before 0.956 0.755 0.883 0.814 0.1026 229
clustering
Training and test- 0.999 0.000 0.000 NaN 0.0025 25
ing for cluster1
Cluster1 training 0.747 0.997 0.604 0.752 0.0025 25
for all Data
Training and test- 0.991 0.000 0.000 NaN 0.0058 13
ing for cluster2
Cluster2 training 0.979 0.595 0.355 0.444 0.0058 13
for all Data
Training and test- 0.984 0.344 0.355 0.349 0.0054 25
ing for cluster3
Fig. 5. Different CHs behaviors based on DL Load KPI for clusters that have
Cluster3 training 0.989 0.141 0.818 0.240 0.0054 25
similar high congestion rate.
for all Data
Training and test- 0.956 0.755 0.883 0.814 0 38
ing for cluster5 performances for HC-Cluster are improved. The recall metric
Cluster5 training 0.999 0.000 0.000 NaN 0 38 for HC-Cluster is increased by 15%, the precision is also
for all Data
increased by 2% and respectively the Fmeas is increased by
10%. For the case of clusters having an average congestion
the advantages of cell elimination with low congestion rate rate, the results are presented in Table V. As shown in Table V,
from the prediction model, we analyse the confusion matrix
for two different clusters with low congestion rate in the case
TABLE V
of training on whole data and testing on these clusters. The P ERFORMANCE FOR CLUSTERS WITH AVERAGE CONGESTION RATE FOR
confusion matrix for the cluster1 and the cluster2 shows 16 30 MIN PREDICTION HORIZON .
real anomalies for cluster1 and 15 for cluster2 as shown in h=30 min Acc R P F Congestion Nb
Table IV.(a) and Table IV.(a). The prediction model does not Rate cells
have the ability to predict them. It represents cells that were Training and test- 0.91 0.417 0.701 0.523 0.091 28
ing for cluster7
congested rarely overall time. However, the prediction model
Testing for clus- 0.920 0.488 0.753 0.592 0.091 28
indicates that there are 25 false positive congested cells for ter7 and training
cluster1 and 11 false positive congested cells for cluster2 that for all Data
are not actually congested. This requires the operator to correct Training and test- 0.943 0.641 0.835 0.725 0.080 56
ing for cluster9
uncongested cells. Instead of correcting cells which are really Testing for Clus- 0.944 0.638 0.853 0.730 0.080 56
congested to the QoS, correcting uncongested cells requires a ter9 and training
significant increment cost in terms of budget and time and can for all Data
also decrease the reliability of the operation of the network
by transit the traffic to another cell. As the model has not for cluster9 the precision is higher than 0.8 and for cluster7, the
the ability to predict congestion for these cells, it is better to performances in the 2 cases are almost the same. For clusters8
throw them out of the prediction model. This analysis shows and 10, their behaviors have similarity with clusters7 and 9 in
that FunLBM co-clustering approach allows us to find these term of one KPI. The number of cells in theses cluster is very
cells that we cannot predict their abnormalities in clusters and low (six cells). Therefore, the performance for these clusters
throw them out of the prediction model. cannot be actually indicative.
C. Summarised results
TABLE IV
C ONFUSION MATRIX FOR THE CASE OF USING TRAINING OBSERVATIONS
Combined with the above analysis, the results of our
OF THE HOLE DATASET AND TESTING IN CLUSTER 1 AND CLUSTER 2. proposed hybrid model can be summarized in the different
blocks presented in Fig. 6. Fig. 6 shows that Functional
Predicted Predicted
0 1 0 1 LBM co-clustering approach used in this study succeeds to
(a) (b)
Actual 0 4959 25 Actual 0 2574 11 regroup cells in clusters based on their behaviors. These
1 13 3 1 15 0 clusters can be classified in three categories: High congestion

587
Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.
2) Unsupervised co-clustering approach based on LBM to
regroup similar cells in terms of behaviors, 3) supervised
logistic regression classification model for congestion applied
to clusters and 4) Prediction future congestion in mobile
networks. The model results in performance enhancement
ensures in terms of prediction congestion in mobile networks
through a real data application on LTE networks. Besides,
the supervised logistic regression approach applied in clusters
of cells with similar behaviors obtained by the co-clustering
approach improves the quality of congestion prediction in
mobile networks compared with the logistic regression ap-
Fig. 6. Hybrid proposed model result summary. proach applied in heterogeneous cells with different behaviors.
This work inspires exciting directions for future research. The
problem of congestion prediction in radio access networks
rate, Average congestion rate and Low congestion rate or could be further extended by using deep learning neural
without congestion. For the high congestion rate category that networks and be compared with the proposed hybrid model in
represents the 70% of congestion cases in the initial dataset, this article. Another improvement for our study is to introduce
the performance of the prediction model are improved. For other KPIs as latency that can affect the QOS in mobile
average congestion rate category, the prediction model keeps networks. Moreover, an application of the model on other data
the same performances. For low congestion rate category that extracted from other cities and for other periods of time could
presents less than 2 % of initial dataset’s congestion, cells be interesting to validate the flexibility of the proposed hybrid
that belong to this category are eliminated from the prediction model.
model. In fact, for this kind of cells it is impossible to
predict the congestion and the model requires the operator to R EFERENCES
correct uncongested cells by generating wrong alarms (False [1] P. Torres, P. Marques, H. Marques, R. Dionı́sio, T. Alves , L. Pereira
and J. Ribeiro, “Data analytics for forecasting cell congestion on
positive congestion). So, we recommend for these cells not LTE networks,” in 2017 Network Traffic Measurement and Analysis
to apply a prediction model since it will generate many more Conference (TMA), Dublin, 2017, pp. 1–6.
false positives than true positives. Thus, the correction of the [2] R. Santos, M. Sousa, P. Vieira, M. P. Queluz and A. Rodrigues, “An
Unsupervised Learning Approach for Performance and Configuration
problems will be carried out in a reactive manner. Finally, Optimization of 4G Networks,” in 2019 IEEE Wireless Communications
the hybrid model proposed in this paper succeeds to regroup and Networking Conference (WCNC), Marrakesh, Morocco, 2019, pp.
cells based on their behaviors. This model can be used as 1–6.
[3] I. Hadj-Kacem, S. Ben Jemaa, S. Allio and Y. Ben Slimen, “Anomaly
a preventive maintenance tool which makes it possible to prediction in mobile networks : A data driven approach for machine
predict failures and avoid degradation of the quality of service learning algorithm selection,” in 2020 IEEE/IFIP Network Operations
of radio networks. For the last class of cluster, we give a and Management Symposium (NOMS 2020), Budapest, Hungary, 2020,
pp. 1–7.
recommendation that prevents the mobile network operator [4] K. Ghosh, C. Bellinger, R. Corizzo, B. Krawczyk and N. Japkowicz,“On
from intervening to correct false alarms. Note that, when the the combined effect of class imbalance and concept complexity in deep
operational team corrects unexisting congestion or a cell, this learning,” in 2021 IEEE International Conference on Big Data (Big
Data), Orlando, FL, USA, 2021, pp. 4859-4868.
means that the users of the supposedly congested cell will be [5] B. M. Coronado, U. Mori, A. Mendiburu and J. Miguel-Alonso, “Survey
forced to perform a handover. Hence, the quality of service of Network Intrusion Detection Methods from the Perspective of the
for these users deteriorates and the risk of call cuts increases. Knowledge Discovery in Databases Process,” IEEE Transactions on
Network and Service Management, 2020.
To illustrate the advantages of cells with low congestion rate [6] S. Kyu Kwak and J. Hae Kim, “Statistical data preparation: management
elimination from prediction model, we analyse the confusion of missing values and outliers,” Korean journal of anesthesiology, vol.
matrix for two different clusters with low congestion rate and it 70, no. 4, pp. 407–411, 2017.
[7] Y. Ben Slimen, S. Allio and J. Jacques, “Anomaly Prevision in Radio
confirms that the prediction model has not the ability to predict Access Networks Using Functional Data Analysis,” in GLOBECOM
them. However, the prediction model requires the operator to 2017 - 2017 IEEE Global Communications Conference, Singapore,
correct uncongested cells. This analysis shows that FunLBM 2017, pp. 1–6.
[8] Y. Ben Slimen, S. Allio, J. Jacques, “Model-based co-clustering for
co-clustering approach allows us to find these cells that we functional data,” Neurocomputing, vol. 291, pp. 97–108, 2018.
can not predict their uncertain abnormalities in clusters and [9] G. Govaert and M . Nadif, Co-Clustering: Co-Clustering: Models,
throw them out of the prediction model. Algorithms and Applications, Wiley-ISTE, 2013.
[10] C. Bouveyron, L. Bozzi, J. Jacques and F. Xavier Jollois, “The
Functional Latent Block Model for the Co-Clustering of Electricity
V. CONCLUSION Consumption Curves,” Journal of the Royal Statistical Society: Series
This paper proposes a hybrid congestion prediction model C Applied Statistics, Wiley, In press, vol. 67, no. 4,pp. 897–915, 2018.
[11] B. Juba, H. S. Le, “Precision-Recall versus Accuracy and the Role
based on supervised and unsupervised approaches to enhance of Large Data Sets,” The Thirty-Third AAAI Conference on Artificial
the congestion in LTE networks by observing key performance Intelligence (AAAI-19), 2019.
indicators. This model consists of four essential steps: 1)
Smoothing step to transform discrete data to functional data,

588
Authorized licensed use limited to: University of Technology Sydney. Downloaded on February 26,2025 at 00:14:24 UTC from IEEE Xplore. Restrictions apply.

You might also like