0% found this document useful (0 votes)
7 views

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

pritish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

pritish
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Predictive Congestion

Management in Telecom
Networks
Using Advanced Machine Learning Techniques
INDEX
REPORT:
S. No. Topic Page No.
1. Abstract 1

2. Preliminary Analysis for Understanding Data 1

3. Exploratory Data Analysis 2-6

4. Feature Engineering 6-8

5. Steps to overcome overfitting 9

5. Applying Machine Learning Models and Optimization 9-12

6. Approach to overcome Overfitting 12-13

7. Conclusion 13

ANNEXURE:
S. No. Topic Page No.
1. Theory of Network Congestion 1

2. The Key Difference between 3G and 4G 1

3. Congestion and its adverse effect on Network 1-2

4. Backhaul congestion 2

5. CatBoost 2-3

6. LightGBM 3

7. GBM 3

8. Ensemble Learning 3

9. Bayesian Optimisation 3-5

10. Grade Of Service 5


Abstract
Networks, whether voice or data, are designed around many different variables. Two of the
most important factors that you need to consider in network design are service and cost.
Service is essential for maintaining customer satisfaction and the problem statement focuses on
this side of the coin by trying to predict congestion in towers. The usual features that could have
been used to identify individual towers were masked forcing us to approach this problem from a
traffic volume, traffic type and time standpoint. We hypothesized various features using the
travel analysis concepts like Grade of Service and Busy Hour Traffic used in industry which
were confirmed by our Exploratory Data Analysis and Visualizations. We also analyzed the
natural human tendencies to incorporate the knowledge about weekdays and holidays into
meaningful data for our model. We propose a machine learning algorithm based on these cell
tower statistics to predict the type of congestion that might occur, thus enabling telecom service
providers to take measures in advance.
We conducted our analysis in five major parts :-
● Preliminary Analysis on the given data set and basic visualizations
● Exploratory Data Analysis
● Feature Interaction and Feature Engineering
● Applying Machine Learning Models and Optimization
● Ensemble Learning

Preliminary Analysis for Understanding Data


About Provided Datasets
File name Number of Data Points

train.csv 78560

test.csv 26305

About features
Feature Category Feature Name Feature Count for
given Category

Index ‘cell_name’ 1

Time related Information 'par_day', 'par_hour', etc. 5

Tower Information 'beam_direction', 6


'cell_range’, etc.
Usage Details 'health_total_bytes', 26
'video_total_bytes', etc.

Target(Only in train.csv) ‘Congestion_Type’ 1

Exploratory Data Analysis

For Usage Details Features

● The following graph is the histogram of data that are not normally distributed but show

positive skewness (skewed to the right with one center)

● This histogram is typical for distributions that will benefit from a logarithmic
transformation.
● After log transformation, the graph shows a Normal distribution.
● Similarly, all of the ‘Usages features’ and ‘subscriber_count’ also show the same
distribution.

Tower Related Features

● We can see a uniform distribution of data among cell_range.


● In the above graphs, x-axis represents the tuple of (cell range, congestion type) and
tuple of (tilt, congestion type) respectively, while the y-axis shows the frequency of
occurrence of the respective tuple. It can be observed from both the graphs that the
distribution is uniform.

Distribution of Congestion_type in the dataset

● We can observe that dataset has uniform distribution for each target variable.
Inference from graph

We tried visualizing the different bytes used by the users.


The continuous variables representing the bytes consumed were divided into 4 quartiles
(25%,50%,75%,100%). We then went on to analyze the distribution of different types of
congestions namely 3G Backhaul, 4G Backhaul, 4G Ran and No Congestion.

A clear pattern is seen in the congestion distributions for the classes 4G Backhaul, 4G Ran and
No Congestion.
Following observation were made about the Congestions:

1. It was noticed that 'No Congestion' was following a decreasing trend, as would be
logically expected, since, as the number of users using the network increases, we expect
congestion to increase, i.e, we expect 'No Congestion' to decrease.

2. For 4G Ran congestion, again, the trend seen is that the congestion increases as the
number of users increases. This can be logically explained since we expect to see a rise
in the congestion levels with a rise in a number of users.

3. Similarly, for 4G Backhaul, it’s the exact same trend as 4G Ran. Congestion increases
with increase in number of users.

4. For 3G congestion, we notice some anomalous behavior, which does not exactly fit our
logical expectations. We expected to see a rise in congestion with the rise in users, but
3G Congestion doesn’t really follow any particular trend across all the bytes data. In
some bytes data, it peaks at the 2nd Quartile and then falls, whereas, in some, it
remains more or less constant for the 1st and 2nd Quartile, while falling for the rest and
for some other cases, there is a straight downward trend.
In a nutshell, all the congestions except 3G Backhaul fit a logical explanation, but 3G Backhaul
is a bit anomalous.

Bytes features are masked in clusters.


● Here, it is observed that the distribution of 3 types of bytes is similar. Similarly, uniform
distribution is observed for different set of bytes. It is not common to observe this type of
distribution for a completely different set of bytes. Also, the correlation between them is
not significant, this proves that bytes column is masked in different clusters.

Feature Engineering
Feature Creation Table
Feature Created Methodology Used

‘weekday’ Remainder of ‘par_day’ divided by 7

‘holidays’ Weekend and christmas holiday

‘sum_of_bytes’ Sum of all 26 bytes

‘max_1’, ’max_2’, ’max_3’ Selected three largest values from each bytes row

‘range_tilt’ A function of cell_range and tilt

‘High_Priority’ Sum of log of High priority data bytes

‘Medium_Priority’ Sum of log Medium priority data bytes

‘Low_Priority’’ Sum of log Low priority data bytes

‘daily_peak_traffic’ 2% of day wise sum of ‘High_Priority’

‘grade_of_service’ ‘High_Priority’/’daily_peak_traffic’

‘busy_hours_traffic’ Busiest hour for each week-day ‘High_Priority’

Feature Explanation:
Not all type of bytes sent across have the same level of importance and urgency. A little
increase in signals of more importance will be enough to flag it as congestion. However, a large
increase in signals of low importance might be needed to flag it as congestion. In order to deal
with this priority of different type of bytes signals, we have classified all byte signals in 3 priority
orders.
High Volume Medium Volume Low Volume

High Priority Bytes Congestion May Lead To No


Congestion Congestion/May
Lead To
Congestion

Low Priority Bytes Congestion/May No No Congestion


Lead To Congestion Congestion/May
Lead To
Congestion

‘High_Priority’: For example, audio bytes need to transferred as soon as possible failing which
will reduce the Grade of Service.
Similarly, ‘Medium_Priority’ and ‘Low_Priority’ are at mediocre and least priority in the queue
of transferring the data which even if transferred with a bit of delay will not cause a lot of
inconvenience.
All these priority columns are created by adding the log-transformed values of the given
categories falling in it for a row.

‘daily_peak_traffic’: The idea behind this feature was to approximate the peak value of high
priority traffic volume over the 5 minutes bucket that might be observed in a particular day. This
was approximated as 2% of the sum of the high priority daily traffic.

‘grade_of_service’(Refer to Annexure): Grade of Service is an important parameter in


transmission of data which quantifies the quality of service experienced by a user. It is an
important feature to identify and predict congestion. We tried to find an equivalent to this term
using our data by taking the ratio of ‘High Priority’ bytes to the ‘Daily Peak Traffic’ which shows
relative byte usage for a given time with respect to the peak usage. To get more information on
this feature refer to annexure.

‘busy_hours_traffic’: It is a binary encoded column in which:


1 - represents the time (Hour) at which sum of bytes for ‘High_Priority’ column is maximum at a
given day.
0 - otherwise

This feature will help to differentiate between busy hour traffic and normal traffic of the bytes.
Encoding 0 1 2 3 4 5 6

Weekday Friday Saturday Sunday Monday Tuesday Wednesday Thursday

Time(hr) 4 PM 3 PM 3 PM 3 PM 4 PM 2 PM 10 PM

‘weekday’ : We saw repetitive nature in bytes usage pattern across 30 days. Also, the usage
pattern should be different on weekdays and weekends. In order to incorporate this property,
We created a new feature by taking the remainder of ‘par_day’ divided by 7. So we classified
1-30 days of a month into 0-6 weekdays.

‘Holidays’: This feature is build on the above feature. Once we get the classification of
weekdays and weekends we can further make a larger group. If a given day is holiday we label
it 1 or else 0. Holidays contain all weekends along with 2 Christmas dates. All other days are
considered as working days.

‘sum_of_bytes’ : As congestion is highly affected by the amount of data, so we have taken the
sum of all the bytes columns to create a new column which is also shown to have one of the
highest feature importance in our model.

‘max_1’,’max_2’,’max_3’: We have taken the largest three bytes data for each row of 26-byte
columns.

‘Cell_tilt’ : The motive behind this feature is that the combination of cell range and tilt directly
affect the congestion. Hence, we created a function combining these two features. Cell_tilt =
10*tilt + cell_range
Steps to overcome overfitting in model
K-fold cross validation
1. Cross-validation is a powerful preventive measure against overfitting. 2. It allows us to
tune hyperparameters with only our original training set. This allows us to keep our test set
as a truly unseen dataset for selecting our final model.
3. We have fixed the K = 5 for k fold cross validation.

Bayesian optimization (Refer to Annexure)


1. In contrast to random or grid search, Bayesian optimization keeps track of past
evaluation results which they use to form a probabilistic model mapping
hyperparameters to a probability of a score on the objective function.
2. We have used Bayesian optimization technique to optimize the hyperparameters as it is
faster than grid search.

Dropping redundant features


1. We dropped par_year,par_month column because they were same throughout and
therefore redundant.

Early stopping
1. It avoids overfitting by attempting to automatically select the inflection point where
performance on the test dataset starts to decrease while performance on the training
dataset continues to improve as the model starts to overfit.

Ensembling (Refer to Annexure)


1. Ensembles are machine learning methods for combining predictions from multiple
separate models.
2. We have used 3 Catboost models, 1 Gradient boost, 2 Light boost models stacked with
a neural net as a top layer above these 6 boosting models.

Applying Machine Learning Models and Optimization


Model Accuracy MCC_score

Stacking Level 1

Catboost's Model 1 0.80728 0.7431

Catboost's Model 2 0.8081 0.7443

Catboost's Model 3 0.8051 0.7403


Gradient Boost 0.8012 0.7350

Light GBM Model 1 0.8024 0.7368

Light GBM Model 2 0.8052 0.7404

Stacking level 2(output)

Neural net .8153 .745

Models Hyperparameters

Stack Level 1

Cat Boost Model 1 depth_ = 1,iterations_ =9000,lr = .1

Cat Boost Model 2 (depth_ = 2,iterations_ =6000,lr = .08)

Cat Boost Model 3 depth_ = 3,iterations_ =8000,lr = .05

Gradient Boost subsample_=0.8,n_estimators_=3000,max_depth_=1

Light GBM Model 1 colsample_bytree=0.5193,learning_rate = 0.3254 ,


max_depth = 2, n_estimators = 525, reg_alpha = 0.5067,
reg_lambda=.5, subsample = .9

Light GBM Model 2 colsample_bytree=0.7,learning_rate = 0.3, max_depth =


1, n_estimators = 2100, reg_alpha = .7, reg_lambda=.5,
subsample = .9

Stacking Level 2(output)

Neural Net 1 layer, 30 neurons, activation=’tanh’

Overall Model

Architecture
Feature Importance :
Feature importance of our 6 top new features that we have created

Feature importance of top 6 features overall. Out of first 6 overall important features it
contains 4 of our created features.

Approach to estimate the Uncertainty


Two measures have been tried to incorporate the essence of the uncertainty of our modeling.
They are as follows:

Confusion Matrix
Using 5-fold cross validation, we have generated 5 confusion matrix to measure the variation in
obtained results.This process was repeated 5 times to generate 25 confusion matrices and
uncertainty is shown through the mean and standard deviation of these 25 confusion matrices.
This is used to identify the uncertainty in prediction of individual classes of congestion.

Inferences :

1. Out of all the wrong predictions, most wrong predictions are made for 4G Backhaul and
3G Backhaul.
2. Non-Congestion can be identified with good certainty with only a little bit of uncertainty
with respect to 3G Backhaul.
3. The model can identify the differences between 4G RAN and 3G Backhaul clearly.
MCC Score
As our main metric of measure in this problem statement is Matthews Correlation Coefficient i.e.
MCC score, the standard deviation of obtained MCC Score through K-fold cross-validation
iterated over 5 times can be used as the general uncertainty parameter.

MCC = 0.7447 ± 0.413%

Conclusion
Since the majority of the features for this network congestion analysis were of the type of bytes
packets sent and received over a period. We have aligned our analysis accordingly. We
classified the types of bytes packets according to their priority. Some other important features
created were based on the usage pattern on a single day. We used single day usage for
creating features as that was the only constant known thing in the dataset. Also, we used the
classification of days into weekdays and weekends. Since we were given a lot of time intervals
for a single day we created some very important features such as peak hours of bytes
consumption. We also tried to rate the grade of service of a single data point with respect to the
total bytes consumption on that day. Grade of service is a relative parameter which turned out to
be a very important feature in our model. If a data point has low values in Grade of Service
feature than there is a high chance that data point to be classified as congestion.
ANNEXURE
THEORY OF NETWORK CONGESTION
Congestion arises in the network when routers are not able to process the data that arrive at
them and this causes buffer overflows at these routers that result in the loss of some packets of
data. The only solution to relax congestion is to reduce the load of the network.

Ideally, the network should be accessible and fair to all the users. In order to ensure this
fairness, we should not allow some users to load the Radio Access Network(RAN), while other
subscribers experience a poor quality of service(QoS) or cannot even access the internet. Each
user’s traffic has to be prioritized such that the network allows all of the users to access services
with the appropriate bandwidth and within the desired latency bounds, even when there is
congestion.

Byte accuracy is crucial when evaluating the accuracy of traffic classification algorithms. They
note that the majority of flows on the Internet are small and account for only a small portion of
total bytes and packets in the network (mice flows). On the other hand, the majority of the traffic
bytes are generated by a small number of large flows (elephant flows). They give an example
from a 6-month data trace where the top (largest) 1 % of flows account for over 73% of the
traffic in terms of bytes. with a threshold to differentiate elephant and mice flows of 3.7MB, the
top 0.1 % of flows would account for 46% of the traffic (in bytes). Presented with such a dataset,
a classifier optimized to identify all but the top 0.1 % of the flows could attain a 99.9% flow
accuracy but still result in 46% of the bytes in the dataset to be misclassified.

The key difference between 3G and 4G


● Speed
● Network Structure
● Switching Technology
● Data rate

Congestion and its adverse effect on network

The word congestion means excessive crowding but in Telecommunication, the term is used
when a node or link or a channel carries an excessive amount of data and thus degrades the
QoS (Quality of the Service) of the network. Congestion is the most common and fast growing
problem in today’s networking system. With the fast growth of the Internet and increased
demand to use the Internet for voice and video applications congestion increases and as a
result of congestion the QoS degrades. That means all the aspects of a connection, such as
service response time, loss, signal-to-noise ratio, cross-talk, echo, interrupts, frequency
response, loudness level and all other quality of service requirements degrade. It also results in
incremental increases in offered load and that affects the network throughput.
Different types of congestions and their respective causes and
impacts on network

Radio network controller congestion


RNC (Radio Network Controller) congestion occurred due to overload in RNCs because of
applications such as push e-mail, VPNs, mobile port scanning, Hypertext Transfer Protocol
(HTTP) over Secure Socket Layer (HTTPS), Secure Shell (SSH), location-based services,
push-to-talk, wireless-specific signaling attacks and worms introduce anomalously high amounts
of signaling in the network.

Congestion in wireless radio access network


Cellular wireless networks have become an indispensable part of the communication
infrastructure. RAN (Radio Access Network) is one of the key components of a cellular wireless
network. The main task of RAN is to provide a connection between a mobile station (Cellular
phone, Computer or any Remotely controlled device) with core networks.RAN is designed to an
established certain number of calls with the core network and if it receives call establishment
request beyond its capacity then it can’t serve the incoming request (Call establishment) of the
subscribers and this problem is called congestion in RAN.

Backhaul congestion
Backhaul congestion is another type of congestion which is occurred because of applications
such as video download, video upload, P2P, File Transfer Protocol (FTP), single-source flood
attacks, and distributed source flood attacks, all of the application typically send large amounts
of volume that contribute to the congestion of backhaul links between the base station, the RNC
and network elements in the path.

CatBoost

CatBoost is a machine learning algorithm that uses gradient boosting on decision trees.

● It yields state-of-the-art results without extensive data training typically required by other
machine learning methods, and
● Provides powerful out-of-the-box support for the more descriptive data formats that
accompany many business problems.
LightGBM
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is
designed to be distributed and efficient with the following advantages:
● Faster training speed and higher efficiency
● Lower memory usage
● Better accuracy
● Parallel and GPU learning supported
● Capable of handling large-scale data

GBM

GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with
high prediction power. Boosting is actually an ensemble of learning algorithms which combines
the prediction of several base estimators in order to improve robustness over a single estimator.
It combines multiple weak or average predictors to build strong predictor.

Ensemble Learning

Ensemble is the art of combining a diverse set of learners (individual models) together to
improvise on the stability and predictive power of the model. In the above example, the way we
combine all the predictions together will be termed as Ensemble Learning.

Here are the top 4 reasons for a model to be different. They can be different because of a mix of
these factors as well:

● Difference in population
● Difference in hypothesis
● Difference in modeling technique
● Difference in initial seed

Bayesian Optimization

Bayesian approaches, in contrast to random or grid search, keep track of past evaluation results
which they use to form a probabilistic model mapping hyperparameters to a probability of a
score on the objective function:
In the literature, this model is called a “surrogate” for the objective function and is represented
as p(y | x). The surrogate is much easier to optimize than the objective function and Bayesian
methods work by finding the next set of hyperparameters to evaluate on the actual objective
function by selecting hyperparameters that perform best on the surrogate function. In other
words:
1. Build a surrogate probability model of the objective function
2. Find the hyperparameters that perform best on the surrogate
3. Apply these hyperparameters to the true objective function
4. Update the surrogate model incorporating the new results
5. Repeat steps 2–4 until max iterations or time is reached
The aim of Bayesian reasoning is to become “less wrong” with more data which these
approaches do by continually updating the surrogate probability model after each evaluation of
the objective function.

At a high-level, Bayesian optimization methods are efficient because they choose the next
hyperparameters in an informed manner. The basic idea is: spend a little more time selecting
the next hyperparameters in order to make fewer calls to the objective function. In
practice, the time spent selecting the next hyperparameters is inconsequential compared to the
time spent in the objective function. By evaluating hyperparameters that appear more promising
from past results, Bayesian methods can find better model settings than random search in fewer
iterations.

Bayesian model-based methods can find better hyperparameters in less time because they
reason about the best set of hyperparameters to evaluate based on past trials.

As a good visual description of what is occurring in Bayesian Optimization take a look at the
images below. The first shows an initial estimate of the surrogate model — in black with
associated uncertainty in gray — after two evaluations. Clearly, the surrogate model is a poor
approximation of the actual objective function in red:
The next image shows the surrogate function after 8 evaluations. Now the surrogate almost
exactly matches the true function. Therefore, if the algorithm selects the hyperparameters that
maximize the surrogate, it will likely yield very good results on the true evaluation function.

Grade of Service

Grade of Service (GoS) is defined as the probability that calls will be blocked while attempting to
seize circuits. It is written as P.xx blocking factor or blockage, where xx is the percentage of calls
that are blocked for a traffic system. For example, traffic facilities requiring P.01 GoS define a 1
percent probability of callers being blocked to the facilities. A GoS of P.00 is rarely requested
and will rarely happen because to be 100 percent sure that there is no blocking, you would have
to design a network where the caller to circuit ratio is 1:1. Also, most traffic formulas assume that
there are an infinite number of callers. We used an approximation of this method using the
volume of traffic transmission ( in Bytes ) to help out model quantify the comparative workload at
a particular time instant.

You might also like