Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques
Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques
Management in Telecom
Networks
Using Advanced Machine Learning Techniques
INDEX
REPORT:
S. No. Topic Page No.
1. Abstract 1
7. Conclusion 13
ANNEXURE:
S. No. Topic Page No.
1. Theory of Network Congestion 1
4. Backhaul congestion 2
5. CatBoost 2-3
6. LightGBM 3
7. GBM 3
8. Ensemble Learning 3
train.csv 78560
test.csv 26305
About features
Feature Category Feature Name Feature Count for
given Category
Index ‘cell_name’ 1
● The following graph is the histogram of data that are not normally distributed but show
● This histogram is typical for distributions that will benefit from a logarithmic
transformation.
● After log transformation, the graph shows a Normal distribution.
● Similarly, all of the ‘Usages features’ and ‘subscriber_count’ also show the same
distribution.
● We can observe that dataset has uniform distribution for each target variable.
Inference from graph
A clear pattern is seen in the congestion distributions for the classes 4G Backhaul, 4G Ran and
No Congestion.
Following observation were made about the Congestions:
1. It was noticed that 'No Congestion' was following a decreasing trend, as would be
logically expected, since, as the number of users using the network increases, we expect
congestion to increase, i.e, we expect 'No Congestion' to decrease.
2. For 4G Ran congestion, again, the trend seen is that the congestion increases as the
number of users increases. This can be logically explained since we expect to see a rise
in the congestion levels with a rise in a number of users.
3. Similarly, for 4G Backhaul, it’s the exact same trend as 4G Ran. Congestion increases
with increase in number of users.
4. For 3G congestion, we notice some anomalous behavior, which does not exactly fit our
logical expectations. We expected to see a rise in congestion with the rise in users, but
3G Congestion doesn’t really follow any particular trend across all the bytes data. In
some bytes data, it peaks at the 2nd Quartile and then falls, whereas, in some, it
remains more or less constant for the 1st and 2nd Quartile, while falling for the rest and
for some other cases, there is a straight downward trend.
In a nutshell, all the congestions except 3G Backhaul fit a logical explanation, but 3G Backhaul
is a bit anomalous.
Feature Engineering
Feature Creation Table
Feature Created Methodology Used
‘max_1’, ’max_2’, ’max_3’ Selected three largest values from each bytes row
‘grade_of_service’ ‘High_Priority’/’daily_peak_traffic’
Feature Explanation:
Not all type of bytes sent across have the same level of importance and urgency. A little
increase in signals of more importance will be enough to flag it as congestion. However, a large
increase in signals of low importance might be needed to flag it as congestion. In order to deal
with this priority of different type of bytes signals, we have classified all byte signals in 3 priority
orders.
High Volume Medium Volume Low Volume
‘High_Priority’: For example, audio bytes need to transferred as soon as possible failing which
will reduce the Grade of Service.
Similarly, ‘Medium_Priority’ and ‘Low_Priority’ are at mediocre and least priority in the queue
of transferring the data which even if transferred with a bit of delay will not cause a lot of
inconvenience.
All these priority columns are created by adding the log-transformed values of the given
categories falling in it for a row.
‘daily_peak_traffic’: The idea behind this feature was to approximate the peak value of high
priority traffic volume over the 5 minutes bucket that might be observed in a particular day. This
was approximated as 2% of the sum of the high priority daily traffic.
This feature will help to differentiate between busy hour traffic and normal traffic of the bytes.
Encoding 0 1 2 3 4 5 6
Time(hr) 4 PM 3 PM 3 PM 3 PM 4 PM 2 PM 10 PM
‘weekday’ : We saw repetitive nature in bytes usage pattern across 30 days. Also, the usage
pattern should be different on weekdays and weekends. In order to incorporate this property,
We created a new feature by taking the remainder of ‘par_day’ divided by 7. So we classified
1-30 days of a month into 0-6 weekdays.
‘Holidays’: This feature is build on the above feature. Once we get the classification of
weekdays and weekends we can further make a larger group. If a given day is holiday we label
it 1 or else 0. Holidays contain all weekends along with 2 Christmas dates. All other days are
considered as working days.
‘sum_of_bytes’ : As congestion is highly affected by the amount of data, so we have taken the
sum of all the bytes columns to create a new column which is also shown to have one of the
highest feature importance in our model.
‘max_1’,’max_2’,’max_3’: We have taken the largest three bytes data for each row of 26-byte
columns.
‘Cell_tilt’ : The motive behind this feature is that the combination of cell range and tilt directly
affect the congestion. Hence, we created a function combining these two features. Cell_tilt =
10*tilt + cell_range
Steps to overcome overfitting in model
K-fold cross validation
1. Cross-validation is a powerful preventive measure against overfitting. 2. It allows us to
tune hyperparameters with only our original training set. This allows us to keep our test set
as a truly unseen dataset for selecting our final model.
3. We have fixed the K = 5 for k fold cross validation.
Early stopping
1. It avoids overfitting by attempting to automatically select the inflection point where
performance on the test dataset starts to decrease while performance on the training
dataset continues to improve as the model starts to overfit.
Stacking Level 1
Models Hyperparameters
Stack Level 1
Overall Model
Architecture
Feature Importance :
Feature importance of our 6 top new features that we have created
Feature importance of top 6 features overall. Out of first 6 overall important features it
contains 4 of our created features.
Confusion Matrix
Using 5-fold cross validation, we have generated 5 confusion matrix to measure the variation in
obtained results.This process was repeated 5 times to generate 25 confusion matrices and
uncertainty is shown through the mean and standard deviation of these 25 confusion matrices.
This is used to identify the uncertainty in prediction of individual classes of congestion.
Inferences :
1. Out of all the wrong predictions, most wrong predictions are made for 4G Backhaul and
3G Backhaul.
2. Non-Congestion can be identified with good certainty with only a little bit of uncertainty
with respect to 3G Backhaul.
3. The model can identify the differences between 4G RAN and 3G Backhaul clearly.
MCC Score
As our main metric of measure in this problem statement is Matthews Correlation Coefficient i.e.
MCC score, the standard deviation of obtained MCC Score through K-fold cross-validation
iterated over 5 times can be used as the general uncertainty parameter.
Conclusion
Since the majority of the features for this network congestion analysis were of the type of bytes
packets sent and received over a period. We have aligned our analysis accordingly. We
classified the types of bytes packets according to their priority. Some other important features
created were based on the usage pattern on a single day. We used single day usage for
creating features as that was the only constant known thing in the dataset. Also, we used the
classification of days into weekdays and weekends. Since we were given a lot of time intervals
for a single day we created some very important features such as peak hours of bytes
consumption. We also tried to rate the grade of service of a single data point with respect to the
total bytes consumption on that day. Grade of service is a relative parameter which turned out to
be a very important feature in our model. If a data point has low values in Grade of Service
feature than there is a high chance that data point to be classified as congestion.
ANNEXURE
THEORY OF NETWORK CONGESTION
Congestion arises in the network when routers are not able to process the data that arrive at
them and this causes buffer overflows at these routers that result in the loss of some packets of
data. The only solution to relax congestion is to reduce the load of the network.
Ideally, the network should be accessible and fair to all the users. In order to ensure this
fairness, we should not allow some users to load the Radio Access Network(RAN), while other
subscribers experience a poor quality of service(QoS) or cannot even access the internet. Each
user’s traffic has to be prioritized such that the network allows all of the users to access services
with the appropriate bandwidth and within the desired latency bounds, even when there is
congestion.
Byte accuracy is crucial when evaluating the accuracy of traffic classification algorithms. They
note that the majority of flows on the Internet are small and account for only a small portion of
total bytes and packets in the network (mice flows). On the other hand, the majority of the traffic
bytes are generated by a small number of large flows (elephant flows). They give an example
from a 6-month data trace where the top (largest) 1 % of flows account for over 73% of the
traffic in terms of bytes. with a threshold to differentiate elephant and mice flows of 3.7MB, the
top 0.1 % of flows would account for 46% of the traffic (in bytes). Presented with such a dataset,
a classifier optimized to identify all but the top 0.1 % of the flows could attain a 99.9% flow
accuracy but still result in 46% of the bytes in the dataset to be misclassified.
The word congestion means excessive crowding but in Telecommunication, the term is used
when a node or link or a channel carries an excessive amount of data and thus degrades the
QoS (Quality of the Service) of the network. Congestion is the most common and fast growing
problem in today’s networking system. With the fast growth of the Internet and increased
demand to use the Internet for voice and video applications congestion increases and as a
result of congestion the QoS degrades. That means all the aspects of a connection, such as
service response time, loss, signal-to-noise ratio, cross-talk, echo, interrupts, frequency
response, loudness level and all other quality of service requirements degrade. It also results in
incremental increases in offered load and that affects the network throughput.
Different types of congestions and their respective causes and
impacts on network
Backhaul congestion
Backhaul congestion is another type of congestion which is occurred because of applications
such as video download, video upload, P2P, File Transfer Protocol (FTP), single-source flood
attacks, and distributed source flood attacks, all of the application typically send large amounts
of volume that contribute to the congestion of backhaul links between the base station, the RNC
and network elements in the path.
CatBoost
CatBoost is a machine learning algorithm that uses gradient boosting on decision trees.
● It yields state-of-the-art results without extensive data training typically required by other
machine learning methods, and
● Provides powerful out-of-the-box support for the more descriptive data formats that
accompany many business problems.
LightGBM
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is
designed to be distributed and efficient with the following advantages:
● Faster training speed and higher efficiency
● Lower memory usage
● Better accuracy
● Parallel and GPU learning supported
● Capable of handling large-scale data
GBM
GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with
high prediction power. Boosting is actually an ensemble of learning algorithms which combines
the prediction of several base estimators in order to improve robustness over a single estimator.
It combines multiple weak or average predictors to build strong predictor.
Ensemble Learning
Ensemble is the art of combining a diverse set of learners (individual models) together to
improvise on the stability and predictive power of the model. In the above example, the way we
combine all the predictions together will be termed as Ensemble Learning.
Here are the top 4 reasons for a model to be different. They can be different because of a mix of
these factors as well:
● Difference in population
● Difference in hypothesis
● Difference in modeling technique
● Difference in initial seed
Bayesian Optimization
Bayesian approaches, in contrast to random or grid search, keep track of past evaluation results
which they use to form a probabilistic model mapping hyperparameters to a probability of a
score on the objective function:
In the literature, this model is called a “surrogate” for the objective function and is represented
as p(y | x). The surrogate is much easier to optimize than the objective function and Bayesian
methods work by finding the next set of hyperparameters to evaluate on the actual objective
function by selecting hyperparameters that perform best on the surrogate function. In other
words:
1. Build a surrogate probability model of the objective function
2. Find the hyperparameters that perform best on the surrogate
3. Apply these hyperparameters to the true objective function
4. Update the surrogate model incorporating the new results
5. Repeat steps 2–4 until max iterations or time is reached
The aim of Bayesian reasoning is to become “less wrong” with more data which these
approaches do by continually updating the surrogate probability model after each evaluation of
the objective function.
At a high-level, Bayesian optimization methods are efficient because they choose the next
hyperparameters in an informed manner. The basic idea is: spend a little more time selecting
the next hyperparameters in order to make fewer calls to the objective function. In
practice, the time spent selecting the next hyperparameters is inconsequential compared to the
time spent in the objective function. By evaluating hyperparameters that appear more promising
from past results, Bayesian methods can find better model settings than random search in fewer
iterations.
Bayesian model-based methods can find better hyperparameters in less time because they
reason about the best set of hyperparameters to evaluate based on past trials.
As a good visual description of what is occurring in Bayesian Optimization take a look at the
images below. The first shows an initial estimate of the surrogate model — in black with
associated uncertainty in gray — after two evaluations. Clearly, the surrogate model is a poor
approximation of the actual objective function in red:
The next image shows the surrogate function after 8 evaluations. Now the surrogate almost
exactly matches the true function. Therefore, if the algorithm selects the hyperparameters that
maximize the surrogate, it will likely yield very good results on the true evaluation function.
Grade of Service
Grade of Service (GoS) is defined as the probability that calls will be blocked while attempting to
seize circuits. It is written as P.xx blocking factor or blockage, where xx is the percentage of calls
that are blocked for a traffic system. For example, traffic facilities requiring P.01 GoS define a 1
percent probability of callers being blocked to the facilities. A GoS of P.00 is rarely requested
and will rarely happen because to be 100 percent sure that there is no blocking, you would have
to design a network where the caller to circuit ratio is 1:1. Also, most traffic formulas assume that
there are an infinite number of callers. We used an approximation of this method using the
volume of traffic transmission ( in Bytes ) to help out model quantify the comparative workload at
a particular time instant.