0% found this document useful (0 votes)

11 views20 pages

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

pritish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views20 pages

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

pritish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Predictive Congestion

Management in Telecom
Networks
Using Advanced Machine Learning Techniques
INDEX
REPORT:
S. No. Topic Page No.
1. Abstract 1

2. Preliminary Analysis for Understanding Data 1

3. Exploratory Data Analysis 2-6

4. Feature Engineering 6-8

5. Steps to overcome overfitting 9

5. Applying Machine Learning Models and Optimization 9-12

6. Approach to overcome Overfitting 12-13

7. Conclusion 13

ANNEXURE:
S. No. Topic Page No.
1. Theory of Network Congestion 1

2. The Key Difference between 3G and 4G 1

3. Congestion and its adverse effect on Network 1-2

4. Backhaul congestion 2

5. CatBoost 2-3

6. LightGBM 3

7. GBM 3

8. Ensemble Learning 3

9. Bayesian Optimisation 3-5

10. Grade Of Service 5

Abstract
Networks, whether voice or data, are designed around many different variables. Two of the
most important factors that you need to consider in network design are service and cost.
Service is essential for maintaining customer satisfaction and the problem statement focuses on
this side of the coin by trying to predict congestion in towers. The usual features that could have
been used to identify individual towers were masked forcing us to approach this problem from a
traffic volume, traffic type and time standpoint. We hypothesized various features using the
travel analysis concepts like Grade of Service and Busy Hour Traffic used in industry which
were confirmed by our Exploratory Data Analysis and Visualizations. We also analyzed the
natural human tendencies to incorporate the knowledge about weekdays and holidays into
meaningful data for our model. We propose a machine learning algorithm based on these cell
tower statistics to predict the type of congestion that might occur, thus enabling telecom service
providers to take measures in advance.
We conducted our analysis in five major parts :-
● Preliminary Analysis on the given data set and basic visualizations
● Exploratory Data Analysis
● Feature Interaction and Feature Engineering
● Applying Machine Learning Models and Optimization
● Ensemble Learning

Preliminary Analysis for Understanding Data

About Provided Datasets
File name Number of Data Points

train.csv 78560

test.csv 26305

About features
Feature Category Feature Name Feature Count for
given Category

Index ‘cell_name’ 1

Time related Information 'par_day', 'par_hour', etc. 5

Tower Information 'beam_direction', 6

'cell_range’, etc.
Usage Details 'health_total_bytes', 26
'video_total_bytes', etc.

Target(Only in train.csv) ‘Congestion_Type’ 1

Exploratory Data Analysis

For Usage Details Features

● The following graph is the histogram of data that are not normally distributed but show

positive skewness (skewed to the right with one center)

● This histogram is typical for distributions that will benefit from a logarithmic
transformation.
● After log transformation, the graph shows a Normal distribution.
● Similarly, all of the ‘Usages features’ and ‘subscriber_count’ also show the same
distribution.

Tower Related Features

● We can see a uniform distribution of data among cell_range.

● In the above graphs, x-axis represents the tuple of (cell range, congestion type) and
tuple of (tilt, congestion type) respectively, while the y-axis shows the frequency of
occurrence of the respective tuple. It can be observed from both the graphs that the
distribution is uniform.

Distribution of Congestion_type in the dataset

● We can observe that dataset has uniform distribution for each target variable.
Inference from graph

We tried visualizing the different bytes used by the users.

The continuous variables representing the bytes consumed were divided into 4 quartiles
(25%,50%,75%,100%). We then went on to analyze the distribution of different types of
congestions namely 3G Backhaul, 4G Backhaul, 4G Ran and No Congestion.

A clear pattern is seen in the congestion distributions for the classes 4G Backhaul, 4G Ran and
No Congestion.
Following observation were made about the Congestions:

1. It was noticed that 'No Congestion' was following a decreasing trend, as would be
logically expected, since, as the number of users using the network increases, we expect
congestion to increase, i.e, we expect 'No Congestion' to decrease.

2. For 4G Ran congestion, again, the trend seen is that the congestion increases as the
number of users increases. This can be logically explained since we expect to see a rise
in the congestion levels with a rise in a number of users.

3. Similarly, for 4G Backhaul, it’s the exact same trend as 4G Ran. Congestion increases
with increase in number of users.

4. For 3G congestion, we notice some anomalous behavior, which does not exactly fit our
logical expectations. We expected to see a rise in congestion with the rise in users, but
3G Congestion doesn’t really follow any particular trend across all the bytes data. In
some bytes data, it peaks at the 2nd Quartile and then falls, whereas, in some, it
remains more or less constant for the 1st and 2nd Quartile, while falling for the rest and
for some other cases, there is a straight downward trend.
In a nutshell, all the congestions except 3G Backhaul fit a logical explanation, but 3G Backhaul
is a bit anomalous.

Bytes features are masked in clusters.

● Here, it is observed that the distribution of 3 types of bytes is similar. Similarly, uniform
distribution is observed for different set of bytes. It is not common to observe this type of
distribution for a completely different set of bytes. Also, the correlation between them is
not significant, this proves that bytes column is masked in different clusters.

Feature Engineering
Feature Creation Table
Feature Created Methodology Used

‘weekday’ Remainder of ‘par_day’ divided by 7

‘holidays’ Weekend and christmas holiday

‘sum_of_bytes’ Sum of all 26 bytes

‘max_1’, ’max_2’, ’max_3’ Selected three largest values from each bytes row

‘range_tilt’ A function of cell_range and tilt

‘High_Priority’ Sum of log of High priority data bytes

‘Medium_Priority’ Sum of log Medium priority data bytes

‘Low_Priority’’ Sum of log Low priority data bytes

‘daily_peak_traffic’ 2% of day wise sum of ‘High_Priority’

‘grade_of_service’ ‘High_Priority’/’daily_peak_traffic’

‘busy_hours_traffic’ Busiest hour for each week-day ‘High_Priority’

Feature Explanation:
Not all type of bytes sent across have the same level of importance and urgency. A little
increase in signals of more importance will be enough to flag it as congestion. However, a large
increase in signals of low importance might be needed to flag it as congestion. In order to deal
with this priority of different type of bytes signals, we have classified all byte signals in 3 priority
orders.
High Volume Medium Volume Low Volume

High Priority Bytes Congestion May Lead To No

Congestion Congestion/May
Lead To
Congestion

Low Priority Bytes Congestion/May No No Congestion

Lead To Congestion Congestion/May
Lead To
Congestion

‘High_Priority’: For example, audio bytes need to transferred as soon as possible failing which
will reduce the Grade of Service.
Similarly, ‘Medium_Priority’ and ‘Low_Priority’ are at mediocre and least priority in the queue
of transferring the data which even if transferred with a bit of delay will not cause a lot of
inconvenience.
All these priority columns are created by adding the log-transformed values of the given
categories falling in it for a row.

‘daily_peak_traffic’: The idea behind this feature was to approximate the peak value of high
priority traffic volume over the 5 minutes bucket that might be observed in a particular day. This
was approximated as 2% of the sum of the high priority daily traffic.

‘grade_of_service’(Refer to Annexure): Grade of Service is an important parameter in

transmission of data which quantifies the quality of service experienced by a user. It is an
important feature to identify and predict congestion. We tried to find an equivalent to this term
using our data by taking the ratio of ‘High Priority’ bytes to the ‘Daily Peak Traffic’ which shows
relative byte usage for a given time with respect to the peak usage. To get more information on
this feature refer to annexure.

‘busy_hours_traffic’: It is a binary encoded column in which:

1 - represents the time (Hour) at which sum of bytes for ‘High_Priority’ column is maximum at a
given day.
0 - otherwise

This feature will help to differentiate between busy hour traffic and normal traffic of the bytes.
Encoding 0 1 2 3 4 5 6

Weekday Friday Saturday Sunday Monday Tuesday Wednesday Thursday

Time(hr) 4 PM 3 PM 3 PM 3 PM 4 PM 2 PM 10 PM

‘weekday’ : We saw repetitive nature in bytes usage pattern across 30 days. Also, the usage
pattern should be different on weekdays and weekends. In order to incorporate this property,
We created a new feature by taking the remainder of ‘par_day’ divided by 7. So we classified
1-30 days of a month into 0-6 weekdays.

‘Holidays’: This feature is build on the above feature. Once we get the classification of
weekdays and weekends we can further make a larger group. If a given day is holiday we label
it 1 or else 0. Holidays contain all weekends along with 2 Christmas dates. All other days are
considered as working days.

‘sum_of_bytes’ : As congestion is highly affected by the amount of data, so we have taken the
sum of all the bytes columns to create a new column which is also shown to have one of the
highest feature importance in our model.

‘max_1’,’max_2’,’max_3’: We have taken the largest three bytes data for each row of 26-byte
columns.

‘Cell_tilt’ : The motive behind this feature is that the combination of cell range and tilt directly
affect the congestion. Hence, we created a function combining these two features. Cell_tilt =
10*tilt + cell_range
Steps to overcome overfitting in model
K-fold cross validation
1. Cross-validation is a powerful preventive measure against overfitting. 2. It allows us to
tune hyperparameters with only our original training set. This allows us to keep our test set
as a truly unseen dataset for selecting our final model.
3. We have fixed the K = 5 for k fold cross validation.

Bayesian optimization (Refer to Annexure)

1. In contrast to random or grid search, Bayesian optimization keeps track of past
evaluation results which they use to form a probabilistic model mapping
hyperparameters to a probability of a score on the objective function.
2. We have used Bayesian optimization technique to optimize the hyperparameters as it is
faster than grid search.

Dropping redundant features

1. We dropped par_year,par_month column because they were same throughout and
therefore redundant.

Early stopping
1. It avoids overfitting by attempting to automatically select the inflection point where
performance on the test dataset starts to decrease while performance on the training
dataset continues to improve as the model starts to overfit.

Ensembling (Refer to Annexure)

1. Ensembles are machine learning methods for combining predictions from multiple
separate models.
2. We have used 3 Catboost models, 1 Gradient boost, 2 Light boost models stacked with
a neural net as a top layer above these 6 boosting models.

Applying Machine Learning Models and Optimization

Model Accuracy MCC_score

Stacking Level 1

Catboost's Model 1 0.80728 0.7431

Catboost's Model 2 0.8081 0.7443

Catboost's Model 3 0.8051 0.7403

Gradient Boost 0.8012 0.7350

Light GBM Model 1 0.8024 0.7368

Light GBM Model 2 0.8052 0.7404

Stacking level 2(output)

Neural net .8153 .745

Models Hyperparameters

Stack Level 1

Cat Boost Model 1 depth_ = 1,iterations_ =9000,lr = .1

Cat Boost Model 2 (depth_ = 2,iterations_ =6000,lr = .08)

Cat Boost Model 3 depth_ = 3,iterations_ =8000,lr = .05

Gradient Boost subsample_=0.8,n_estimators_=3000,max_depth_=1

Light GBM Model 1 colsample_bytree=0.5193,learning_rate = 0.3254 ,

max_depth = 2, n_estimators = 525, reg_alpha = 0.5067,
reg_lambda=.5, subsample = .9

Light GBM Model 2 colsample_bytree=0.7,learning_rate = 0.3, max_depth =

1, n_estimators = 2100, reg_alpha = .7, reg_lambda=.5,
subsample = .9

Stacking Level 2(output)

Neural Net 1 layer, 30 neurons, activation=’tanh’

Overall Model

Architecture
Feature Importance :
Feature importance of our 6 top new features that we have created

Feature importance of top 6 features overall. Out of first 6 overall important features it
contains 4 of our created features.

Approach to estimate the Uncertainty

Two measures have been tried to incorporate the essence of the uncertainty of our modeling.
They are as follows:

Confusion Matrix
Using 5-fold cross validation, we have generated 5 confusion matrix to measure the variation in
obtained results.This process was repeated 5 times to generate 25 confusion matrices and
uncertainty is shown through the mean and standard deviation of these 25 confusion matrices.
This is used to identify the uncertainty in prediction of individual classes of congestion.

Inferences :

1. Out of all the wrong predictions, most wrong predictions are made for 4G Backhaul and
3G Backhaul.
2. Non-Congestion can be identified with good certainty with only a little bit of uncertainty
with respect to 3G Backhaul.
3. The model can identify the differences between 4G RAN and 3G Backhaul clearly.
MCC Score
As our main metric of measure in this problem statement is Matthews Correlation Coefficient i.e.
MCC score, the standard deviation of obtained MCC Score through K-fold cross-validation
iterated over 5 times can be used as the general uncertainty parameter.

MCC = 0.7447 ± 0.413%

Conclusion
Since the majority of the features for this network congestion analysis were of the type of bytes
packets sent and received over a period. We have aligned our analysis accordingly. We
classified the types of bytes packets according to their priority. Some other important features
created were based on the usage pattern on a single day. We used single day usage for
creating features as that was the only constant known thing in the dataset. Also, we used the
classification of days into weekdays and weekends. Since we were given a lot of time intervals
for a single day we created some very important features such as peak hours of bytes
consumption. We also tried to rate the grade of service of a single data point with respect to the
total bytes consumption on that day. Grade of service is a relative parameter which turned out to
be a very important feature in our model. If a data point has low values in Grade of Service
feature than there is a high chance that data point to be classified as congestion.
ANNEXURE
THEORY OF NETWORK CONGESTION
Congestion arises in the network when routers are not able to process the data that arrive at
them and this causes buffer overflows at these routers that result in the loss of some packets of
data. The only solution to relax congestion is to reduce the load of the network.

Ideally, the network should be accessible and fair to all the users. In order to ensure this
fairness, we should not allow some users to load the Radio Access Network(RAN), while other
subscribers experience a poor quality of service(QoS) or cannot even access the internet. Each
user’s traffic has to be prioritized such that the network allows all of the users to access services
with the appropriate bandwidth and within the desired latency bounds, even when there is
congestion.

Byte accuracy is crucial when evaluating the accuracy of traffic classification algorithms. They
note that the majority of flows on the Internet are small and account for only a small portion of
total bytes and packets in the network (mice flows). On the other hand, the majority of the traffic
bytes are generated by a small number of large flows (elephant flows). They give an example
from a 6-month data trace where the top (largest) 1 % of flows account for over 73% of the
traffic in terms of bytes. with a threshold to differentiate elephant and mice flows of 3.7MB, the
top 0.1 % of flows would account for 46% of the traffic (in bytes). Presented with such a dataset,
a classifier optimized to identify all but the top 0.1 % of the flows could attain a 99.9% flow
accuracy but still result in 46% of the bytes in the dataset to be misclassified.

The key difference between 3G and 4G

● Speed
● Network Structure
● Switching Technology
● Data rate

Congestion and its adverse effect on network

The word congestion means excessive crowding but in Telecommunication, the term is used
when a node or link or a channel carries an excessive amount of data and thus degrades the
QoS (Quality of the Service) of the network. Congestion is the most common and fast growing
problem in today’s networking system. With the fast growth of the Internet and increased
demand to use the Internet for voice and video applications congestion increases and as a
result of congestion the QoS degrades. That means all the aspects of a connection, such as
service response time, loss, signal-to-noise ratio, cross-talk, echo, interrupts, frequency
response, loudness level and all other quality of service requirements degrade. It also results in
incremental increases in offered load and that affects the network throughput.
Different types of congestions and their respective causes and
impacts on network

Radio network controller congestion

RNC (Radio Network Controller) congestion occurred due to overload in RNCs because of
applications such as push e-mail, VPNs, mobile port scanning, Hypertext Transfer Protocol
(HTTP) over Secure Socket Layer (HTTPS), Secure Shell (SSH), location-based services,
push-to-talk, wireless-specific signaling attacks and worms introduce anomalously high amounts
of signaling in the network.

Congestion in wireless radio access network

Cellular wireless networks have become an indispensable part of the communication
infrastructure. RAN (Radio Access Network) is one of the key components of a cellular wireless
network. The main task of RAN is to provide a connection between a mobile station (Cellular
phone, Computer or any Remotely controlled device) with core networks.RAN is designed to an
established certain number of calls with the core network and if it receives call establishment
request beyond its capacity then it can’t serve the incoming request (Call establishment) of the
subscribers and this problem is called congestion in RAN.

Backhaul congestion
Backhaul congestion is another type of congestion which is occurred because of applications
such as video download, video upload, P2P, File Transfer Protocol (FTP), single-source flood
attacks, and distributed source flood attacks, all of the application typically send large amounts
of volume that contribute to the congestion of backhaul links between the base station, the RNC
and network elements in the path.

CatBoost

CatBoost is a machine learning algorithm that uses gradient boosting on decision trees.

● It yields state-of-the-art results without extensive data training typically required by other
machine learning methods, and
● Provides powerful out-of-the-box support for the more descriptive data formats that
accompany many business problems.
LightGBM
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is
designed to be distributed and efficient with the following advantages:
● Faster training speed and higher efficiency
● Lower memory usage
● Better accuracy
● Parallel and GPU learning supported
● Capable of handling large-scale data

GBM

GBM is a boosting algorithm used when we deal with plenty of data to make a prediction with
high prediction power. Boosting is actually an ensemble of learning algorithms which combines
the prediction of several base estimators in order to improve robustness over a single estimator.
It combines multiple weak or average predictors to build strong predictor.

Ensemble Learning

Ensemble is the art of combining a diverse set of learners (individual models) together to
improvise on the stability and predictive power of the model. In the above example, the way we
combine all the predictions together will be termed as Ensemble Learning.

Here are the top 4 reasons for a model to be different. They can be different because of a mix of
these factors as well:

● Difference in population
● Difference in hypothesis
● Difference in modeling technique
● Difference in initial seed

Bayesian Optimization

Bayesian approaches, in contrast to random or grid search, keep track of past evaluation results
which they use to form a probabilistic model mapping hyperparameters to a probability of a
score on the objective function:
In the literature, this model is called a “surrogate” for the objective function and is represented
as p(y | x). The surrogate is much easier to optimize than the objective function and Bayesian
methods work by finding the next set of hyperparameters to evaluate on the actual objective
function by selecting hyperparameters that perform best on the surrogate function. In other
words:
1. Build a surrogate probability model of the objective function
2. Find the hyperparameters that perform best on the surrogate
3. Apply these hyperparameters to the true objective function
4. Update the surrogate model incorporating the new results
5. Repeat steps 2–4 until max iterations or time is reached
The aim of Bayesian reasoning is to become “less wrong” with more data which these
approaches do by continually updating the surrogate probability model after each evaluation of
the objective function.

At a high-level, Bayesian optimization methods are efficient because they choose the next
hyperparameters in an informed manner. The basic idea is: spend a little more time selecting
the next hyperparameters in order to make fewer calls to the objective function. In
practice, the time spent selecting the next hyperparameters is inconsequential compared to the
time spent in the objective function. By evaluating hyperparameters that appear more promising
from past results, Bayesian methods can find better model settings than random search in fewer
iterations.

Bayesian model-based methods can find better hyperparameters in less time because they
reason about the best set of hyperparameters to evaluate based on past trials.

As a good visual description of what is occurring in Bayesian Optimization take a look at the
images below. The first shows an initial estimate of the surrogate model — in black with
associated uncertainty in gray — after two evaluations. Clearly, the surrogate model is a poor
approximation of the actual objective function in red:
The next image shows the surrogate function after 8 evaluations. Now the surrogate almost
exactly matches the true function. Therefore, if the algorithm selects the hyperparameters that
maximize the surrogate, it will likely yield very good results on the true evaluation function.

Grade of Service

Grade of Service (GoS) is defined as the probability that calls will be blocked while attempting to
seize circuits. It is written as P.xx blocking factor or blockage, where xx is the percentage of calls
that are blocked for a traffic system. For example, traffic facilities requiring P.01 GoS define a 1
percent probability of callers being blocked to the facilities. A GoS of P.00 is rarely requested
and will rarely happen because to be 100 percent sure that there is no blocking, you would have
to design a network where the caller to circuit ratio is 1:1. Also, most traffic formulas assume that
there are an infinite number of callers. We used an approximation of this method using the
volume of traffic transmission ( in Bytes ) to help out model quantify the comparative workload at
a particular time instant.

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Formula LTE New
No ratings yet
Formula LTE New
11 pages
NetOps 2.0 Transformation: The DIRE Methodology
From Everand
NetOps 2.0 Transformation: The DIRE Methodology
Ray Belleville
5/5 (1)
Problem Statement - Inter Hall IIT KGP
No ratings yet
Problem Statement - Inter Hall IIT KGP
3 pages
Congestion Recognition in Mobile Networks: Rami Alisawi
No ratings yet
Congestion Recognition in Mobile Networks: Rami Alisawi
62 pages
Specification For Radio Planning Reports and Graphs in Metrica
No ratings yet
Specification For Radio Planning Reports and Graphs in Metrica
15 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
A_Hybrid_machine_learning_based_model_for_congestion_prediction_in_mobile_networks
No ratings yet
A_Hybrid_machine_learning_based_model_for_congestion_prediction_in_mobile_networks
6 pages
BSNE2N0147_2100_Report
No ratings yet
BSNE2N0147_2100_Report
1,006 pages
4g Lte Network Throughput Modelling and Predicting
No ratings yet
4g Lte Network Throughput Modelling and Predicting
6 pages
Huawei-WCDMA Capacity Planning
No ratings yet
Huawei-WCDMA Capacity Planning
86 pages
UMTS Capacity Estimation
No ratings yet
UMTS Capacity Estimation
70 pages
QoS: Myths and Hype
From Everand
QoS: Myths and Hype
John G. Waclawsky
No ratings yet
A Survey On 3G Mobile Network Traffic Performance and Analysis in Ethiopia
No ratings yet
A Survey On 3G Mobile Network Traffic Performance and Analysis in Ethiopia
13 pages
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
From Everand
Principles of Multiple Spanning Tree Protocol: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning for Wireless Network Throughput Prediction
No ratings yet
Machine Learning for Wireless Network Throughput Prediction
11 pages
N0182
No ratings yet
N0182
233 pages
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
From Everand
Spanning Tree Protocol Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ADDISE
No ratings yet
ADDISE
4 pages
Small Cell and CRAN Deployment Report
From Everand
Small Cell and CRAN Deployment Report
Wade Sarver
2/5 (1)
ZTE LR14 LTE FDD Congestion Control Feature Guide
No ratings yet
ZTE LR14 LTE FDD Congestion Control Feature Guide
37 pages
A Survey On 3G Mobile Network Traffic Performance and Analysis in Ethiopia
No ratings yet
A Survey On 3G Mobile Network Traffic Performance and Analysis in Ethiopia
13 pages
2019 Big data analytics for automated QoE management in mobile networks
No ratings yet
2019 Big data analytics for automated QoE management in mobile networks
7 pages
2021 Data-Driven Construction of User Utility Functions from Radio Connection Traces in LTE
No ratings yet
2021 Data-Driven Construction of User Utility Functions from Radio Connection Traces in LTE
15 pages
Lab 11 Ai Mussab(Fa22bce 073)
No ratings yet
Lab 11 Ai Mussab(Fa22bce 073)
11 pages
Forecasting Model For Data Center Bandwidth2016
No ratings yet
Forecasting Model For Data Center Bandwidth2016
8 pages
Icix 3G-4G-5G Capacity & Traffic Management
No ratings yet
Icix 3G-4G-5G Capacity & Traffic Management
11 pages
Ramin Orucov Big Data Day Baku 2015 Big Data Analytics in Telecommunication Business
No ratings yet
Ramin Orucov Big Data Day Baku 2015 Big Data Analytics in Telecommunication Business
17 pages
Measurement-Based Optimization of The GPRS/UMTS Core Network
No ratings yet
Measurement-Based Optimization of The GPRS/UMTS Core Network
21 pages
LightGBM in Practice: Definitive Reference for Developers and Engineers
From Everand
LightGBM in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
SCFT Report Gul006 Vil (Voda)
No ratings yet
SCFT Report Gul006 Vil (Voda)
53 pages
Long Term 5G Network Traffic Forecasting Via Model
No ratings yet
Long Term 5G Network Traffic Forecasting Via Model
12 pages
Future Directions For Fiber Deep HFC Deployments
No ratings yet
Future Directions For Fiber Deep HFC Deployments
35 pages
Umts Performance Report
No ratings yet
Umts Performance Report
18 pages
BTH2012 Aruchamy
No ratings yet
BTH2012 Aruchamy
93 pages
Classification Model To Classify Network Traffic
No ratings yet
Classification Model To Classify Network Traffic
5 pages
03 17th October Traffic Engineering
No ratings yet
03 17th October Traffic Engineering
57 pages
LTE Throughput
No ratings yet
LTE Throughput
6 pages
Sharing With Telenor Pakistan: Traffic Forecast Process & Data Promotion Traffic Forecast Process
No ratings yet
Sharing With Telenor Pakistan: Traffic Forecast Process & Data Promotion Traffic Forecast Process
6 pages
3G OPTI Crash Course
No ratings yet
3G OPTI Crash Course
13 pages
UVJHRCH00221_SCFT_L700_2 (2) (1)
No ratings yet
UVJHRCH00221_SCFT_L700_2 (2) (1)
197 pages
applsci-14-01962
No ratings yet
applsci-14-01962
18 pages
ZTE UMTS QoS Feature Guide
100% (1)
ZTE UMTS QoS Feature Guide
47 pages
Tools of the Trade: Navigating the Labyrinth of Network Troubleshooting
From Everand
Tools of the Trade: Navigating the Labyrinth of Network Troubleshooting
Pasquale De Marco
No ratings yet
applsci-14-01268-v5
No ratings yet
applsci-14-01268-v5
25 pages
thesis on Asteroid Prediction hazard level
No ratings yet
thesis on Asteroid Prediction hazard level
124 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
CEM Service Analysis Report Jan 2025.v1.0
No ratings yet
CEM Service Analysis Report Jan 2025.v1.0
16 pages
Feature Deployment and Testing Guide
No ratings yet
Feature Deployment and Testing Guide
21 pages
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
4G Network Dimensioning
No ratings yet
4G Network Dimensioning
68 pages
Optimizing For Capacity WP-111854-En
No ratings yet
Optimizing For Capacity WP-111854-En
23 pages
Wcdmapsserviceoptimizationguide 150227111633 Conversion Gate01
No ratings yet
Wcdmapsserviceoptimizationguide 150227111633 Conversion Gate01
200 pages
Realtime mobile bandwidth and handoff predictions in 4G_5G networks
No ratings yet
Realtime mobile bandwidth and handoff predictions in 4G_5G networks
11 pages
NB-IoT Systems and Protocols: Definitive Reference for Developers and Engineers
From Everand
NB-IoT Systems and Protocols: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Broadband-Data Qoe Monitoring Solution: © 2013 Witbe, All Rights Reserved. Proprietary and Confidential
No ratings yet
Broadband-Data Qoe Monitoring Solution: © 2013 Witbe, All Rights Reserved. Proprietary and Confidential
31 pages
Report
No ratings yet
Report
6 pages
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
From Everand
Storm Systems for Real-Time Data Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Umts Performance Report
No ratings yet
Umts Performance Report
34 pages
Introductn Working Processes Defmn
No ratings yet
Introductn Working Processes Defmn
35 pages
Introductn Working Products Applications
No ratings yet
Introductn Working Products Applications
28 pages
VBScript Variables
No ratings yet
VBScript Variables
6 pages
Stress Tensor Strain Tensor UG
No ratings yet
Stress Tensor Strain Tensor UG
39 pages
Lecture 3-5
No ratings yet
Lecture 3-5
23 pages
VBScript Introduction PDF
No ratings yet
VBScript Introduction PDF
4 pages
VBScript Conditional Statements
No ratings yet
VBScript Conditional Statements
5 pages
Ifi 1 Else K Ifk 1: Rint OUR
No ratings yet
Ifi 1 Else K Ifk 1: Rint OUR
7 pages
Problem Set 5 Solutions: Introduction To Algorithms
No ratings yet
Problem Set 5 Solutions: Introduction To Algorithms
8 pages
Gel CLB Agg Subject 2 37644
No ratings yet
Gel CLB Agg Subject 2 37644
2 pages
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
No ratings yet
3.0 Java Programming Tutorial OOP Exercises 3.4 Understanding Objects PDF
61 pages
Trading Strategy - Technical Analysis With Python TA-Lib
No ratings yet
Trading Strategy - Technical Analysis With Python TA-Lib
12 pages
Abb Solar Energy Solutions PDF
No ratings yet
Abb Solar Energy Solutions PDF
46 pages
Technology Management Involves The Planning
No ratings yet
Technology Management Involves The Planning
6 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
8 pages
CBSE Class 10 Maths Notes Chapter 2
No ratings yet
CBSE Class 10 Maths Notes Chapter 2
1 page
Compuational Thinking
No ratings yet
Compuational Thinking
17 pages
Chapter 1, Manuscript Acquisition PDF
No ratings yet
Chapter 1, Manuscript Acquisition PDF
2 pages
Amol Bhosale Resume
No ratings yet
Amol Bhosale Resume
2 pages
Contactless Food Ordering System
No ratings yet
Contactless Food Ordering System
64 pages
Module 3 Probem Solving
No ratings yet
Module 3 Probem Solving
64 pages
AWS IAM Interview
No ratings yet
AWS IAM Interview
3 pages
Activity
No ratings yet
Activity
21 pages
Access-125-UZ125-NR-Ride-connect-2025
No ratings yet
Access-125-UZ125-NR-Ride-connect-2025
48 pages
Roblox Admin Script
100% (1)
Roblox Admin Script
2 pages
Firewall Juniper SRX Implicit Deny
No ratings yet
Firewall Juniper SRX Implicit Deny
3 pages
Coaching Combination Play Build Up Midfield Rotation Coordinated
No ratings yet
Coaching Combination Play Build Up Midfield Rotation Coordinated
8 pages
Tarea 3-Seguridad PDF
No ratings yet
Tarea 3-Seguridad PDF
13 pages
g10 Maths
No ratings yet
g10 Maths
3 pages
Group7 - Word Processing
No ratings yet
Group7 - Word Processing
35 pages
Mathematical Induction
No ratings yet
Mathematical Induction
13 pages
Ict Past Paer Exercise
No ratings yet
Ict Past Paer Exercise
20 pages
Revised - Ms-Word-Mcq Format
No ratings yet
Revised - Ms-Word-Mcq Format
101 pages
2 Superlab#2
100% (1)
2 Superlab#2
7 pages
Progression Sharing HT Media Case Study On Cloud Computing
No ratings yet
Progression Sharing HT Media Case Study On Cloud Computing
4 pages
CA2-Cyber Law & Ethics-OEC-CS801B
No ratings yet
CA2-Cyber Law & Ethics-OEC-CS801B
3 pages
Cloud Computing and Security
No ratings yet
Cloud Computing and Security
567 pages
The Internet Quiz With Answers
No ratings yet
The Internet Quiz With Answers
2 pages
Computer System Architecture CHO
No ratings yet
Computer System Architecture CHO
7 pages

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

Predictive Congestion Management in Telecom Networks Using Advanced Machine Learning Techniques

Uploaded by

Predictive Congestion

2. Preliminary Analysis for Understanding Data 1

3. Exploratory Data Analysis 2-6

4. Feature Engineering 6-8

5. Steps to overcome overfitting 9

5. Applying Machine Learning Models and Optimization 9-12

6. Approach to overcome Overfitting 12-13

2. The Key Difference between 3G and 4G 1

3. Congestion and its adverse effect on Network 1-2

9. Bayesian Optimisation 3-5

10. Grade Of Service 5

Preliminary Analysis for Understanding Data

Time related Information 'par_day', 'par_hour', etc. 5

Tower Information 'beam_direction', 6

Target(Only in train.csv) ‘Congestion_Type’ 1

Exploratory Data Analysis

For Usage Details Features

positive skewness (skewed to the right with one center)

Tower Related Features

● We can see a uniform distribution of data among cell_range.

Distribution of Congestion_type in the dataset

We tried visualizing the different bytes used by the users.

Bytes features are masked in clusters.

‘weekday’ Remainder of ‘par_day’ divided by 7

‘holidays’ Weekend and christmas holiday

‘sum_of_bytes’ Sum of all 26 bytes

‘range_tilt’ A function of cell_range and tilt

‘High_Priority’ Sum of log of High priority data bytes

‘Medium_Priority’ Sum of log Medium priority data bytes

‘Low_Priority’’ Sum of log Low priority data bytes

‘daily_peak_traffic’ 2% of day wise sum of ‘High_Priority’

‘busy_hours_traffic’ Busiest hour for each week-day ‘High_Priority’

High Priority Bytes Congestion May Lead To No

Low Priority Bytes Congestion/May No No Congestion

‘grade_of_service’(Refer to Annexure): Grade of Service is an important parameter in

‘busy_hours_traffic’: It is a binary encoded column in which:

Weekday Friday Saturday Sunday Monday Tuesday Wednesday Thursday

Bayesian optimization (Refer to Annexure)

Dropping redundant features

Ensembling (Refer to Annexure)

Applying Machine Learning Models and Optimization

Catboost's Model 1 0.80728 0.7431

Catboost's Model 2 0.8081 0.7443

Catboost's Model 3 0.8051 0.7403

Light GBM Model 1 0.8024 0.7368

Light GBM Model 2 0.8052 0.7404

Stacking level 2(output)

Neural net .8153 .745

Cat Boost Model 1 depth_ = 1,iterations_ =9000,lr = .1

Cat Boost Model 2 (depth_ = 2,iterations_ =6000,lr = .08)

Cat Boost Model 3 depth_ = 3,iterations_ =8000,lr = .05

Gradient Boost subsample_=0.8,n_estimators_=3000,max_depth_=1

Light GBM Model 1 colsample_bytree=0.5193,learning_rate = 0.3254 ,

Light GBM Model 2 colsample_bytree=0.7,learning_rate = 0.3, max_depth =

Stacking Level 2(output)

Neural Net 1 layer, 30 neurons, activation=’tanh’

Approach to estimate the Uncertainty

MCC = 0.7447 ± 0.413%

The key difference between 3G and 4G

Congestion and its adverse effect on network

Radio network controller congestion

Congestion in wireless radio access network

You might also like