0% found this document useful (0 votes)
145 views

Problem Statement - Inter Hall IIT KGP

This document provides a problem statement for predicting network congestion using cell tower statistics. The goal is to train machine learning models to predict the type of congestion using usage data like web browsing bytes and subscriber count for different cell towers. The training dataset includes anonymized data for cell towers in December 2018 with fields like tower ID, usage by activity, direction, range, and the target variable of congestion type. Models will be evaluated on their ability to accurately predict congestion type using the Matthews correlation coefficient.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views

Problem Statement - Inter Hall IIT KGP

This document provides a problem statement for predicting network congestion using cell tower statistics. The goal is to train machine learning models to predict the type of congestion using usage data like web browsing bytes and subscriber count for different cell towers. The training dataset includes anonymized data for cell towers in December 2018 with fields like tower ID, usage by activity, direction, range, and the target variable of congestion type. Models will be evaluated on their ability to accurately predict congestion type using the Matthews correlation coefficient.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

PROBLEM STATEMENT

Overview:
In the context of telecommunications industry, one of the most important issues that industry faces is
network congestion. It has been shown that congestion, even if for smaller durations, has a negative
impact on customer loyalty, especially in price sensitive markets. To solve this problem effectively, it
becomes imperative for firms to be able to predict congestion in advance and take proactive actions.
In this competition, you are required to train machine learning models that use cell tower statistics such as
usage, customer count, etc, to predict the type of congestion that might occur.
We are providing a subset of original dataset, while also randomizing/masking some values to avoid
leakage of proprietary information. Hence, this dataset only has sample data for December, 2018
transactions.
Some fields in the current dataset are anonymised to avoid data leakage; while the usage data has also
been anonymously scaled/randomized by a single/constant factor.

Data Collection Methodology overview:

Incidents Table ESR Records

1. Tower level activity 1. User level activity

1. Aggregated over 5 mins period


2. Inner joined

Data Dictionary:
You are provided data for cell towers with the following fields in training dataset:

1. cell_name : Cell tower number/name – Masked name for cell towers


2. 4G_rat: Tower supports 3G/4G indicator
3. Par_year – Year under consideration (2018)
4. par_month – Month under consideration (December)
5. par_day – Day under consideration
6. par_hour – Hour under consideration
7. par_min – Minute bucket under consideration
a. Buckets of 5 min interval. Eg: value of 15 implies statistics are compiled/aggregated over a
time period from 10-15 mins
8. subscriber_count: Count of total subscribers for the cell in the specified time period
9. Usage data: Data usage by activity type; includes both upload and download bytes
a. web_browsing_total_bytes
b. video_total_bytes
c. social_ntwrking_bytes
d. cloud_computing_total_bytes
e. web_security_total_bytes
f. gaming_total_bytes
g. health_total_bytes
h. communication_total_bytes
i. file_sharing_total_bytes
j. remote_access_total_bytes
k. photo_sharing_total_bytes
l. software_dwnld_total_bytes
m. marketplace_total_bytes
n. storage_services_total_bytes
o. audio_total_bytes
p. location_services_total_bytes
q. presence_total_bytes
r. advertisement_total_bytes
s. system_total_bytes
t. voip_total_bytes
u. speedtest_total_bytes
v. email_total_bytes
w. weather_total_bytes
x. media_total_bytes
y. mms_total_bytes
z. others_total_bytes
10. beam_direction: Tower beam direction
11. cell_range: Cell tower range
12. tilt: Cell tower tilt
13. ran_vendor: Service Vendor
14. Congestion_Type: Type of congestion observed (Target Variable)

Evaluation Criteria:
Evaluation of outputs is based on Matthews correlation coefficient
(https://en.wikipedia.org/wiki/Matthews_correlation_coefficient)

Scikit Learn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html


Rules:
1. Use of external data is not allowed
2. Using Cell name as a predictor variable is not allowed
3. Students are encouraged to use Python 3+ modules for all their work

Useful Links:
1. Backhaul: https://en.wikipedia.org/wiki/Backhaul_(telecommunications)
2. RAN: https://en.wikipedia.org/wiki/Radio_access_network

You might also like