1ijecs
1ijecs
net/publication/375592850
CITATIONS READS
0 159
3 authors, including:
All content following this page was uploaded by Chizi Aloy-Okwelle on 12 November 2023.
Abstract:
Telecom network often encounters large number of tweets based on the user experience for a network. This
huge amount of raw data can be used for industrial or business purpose by organizing them according to our
requirement and processing. The aim of this paper is to address the social media review challenges in telecom
companies. The methods include; extracting tweets, analyzing them and segregating them into various
categories to help the company understand the concerns of their customer. This can help save millions and
prevent customer churn. In other to build a robust model, the dataset was pre-processed by checking and
removing Nan values. After the pre-processing, stage the cleaned data was tokenized. In tokenization process,
each word was divided into tokens, for easy training. After the tokenization process, we performed an
exploratory data analysis on the dataset to understand the pattern of the dataset. After the explorative data
analysis stage, we trained a random forest classifier to predict/classify the customer’s satisfaction into positive,
negative, and neural.
2. Related Work
[5] Provided a methodology for telecom companies to target different-value customers by appropriate offers
and services. This methodology was implemented and tested using a dataset that contains about 127 million
records for training and testing supplied by Syriatel Corporation. Firstly, customers were segmented based
on the new approach (Time- frequency- monetary) TFM (TFM where: Time (T): total of calls duration and
3. DESIGN METHODOLOGY
This session discusses the technique used and the methods used in the data collection. The system
architecture of the proposed system can be seen in Figure 1.
Preprocess
Tele-Communication Data ing
Tokenization
Pre-processing: This involves the processing of the legal case document, so that it would be fit for a training
a deep learning model. The following are the process of tokenization:
i. Data Cleaning and Noise Removal: One of the key steps in processing language data is to remove
noise so that the machine can easily detect the patterns in the data. Text data contains a lot of noise, in
the form of hashtags, punctuation and numbers.
ii. Filtering by length: It is useful to remove unwanted words in a sentence. Words that are usually less
than two characters in length do not represent a special meaning in a sentence. However, these words
have characteristics that are not defeated in the previous pretreatment process. Therefore, in this
process, only the necessary words are split by limiting the length of the words
iii. Transform cases: In this process, each character in a word is converted to lowercase.
Tokenization: Tokenization is the process of splitting the text into smaller pieces called tokens. Words,
numbers and punctuation marks and others can be considered as tokens. Each sentence in the document was
separated by words. To achieve this, the nltk (Natural Language Tool Kit) package in Python programming
language was used.
Exploratory Data Analysis (EDA): This was used in performing visualization of the data. This was used in
analyzing the data using charts and graphs.
Model Evaluation: The Random Forest Regression algorithm is used in order that to find the output from the
training dataset, it uses multiple decision trees. The mathematical representation of the random forest can be
seen in Table 1.
4. Implementation/Result
This system presents a model for analyzing customers’ behavior on telecommunication using big data. The
system starts by acquiring unstructured data, to address the social media review challenges for
telecommunication companies and then analyzing the data, and segregating them into various categories to
help the company understand the concerns of their customers. In other to build a robust model, the tweet
dataset was pre-processed by checking and removing NaN values. The cleaned data can be seen in Figure 2.
After the stage of pre-processing, the cleaned data was tokenized. By tokenization, we divided each word into
tokens, for easy training and understanding, this is seen in Figure 2. After the tokenization process, we
performed an exploratory data analysis on the dataset in order to understand the dataset’s pattern. This is
displayed in Figures 3, 4, and 5. After the explorative data analysis, we trained a random forest classifier to
predict/classify the customer’s satisfaction into positive, negative, and neural. The result of the random Forest
model can be found in Figures 6 and 7.
Reference
1. Aluri A, Price BS, McIntyre N. (2019). "Using machine learning to concrete value through dynamic
customer engagement in a brand loyalty program". J Hosp Tour Res. 2019;43 (1):78–100.
2. Chung-Min Chen(2016) "Use cases and challenges in telecom big data analytics"Published online by
Cambridge University Press.APSIPA Transactions on Signal and Information Processing , Volume 5 ,
2016 , e19.DOI: https://doi.org/10.1017/ATSIP.2016.20
3. Jeffrey S., Yves T'J., Raluca D., Peter S., Laurent P. (2014). "Using big data to improve customer
experience and business performance" Bell labs technical journal 18 (4), 3-17,
2014
4. Oladapo K, Omotosho O, Adeduro O. (2018). Predictive analytics for increased loyalty and
customer retention in telecommunication industry. Int J Comput Appl. 2018;975:8887
5. Wissam N, Ramez A,, Kamal S, & Shadi B.(2020)"Predictive analytics using big data for
increased customer loyalty: Syriatel Telecom Company case study" J Big Data (2020)
7:29 https://doi.org/10.1186/s40537-020-00290-0 pp 2-24