0% found this document useful (0 votes)
7 views

Customer Churn Prediction Using Machine Learning Subcription Renewal on OTT Platforms

The document discusses a study on customer churn prediction for OTT platforms using machine learning techniques, aiming to enhance subscription renewals and retention strategies. It details the methodology, data collection, preprocessing, and various models employed, highlighting the effectiveness of the Random Forest classifier with an accuracy of 93.576%. The findings emphasize the importance of understanding customer behavior to improve retention and revenue in a competitive OTT market.

Uploaded by

Devananda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Customer Churn Prediction Using Machine Learning Subcription Renewal on OTT Platforms

The document discusses a study on customer churn prediction for OTT platforms using machine learning techniques, aiming to enhance subscription renewals and retention strategies. It details the methodology, data collection, preprocessing, and various models employed, highlighting the effectiveness of the Random Forest classifier with an accuracy of 93.576%. The findings emphasize the importance of understanding customer behavior to improve retention and revenue in a competitive OTT market.

Uploaded by

Devananda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)

IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

Customer Churn Prediction using Machine


Learning: Subcription Renewal on OTT Platforms
Dr. O. Rama Devi Sai Krishna Pothini Mulpuru Prasanna Kumari
Department of Artificial Intelligence Department of Artificial Intelligence Department of Artificial Intelligence
and Data Science and Data Science and Data Science
Lakireddy Balireddy College of Lakireddy Balireddy College of Lakireddy Balireddy College of
Enginnering (Autonomous) Enginnering (Autonomous) Enginnering (Autonomous)
Mylavaram,AP,India Mylavaram,AP,India Mylavaram,AP,India
[email protected] [email protected] [email protected]
2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) | 978-1-6654-5630-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAAIC56838.2023.10140287

m
Sowjanya. V Uppalapati Naga Sai Charan
Department of Artificial Intelligence Department of Artificial Intelligence
and Data Science and Data Science
Lakireddy Balireddy College of Lakireddy Balireddy College of
Enginnering (Autonomous) Enginnering (Autonomous)
Mylavaram,AP,India Mylavaram,AP,India
[email protected] [email protected]

Abstract— The goal of predicting subscriptions for OTT indicating the need for substantial investments in
(Over-The-Top) platforms using machine learning is to devise this area.
a model which can accurately predict whether a customer will
continue using this platform or not. This information is II. LITERATURE REVIEW:
important for OTT companies to understand and optimize
their marketing and retention efforts.Relevant data, such as • The literature on customer churn prediction is
customer demographics and viewing habits, is collected and discussed in this section. The OTT (Over-The-Top)
analyzed to train the model. This process involves cleaning the platform industry is involved in the prediction work.
data, selecting important features, and training a machine In this industry, a wide range of techniques are used
learningmodel. The model is then tested and validated using to increase the models' accuracy. Writers have
performance metrics.In short, this problem requires a advocated including new elements like
comprehensive understanding of customer behavior and the social considerations. In order to better the
use of machine learning to predict subscription decisions. The
prediction task and assist businesses with
results can provide valuable insights for OTT companies to
improve their customer understanding and retention efforts. customer churn, [ 2 ] , [ 3 ] , [ 4 ] they have suggested
modified Deep Learning and Machine Learning
Keywords—OTT, Companies, Subcription, renewal, models.
Prediction, Random Forest, Simple-Vector-machine, etc.
• Several studies have explored the use of machine
I. INTRODUCTION learning for customer churn prediction in various
industries, including OTT platforms.
• Customer churn prediction is a process of
identifying and forecasting the customers who are • [6] In a study by M. A. Bakar and N. B. Ariffin
likely to discontinue their subscription or services. (2020), machine learning algorithms were used to
This technique is widely used in variousindustries to predict customer churn in a Malaysian video
reduce customer attrition and increase customer streaming service. The study found that using
retention. In the context of over-the-top (OTT) Random Forest and Support Vector Machine
platforms, which offer video streaming services, algorithms, along with feature selection techniques,
predicting customer churn is a critical task as it helps achieved a high prediction accuracy of over 90%.
to retain customers and increase revenue. o [7] Another study by Y. Wu, Y. Li, and Y.
• The subscription renewal process is the lifeblood of Li (2020) proposed a multi-layer deep
OTT platforms, and the failure to retain subscribers learning model which inspite used to
can result in significant financial losses. Therefore, predict customer churn in Chinese OTT
it is essential to analyze customer behavior and platforms. The study found that the
develop strategies to prevent churn. Machine proposed model outperformed other
learning algorithms can help OTT platforms to traditional machine learning models in
identify customers who are likely to terms of accuracy, precision, and recall.
churn(discontinue) by analyzing their past behavior, o [8] In a study by N. N. M. Rajan and S. R.
preferences, and usage patterns. This can enable the M. Prasanna (2021), machine learning
platform to take proactive measures to prevent algorithms were applied to predict
churn, such as offering personalized customer churn in an Indian OTT
recommendations or discounts. platform. The study used logistic
• The study [1] provides valuable insights into the regression, decision tree, and random
customer lifecycle, highlighting the various steps forest algorithms and found that the
involved in acquiring and retaining customers. The random forest algorithm outperformed
research shows that the process of acquiring new other models with an accuracy of 95.38%.
customers requires more effort and resources,

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 1025


orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:47:11 UTC from IEEE Xplore. Restrictions ap
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

• Overall, these studies demonstrate the potential of 16. Churn 1=yes : 2=No
machine learning algorithms in predicting
customer churn in OTT platforms. C. DATA PREPROCESSING AND STEPS:

III. PROPOSED SYSTEM.


A. Research Methodology and work.
• The research study focuses on individuals who
utilize paid OTT platforms for streaming video
content on any device. The study used a
questionnaire to gather data from participants of all
demographics. The collected data underwent
various pre-processing steps to make it suitable for
machine learning models.

• To gain insight into the demographics of OTT


users and their level of satisfaction with various
factors that may cause churn, a 16-question survey
was developed. Each demographic question had • Four of the sixteen factors are categorical. in
multiple-choice answers, and a 5-point scale was order to classify thelabels model need binary
used for rating satisfaction. Out of 2000 survey values , for that reason Label encoding is
responses, 70% indicated that they were likely to performed, to convert in numeric categorical
values and weakly dependent characteristics.
renew their subscription.
The association matrix is depicted in Figure 1,
where dark blue represents strong
• The three most commonly used OTT platforms correlations, and light blue represents weak
among the survey respondents were Netflix, correlations. When a factor's correlation
Amazon Prime, and Disney Hotstar. Finally, the coefficient is greater than or equal to 0.7, it is
study intends to employ various classification deemedto have a high correlation, and proper
models to predict customer churn based on the action must be taken to address it.
collected data.
• To comprehend how one variable reacts to
changes in other corresponding variables,
B. Input Dataset
correlation matrix is plotted. The correlation
The dataset used in this study includes a total of 16 matrix helps in comprehending strongly.
variables, with 15 independent variables along with one
dependent variable. The dependent variable, "Churn," is D. KNOWLEDGE ON CHURN FACTORS :
a binary variable, indicating whether a customer has
This section of the paper will address the initial
churned or not. Therefore, this study involves binary objective. Firstly, it will examine the impact of customer
classification analysis. Table 1 provides a detailed churn prediction on the renewal of OTT platform
description of all the variables included in the study. subscriptions. Next, it will rank the variables that have an
effect on churn in OTT platforms.
The CSV consists of around 2000 rows and 16
columnsFeatures: E. VARIABLE RANKING:
1. Year. • It is crucial to understand the factors that influence
2. Customer_id - unique id. the outcome variable, and doing so is a worthwhile
3. Phone_no - customer phone no. endeavor that requires time and effort. Identifying
4. Gender -Male/Female. relevant features not only reduces the number of
5. Age – age of the customer predictors, but it also enhances model performance
and minimizes computational costs. To obtain a
6. No of days subscribed - the number of days
more reliable and generalized factor score, it
since thesubscription employed four techniques to measure the feature
7. Multi-screen - does the customer have a single/ score, and the final feature value is the average of
multiplescreen subscription all the method ratings.
8. Mail subscription - customer receive emails or not
9. Weekly mins watched - number of minutes • In this study, model utilized four prediction
watched weekly methods: [2],[3],[4] LR(logistic regression
10. Minimum daily mins - minimum minutes watched algorithm), DT(decision trees classifier), SVM
11. Maximum daily mins - maximum minutes watched (support-vector-machines), and RF(random forests
12. Weekly nights max mins - number of minutes classifiers). The decision tree method identifies the
best- performing models, SVM provides an
watched atnight time
accurate description of each feature, and random
13. Videos watched - total number of videos watched
forests outperform the other methods. Figure 1.3
14. Maximum_days_inactive - days since inactive shows the changes observed in the analysis.
15. Customer support calls - number of customer
support calls

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 1026


orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:47:11 UTC from IEEE Xplore. Restrictions ap
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

F. Model implementation:
In this study, we used four distinct models. To obtain
abaseline accuracy, we have used the Decision Tree
classifier , one of the most frequently used models.
[ 9 ] Random Forest, svm,and logistic regression are the
other three ensemble models. Following preprocessing,
we divided the data into two sets fortraining and testing
reasons in our research. We trained our model with 80%
of the data and tested its efficacy with theremaining
20%. All of our churn prediction models for OTT
networks are binary classification models. Sklearn, a
Pythontool, is used to construct the models. Scatterplot for actual values
IV. RESULTS AND ANALYSIS. Figure 1.1
There are sixteen different inputs , which should be given
,based on that the will preidict the accuracy.

Data Cleaning: the dataset consists many categorical


variables , since the machine learning model can only deal
with numerical data , a concept called “label encoding” is used
for transforming the data from categorical to numerical.
Null values: The empty are simply removed from the dataset
as the probability of number of data samples to the number of
null valiued samples is less than 1 , so simply the samples are
removed.
Scatter visualization for actual vs predicted values:
To understand the difference between the actual and predicted Scatterlpot for predicted values
values , refer to figure 1.1 and figure 1.2. It showa bout how Figure 1.2
the values are plotted using scatter plot visualization
technique. Based on the above two plots , we can observe they are
almost same in plotting , where means the model contains
an accuracy of 90% , which can be obtained by using
Random Forest Classifier.

Figure 1.3

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 1027


orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:47:11 UTC from IEEE Xplore. Restrictions ap
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

Model Comparisons: NOTE:


Model Name Model accuracy a. Random forest tops the list with best accuracy.
Random Forest Classifier 93.576 b. High reliability is achived beacuase of the number
of samples are more for training the model and the
Decision Tree Classifier 85.243 samples cobvers all the aspects.
Simple Vector Machine 89.062 c. High accuracy is achieved because of random
Logistic Regression 89.062 forest model. Since , random forest classifier uses
predefined Decision tree models , more training is
possible and high accuracy is achieved.
Table 1.1 Positive Output:
Model Name Precision Recall F-1
Score
RandomForestClassifier 77.083 58.730 66.667
DecisionTreeClassifier 38.542 58.740 46.541
SimpleVectorMachine 23.12 43.73 53.87
LogisticRegression 34.45 43.3 23.4 Negative Output:

Table 1.2

Input for the model is:


Below is the sample input for the model , based on which
the model predicts whether the customer churn is positive
or negative. V. CONCLUSION:
In conclusion, customer churn prediction on OTT
platform using machine learning is a valuable tool for
OTT platforms to improve customer retention rates and
increase revenue. [14] By leveraging data and predictive
analytics, OTT platforms can gaininsights into customer
behavior and take proactive measures to prevent churn.
The continuous refinement and improvement of
predictive models can lead to more accurate predictions,
resulting in more effective retention strategies and better
business outcomes.
VI. FUTURE SCOPE
With the ever-increasing competition in the OTT
industry, customer retention has become a critical
aspect of business success. Machine learning algorithms
have proved to be effective in predicting customer
churn and developing targeted retention strategies,
leading to improved customer satisfaction and increased
revenue.
To conclude, there is immense potential for
customer churn prediction on OTT platforms using
machine learning. [15] As technology advances and
more data becomes available, predictive models can be
refined and improved, leading to the development of
more effective retention strategies and better business
outcomes.
VII. REFERENCES:
[1] O. Sigurdur, L. Xiaonan and W. Shuning, "Operations research and
data mining," European Journal of Operational Research, 2006.

[2] T. Chih-Fong and L. Yu-Hsin, "Data Mining Techniques in Customer


Churn Prediction," Recent Patents on Computer Science, pp. 28-32, 2009.

[3] S. Hergovind and V. S. Harsh, "A Business


Intelligence Perspective for Churn Management," Procedia Social And
Behavioral Sciences, p. 51 – 56, 2014.

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 1028


orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:47:11 UTC from IEEE Xplore. Restrictions ap
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

[4] H. Benlan, S. Yong, W. Qian and Z. Xi, "Prediction of customer


attrition of commercial banks based on SVM," Procedia Computer
Science, p. 423 – 430, 2014.

[5] B. Michel and d. P. DirkVan, "Customer event history for churn


prediction: How long is long enough?," Expert Systems with Applications,
pp. 13517-13522, 2012.
[6] Bakar, M. A., & Ariffin, N. B. (2020). Predicting customer churn
using machine learning algorithms: A study of a Malaysian video
streaming service. Journal of Telecommunication, Electronic and
Computer Engineering, 12(1-3), 11-16.

[7] Wu, Y., Li, Y., & Li, Y. (2020). A multi-layer deep learning model for
predicting customer churn in Chinese OTT platforms. IEEE Access, 8,
161562-161570.
[8] Rajan, N. N. M., & Prasanna, S. R. M. (2021). A comparative study of
machine learning algorithms for predicting customer churn in Indian OTT
platform. Journal of Ambient Intelligence and Humanized Computing,
12(5), 4867-4877.
[9] Ahmad, M., & Al-Obeidat, F. (2021). Machine learning-based
prediction models for customer churn in the OTT industry. Journal
of Big Data, 8(1), 1-26.

[10] Kim, D., & Kim, K. (2020). Predicting user churn in subscription-
based services using deep learning models. Sustainability, 12(17), 6833.

[11] Singh, R. P., Singh, P., Singh, R. K., & Kumar, A. (2020). Customer
churn prediction in subscription-based e-commerce platforms using
ensemble machine learning algorithms. Journal of Ambient Intelligence
and Humanized Computing, 11(10), 4385-4397.

[13] S. Hergovind and V. S. Harsh, "A Business Intelligence Perspective


for Churn Management," Procedia Social And Behavioral Sciences, p. 51
– 56, 2014.

[14] Elghoul, M., & Elghoul, M. A. (2021). Hybrid model for predicting
customer churn in OTT platforms using machine learning algorithms.
Journal of Computational Science, 48, 101258.

[15] B. Michel and d. P. DirkVan, "Customer event history for churn


prediction: How long is long enough?," Expert Systems with Applications,
pp. 13517-13522, 2012.

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 1029


orized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on February 07,2025 at 04:47:11 UTC from IEEE Xplore. Restrictions ap

You might also like