Chrun of Telecom Subscribers (1)
Chrun of Telecom Subscribers (1)
As you have learnt by now, predictive modeling – the process of “scoring” and targeting customers for a
marketing campaign – is a significant database marketing tool and an important component of a firm’s
customer relationship management (CRM) effort. The promise of predictive modeling is the ability to
predict what actions customers will take, thereby allowing firms to target their marketing efforts more
effectively. One area of particular importance is customer “churn,” in this case, customer voluntary churn,
when current customers decide to take their business elsewhere or voluntarily terminate their service.
Annual churn rates have been reported to be in the 20% - 40% range for telecommunication and other
technology industries. This puts a premium on developing models that accurately predict which
customers are most likely to churn, so proactive steps (e.g. appropriate communication and treatment
programs) can be taken to prevent customers from churning. The purpose of this assignment is to figure
out which method(s) works best for predicting churn, thereby enhancing our overall understanding of
predictive modeling.
The Teradata Center for Customer Relationship Management at Duke University (the Center) has shared
a dataset regarding the churn of telecommunications customers. The data consist of calibration and
validation samples of customers from a major wireless telecommunications company. The calibration
sample includes observed churn and a set of potential predictor variables. The two validation samples
include the same predictor variables, but no churn variable. Your group, which has been hired as a
consultant to the telecommunications company, are required to submit the predictions of likelihood to
churn.
The Wireless Industry: Over the years, the wireless sector has been one of the fastest-growing businesses
in the economy. With a unique value proposition – freedom and connectivity – the number of subscribers
doubled every two years during the 90’s. Wireless stocks grew as fast as those of many dot-coms, start-
ups emerged everywhere, and IPO’s raised record amounts of money.
These events shaped the new telecommunications landscape as we know it today.
Industry Turmoil: Despite the vertiginous levels of growth and promise, serious charges to industry
profitability have recently emerged: (a) Consolidation: From the nearly 60 cellular companies in US,
virtually all of them are now bankrupt, bought out, or struggling with heavy debts. Only six big players
now account for 80% of the wireless pie. We have seen such consolidation in India too, where there are
four major players (b) Growth: With 1.2 billion subscribers, India is currently the second-largest telecom
market in the world and has seen rapid expansion over the past years. The industry has increased
primarily due to favourable regulatory conditions, low prices, increased accessibility, and the
introduction of Mobile Number Portability (MNP). The telecom sector is set to grow at a Compound
Annual Growth Rate (CAGR) of 9.4% from 2020 to 2025. However, with a CAGR of 15.9% throughout the
forecast period, the smartphone industry in India will have the fastest growth. However, as the growth
increases, so have the competition; (c) Competition: As an obvious result (and to the consumer’s
delight), firms engaged in a devastating price war that not only eroded revenue growth but also
endangered their ability to meet their titanic debts. (d) Customer Strategy: The industry paradigm has
arguably changed from one of “make big networks, get customers” to “make new services, please
customers.” In short, the industry has moved from an acquisition orientation to a retention orientation.
The Elusive Customer: Until now, firms have been able to acquire customers without much effort.
Demand for wireless services has been such that if a customer decided to drop his service and switch to
another carrier, another new customer was right behind him. The priority was to maintain the customer
acquisition rate high, often at the expense of customer retention. But this situation has changed. As the
well of wireless subscribers has begun to run dry, churn – the customer’s decision to end the relationship
and switch to another company – has become a major concern. Last year the industry average churn rate
was 20% - 25% annually, which translates to approximately 2% churn per month. This means that
companies lose 2% of their customers every month. Third quarter, 2001 (the data is a bit old, but the
pattern still is similar), statistics show annual churn rates in an even higher range, 28% - 46% annual churn.
Churn rates for major carriers - Q3 2001
2.10 %
Nextel 28 %
5%
VoiceStream 46 %
3%
Sprint PCS 31 %
3.10 %
AT&T wireless 37 %
3.20 %
Cingular 34 %
2.20 % Month
Verizon Wireless 31 % Year
0 % 10 % 20 % 30 % 40 % 50 %
Data Description
The data provided have generously been provided to the Center by a major wireless carrier. The data are
organized into three data files: Calibration, Current Score Data, and Future Score Data.
Calibration Current Score Data Future Score Data
where:
n = number of customers,
𝑣𝑖 = % of churners who have predicted probability of churn equal to or higher than customer i,
𝑣̂𝑖 = % of customers who have predicted probability of churn equal to or higher than customer i
𝑣𝑖 is the height of the method’s cumulative lift curve at the ith most likely predicted-to-churn customer,
and 𝑣̂𝑖 is the height of the random cumulative lift curve. The difference provides the “length” for
calculating the area between the random and method prediction curves. The term 1/n approximates the
“width” on the x-axis. The Gini Coefficient sums these lengths-times-widths across customers, providing
an approximation to the area between the method’s lift curve and the random lift curve. The calculation
is multiplied by “2” to ensure that the maximum possible Gini Coefficient is 12.
The Gini Coefficient for Method A in the above figure is 0.84; the Gini for Method B is 0.69. Random
prediction will achieve a Gini of 0 (as seen in the formula above since for random prediction, 𝑣̂𝑖 =𝑣𝑖 , and
higher Gini will correspond to more separation between the method’s lift curve and random, which
means better prediction.
Deliverables:
You, as a group, are required to work on the assignment and prepare a report detailing out the your
articulation of the problem including an outline of how you plan to address the issue, the specific tasks
carried out for the purpose (data wrangling, iterative model building, etc), and the usefulness of the
models developed explained with the help of relevant outputs, and the insights gained from those to
help you arrive at the final recommendation. This report needs to be uploaded in Word/pdf format. Note
that the report should include the results that you got, the plots that you have developed, with due
interpretation. In addition, you need to upload a R script file which will run all the codes that you would
have used to develop the model. If you use any other tools like Tableau or Excel, you are required to
upload those files too. In addition, you are required to upload a video presentation, detailing out the
process that you had followed, and the outcome thereof, including the interpretation of various results
that you would have got, and your final recommendation of the model. This video should not be of more
than 15 minutes duration.
The evaluation will be based on the detailed explanation and interpretation incorporated in the report,
and the video presentation. In addition, you will be given due credit for concepts beyond what was
discussed in the classes as incorporated by you in the task. Though the data is a bit old, the learnings
would surely be as relevant as ever.
Your submissions will be checked for plagiarism, and if found to be plagiarized, will be heavily penalized.