Applied Predictive Analytics - Caselets

This document discusses using predictive analytics and machine learning models to help a nonprofit organization recover lapsed donors. Several chapters discuss preparing the donor data, identifying patterns in past purchases, clustering donors into groups, and building classification models to predict the likelihood that lapsed donors will respond to a mailing. The final chapters discuss evaluating different models using accuracy metrics and choosing the best model to deploy based on optimizing the business objective.

Uploaded by

Kunwar Rawat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Applied Predictive Analytics - Caselets

Uploaded by

Kunwar Rawat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Chapter 1

Caselet

One of the most successful digital marketing methods in the past has been Email Marketing.
But, in order to get the best ROI from a digital marketing campaign, the email must be sent to
a targeted audience who will be interested in the promotion.
A direct response company that sold books and DVDs had developed a large database
of existing customers along with their historical purchases and their response to previous email
campaigns. They always faced the challenge of identifying the right target audience for their
new promotions. They spent considerable time and money creating multiple samples of
customers, sending the new email campaigns to each of the sample, analyzing the response and
finding which of the sample audience responded best. This sample segmentations created were
all manual and may not be the best. This process also led to delay in their email campaigns and
cost considerable waste of money.
The company hired an analyst who applied predictive analytics for finding the best
target audience for the campaign. Instead of manually creating sample segments, the emails
were sent to a random subset of audience. Based on their response, the analyst identified the
key characteristics of those who responded to the test mailing and based on this, they could
assign score to each of their existing customers. This score could be used to determine which
customers to mail.
Chapter 2

Caselet

A nonprofit organization wants to recover lapsed donors. But, what defines a lapsed donor? Is
it a donor who did not give any gift in the past 6 months or 1 year or 2 years? Only a domain
expert in the organization can define this objective.
Do we need to predict whether a lapsed donor can be recovered, or do we need to predict
how much a lapsed donor will donate? If we predict only the likelihood of recovering a donor,
we may end up favoring low-dollar donors. Alternatively, predicting the amount a lapsed donor
will donate would help achieve higher ROI.
What data is available? Is it live data or archived data? If it is archived data, what is the
process to get access to that data and how long will it take? The project should be planned
based on realistic answers to this.
Different predictive models can be built for this. At the beginning of the project, the
evaluation criteria must be decided to select the metric that will be used to evaluate the model
that meets business objective.
Last but not the least, the deployment team must be informed on the requirements for
deployment as early as possible for identifying potential obstacles to deployment once the final
model is built.
Chapter 3

Caselet

A nonprofit organization wants to recover lapsed donors. It has been decided that all donors
who have not contributed within the last year are lapsed donors.
The data analyst gets access to the required data. He finds that the organization has
migrated from storing data in spreadsheets to a CRM application. The spreadsheets store all
date fields as “MM/DD/YY” (where YY can denote years from 1930 to 2029), while CRM
stores it as binary date-time value. The data analyst identifies this as one of the data that requires
cleaning during data preparation task.
The data analyst tries to get insight into each of the variables. He notices that some of
the zip codes have very few donors. On analysis, he finds that some records in the spreadsheets
have the leading zeroes stripped out from zip code. For example: “9263” is stored, instead of
“09263”. The data analyst identifies this as another cleaning task that will be part of data
preparation.
The data analyst identifies problems in the data, such as missing values, outliers, spikes
and high-cardinality, so that these can be fixed during data preparation.
Chapter 4

Caselet

A nonprofit organization wants to recover lapsed donors. The data analyst has identified most
of the problems in the data, and now those must be fixed. He fixes the problems as follows:
Dates that are found in the spreadsheet are converted from “MM/DD/YY” (where YY
can denote years from 1930 to 2029) to “MM/DD/YYYY”.
Zip codes are corrected by prefixing zeroes where it is missed. “9263” is changed to
“09263”.
Less than 0.1% of the data in spreadsheet is found to contain non-numeric values for
donated amount. For example: “100 dollars”, “250 USD”, etc. The data analyst finds the
average donation made by the same donor by using other records and replaces this value with
the imputed value. The remaining records are removed.
The gender consists of “M”, “F”, “Male”, “Female”, “n/a”, “”. The data analyst
converts all “Male” to “M” and all “Female” to “F”. He considers removing a ’’ data that has
“n/a” or “” for gender. But, that happens to be 45%. Changing all the missing gender data to
“M” or “F” will also skew the data. He finally decides to convert all the missing gender to “D”
(did not respond) and use the same to model. Based on the accuracy of the model, he can always
modify this in the next iteration, if required.
Similarly, the data analyst corrects all problems in the data.
Chapter 5

Caselet

A nonprofit organization trains underprivileged and under-skilled women to create products

and sells them online. The products sold include jams, pickles, chutneys, pappad, belts, wallets,
artificial jewelry, scarves, kerchief, printed bedsheets, pillow cover, candles, match boxes,
agarbathi, etc.
The NGO wants to understand its customers’ purchase behavior by examining which
items are purchased together in a single visit. Predictive Analytics is used to discover such
hidden relationships between items in data. These relationships are called association rules.
In this case, the analytics discovered that when customers purchase candles and
agarbathis, they also tend to buy matchbox. They also found many other baskets of products
bought together. Finding such group of items purchased together can help the NGO to
maximize the sales by deciding which products are on sale. It can also help them to suggest
other products to buy based on an item in the cart. It can also help in bundling frequent products
bought together and offering a discount.
Chapter 6

Caselet

A nonprofit organization wants to approach previous donors for their current donation drive.
Contacting all existing donors will be waste of time and money. Also, contacting those who
recently donated will have a negative impact on their future donations. How can the nonprofit
organization identify the donors to contact for their current donation drive?
In this scenario, descriptive modeling or unsupervised learning methods discover the
best way to segment the data.
Following are some of the questions that can be answered in this type of modeling:
1. Segmenting the first donation by age provides range of age when large number of
people first give a donation
2. Segmenting the largest donation given by age provides range of age when people give
largest donation
3. Size of donation based on gender
4. Size of donation based on marital status
From the above analysis, a proper strategy can be created for soliciting donations from different
segments.
Chapter 7

Caselet

A nonprofit organization has collected lot of historical data on the donors and decides to
cluster-analyze them to generate five different groups. The clustering generated groups of sizes
from single digits to a maximum of few thousand.
The input variables included Date of Birth, Gender, Marital Status, Number of
Children, Annual Income, Highest Education, Domicile State and many more. While it is
simple to visualize clusters formed with two inputs, the reality is that many cluster models are
created from dozens of inputs. Interpreting cluster models is a challenge for predictive
modelers, because there are no clear and standard metrics like those used to access supervised
learning models.
In this case, the predictive modeler used ANOVA technique for identifying the most
important variable.
Chapter 9

Caselet

A nonprofit organization wants to recover lapsed donors. The business objective is to build a
classification model that can predict the likelihood (binary flag), a lapsed donor will respond
to a mailing. The predictive modeler creates multiple models using the following algorithms:
1. Decision trees
2. Logistic regression
3. K-nearest neighbor
4. Naïve Bayes
Traditionally, we selected the best single model for deployment. In ensemble approach, we
use them all in deployment. The reason for this is improved accuracy. Actually, model
ensembles not only improve model accuracy, but they can also improve model robustness.
Through averaging multiple models into a single prediction, no single model dominates the
final predicted value of the models, reducing the likelihood that a flaky prediction will be made.

ML4T 2017fall Exam1 Version A
No ratings yet
ML4T 2017fall Exam1 Version A
8 pages
2 Data Mining Process
No ratings yet
2 Data Mining Process
5 pages
OLAP and OLTP and data mining_KDD process
No ratings yet
OLAP and OLTP and data mining_KDD process
11 pages
Data Mining
No ratings yet
Data Mining
69 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
Data Mining
No ratings yet
Data Mining
6 pages
2 - Business Problems and Data Science Solutions
No ratings yet
2 - Business Problems and Data Science Solutions
26 pages
What Is Cohort Analysis
No ratings yet
What Is Cohort Analysis
10 pages
Course 2 - 121756
No ratings yet
Course 2 - 121756
29 pages
Data Exploration in Preparation For Modeling
No ratings yet
Data Exploration in Preparation For Modeling
27 pages
Data Mining Caselets
No ratings yet
Data Mining Caselets
10 pages
Data Preprocessing For Business Intelligence
No ratings yet
Data Preprocessing For Business Intelligence
93 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
2 pages
Data Is Powerful Data Is Powe
No ratings yet
Data Is Powerful Data Is Powe
2 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
Limitations of Data
No ratings yet
Limitations of Data
2 pages
Business Analytics Casebook
No ratings yet
Business Analytics Casebook
132 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
13 pages
What Is Data Analytics? A Complete Guide For Beginners
From Everand
What Is Data Analytics? A Complete Guide For Beginners
Piyush Kumar Jain
No ratings yet
2018 Analyze This Problem Statement
No ratings yet
2018 Analyze This Problem Statement
4 pages
DA5.6 Marketing Analytics q&a
No ratings yet
DA5.6 Marketing Analytics q&a
4 pages
Lecture 6: Modeling, Evaluation, and Visualization
No ratings yet
Lecture 6: Modeling, Evaluation, and Visualization
14 pages
20 - Chapter 2: Business Problems and Data Science Solutions
No ratings yet
20 - Chapter 2: Business Problems and Data Science Solutions
4 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Data Entry - 2 Report For Module 5
No ratings yet
Data Entry - 2 Report For Module 5
10 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
What Is Data Analysis
No ratings yet
What Is Data Analysis
6 pages
Data Analytics The Ultimate Guide To Big Data Analytics For Business, Data Mini - 0011 - Chapter 11
No ratings yet
Data Analytics The Ultimate Guide To Big Data Analytics For Business, Data Mini - 0011 - Chapter 11
4 pages
University of Maine Thesis Guidelines
100% (1)
University of Maine Thesis Guidelines
8 pages
DM in Marketing
No ratings yet
DM in Marketing
14 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
ISPFL9 Module1
100% (1)
ISPFL9 Module1
22 pages
Section 06: Information-Centered Systems
No ratings yet
Section 06: Information-Centered Systems
21 pages
MBA6018_u05a1_Data Gathering and Analysis
No ratings yet
MBA6018_u05a1_Data Gathering and Analysis
6 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Predicting Missing Items in Shopping Carts Using Fast Algorithm
No ratings yet
Predicting Missing Items in Shopping Carts Using Fast Algorithm
7 pages
Data Warehouse and Data Mining: Callum Moran
No ratings yet
Data Warehouse and Data Mining: Callum Moran
7 pages
Data Mining
No ratings yet
Data Mining
31 pages
Module 2 - BA
No ratings yet
Module 2 - BA
28 pages
Exam1 FALL22 Yeji Kim
No ratings yet
Exam1 FALL22 Yeji Kim
3 pages
Unit 4
No ratings yet
Unit 4
15 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Honey - Bda Assignment
No ratings yet
Honey - Bda Assignment
4 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Data Analytics in Banking 24.11.2021
No ratings yet
Data Analytics in Banking 24.11.2021
34 pages
Bda Unit-1
No ratings yet
Bda Unit-1
43 pages
Data Mining
No ratings yet
Data Mining
16 pages
Banking Analytics
No ratings yet
Banking Analytics
14 pages
Jcpenney Term Paper
100% (1)
Jcpenney Term Paper
4 pages
Analytics Concerns
No ratings yet
Analytics Concerns
10 pages
Data Mining - Detailed - Simple Terms
No ratings yet
Data Mining - Detailed - Simple Terms
9 pages
DATA BASED DECISIONS NOTES
No ratings yet
DATA BASED DECISIONS NOTES
24 pages
Data Trials and Triumphs
No ratings yet
Data Trials and Triumphs
3 pages
Organisational Learning in Data Mining
No ratings yet
Organisational Learning in Data Mining
9 pages
Power BI - Notes
No ratings yet
Power BI - Notes
13 pages
Data Mining Is A Business Process For Exploring Large Amounts of Data To Discover Meaningful Patterns and Rules
100% (1)
Data Mining Is A Business Process For Exploring Large Amounts of Data To Discover Meaningful Patterns and Rules
4 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
Volunteer data skills to make a positive impact
No ratings yet
Volunteer data skills to make a positive impact
3 pages
Analytics in a Business Context: Practical guidance on establishing a fact-based culture
From Everand
Analytics in a Business Context: Practical guidance on establishing a fact-based culture
Frank Vella
No ratings yet
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Yara International School - Riyadh: Date: 17 April 2024 Circular No: 003/YIS/AS/24-25/003 Dear Parents
No ratings yet
Yara International School - Riyadh: Date: 17 April 2024 Circular No: 003/YIS/AS/24-25/003 Dear Parents
1 page
IMT-64 - Old Assignment
No ratings yet
IMT-64 - Old Assignment
4 pages
MCQ From Predictive Analytics
No ratings yet
MCQ From Predictive Analytics
10 pages
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
No ratings yet
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
10 pages
CVRU MBA Details
No ratings yet
CVRU MBA Details
90 pages
Institution Name: Sensitivity: Internal Restricted
No ratings yet
Institution Name: Sensitivity: Internal Restricted
7 pages
Welcome, Kunwar Pal Singh Rawat!: Sorry!
No ratings yet
Welcome, Kunwar Pal Singh Rawat!: Sorry!
2 pages
Interaction ID Type Resul T Response: Return To Progress Summary
No ratings yet
Interaction ID Type Resul T Response: Return To Progress Summary
2 pages
Tamiliniyan Santhosh 92
No ratings yet
Tamiliniyan Santhosh 92
34 pages
OliverKNN Presentation
No ratings yet
OliverKNN Presentation
29 pages
Data Mining Techniques and Methods
No ratings yet
Data Mining Techniques and Methods
11 pages
Machine Learning Techniques For Optimizing Design of Double T-Shaped Monopole Antenna
No ratings yet
Machine Learning Techniques For Optimizing Design of Double T-Shaped Monopole Antenna
6 pages
Plant Disease Identification Using A Novel Convolutional Neural Network
No ratings yet
Plant Disease Identification Using A Novel Convolutional Neural Network
44 pages
ARTI 404 - Project Template_Updated.pdf
No ratings yet
ARTI 404 - Project Template_Updated.pdf
8 pages
Eplug Revise3
No ratings yet
Eplug Revise3
102 pages
MACHINE-LEARNING-LAB
No ratings yet
MACHINE-LEARNING-LAB
3 pages
Data Mining All Slides
No ratings yet
Data Mining All Slides
206 pages
Number Plate Recogination Using Machine Learning
No ratings yet
Number Plate Recogination Using Machine Learning
11 pages
vihari
No ratings yet
vihari
27 pages
Machine Learning Algorithms for GeoSpatial Data. Applications And
No ratings yet
Machine Learning Algorithms for GeoSpatial Data. Applications And
9 pages
Machine Learning With 3D Spatio-Temporal SSM For Alzheimer's Disease Patient Classification
No ratings yet
Machine Learning With 3D Spatio-Temporal SSM For Alzheimer's Disease Patient Classification
2 pages
Answers To Questions
No ratings yet
Answers To Questions
9 pages
Sentiment Analysis of Customers Reviews Using A Hybrid Evolutionary SVM-Based Approach in An Imbalanced Data Distribution
No ratings yet
Sentiment Analysis of Customers Reviews Using A Hybrid Evolutionary SVM-Based Approach in An Imbalanced Data Distribution
14 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Battery Management System To Estimate Battery Agin
No ratings yet
Battery Management System To Estimate Battery Agin
15 pages
IS421 Tutorial 4-Solution
No ratings yet
IS421 Tutorial 4-Solution
4 pages
A Review of Image Classification Approaches and Techniques
No ratings yet
A Review of Image Classification Approaches and Techniques
6 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
10 pages
Machine Learning Approach For Flood Risks Prediction
No ratings yet
Machine Learning Approach For Flood Risks Prediction
8 pages
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
No ratings yet
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
15 pages
Handwritten Digit Recognition Project Paper
No ratings yet
Handwritten Digit Recognition Project Paper
15 pages
Week 1 HW
No ratings yet
Week 1 HW
3 pages
AIML II Test Scheme and Soluion 2023
No ratings yet
AIML II Test Scheme and Soluion 2023
12 pages
Milvus Overview
No ratings yet
Milvus Overview
53 pages
Machine Learning in Healthcare
No ratings yet
Machine Learning in Healthcare
5 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Else Iver
No ratings yet
Else Iver
16 pages

Applied Predictive Analytics - Caselets

Uploaded by

Applied Predictive Analytics - Caselets

Uploaded by

Chapter 1

A nonprofit organization trains underprivileged and under-skilled women to create products

You might also like