0% found this document useful (0 votes)
4 views

W1 Lecture1 May8 2023

The document outlines the objectives and processes of a data mining course, emphasizing the identification of business problems and the application of predictive analytics. It discusses the significance of data mining in modern business, including its role in enhancing customer targeting and improving marketing strategies. Additionally, it highlights the importance of skilled personnel and common software used in data mining, while providing examples of its applications across various industries.

Uploaded by

Almaas Zafar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

W1 Lecture1 May8 2023

The document outlines the objectives and processes of a data mining course, emphasizing the identification of business problems and the application of predictive analytics. It discusses the significance of data mining in modern business, including its role in enhancing customer targeting and improving marketing strategies. Additionally, it highlights the importance of skilled personnel and common software used in data mining, while providing examples of its applications across various industries.

Uploaded by

Almaas Zafar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

Lecture 1

Desired Outcomes: Data Mining

1. From a Top Line Perspective:


1. How to identify a business problem that is relevant for data mining
2. What is the process we undertake to solve a given business problem.
3. Develop a discipline and approach in dealing with data to solve
business problems
4. What are the different types of statistics and techniques that are used
to solve a given business problem
5. Steps used in building and evaluating a predictive model.
6. What are the different types of segmentation and how are they used
in data mining
Desired Outcomes: Data Mining

• Exploring the more recent advancements in data science


– Big Data
– Text Mining
– Artificial Intelligence

• An understanding of the significance and role of the future data


scientist
– The Hybrid
What This Course Will NOT Do
• Teach you all the statistics you need to do data mining
• Replace real-world experience analyzing databases
• Turn you into an immediate data mining practitioner
What This Course WILL Do
• Help you understand how and when to use data mining
• Assist you in talking to data miners (internally or externally)
• Begin your training and provide an initial foundation of knowledge as
a data miner
What is Data Mining?
•The process of exploration and analysis, by automatic means, of large
quantities of data to discover meaningful patterns and rules

•What does this mean from a business standpoint ?


– Capitalization of above learning to maximize ROI for a given business process
What is Data Mining?
• Data Mining is revolutionizing business today

• The old business paradigms are no longer acceptable

• Companies recognize their information as a critical asset

• The most successful companies in the coming millennium will


be able to intelligently utilize this information for profit-
maximization decisions
Why the Growth in Data Mining?
• Marketers are no longer revenue-driven, but ROI driven

• Organizations are becoming customer centric vs. product centric

• Recognition of information as an asset but one that can be used


across all industry sectors and beyond just the marketing domain.

• Too much noise and confusion in the market place

• Societal changes include:


– Consumers are time conscious
– Emphasis on quality and value
– Aging population
– Emphasis on “What's in it for me”?
Why the Growth in Data Mining?
• Technological Changes
– Increased storage and processing capacity within a constantly cost- reduction
environment
– Increased use of statistical tools and software for enhancing business decision-
making

• One-to-One Marketing is Becoming the “Norm”


– Increased emphasis on developing customer loyalty programs
– Information represents a critical requirement in developing customer loyalty
programs
– Mining the above information intelligently is the key towards successful
customer loyalty programs

• The Web and Big Data


– Easy and timely access to large volume of data
Data Mining as a Profession
• The most important asset for successful data mining is people
• Successful hiring factors to look for are:
 Quantitative skills
 Business and problem-solving skills
 Programming skills
 Knowledge of data structure, file structure, system structure and their integration
 Communication skills and ability to liaise with marketing and systems departments

• Which of the above will become less important in an increasingly


automated work environment.
Common Software
• SAS (Enterprise Miner, Base SAS)
• Power BI
• Tableau
• Statistica
• KXEN
• Angoss Knowledge Studio
• R
• Python
• Impala
• Data Robot
• Alteryx
• etc
Common Applications
• Fraud detection
• Marketing
• Drug testing
• Quality control
• Credit scoring
• Crime prevention/security
• Inventory control and planning
• Sports
• IOT/Predictive Maintenance
Common Marketing Applications
1) Acquisition of new customers
2) Developing up-sell strategies
3) Developing cross-sell strategies
4) Reducing customer defection
5) Creation of target customer groups for existing customer marketing
programs
6) Campaign management analysis
7) Identifying high value and high potential value customers
8) Product affinity and bundling analysis
9) Retail site location analysis and product distribution analysis

One of the primary objectives of data mining is to align


marketing investment with customer potential
Data Mining in Today’s Environment

• Use of recommender engines to determine what to sell or offer


next

We all know the the business value. But what is the


other inherent value that shopify offers?

What is the real value behind Facebook or how did Facebook


really monetize its business propsition

14
Data Mining in Today’s Environment

What are the techniques used by these technologies?

How can they respond

How do they use data?

2 Main Approaches
• Advanced
• Non-Advanced

15
Data Mining in Today’s Environment

• AI is not new
• Why the popularity of AI
• How Does AI fit in ?
• Is it a part of data mining
• What is more important
• Data vs. Math

16
Data Mining in Today’s Environment

What does the mobile phone do?

Is it capturing data without you clicking on anything?

How can it be used?

17
Data Mining in Today’s Environment

• CHATGBT
• Massive amounts of data can be analyzed
– Conduct massive research
– Write essays, articles, books,and exams
– Create new art/paintings
– Create programming code
– Can create almost anything as long as there is a base or
source of data to work with.
• Where will the human fit in ?

18
Where is AI versus the brain?
• Time Saving and better
• But is it enough
• What can the human do that AI cannot

• Notice the lower parts of the brian


– Look familiar
Improving Business Results
• Data Mining is about identifying opportunities to improve business results
• This may be achieved by identifying segments of customers that outperform
others based on certain business objectives (an objective function)
• For example, the results from the predictive model below identifies customers
more or less likely to respond to a particular DM offer
Mass Marketing: Same Investment for All
Customers

High

Marketing
Investment
$/Customer

Low
Low Customer Value / Potential
High
Align Marketing Investment with Customer
Potential

High

Marketing
Investment
$/Customer

Low
Low Customer Value / Potential
High
The Big Picture…
• Effective customer segmentation in the analysis phase
drives program planning, execution of communications,
and program measurement
Analyse Plan Interact Measure

· Develop knowledge base · Develop investment · Create communications · Track and


· Define segments strategy · Engage customer via measure
· Rank · Develop proper channels
communications and · Solicit feedback
contact strategies

Customer
Valuation Develop Targeted
Segment Communications
Creation
Develop
and Measurement
Customer
Value Systems
Strategies
Customer Proposition
Knowledge Development Customer Contact
Management Management

Information and Learning

23
Applying Predictive Analytics

Efficient Accelerated and Higher


Longer Relationship
Acquisition Growth
Revenue

Profit

Time
Data Mining Cost
and Analytics

• Predictive analytics is about reducing cost and


increasing profits (red versus blue line)

24
Four Stages of Data Mining
The Data Mining Process:
Problem Identification Stage
1)Problem Identification

Role of Business Role of Data Miner Role of Systems

Identification and
Provide
Identify overall prioritization
information
business of business strategy
regarding current
strategy components
data environment
which can be resolved
through
predictive analytics

• Example: Improve retention results. What is the data mining impact?


The Data Mining Process:
Creation of the Analytical File
Role of Business Role of Data Miner Role of Systems
• Conduct preliminary • Acts as data
data consultant to data
diagnostics: miner:
Understand sources • Source file • Data Dictionary
of data that are used extractions
• File Layouts
in data mining project • Data dumps
• Star Schema
• Determination of
• Data Nuances/
links and keys
Interpretations
between files
• Frequency
distributions on all
fields on all files
The Data Mining Process:
Application of Data Mining Techniques

Role of Business Role of Data Miner/ Analyst

• Have clear understanding • Design appropriate reports to


of the key information communicate final data mining
within data mining solution and its expected
solution performance
• Have clear understanding • Consult and advise on how
of how data mining data mining solution should be
solution performs from used and tracked in future
business perspective campaign
• Have clear understanding
of how to use data mining
solution in future
campaign
The Data Mining Process:
Implementation/Tracking
Role of Business Role of Data Miner Role of Systems
Apply solution to Assist or run
Review current
database for upcoming program to apply
results of solution
campaign data mining
vs. results of
solution achieved Validate application of solution and
through learning by checking tracking matrix to
development random dump of 10 database for
records upcoming
Based on campaign; this will
objectives of Produce results be applicable if
marketing Based on marketing solutions are hard
initiative, needs for tracking, coded within IT
determine what create tracking matrix infrastructure
needs to be and codes that meet
tracked the tracking objectives
What is the Impact of Data Mining?
• First Example: Increase number of orders from 100,000 to 200,000
– Is this caused by data mining?
• Second Example: Increase the order rate per customer from 1% to
2% with total orders decreasing by 100,000
– Is this caused by data mining?
• A third example listed below to illustrate the impact of data mining
(assume promotion cost of $1.00 per customer)
# of customers # of orders order rate Cost Per order
scenario 1 1000000 20000 2% $50.00
scenario 2 500000 20000 4% $25.00

And I have saved $500000 to achieve the same number of orders


Problem Identification
• How does data mining impact the business?
– Example 1: Promotional Campaign to 500,000 customers; promotion cost per
piece is $1.00

– Assume data mining can bring 10% improvement in performance for all
campaigns
• What is the potential data mining impact here?

– We need to identify the performance metric


Problem Identification
• How does data mining impact the business?
– Example 1: Direct Mail Campaign to 500,000 customers; promotion cost per
customer is $1.00

Scenarios # of Customers Response Rate # of Responders Promotion Cost


With Data Mining 500,000 1.1% 5,500 $500,000
Without Data Mining 550,000 1.0% 5,500 $550,000
$ Opportunity $50,000
– Note: the calculation is an opportunity cost; it calculates the additional
promotional cost to achieve 5,500 responders without data mining
Problem Identification
– Example 2: Outbound telemarketing campaign to 300,000 customers;
promotion cost person is $6.00
Scenarios # of Customers Response Rate # of Responders Promotion Cost
With Data Mining 300,000 1.1% 3,300 $1,800,000
Without Data Mining 330,000 1.0% 3,300 $1,980,000
$ Opportunity $180,000

– Example 3: Email campaign to 1,000,000 customers


with cost per promotion of $0.10

Scenarios # of Customers Response Rate # of Responders Promotion Cost


With Data Mining 1,000,000 1.1% 11,000 $100,000
Without Data Mining 1,100,000 1.0% 11,000 $110,000
$ Opportunity $10,000
– Of the three examples, which campaign would you focus your data mining
activities on?
Problem Identification

• Data Mining in business is about efficiency


• Cost per effort and volume of records become the
determinants in assessing the $ opportunity of a given business
initiative
• What about risk
– It is still the same mindset
– You have a cost per effort is attempting to reduce risk
– And you have a number of persons in which you will deploy your strategy
with its cost per effort.
– What is the metric we are trying to optimize?
Problem Identification

• You have the following decile table where you asked to assess the $
opportunity of deploying a data mining strategy against the top
40% of the customer base. The data mining strategy is about
targeting the right customers in order to reduce risk in the most
effective manner. $ Cost per effort is $5.00 and average credit risk
rate is 2.5% Deciles Credit Risk # of Customers
0-10% 6% 10000
10%-20% 5% 10000
20%-30% 3% 10000
30%-40% 2% 10000
….
90%-100% 0.50% 10000

What is the $ opportunity

35
Problem Identification
• The $ opportunity is

# of Identified Business Cost per


Scenarios # of Customers Credit Risk Person losses effort

With Data Mining 40,000 4.00% 1,600 $200,000

Without Data
Mining 64,000 2.50% 1,600 $320,000

$ Opportunity $120,000

But again, what drives the $ opportunity

36
Identifying Data Mining Opportunities
• Explore the organizations key business challenges
• Determine if improved customer/prospect targeting or segmentation
would improve results
• Review the following questions:

 Are the overall business results reasonable?


 Is the product or service in a stable business environment?
 What is the current data environment?
 What type of budgets are available?
 What type of margins does the product or service contribute to the
organization?
 How many customers or prospects do you currently target?
 Will the results of your data mining exercise be actionable based on the results
you are trying to improve?
 How do we utilize Big Data and what is it? .
The Rule of thumb

• Based on experience, data mining will not improve consumer


behaviour by more than a factor of 5 to 1 from the average
• Company A has a 10,000 customers enrolled in a service that is
renewed on an annual basis
• Each year only 10% of all customers renew their service
• Their renewal rates for other products and services averages
70%

• Should data mining be used to improve retention?

38
Example 2: Identifying Data Opportunities
• Company B has a 1,000,000 customers and has been cross selling a
long distance phone plan for over 2 years
• Over the last 6 months acquisition results have declined by 20% and
the cost per new plan member has increased beyond target levels
(30% increase)

• Should data mining be used to improve results?


• What might you do?

• Rule of thumb: Data mining should expect to maximize desired


behaviour at no more than 5 to 1.
Example 3: Identifying Data Opportunities
Art vs. Science

• Retail Company collect no information on its customers


• Market research has indicated that the key drivers of purchase
behaviour are high income, female immigrants
– No individual-level information
– Information is available only at aggregate or postal code level
– Advantages of using advanced statistical techniques are minimized within this
data environment
– Quicker and simpler solutions will suffice
Example 3: Identifying Data Opportunities
The Solution:
• Using an “RFM” index approach, create postal code index based on three Statistics Canada
Variables:
• Median taxfiler income of postal code
• % of population female within postal code
• % of population landed immigrants within postal code

Income % Female % Landed Immig.

Average Postal Code $40,000 52% 5%

M5A 1J2 $50,000 60% 10%

Index 1.25 1.15 2

The index for M5A 1J2 is (.33 x 1.25)+ (.33 x 1.15)+ (.33 x 2) = 1.45
Example 3: Identifying Data Opportunities
• This index scheme can then be used to score each postal
code
• The 800,000 postal codes in Canada are then ranked into
20 half deciles based on descending index score

# of Postal Minimum Index


% of File # of Prospects
Codes in Interval

0-5% 40,000 5.50 80,000


5-10% 40,000 5.00 60,000
10-15% 40,000 4.80 90,000

95-100% 40,000 0.05 30,000
Total 800,000 3,000,000

• How might this retailer use this above tool?


Example 4
• An SVP of a large bank has spent thousands of dollars
creating a credit card response model
• The predictive model identifies those who are most likely
to respond to the banks next offer
• The model will allow the bank to save considerable money
– mailing only 20% of the prospects, they will generate
70% of all the responders
Example 4
• “But I need the maximum number of responders”

• Attaining even 70% of the responders will not meet the campaign
expectations

• What is the real problem here?

 Data Mining is not always necessary


Example Problems
• A response model was built to acquire new customers. Model was never used.
Given what you have learned, what might have happened here? Assume the
development of the model is sound

• Targetting models have been built and are performing very well. Overall profit is
still declining. What could be causing this?

• A PHD in Mathematics was hired to head up the Data Science Department. His
team quietly developed a number of very sophisticated tools under his sole
direction without input from other areas. None of the developed tools were ever
used and within six months, he left the organization. What might have contributed
to this situation?

• You are asked to build a customer response model. What would be your first three
questions in undertaking this project?
The key in all these examples is Asking
the right question
• What is a good question?
– Creates need for further questions
– Identifies other options in identifying problem
– Digs deep into the situation
– Avoids “whys”
– Avoids short yes or no answers
– Creates a move to confidence between the analyst and the business
stakeholder
– Can involve multiple stakeholders in obtaining the right answer

46
Question types
• Focus
– What concerns do you have
– What do you think about…
• Feeling
– How have you been affected
– What is your perception?
• Observation
– What do you see/hear/smell

• The above feedback can yield insights into the


stakeholder’s overall expectations
Question types
• Vision
• What is the overall vison /direction?
• How does this project fit in with the overall vision?
• How will we measure success that aligns with the overall vision?
• Change
• What could be changed?
• How would we change direction?
• How might change result in success
• What are the pros and cons to change?

• Analysis
– What has been done in the past and what are the results?
– What is your interpretation of results and insights
– Have you had the requisite data to support your analysis
– What is the key overall learning from your historical analysis?
Statistics Review: Mean
• Definition: the sum of all the values in a sample divided by the
number of values in the sample
• It is also referred to as the arithmetic mean, the average or the
arithmetic average 1 $ 150
2 $ 125
N 3 $ 175

x
4 $ 100
i N 5 $ 75
i 1 6 $ 110
7 $ 90
8 $ 140
• Example: Average monthly 9 $ 130
credit card spend for 10 10 $ 1,000
customers – Total $ 2,095
$2,095 / 10 = $209.50 Average $ 209.50

49
Statistics Review: Median
• Definition: the value above which half the values lie and below
which the other half lie; it is the balancing point of the distribution
• In our example, we have an
5 $ 75 even number of sample
7 $ 90 points, hence the median is
4 $ 100 the mean of the middle two
6 $ 110
2 $ 125
points
9 $ 130 ($125 + $130) / 2 = $127.50
8
1
$
$
140
150
• Notice we have rank ordered
3 $ 175 our sample to get the median
10 $ 1,000 • If we had 11 points, the
Sum $ 2,095
Average $ 209.50
median would be #6

50
Statistics Review
• Why do we need to look at median in some cases?
– Looking at mean can give misleading results if there are outlier values

51
Statistics Review: Mode
• Definition: the value within a distribution which occurs most
frequently
• Within our sample data, there is no 1 $ 75
mode 2 $ 90
• That is, there is no value which is repeated; 3 $ 100
4 $ 110
all the values are unique
5 $ 125
6 $ 130
• A histogram is the graphical 7 $ 140
8 $ 150
representation of a frequency
9 $ 175
distribution 10 $ 1,000
• The vertical axis represents the frequency Sum $ 2,095
(or count), while the horizontal axis Average $ 209.50
represents the class or actual occurrences
within a distribution
52
Basic Distributional Theory
• Distributional Theory is the foundation for all advanced
mathematics
• Consider the following three distributions:

Symmetrical
distribution
Asymmetrical
distribution

53
Symmetrical Distribution
• For a symmetrical distribution, the mean, median and mode are
equal
• The normal distribution is a symmetrical distribution with special
properties

Mean

Median
Notice how these
Mode align!

54
Asymmetrical Distribution
• For an asymmetrical distribution, the mean, median and mode are
NOT equal
• We generally refer to these distributions as skewed distributions

• What does this lead us to?


• How is the data dispersed
or how does the data
vary?

Mode Mean Notice how these


Median DO NOT align!

55
Central Tendency
• With the foundation of basic distributional theory we can begin to
consider central tendency of our distribution or Central Limit
Theorem
• Range
• Standard Deviation
• Skewness

56
Range
• Definition: the difference between the largest and
smallest of a data set
– Example:
Average monthly credit card 1 $ 75
spend for 10 customers 2 $ 90
3 $ 100
range = $1,000 - $75 4 $ 110
range = $925
5 $ 125
6 $ 130
7 $ 140
8 $ 150
9 $ 175
10 $ 1,000

57
Standard Deviation
• Definition: a measure of the amount by which the values in a
sample differ from their mean
• It is the square root of the variance
– Also referred to as the second moment about the mean

 (x  x)
i 1
i
2
( N  1)

• Example: Average monthly credit card


spend for 10 customers

Standard deviation = $279.32


58
Capping of outliers

• In the real world of data mining, all datasets are asymmetric


• Not really an issue unless there are extreme outliers and with
continuous data
• How do we handle?
– Sort records by variable value from lowest value to highest value into 100
centiles
– Take top 100 centile and calculate std.dev. And then add 2 std . Dev. To
mean in top centile to get capped value
• Ex: 1000+ 2std.dev.X 100= 1200 for capped value at high end.
– For lowest centile where values are negative, we calculate std.dev. And
then subtract 2 std. dev. To mean in lowest centile.
• Ex. -500-2 std.dv. X75= -650 for capped value at low end.

59
Standard Deviation
• For a binomial distribution, such as response, we must use a
different formula
1 Responder
0 Non - Responder
0 Non - Responder
1 Responder
0 Non - Responder
( p * q) ( N ) 1
0
Responder
Non - Responder
0 Non - Responder
0 Non - Responder
0 Non - Responder
0.300 Mean
0.145 St. Dev.

60
Standard Deviation
• What does 2 standard deviations mean?
– That we are 95% confident that a result from another sample will be within
that confidence range

• Suppose in a sample of 500 people, we have the following:


– Average age is 42
– Standard deviation age is 5 years

• What can we communicate to business users about this sample?


– 95% confident that the range of values around the
mean will be between 32 < = 42 < = 52

61
Are They the Same?
• Consider the following two distributions ...

Distribution A Distribution B
0 4,500
1,000 4,600
2,000 4,700
3,000 4,800
4,000 4,900
5,000 5,000
6,000 5,100
7,000 5,200
8,000 5,300
9,000 5,400
10,000 5,500
Mean 5,000 5,000
St. Dev. 3,316.62 331.66

62
Are They the Same?
• Even though both A and B have the same mean, the standard
deviation of A is 10 times that of B, hence they are NOT the same

10

Blue: DIST A 7

Purple: DIST B 6

0
1 2 3 4 5 6 7 8 9 10 11

63

You might also like