0% found this document useful (0 votes)

4 views

W1 Lecture1 May8 2023

The document outlines the objectives and processes of a data mining course, emphasizing the identification of business problems and the application of predictive analytics. It discusses the significance of data mining in modern business, including its role in enhancing customer targeting and improving marketing strategies. Additionally, it highlights the importance of skilled personnel and common software used in data mining, while providing examples of its applications across various industries.

Uploaded by

Almaas Zafar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

W1 Lecture1 May8 2023

Uploaded by

Almaas Zafar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 63

Lecture 1

Desired Outcomes: Data Mining

1. From a Top Line Perspective:

1. How to identify a business problem that is relevant for data mining
2. What is the process we undertake to solve a given business problem.
3. Develop a discipline and approach in dealing with data to solve
business problems
4. What are the different types of statistics and techniques that are used
to solve a given business problem
5. Steps used in building and evaluating a predictive model.
6. What are the different types of segmentation and how are they used
in data mining
Desired Outcomes: Data Mining

• Exploring the more recent advancements in data science

– Big Data
– Text Mining
– Artificial Intelligence

• An understanding of the significance and role of the future data

scientist
– The Hybrid
What This Course Will NOT Do
• Teach you all the statistics you need to do data mining
• Replace real-world experience analyzing databases
• Turn you into an immediate data mining practitioner
What This Course WILL Do
• Help you understand how and when to use data mining
• Assist you in talking to data miners (internally or externally)
• Begin your training and provide an initial foundation of knowledge as
a data miner
What is Data Mining?
•The process of exploration and analysis, by automatic means, of large
quantities of data to discover meaningful patterns and rules

•What does this mean from a business standpoint ?

– Capitalization of above learning to maximize ROI for a given business process
What is Data Mining?
• Data Mining is revolutionizing business today

• The old business paradigms are no longer acceptable

• Companies recognize their information as a critical asset

• The most successful companies in the coming millennium will

be able to intelligently utilize this information for profit-
maximization decisions
Why the Growth in Data Mining?
• Marketers are no longer revenue-driven, but ROI driven

• Organizations are becoming customer centric vs. product centric

• Recognition of information as an asset but one that can be used

across all industry sectors and beyond just the marketing domain.

• Too much noise and confusion in the market place

• Societal changes include:

– Consumers are time conscious
– Emphasis on quality and value
– Aging population
– Emphasis on “What's in it for me”?
Why the Growth in Data Mining?
• Technological Changes
– Increased storage and processing capacity within a constantly cost- reduction
environment
– Increased use of statistical tools and software for enhancing business decision-
making

• One-to-One Marketing is Becoming the “Norm”

– Increased emphasis on developing customer loyalty programs
– Information represents a critical requirement in developing customer loyalty
programs
– Mining the above information intelligently is the key towards successful
customer loyalty programs

• The Web and Big Data

– Easy and timely access to large volume of data
Data Mining as a Profession
• The most important asset for successful data mining is people
• Successful hiring factors to look for are:
 Quantitative skills
 Business and problem-solving skills
 Programming skills
 Knowledge of data structure, file structure, system structure and their integration
 Communication skills and ability to liaise with marketing and systems departments

• Which of the above will become less important in an increasingly

automated work environment.
Common Software
• SAS (Enterprise Miner, Base SAS)
• Power BI
• Tableau
• Statistica
• KXEN
• Angoss Knowledge Studio
• R
• Python
• Impala
• Data Robot
• Alteryx
• etc
Common Applications
• Fraud detection
• Marketing
• Drug testing
• Quality control
• Credit scoring
• Crime prevention/security
• Inventory control and planning
• Sports
• IOT/Predictive Maintenance
Common Marketing Applications
1) Acquisition of new customers
2) Developing up-sell strategies
3) Developing cross-sell strategies
4) Reducing customer defection
5) Creation of target customer groups for existing customer marketing
programs
6) Campaign management analysis
7) Identifying high value and high potential value customers
8) Product affinity and bundling analysis
9) Retail site location analysis and product distribution analysis

One of the primary objectives of data mining is to align

marketing investment with customer potential
Data Mining in Today’s Environment

• Use of recommender engines to determine what to sell or offer

We all know the the business value. But what is the

other inherent value that shopify offers?

What is the real value behind Facebook or how did Facebook

really monetize its business propsition

14
Data Mining in Today’s Environment

What are the techniques used by these technologies?

How can they respond

How do they use data?

2 Main Approaches
• Advanced
• Non-Advanced

15
Data Mining in Today’s Environment

• AI is not new
• Why the popularity of AI
• How Does AI fit in ?
• Is it a part of data mining
• What is more important
• Data vs. Math

16
Data Mining in Today’s Environment

What does the mobile phone do?

Is it capturing data without you clicking on anything?

How can it be used?

17
Data Mining in Today’s Environment

• CHATGBT
• Massive amounts of data can be analyzed
– Conduct massive research
– Write essays, articles, books,and exams
– Create new art/paintings
– Create programming code
– Can create almost anything as long as there is a base or
source of data to work with.
• Where will the human fit in ?

18
Where is AI versus the brain?
• Time Saving and better
• But is it enough
• What can the human do that AI cannot

• Notice the lower parts of the brian

– Look familiar
Improving Business Results
• Data Mining is about identifying opportunities to improve business results
• This may be achieved by identifying segments of customers that outperform
others based on certain business objectives (an objective function)
• For example, the results from the predictive model below identifies customers
more or less likely to respond to a particular DM offer
Mass Marketing: Same Investment for All
Customers

High

Marketing
Investment
$/Customer

Low
Low Customer Value / Potential
High
Align Marketing Investment with Customer
Potential

High

Marketing
Investment
$/Customer

Low
Low Customer Value / Potential
High
The Big Picture…
• Effective customer segmentation in the analysis phase
drives program planning, execution of communications,
and program measurement
Analyse Plan Interact Measure

· Develop knowledge base · Develop investment · Create communications · Track and

· Define segments strategy · Engage customer via measure
· Rank · Develop proper channels
communications and · Solicit feedback
contact strategies

Customer
Valuation Develop Targeted
Segment Communications
Creation
Develop
and Measurement
Customer
Value Systems
Strategies
Customer Proposition
Knowledge Development Customer Contact
Management Management

Information and Learning

23
Applying Predictive Analytics

Efficient Accelerated and Higher

Longer Relationship
Acquisition Growth
Revenue

Profit

Time
Data Mining Cost
and Analytics

• Predictive analytics is about reducing cost and

increasing profits (red versus blue line)

24
Four Stages of Data Mining
The Data Mining Process:
Problem Identification Stage
1)Problem Identification

Role of Business Role of Data Miner Role of Systems

Identification and
Provide
Identify overall prioritization
information
business of business strategy
regarding current
strategy components
data environment
which can be resolved
through
predictive analytics

• Example: Improve retention results. What is the data mining impact?

The Data Mining Process:
Creation of the Analytical File
Role of Business Role of Data Miner Role of Systems
• Conduct preliminary • Acts as data
data consultant to data
diagnostics: miner:
Understand sources • Source file • Data Dictionary
of data that are used extractions
• File Layouts
in data mining project • Data dumps
• Star Schema
• Determination of
• Data Nuances/
links and keys
Interpretations
between files
• Frequency
distributions on all
fields on all files
The Data Mining Process:
Application of Data Mining Techniques

Role of Business Role of Data Miner/ Analyst

• Have clear understanding • Design appropriate reports to

of the key information communicate final data mining
within data mining solution and its expected
solution performance
• Have clear understanding • Consult and advise on how
of how data mining data mining solution should be
solution performs from used and tracked in future
business perspective campaign
• Have clear understanding
of how to use data mining
solution in future
campaign
The Data Mining Process:
Implementation/Tracking
Role of Business Role of Data Miner Role of Systems
Apply solution to Assist or run
Review current
database for upcoming program to apply
results of solution
campaign data mining
vs. results of
solution achieved Validate application of solution and
through learning by checking tracking matrix to
development random dump of 10 database for
records upcoming
Based on campaign; this will
objectives of Produce results be applicable if
marketing Based on marketing solutions are hard
initiative, needs for tracking, coded within IT
determine what create tracking matrix infrastructure
needs to be and codes that meet
tracked the tracking objectives
What is the Impact of Data Mining?
• First Example: Increase number of orders from 100,000 to 200,000
– Is this caused by data mining?
• Second Example: Increase the order rate per customer from 1% to
2% with total orders decreasing by 100,000
– Is this caused by data mining?
• A third example listed below to illustrate the impact of data mining
(assume promotion cost of $1.00 per customer)
# of customers # of orders order rate Cost Per order
scenario 1 1000000 20000 2% $50.00
scenario 2 500000 20000 4% $25.00

And I have saved $500000 to achieve the same number of orders

Problem Identification
• How does data mining impact the business?
– Example 1: Promotional Campaign to 500,000 customers; promotion cost per
piece is $1.00

– Assume data mining can bring 10% improvement in performance for all
campaigns
• What is the potential data mining impact here?

– We need to identify the performance metric

Problem Identification
• How does data mining impact the business?
– Example 1: Direct Mail Campaign to 500,000 customers; promotion cost per
customer is $1.00

Scenarios # of Customers Response Rate # of Responders Promotion Cost

With Data Mining 500,000 1.1% 5,500 $500,000
Without Data Mining 550,000 1.0% 5,500 $550,000
$ Opportunity $50,000
– Note: the calculation is an opportunity cost; it calculates the additional
promotional cost to achieve 5,500 responders without data mining
Problem Identification
– Example 2: Outbound telemarketing campaign to 300,000 customers;
promotion cost person is $6.00
Scenarios # of Customers Response Rate # of Responders Promotion Cost
With Data Mining 300,000 1.1% 3,300 $1,800,000
Without Data Mining 330,000 1.0% 3,300 $1,980,000
$ Opportunity $180,000

– Example 3: Email campaign to 1,000,000 customers

with cost per promotion of $0.10

Scenarios # of Customers Response Rate # of Responders Promotion Cost

With Data Mining 1,000,000 1.1% 11,000 $100,000
Without Data Mining 1,100,000 1.0% 11,000 $110,000
$ Opportunity $10,000
– Of the three examples, which campaign would you focus your data mining
activities on?
Problem Identification

• Data Mining in business is about efficiency

• Cost per effort and volume of records become the
determinants in assessing the $ opportunity of a given business
initiative
• What about risk
– It is still the same mindset
– You have a cost per effort is attempting to reduce risk
– And you have a number of persons in which you will deploy your strategy
with its cost per effort.
– What is the metric we are trying to optimize?
Problem Identification

• You have the following decile table where you asked to assess the $
opportunity of deploying a data mining strategy against the top
40% of the customer base. The data mining strategy is about
targeting the right customers in order to reduce risk in the most
effective manner. $ Cost per effort is $5.00 and average credit risk
rate is 2.5% Deciles Credit Risk # of Customers
0-10% 6% 10000
10%-20% 5% 10000
20%-30% 3% 10000
30%-40% 2% 10000
….
90%-100% 0.50% 10000

What is the $ opportunity

35
Problem Identification
• The $ opportunity is

# of Identified Business Cost per

Scenarios # of Customers Credit Risk Person losses effort

With Data Mining 40,000 4.00% 1,600 $200,000

Without Data
Mining 64,000 2.50% 1,600 $320,000

$ Opportunity $120,000

But again, what drives the $ opportunity

36
Identifying Data Mining Opportunities
• Explore the organizations key business challenges
• Determine if improved customer/prospect targeting or segmentation
would improve results
• Review the following questions:

 Are the overall business results reasonable?

 Is the product or service in a stable business environment?
 What is the current data environment?
 What type of budgets are available?
 What type of margins does the product or service contribute to the
organization?
 How many customers or prospects do you currently target?
 Will the results of your data mining exercise be actionable based on the results
you are trying to improve?
 How do we utilize Big Data and what is it? .
The Rule of thumb

• Based on experience, data mining will not improve consumer

behaviour by more than a factor of 5 to 1 from the average
• Company A has a 10,000 customers enrolled in a service that is
renewed on an annual basis
• Each year only 10% of all customers renew their service
• Their renewal rates for other products and services averages
70%

• Should data mining be used to improve retention?

38
Example 2: Identifying Data Opportunities
• Company B has a 1,000,000 customers and has been cross selling a
long distance phone plan for over 2 years
• Over the last 6 months acquisition results have declined by 20% and
the cost per new plan member has increased beyond target levels
(30% increase)

• Should data mining be used to improve results?

• What might you do?

• Rule of thumb: Data mining should expect to maximize desired

behaviour at no more than 5 to 1.
Example 3: Identifying Data Opportunities
Art vs. Science

• Retail Company collect no information on its customers

• Market research has indicated that the key drivers of purchase
behaviour are high income, female immigrants
– No individual-level information
– Information is available only at aggregate or postal code level
– Advantages of using advanced statistical techniques are minimized within this
data environment
– Quicker and simpler solutions will suffice
Example 3: Identifying Data Opportunities
The Solution:
• Using an “RFM” index approach, create postal code index based on three Statistics Canada
Variables:
• Median taxfiler income of postal code
• % of population female within postal code
• % of population landed immigrants within postal code

Income % Female % Landed Immig.

Average Postal Code $40,000 52% 5%

M5A 1J2 $50,000 60% 10%

Index 1.25 1.15 2

The index for M5A 1J2 is (.33 x 1.25)+ (.33 x 1.15)+ (.33 x 2) = 1.45
Example 3: Identifying Data Opportunities
• This index scheme can then be used to score each postal
code
• The 800,000 postal codes in Canada are then ranked into
20 half deciles based on descending index score

# of Postal Minimum Index

% of File # of Prospects
Codes in Interval

0-5% 40,000 5.50 80,000

5-10% 40,000 5.00 60,000
10-15% 40,000 4.80 90,000
…
95-100% 40,000 0.05 30,000
Total 800,000 3,000,000

• How might this retailer use this above tool?

Example 4
• An SVP of a large bank has spent thousands of dollars
creating a credit card response model
• The predictive model identifies those who are most likely
to respond to the banks next offer
• The model will allow the bank to save considerable money
– mailing only 20% of the prospects, they will generate
70% of all the responders
Example 4
• “But I need the maximum number of responders”

• Attaining even 70% of the responders will not meet the campaign
expectations

• What is the real problem here?

 Data Mining is not always necessary

Example Problems
• A response model was built to acquire new customers. Model was never used.
Given what you have learned, what might have happened here? Assume the
development of the model is sound

• Targetting models have been built and are performing very well. Overall profit is
still declining. What could be causing this?

• A PHD in Mathematics was hired to head up the Data Science Department. His
team quietly developed a number of very sophisticated tools under his sole
direction without input from other areas. None of the developed tools were ever
used and within six months, he left the organization. What might have contributed
to this situation?

• You are asked to build a customer response model. What would be your first three
questions in undertaking this project?
The key in all these examples is Asking
the right question
• What is a good question?
– Creates need for further questions
– Identifies other options in identifying problem
– Digs deep into the situation
– Avoids “whys”
– Avoids short yes or no answers
– Creates a move to confidence between the analyst and the business
stakeholder
– Can involve multiple stakeholders in obtaining the right answer

46
Question types
• Focus
– What concerns do you have
– What do you think about…
• Feeling
– How have you been affected
– What is your perception?
• Observation
– What do you see/hear/smell

• The above feedback can yield insights into the

stakeholder’s overall expectations
Question types
• Vision
• What is the overall vison /direction?
• How does this project fit in with the overall vision?
• How will we measure success that aligns with the overall vision?
• Change
• What could be changed?
• How would we change direction?
• How might change result in success
• What are the pros and cons to change?

• Analysis
– What has been done in the past and what are the results?
– What is your interpretation of results and insights
– Have you had the requisite data to support your analysis
– What is the key overall learning from your historical analysis?
Statistics Review: Mean
• Definition: the sum of all the values in a sample divided by the
number of values in the sample
• It is also referred to as the arithmetic mean, the average or the
arithmetic average 1 $ 150
2 $ 125
N 3 $ 175

x
4 $ 100
i N 5 $ 75
i 1 6 $ 110
7 $ 90
8 $ 140
• Example: Average monthly 9 $ 130
credit card spend for 10 10 $ 1,000
customers – Total $ 2,095
$2,095 / 10 = $209.50 Average $ 209.50

49
Statistics Review: Median
• Definition: the value above which half the values lie and below
which the other half lie; it is the balancing point of the distribution
• In our example, we have an
5 $ 75 even number of sample
7 $ 90 points, hence the median is
4 $ 100 the mean of the middle two
6 $ 110
2 $ 125
points
9 $ 130 ($125 + $130) / 2 = $127.50
8
1
$
$
140
150
• Notice we have rank ordered
3 $ 175 our sample to get the median
10 $ 1,000 • If we had 11 points, the
Sum $ 2,095
Average $ 209.50
median would be #6

50
Statistics Review
• Why do we need to look at median in some cases?
– Looking at mean can give misleading results if there are outlier values

51
Statistics Review: Mode
• Definition: the value within a distribution which occurs most
frequently
• Within our sample data, there is no 1 $ 75
mode 2 $ 90
• That is, there is no value which is repeated; 3 $ 100
4 $ 110
all the values are unique
5 $ 125
6 $ 130
• A histogram is the graphical 7 $ 140
8 $ 150
representation of a frequency
9 $ 175
distribution 10 $ 1,000
• The vertical axis represents the frequency Sum $ 2,095
(or count), while the horizontal axis Average $ 209.50
represents the class or actual occurrences
within a distribution
52
Basic Distributional Theory
• Distributional Theory is the foundation for all advanced
mathematics
• Consider the following three distributions:

Symmetrical
distribution
Asymmetrical
distribution

53
Symmetrical Distribution
• For a symmetrical distribution, the mean, median and mode are
equal
• The normal distribution is a symmetrical distribution with special
properties

Mean

Median
Notice how these
Mode align!

54
Asymmetrical Distribution
• For an asymmetrical distribution, the mean, median and mode are
NOT equal
• We generally refer to these distributions as skewed distributions

• What does this lead us to?

• How is the data dispersed
or how does the data
vary?

Mode Mean Notice how these

Median DO NOT align!

55
Central Tendency
• With the foundation of basic distributional theory we can begin to
consider central tendency of our distribution or Central Limit
Theorem
• Range
• Standard Deviation
• Skewness

56
Range
• Definition: the difference between the largest and
smallest of a data set
– Example:
Average monthly credit card 1 $ 75
spend for 10 customers 2 $ 90
3 $ 100
range = $1,000 - $75 4 $ 110
range = $925
5 $ 125
6 $ 130
7 $ 140
8 $ 150
9 $ 175
10 $ 1,000

57
Standard Deviation
• Definition: a measure of the amount by which the values in a
sample differ from their mean
• It is the square root of the variance
– Also referred to as the second moment about the mean

 (x  x)
i 1
i
2
( N  1)

• Example: Average monthly credit card

spend for 10 customers

Standard deviation = $279.32

58
Capping of outliers

• In the real world of data mining, all datasets are asymmetric

• Not really an issue unless there are extreme outliers and with
continuous data
• How do we handle?
– Sort records by variable value from lowest value to highest value into 100
centiles
– Take top 100 centile and calculate std.dev. And then add 2 std . Dev. To
mean in top centile to get capped value
• Ex: 1000+ 2std.dev.X 100= 1200 for capped value at high end.
– For lowest centile where values are negative, we calculate std.dev. And
then subtract 2 std. dev. To mean in lowest centile.
• Ex. -500-2 std.dv. X75= -650 for capped value at low end.

59
Standard Deviation
• For a binomial distribution, such as response, we must use a
different formula
1 Responder
0 Non - Responder
0 Non - Responder
1 Responder
0 Non - Responder
( p * q) ( N ) 1
0
Responder
Non - Responder
0 Non - Responder
0 Non - Responder
0 Non - Responder
0.300 Mean
0.145 St. Dev.

60
Standard Deviation
• What does 2 standard deviations mean?
– That we are 95% confident that a result from another sample will be within
that confidence range

• Suppose in a sample of 500 people, we have the following:

– Average age is 42
– Standard deviation age is 5 years

• What can we communicate to business users about this sample?

– 95% confident that the range of values around the
mean will be between 32 < = 42 < = 52

61
Are They the Same?
• Consider the following two distributions ...

Distribution A Distribution B
0 4,500
1,000 4,600
2,000 4,700
3,000 4,800
4,000 4,900
5,000 5,000
6,000 5,100
7,000 5,200
8,000 5,300
9,000 5,400
10,000 5,500
Mean 5,000 5,000
St. Dev. 3,316.62 331.66

62
Are They the Same?
• Even though both A and B have the same mean, the standard
deviation of A is 10 times that of B, hence they are NOT the same

Blue: DIST A 7

Purple: DIST B 6

0
1 2 3 4 5 6 7 8 9 10 11

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
SPC Case Analysis: Americo Drilling Supplies MGT 3332 - Spring 2019
No ratings yet
SPC Case Analysis: Americo Drilling Supplies MGT 3332 - Spring 2019
10 pages
W1_lecture1_may8_2023
No ratings yet
W1_lecture1_may8_2023
63 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
47 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Lecture 1 Ok
No ratings yet
Lecture 1 Ok
35 pages
DWDM
No ratings yet
DWDM
30 pages
Data Mining
No ratings yet
Data Mining
26 pages
1 DM Intro
No ratings yet
1 DM Intro
38 pages
data mining
No ratings yet
data mining
17 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
No ratings yet
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
67 pages
Chap 1
No ratings yet
Chap 1
45 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
The Importance of Data Mining in IT Industry
No ratings yet
The Importance of Data Mining in IT Industry
50 pages
Intro Data Mining
100% (1)
Intro Data Mining
87 pages
Data Rich, Information Poor
No ratings yet
Data Rich, Information Poor
5 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
1 DM Intro
No ratings yet
1 DM Intro
38 pages
Module 3
No ratings yet
Module 3
187 pages
BI Lecture 5ppt
No ratings yet
BI Lecture 5ppt
18 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Analytics Methods
No ratings yet
Analytics Methods
40 pages
Internal PPT - Applications and Trends in Data Mining
No ratings yet
Internal PPT - Applications and Trends in Data Mining
17 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
1 Intro
No ratings yet
1 Intro
33 pages
Decision Making
From Everand
Decision Making
Ethan Evans
No ratings yet
Data Mining (6 files merged)
No ratings yet
Data Mining (6 files merged)
86 pages
Data Mining Information
No ratings yet
Data Mining Information
7 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
27 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Data Mining Seminar
50% (2)
Data Mining Seminar
21 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
16 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
Data Mining Techniques For Marketing, Sales, and C... - (Data Mining Techniques) PDF
No ratings yet
Data Mining Techniques For Marketing, Sales, and C... - (Data Mining Techniques) PDF
4 pages
Data Mining Primer
No ratings yet
Data Mining Primer
15 pages
Data Mining From Scratch
No ratings yet
Data Mining From Scratch
17 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
Lecture 1
No ratings yet
Lecture 1
58 pages
Data Warehouse Fundamentals: Instructor: Paul Chen
No ratings yet
Data Warehouse Fundamentals: Instructor: Paul Chen
97 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
Data Mining
No ratings yet
Data Mining
7 pages
Unit 3
No ratings yet
Unit 3
22 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Busiess Analytics Data Mining Lecture 3
No ratings yet
Busiess Analytics Data Mining Lecture 3
52 pages
Data Mning Tools and TechniquesAIMA
No ratings yet
Data Mning Tools and TechniquesAIMA
97 pages
HND - BI - W8 - Data Mining
No ratings yet
HND - BI - W8 - Data Mining
19 pages
CRM - 16
No ratings yet
CRM - 16
17 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
An Introduction To Data Mining: Information System Management Assignment
No ratings yet
An Introduction To Data Mining: Information System Management Assignment
18 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Mastering Lead Generation with DeepSeek AI: Unlocking the Future of Customer Acquisition
From Everand
Mastering Lead Generation with DeepSeek AI: Unlocking the Future of Customer Acquisition
Robert Cullen
No ratings yet
Digital Strategy: Boost Your Business with Big Data and Data Science
From Everand
Digital Strategy: Boost Your Business with Big Data and Data Science
Quick Solutions
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Decision Making with Data
From Everand
Decision Making with Data
Ravi Deshpande
No ratings yet
Representation and Summary of Data - Location (Questions)
No ratings yet
Representation and Summary of Data - Location (Questions)
4 pages
Long Quiz
No ratings yet
Long Quiz
5 pages
Tar
No ratings yet
Tar
8 pages
Question Paper Final
No ratings yet
Question Paper Final
10 pages
MODULE 2 Measures of Central Tendency
No ratings yet
MODULE 2 Measures of Central Tendency
8 pages
AP Statistics Chapter 11 Homework Solutions
100% (1)
AP Statistics Chapter 11 Homework Solutions
4 pages
Statistics Economics - Probability Distribution
No ratings yet
Statistics Economics - Probability Distribution
19 pages
Standard Deviation: Prepared By: Pruna Mae Angela F. Rivera, LPT
No ratings yet
Standard Deviation: Prepared By: Pruna Mae Angela F. Rivera, LPT
22 pages
Business Statistics Sem 3
No ratings yet
Business Statistics Sem 3
21 pages
Class 5.2 B Business Statistics Central Tendency: Research Scholar Priya Chugh
No ratings yet
Class 5.2 B Business Statistics Central Tendency: Research Scholar Priya Chugh
50 pages
570 Assignment 2
No ratings yet
570 Assignment 2
20 pages
Statistics Chapter3 BSC211
No ratings yet
Statistics Chapter3 BSC211
20 pages
Statistics - G8
No ratings yet
Statistics - G8
16 pages
SMJK Yu Hua, Kajang Second Monthly Test Form 4 2018: Diagram 4
No ratings yet
SMJK Yu Hua, Kajang Second Monthly Test Form 4 2018: Diagram 4
2 pages
EES 201 Course Outline
No ratings yet
EES 201 Course Outline
3 pages
Assessment and Evaluation of Learning 2
No ratings yet
Assessment and Evaluation of Learning 2
7 pages
Statistics Diagrams PDF
0% (1)
Statistics Diagrams PDF
16 pages
IOMAC'19: Basic Concepts of Modal Scaling
No ratings yet
IOMAC'19: Basic Concepts of Modal Scaling
8 pages
Normal Distributions
No ratings yet
Normal Distributions
31 pages
LS3 Modules With Worksheets (Mean, Median, Mode and Range)
100% (2)
LS3 Modules With Worksheets (Mean, Median, Mode and Range)
18 pages
Vibraloc: - The Intelligent Vibration Monitor
No ratings yet
Vibraloc: - The Intelligent Vibration Monitor
2 pages
ASA Notes
No ratings yet
ASA Notes
28 pages
Unit 2 - Instrumentation and Control
No ratings yet
Unit 2 - Instrumentation and Control
16 pages
Test Bank for Statistics for Business and Economics 8th Edition Newbold Carlson Thorne 0132745658 9780132745659 - Full Version Is Available For Instant Download
100% (19)
Test Bank for Statistics for Business and Economics 8th Edition Newbold Carlson Thorne 0132745658 9780132745659 - Full Version Is Available For Instant Download
57 pages
Central Tendency-Ch 4
No ratings yet
Central Tendency-Ch 4
18 pages
Think Stats
100% (2)
Think Stats
142 pages
MD Under Discrete Series
No ratings yet
MD Under Discrete Series
8 pages
Introduction To Statistics - 2023-2024
No ratings yet
Introduction To Statistics - 2023-2024
38 pages

W1 Lecture1 May8 2023

Uploaded by

W1 Lecture1 May8 2023

Uploaded by

Lecture 1

Desired Outcomes: Data Mining

1. From a Top Line Perspective:

• Exploring the more recent advancements in data science

• An understanding of the significance and role of the future data

•What does this mean from a business standpoint ?

• The old business paradigms are no longer acceptable

• Companies recognize their information as a critical asset

• The most successful companies in the coming millennium will

• Organizations are becoming customer centric vs. product centric

• Recognition of information as an asset but one that can be used

• Too much noise and confusion in the market place

• Societal changes include:

• One-to-One Marketing is Becoming the “Norm”

• The Web and Big Data

• Which of the above will become less important in an increasingly

One of the primary objectives of data mining is to align

• Use of recommender engines to determine what to sell or offer

We all know the the business value. But what is the

What is the real value behind Facebook or how did Facebook

What are the techniques used by these technologies?

How can they respond

How do they use data?

What does the mobile phone do?

Is it capturing data without you clicking on anything?

How can it be used?

• Notice the lower parts of the brian

· Develop knowledge base · Develop investment · Create communications · Track and

Information and Learning

Efficient Accelerated and Higher

• Predictive analytics is about reducing cost and

Role of Business Role of Data Miner Role of Systems

• Example: Improve retention results. What is the data mining impact?

Role of Business Role of Data Miner/ Analyst

• Have clear understanding • Design appropriate reports to

And I have saved $500000 to achieve the same number of orders

– We need to identify the performance metric

Scenarios # of Customers Response Rate # of Responders Promotion Cost

– Example 3: Email campaign to 1,000,000 customers

Scenarios # of Customers Response Rate # of Responders Promotion Cost

• Data Mining in business is about efficiency

What is the $ opportunity

# of Identified Business Cost per

With Data Mining 40,000 4.00% 1,600 $200,000

But again, what drives the $ opportunity

 Are the overall business results reasonable?

• Based on experience, data mining will not improve consumer

• Should data mining be used to improve retention?

• Should data mining be used to improve results?

• Rule of thumb: Data mining should expect to maximize desired

• Retail Company collect no information on its customers

Income % Female % Landed Immig.

Average Postal Code $40,000 52% 5%

M5A 1J2 $50,000 60% 10%

Index 1.25 1.15 2

# of Postal Minimum Index

0-5% 40,000 5.50 80,000

• How might this retailer use this above tool?

• What is the real problem here?

 Data Mining is not always necessary

• The above feedback can yield insights into the

• What does this lead us to?

Mode Mean Notice how these

• Example: Average monthly credit card

Standard deviation = $279.32

• In the real world of data mining, all datasets are asymmetric

• Suppose in a sample of 500 people, we have the following:

• What can we communicate to business users about this sample?

You might also like