0% found this document useful (0 votes)

61 views

4 Datamining

This document provides an overview of data mining. It defines data mining as the process of discovering hidden patterns in large data sets. The key sections discuss what data mining is, the typical data mining process, common data mining functions like association, classification, clustering and prediction, technologies used for data mining like statistics, decision trees and neural networks, and how data mining differs from classical statistical analysis in its focus on prediction rather than model fit.

Uploaded by

ironchefff

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

4 Datamining

Uploaded by

ironchefff

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 90

Data Mining

What is data mining? Data Mining process Data mining functions Data mining technologies Text mining and Web mining Deploy Data mining for competitive advantage

Data Mining

What is Data Mining?

Data mining is a process of identifying hidden patterns and relationships within data

What is Data Mining?

Non-trivial extraction of implicit, previously unknown and potentially useful information from data Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns The use of specific class of tools (data mining techniques) in the analysis of data

What is Data Mining?

Data Mining is an analytic process designed to

explore data (usually large amounts of data typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data. The ultimate goal of data mining is prediction and predictive data mining is the most common type of data mining and one that has the most direct business applications.

The DM process Data view

Data mining became feasible

The data warehouse that enterprises are building until now have largely ignored Factors make data mining feasible
organizations are gathering more data from on-line TPS with lower storage cost high computation power allows using complex data mining algorithm

The Use of Data Mining

With data mining, it is possible to better manage product warranties, predict purchases of retail stock, unearth fraud, determine credit risk, and define new products and services.

The importance of data mining

Data mining will become much more important, and companies will throw away nothing about their customers because it will be so valuable. If youre not doing this, youre out of business
--- Dr. Penzias, a Nobel Prize winner interviewed in ComputerWorld in January 1999

The process of Data Mining

The DM Process - Overview

Reporting

Different techniques: (10%)

(90%)

The steps in Data Mining (1)

1. Develop an understanding of the purpose of the data mining project 2. Obtain the data set to be used in the analysis. Random sampling. 3. Explore, clean, and preprocess the data. Missing data and outliers

The steps in Data Mining (2)

4. Reduce the data if necessary, separate them into training, validation, and test datasets, eliminating unnecessary variables, transforming variables, and creating new variables.

The steps in Data Mining (3)

5. Determine the data mining task (classification, prediction, clustering etc.) 6. Choose the data mining techniques to be used (regression, neural nets, etc.) 7. Use algorithms to perform the task. It is typically an iterative process.

The steps in Data Mining (4)

8. Interpret the results of the algorithm. Select the best algorithm and test its performance. 9. Deploy the model. Integrate the model into operational systems and run it on real records to produce decisions or actions.

Data Mining Functions

http://www.almaden.ibm.com/cs/ quest/TECH.html

Information obtained from Data Mining

Data mining yields five basic types of information:

Association - occurrences are linked to a single event. beer purchasers also buy peanuts 70% of the time Sequences - events are linked over time. a new

carpet purchase linked to new curtains

Classification - patterns are recognized that describe the characteristics of a group, such as customers who cancel credit cards

Information obtained from Data Mining

Clustering - discovers undiscovered groupings ``Buyers of expensive sport cars are

typically young urban professionals whereas luxury sedans are bought by elderly wealthy persons.''

Forecasting - estimates future value such as inventory turnover

Association

Given a database of transactions, where each transaction consists of a set of items, discover all associations such that the presence of one set of items in a transaction implies the presence of another set of items.

Association rules

In 80% of the cases when people buy bread, they also buy milk. This tells us of the association between bread and milk. We represent it as - bread => milk | 80% This should be read as - "Bread means or implies milk, 80% of the time." Here 80% is the "confidence factor" of the rule. Association rules can be between more than 2 items. For example

bread, milk => jam | 60% bread => milk, jam | 40%

Association Rule Discovery Applications

Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items.

Sequential Pattern Discovery: Definition

Given a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.

Sequential Pattern Discovery: Examples

In point-of-sale transaction sequences, Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk) Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket)

Classification definition

Given a collection of records (training set )

Each record contains a set of attributes (predictors), and a categorical variable- as known as class. Light/regular coke, delayed flight/not, competitive eBay bidding/not, fraudulent/not, respondent/not

Find a model: values of Predictors class membership. Goal: previously unseen records should be assigned a class as accurately as possible. Classification algorithms: Nave rule, Nave Bayes, kNearest Neighbors, classification trees, Neural Nets,

Example of Classification

Fraud Detection Goal: Predict fraudulent cases in credit card transactions. Approach:

Use credit card transactions and the information on its account-holder as attributes.

When does a customer buy, what does he buy, how often he pays on time, etc

Label past transactions as fraud or fair transactions. This forms the class attribute. Learn a model for the class of the transactions. Use this model to detect fraud by observing credit card transactions on an account.

Deviation/Anomaly Detection

Detect significant deviations from normal behavior Applications:

Credit Card Fraud Detection

Network Intrusion Detection

Typical network traffic at University level may reach over 100 million connections per day

Clustering

The process of organizing objects into groups whose members are similar in some way The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. But how to decide what constitutes a good clustering? There is no absolute best criterion which would be independent of the final aim of the clustering. Distance-based, fit-to-descriptive concepts An unsupervised learning problem

Clustering Definition

Given a set of records (rows), each having a set of attributes, and a similarity measure among them, find clusters such that Data points in one cluster are more similar to one another. Data points in separate clusters are less similar to one another. Similarity Measures: Euclidean Distance if attributes are continuous. Other Problem-specific Measures.

Clustering vs. Classification

Possible Use of Clustering

Marketing: finding groups of customers with similar behavior

given a large database of customer data containing their properties and past buying records; Biology: categorizing of plants and animals given their features; Libraries: book arrangement on shelves; Insurance: identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds; City-planning: identifying groups of houses according to their house type, value and geographical location; Earthquake studies: clustering observed earthquake epicenters to identify dangerous zones; WWW: document classification; clustering weblog data to discover groups of similar access patterns.

Prediction

Predict a value of a given continuous variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Examples: Predicting sales amounts of new product based on advertising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.

http://www.thearling.com/text/dmtechniques/d mtechniques.htm

Data Mining Technologies

Data mining technologies

Technology used for data mining

visualization statistic analysis decision trees rule induction neural networks

Statistical analysis

While in majority of the well known statistical packages traditional statistical methods are supplemented by some elements of data mining, their main data analysis methods remain to be of the classical nature: correlation, regression, and factor analyses and other techniques of that kind. Such systems cannot determine the form of dependencies hidden in data and require that the user provides his/her own hypotheses that will be tested by the system

Data Mining Vs. Classical Statistical Analysis

In Classical Statistical Analysis :

The same data is used for model development & reliability assessment Good for describing relationships (e.g., regression) Over-fitting can be common limited predictive abilities
Different datasets are used for model development, calibration & assessment The objective is for prediction

In Data Mining:

The Three Sisters of analysis

Focus is on fit of data to model

Focus is on predictive accuracy How will the model perform on a new dataset?

The Fit Concept

error terms (e)/ residuals

Over-fitting
Credit card spending

Level of Income 43

Time-Series Forecasting

Time-series forecasting is a forecasting method that uses a set of historical values to predict an outcome. These historic values, often referred to as a "time series", are spaced equally over time and can represent anything from monthly sales data to daily electricity consumption to hourly call volumes. Time-series forecasting assumes that a time series is a combination of a pattern and some random error. The goal is to separate the pattern from the error by understanding the pattern's trend, its long-term increase or decrease, and its seasonality, the change caused by seasonal factors such as fluctuations in use and demand.

http://www.decisioneering.com/time-series-forecasting.html

Decision Tree

This method can be applied for solution of classification tasks As a result of applying this method to a training set, a hierarchical structure of classifying rules of the type "IF...THEN..." is created. This structure has a form of a tree (similar to the species detector from botanics or zoology).

Decision Tree

In order to decide to which class an object or a situation should be assigned one has to answer questions located at the tree nodes, starting from the root. Following this procedure one eventually comes to one of the final nodes (called leaves), where he/she finds a conclusion to which class the considered object should be assigned.

Decision tree

ID 1 2 3 4 5 6 7

Debt High High High Low Low Low Low

Income High High Low Low Low High High

Employment Self-employed Salaried Salaried Salaried Self-employed Self-employed Salaried

Credit risk Bad Bad Bad Good Bad Good Good

Decision Tree

Rule Induction

If Debt is High then Risk is High If Debt is low and salaried then Risk is Low If Debt is low and self-employed then Risk is median

Processing loan applications

(American Express)

Given: questionnaire with financial and personal information Question: should money be lent? Simple statistical method covers 90% of cases Borderline cases referred to loan officers But: 50% of accepted borderline cases defaulted! Solution: reject all borderline cases?

No! Borderline cases are most active customers

Enter machine learning

1000 training examples of borderline cases 20 attributes:

age years with current employer years at current address years with the bank other credit cards possessed, human experts only 50%

Learned rules: correct on 70% of cases

Rules could be used to explain decisions to customers

Artificial Neural Networks

Imitates structure of live neural tissue built from separate neurons In order to make meaningful predictions a neural network first has to be trained on data describing previous situations for which both, input parameters and correct reactions to them are known.

Artificial Neural Networks

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

Artificial Neural Networks

An artificial neural network consists of a number of small primitive processing units linked together via weighted, directed connections. A learning algorithm is used to train neural networks based on sample data
weights w1 x1 w2 weights w3 Y1 Y2 input layer output layer

x2
x3

hidden layer

Artificial Neuron Networks

Debt Income x1 x2 wij Y Risk

Employment x3

Application of Neural Networks

This approach proved to be effective in problems of image recognition. However, experience shows that it is not suited well for, say, financial or serious medical applications. knowledge reflected in terms of weights of a couple hundred intra-neural connections cannot be analyzed and interpreted by a human.

Neural Networks Software

http://www.wardsystems.com

Genetic Algorithm

A genetic algorithm is a search technique used in computing to find true or approximate solutions to optimization and search problems, and is often abbreviated as GA. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination).

Genetic Algorithm

http://www.statsoft.com/textbook/stdatmin.html#Models%20for%20Data %20Mining

Text Mining

Text Mining
Application of data mining to nonstructured or less structured text files. It entails the generation of meaningful numerical indices from the unstructured text and then processing these indices using various data mining algorithms

Text mining helps organizations

Find the hidden content of documents, including additional useful relationships Relate documents across previous unnoticed divisions Group documents by common themes

Applications of text mining

Automatic detection of e-mail spam or phishing through analysis of the document content Automatic processing of messages or emails to route a message to the most appropriate party to process that message Analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses

Applications of text mining

Analysis of related scientific publications in journals to create an automated summary view of a particular discipline Creation of a relationship view of a document collection Qualitative analysis of documents to detect deception In 2007, Europol's Serious Crime division developed an analysis system in order to track transnational organized crime.

How to mine Text?

Eliminate commonly used words (stopwords) Replace words with their stems or roots (stemming algorithms) Consider synonyms and phrases Calculate the weights of the remaining terms

Web Mining

Web Mining
The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools

Types of Web Mining

Web Mining

Web content mining The extraction of useful information from Web pages Web structure mining The development of useful information from the links included in the Web documents Web usage mining The extraction of useful information from the data being generated through webpage visits, transaction, etc.

Uses for Web mining:

Determine the lifetime value of clients Design cross-marketing strategies across products Evaluate promotional campaigns Target electronic ads and coupons at user groups Predict user behavior Present dynamic information to users

Web Mining

Social network analysis

Social network analysis views social relationships in terms of network theory, consisting of nodes (representing individual actors within the network) and ties (which represent relationships between the individuals, such as friendship, kinship, organizational position, sexual relationships, etc.)

Social Network Analysis

Social network analysis has emerged as a key technique in modern sociology. It has also gained a significant following in anthropology, biology, communication studies, economics, geography, history, information science, organizational studies, social psychology, development studies, and sociolinguistics and is now commonly available as a consumer tool

Deploying Data Mining for Competitive Advantage

The act of building data-mining models does not, by itself, guarantee any business value To be used as competitive weapon, data mining must be part of a larger process that ensures that the information learned by data mining is transformed into actionable results

A process of deploying data mining for competitive advantage

Problem definition Discovery Implementation Taking action Monitoring the results

Problem definition

Wish to understand and separate customer based for two product lines: long distance and Internet access service Very competitive market Time to react limited Broad-based marketing programs inefficient for customer retention and cross-sell. Cost $275-$400 for each new subscriber

Discovery

Who are the most important, most profitable customers based on a lifetime value calculation? A new user type was identified: Power users who are heavy phone users constantly on the phone

Implementation

Create marketing campaign that provide compelling offers to power users Multiple offers may be made and data mining is used to determine which offers are most effective for which types of people at different times A customer-loyalty program to retain as many of the Power Users as they can before they leave

Taking Action

Campaigns are best targeted at the time a customer contacts you The point of contact: a call center or a Web site interaction Data-mining models need to be integrated into customer touch point

Customer interaction process

A customer calls for billing item interpretation The operator retrieves customer information from call center program While the operator explains to the customer, data mining generates campaign targeting based on up-to-date information Tailored product recommendation and special discount offer displayed to operator The operator relays the offers to the customer, referring to a displayed script

Monitoring the results

Check the success of marketing campaign real time Customer response is captured for campaign refinement Evaluating the effectiveness of data mining model Dynamic learning engine for fine tuning

Integration

Integrating data mining with business strategies and marketing campaigns Integrating data mining with a decision delivery mechanism Creating a feedback loop to monitor the success of the campaigns

Data Mining Case studies

Osn Marketing
100% (1)
Osn Marketing
32 pages
CH 2
No ratings yet
CH 2
37 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
25 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Data Mining and Its Techniques
No ratings yet
Data Mining and Its Techniques
20 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
DATA MINING
No ratings yet
DATA MINING
7 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
LECTURE NOTES ON DATA MINING and DATA WA
No ratings yet
LECTURE NOTES ON DATA MINING and DATA WA
84 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Annotating Full Document
No ratings yet
Annotating Full Document
48 pages
Data Mining
No ratings yet
Data Mining
31 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
24 pages
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
No ratings yet
Discuss The Role of Data Mining Techniques and Data Visualization in e Commerce Data Mining
13 pages
Datamining With Big Data - Siva
No ratings yet
Datamining With Big Data - Siva
69 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
No ratings yet
Data Mining: What Is Data Mining?: Correlations or Patterns Among Fields in Large Relational Databases
6 pages
DM NOTES
No ratings yet
DM NOTES
91 pages
Data Mining Unit 1-1
No ratings yet
Data Mining Unit 1-1
11 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
Data Mining Concepts - Binary
No ratings yet
Data Mining Concepts - Binary
22 pages
Unit 1
No ratings yet
Unit 1
27 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Module 4
No ratings yet
Module 4
54 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Data Warehousing and Mining With Q-Gram As An Application
No ratings yet
Data Warehousing and Mining With Q-Gram As An Application
6 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
U1_1
No ratings yet
U1_1
13 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Data Mining
100% (1)
Data Mining
40 pages
bahiru dikosa
No ratings yet
bahiru dikosa
5 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
Data Mining: What Is Data Mining?: Oracle
No ratings yet
Data Mining: What Is Data Mining?: Oracle
16 pages
Data Mining
No ratings yet
Data Mining
25 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
Unit 4
No ratings yet
Unit 4
15 pages
Data Mining: What Is Data Mining?: Oracle
No ratings yet
Data Mining: What Is Data Mining?: Oracle
16 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Final Document
No ratings yet
Final Document
25 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 14
No ratings yet
Chapter 14
22 pages
Chapter 13
No ratings yet
Chapter 13
34 pages
Chapter 5
No ratings yet
Chapter 5
58 pages
Chapter 6
No ratings yet
Chapter 6
44 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
6 Evaluation
No ratings yet
6 Evaluation
57 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
3 Olap
No ratings yet
3 Olap
73 pages
6 Evaluation
No ratings yet
6 Evaluation
57 pages
7.simple Classification
No ratings yet
7.simple Classification
45 pages
Science 2B03 "The Big Questions": Lectures: Mon/Wed/Thur 1:30 - 2:20 + Your Tutorial Section ("Inquiry Group")
No ratings yet
Science 2B03 "The Big Questions": Lectures: Mon/Wed/Thur 1:30 - 2:20 + Your Tutorial Section ("Inquiry Group")
3 pages
MBA K723 Winter 2013: Data Mining and Business Intelligence
No ratings yet
MBA K723 Winter 2013: Data Mining and Business Intelligence
48 pages
Mini Project Report On: Computer Science and Engineering
No ratings yet
Mini Project Report On: Computer Science and Engineering
23 pages
2011 PG Elecive-Courses Ver 1-10
No ratings yet
2011 PG Elecive-Courses Ver 1-10
31 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
Introduction To Data Warehousing
No ratings yet
Introduction To Data Warehousing
5 pages
Paper Iii
No ratings yet
Paper Iii
3 pages
Predicting Start-Up Success With Machine Learning: Francisco Ramadas Da Silva Ribeiro Bento (M2013022)
No ratings yet
Predicting Start-Up Success With Machine Learning: Francisco Ramadas Da Silva Ribeiro Bento (M2013022)
98 pages
MSC Thesis in Computer Science PDF
100% (4)
MSC Thesis in Computer Science PDF
4 pages
Manipal University Jaipur
No ratings yet
Manipal University Jaipur
8 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
Data Analytics Introduction
No ratings yet
Data Analytics Introduction
9 pages
Lean-Six Sigma in The Age of Artificial Intelligence
No ratings yet
Lean-Six Sigma in The Age of Artificial Intelligence
5 pages
BIDA NOTES (1)
No ratings yet
BIDA NOTES (1)
67 pages
003 05 KNN - Enhancements W3L2
No ratings yet
003 05 KNN - Enhancements W3L2
10 pages
Bradford, West Yorkshire, Uk 29 June - 1 July 2010: (Cit-2010) (Icess-2010) (Scalcom-2010)
No ratings yet
Bradford, West Yorkshire, Uk 29 June - 1 July 2010: (Cit-2010) (Icess-2010) (Scalcom-2010)
10 pages
WSMA Mid-2 1
No ratings yet
WSMA Mid-2 1
26 pages
MIDS Lab Theory
No ratings yet
MIDS Lab Theory
6 pages
Data Warehousing And Data Mining
No ratings yet
Data Warehousing And Data Mining
1 page
Country Profiling Using K-Mean Cluster Analysis: Project Guide: - DR Deepankar Sinha
No ratings yet
Country Profiling Using K-Mean Cluster Analysis: Project Guide: - DR Deepankar Sinha
17 pages
Applying Data Mining Techniques in The Field of Agriculture and Allied Sciences
No ratings yet
Applying Data Mining Techniques in The Field of Agriculture and Allied Sciences
5 pages
SVM Handout
No ratings yet
SVM Handout
9 pages
Data Mining Methods
No ratings yet
Data Mining Methods
18 pages
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
No ratings yet
Survey of Heart Disease Prediction Based On Data Mining Algorithms Ijariie1844
5 pages
BCA 5 and 6 Sem Final Syllabus
No ratings yet
BCA 5 and 6 Sem Final Syllabus
41 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
Churn Analysis
No ratings yet
Churn Analysis
7 pages
A Survey of Methods For Explaining Black Box Models
No ratings yet
A Survey of Methods For Explaining Black Box Models
42 pages
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
10 pages
UnivLib BookList
No ratings yet
UnivLib BookList
69 pages
CFP - 6th International Conference on Data Mining and NLP (DNLP 2025)
No ratings yet
CFP - 6th International Conference on Data Mining and NLP (DNLP 2025)
2 pages

4 Datamining

Uploaded by

4 Datamining

Uploaded by

Data Mining

What is Data Mining?

What is Data Mining?

What is Data Mining?

Data Mining is an analytic process designed to

The DM process Data view

Data mining became feasible

The Use of Data Mining

The importance of data mining

The process of Data Mining

The DM Process - Overview

Different techniques: (10%)

The steps in Data Mining (1)

The steps in Data Mining (2)

The steps in Data Mining (3)

The steps in Data Mining (4)

Data Mining Functions

Information obtained from Data Mining

Data mining yields five basic types of information:

carpet purchase linked to new curtains

Information obtained from Data Mining

Clustering - discovers undiscovered groupings ``Buyers of expensive sport cars are

Forecasting - estimates future value such as inventory turnover

Association Rule Discovery Applications

Sequential Pattern Discovery: Definition

Sequential Pattern Discovery: Examples

Given a collection of records (training set )

Detect significant deviations from normal behavior Applications:

Credit Card Fraud Detection

Network Intrusion Detection

Clustering vs. Classification

Possible Use of Clustering

Marketing: finding groups of customers with similar behavior

Data Mining Technologies

Data mining technologies

Technology used for data mining

Data Mining Vs. Classical Statistical Analysis

In Classical Statistical Analysis :

The Three Sisters of analysis

Focus is on fit of data to model

The Fit Concept

error terms (e)/ residuals

Debt High High High Low Low Low Low

Income High High Low Low Low High High

Employment Self-employed Salaried Salaried Salaried Self-employed Self-employed Salaried

Credit risk Bad Bad Bad Good Bad Good Good

Processing loan applications

No! Borderline cases are most active customers

Enter machine learning

1000 training examples of borderline cases 20 attributes:

Learned rules: correct on 70% of cases

Rules could be used to explain decisions to customers

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neuron Networks

Application of Neural Networks

Neural Networks Software

Text mining helps organizations

Applications of text mining

Applications of text mining

How to mine Text?

Types of Web Mining

Uses for Web mining:

Social network analysis

Social network analysis

Social Network Analysis

Deploying Data Mining for Competitive Advantage

Deploying Data Mining for Competitive Advantage

A process of deploying data mining for competitive advantage

Problem definition Discovery Implementation Taking action Monitoring the results

Customer interaction process

Monitoring the results

Data Mining Case studies

You might also like