0% found this document useful (0 votes)

79 views

Lecture 1-Introduction To Data Mining - M

The document discusses an introductory data mining course. It outlines the course objectives, topics, organization, schedule and provides an example document to mine. The course aims to provide key concepts of data mining techniques with a focus on classification and clustering algorithms.

Uploaded by

Khizar Shahid

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views

Lecture 1-Introduction To Data Mining - M

Uploaded by

Khizar Shahid

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 38

CSD479

Data Mining
Lecture # 1
Administrative Stuff

• Instructor: Dr. Zeeshan Gillani

[email protected]
Room # 37, Faculty Block
Lectures:
• Office Hrs: Mon 10:00 – 11:30; Thur 11:30 – 13:00 hrs
(or by appointment)
• Prerequisite: Knowledge of Statistics and Database/Data
Warehousing is helpful
Course Objectives

This course aims to provide the students with the key

concepts of applications, techniques, and methodologies of
Data Mining with the primary focus on the classification and
clustering algorithms.
Course Outline
Mining Methodology, Overview of Data Warehousing,
Overview of OLAP, Applications of Data Mining, Data
cleaning and preparation, Concept Description, Association
Rule Mining, Classification, Classification by Back
Propagation, Prediction, Decision Trees, Bayesian
Classification, Classification Accuracy, Regression for
Classification and Prediction, Distributions, Cluster
Analysis.
Course Organization
Text Book:
1. Han, J. and Kamber, M. (2011) Data Mining Concepts and
Techniques, 3rd Edition, Morgan Kaufmann.
Reference Books:
1. Provost, F. and Fawcett, T. (2013) Data Science for Business: What
you need to know about data mining and data-analytic thinking,
1st edition, O'Reilly Media.
2. Witten I. H., Frank, E. and Hall, M. A. (2011) Data Mining:
Practical Machine Learning Tools and Techniques, 3rd Edition,
Morgan Kaufmann.
Instruments:
There will be 4 assignments, 4 quizzes,
Weights: Assignments 10%
Quizzes 15%
S-I 10%
S-II 15%
Final Exam 50%
Schedule of Lectures
Lect.# Topics/Contents
1 Introduction to data Mining? Data Mining on different kind of Databases. Data
mining functionalities.
2 Data objects and Attribute Types. Some basic Statistical Descriptions of Data;
Mean, Median, Mode, S.D., Variance etc. Data Similarity and Dissimilarity
3 Non-Euclidean Distances for Nominal, Ordinal and Mixed Types attributes.
4 Data Preprocessing techniques; Data cleaning; Data integration
5 Data Integration problems, removing data redundancy using Chi-square and
correlation analysis.
6 Data Reduction; Dimensionality Reduction, Numerosity Reduction Data
Compression, PCA
7 Examples of PCA; Data Normalization.
8 Mining Frequent Patterns, Market basket analysis, frequent itemsets, frequent
pattern mining. mining association rules from frequent itemsets
9 Finding Frequent itemsets, using candidate generation, generating association
rules from frequent itemsets. Brute force algorithm and The Apriori Algorithm.
10 Finding interestingness, strong rules are not necessarily interesting, from
association analysis to correlation analysis.
11 Sessional - I
Schedule of Lectures
Lect.# Topics/Contents
12 Introduction to Classification, Classification by Decision Tree, Decision tree
induction, attribute selection measures
13
Entropy and Gini measures for tree induction, tree pruning.
14
Tree pruning, pre and post pruning, scalability
15
Model Evaluation methods. Introduction to Weka
16
Conditional Probability and Bayes Theorem
17
Introduction to Naive Bayes Classifier with examples
18 Rule-based Classification: Using IF-THEN Rules for Classification, Rule
Extraction from a Decision Tree.
19 Rule induction using a sequential covering algorithm. Methods of Rule
evaluation.
20 Introduction to Artificial Neural Network. A Multilayer feed-forward neural
network, backpropagation.
21
Example of ANN. Revision
22
Sessional-II
Schedule of Lectures
23
Discussion on S-II. Introduction to clustering, K-Mean clustering
24
Examples of k-means, k-modes, selecting best k.
25
Clustering: K-Medoids with examples
26
Clustering: Introduction to CLARA and CLARANS
27 Introduction to Hierarchical Clustering. Agglomerative Clustering
using Single Link.
28 Agglomerative Clustering using Complete Link, Average Link and
MST. Divisive Algorithms.
29
Introduction to BIRCH. Clustering Features. CF Tree
30 Major tasks of clustering evaluation, Extrinsic and intrinsic
evaluation methods. Revision
31
Terminal Exam
Introduction to Data Mining
(Chapter #1 of text book)

9
Motivation: “Necessity is the
Mother of Invention”
 Data Explosion Problem
1. Automated data collection tools (e.g. web, sensor networks) and mature
database technology lead to tremendous amounts of data stored in databases,
data warehouses and other information repositories.

2. Currently enterprises are facing data explosion problem.

3. YouTube users upload 48 hours of video, Facebook users share 684,478 pieces of
content, Instagram users share 3,600 new photos, and Tumblr sees 27,778 new
posts published.

 A full 90% of world's data generated over last two

years (Date:May 22, 2013, Source:SINTEF)
 Solution: Data warehousing and Data mining
Motivation: “Necessity is the
Mother of Invention”

 Electronic Information an Important Asset for Business

Decisions
1. With the growth of electronic information, enterprises began to
realizing that the accumulated information can be an important
asset in their business decisions.
2. There is a potential business intelligence hidden in the large volume of
data.
3. This intelligence can be the secret weapon on which the success of a
business may depend.

11
Extracting Business Intelligence
(Solution)
1. It is not a Simple Matter to discover Business
Intelligence from Mountain of Accumulated Data.

2. What is required are Techniques that allow the enterprise to

Extract the Most Valuable Information.

3. The Field of Data Mining provides such Techniques.

4. These techniques can Find Novel Patterns (unknown) that

may Assist an Enterprise in Understanding the business
better and in forecasting.
What Is Data Mining?

 Data mining (knowledge discovery in databases):

 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) information or patterns
from data in large databases

 Alternative names :
 Data mining: a misnomer?
 Knowledge discovery(mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, business intelligence, etc.
Data Mining (Example)
 Random Guessing vs. Potential Knowledge
 Suppose we have to Forecast the Probability of Rain in Islamabad city
for any particular day.
 Without any Prior Knowledge the probability of rain would be 50%
(pure random guess).
 If we had a lot of weather data, then we can extract potential
rules using Data Mining which can then forecast the chance of rain
better than random guessing.
 Example: The Rule

if [Temperature = ‘hot’ and Humidity = ‘high’] then there is 66.6%

Temperature Humidity Windy Rain
chance of rain. hot high false No
hot high true Yes
hot high false Yes
mild high false No
cool normal false No
cool normal true Yes
Examples: What is (not) Data
Mining?
 What is not Data  What is Data Mining?
Mining?

– Look up phone – Certain names are more prevalent

number in phone in certain US locations (O’Brien,
directory O’Rurke, O’Reilly… in Boston area)
– Group together similar documents
– Query a Web search returned by search engine according
engine for information to their context (e.g. Amazon
about “Amazon” rainforest, Amazon.com,)
Data Mining: A KDD Process
 Data mining: the core of
knowledge discovery
process. Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse

Data Cleaning

Data Integration

Databases
The Data Mining Process
• Step 0: Determine Business Objective/Learning the
application domain
- e.g. Forecasting the probability of rain
- Must have relevant prior knowledge and goals of application.
• Step 1: Creating a Target Data set/Prepare Data
- Data Selection
- Data Cleaning; Noisy and Missing values handling (may take 60% of
the effort!).
- Data Transformation (Normalization/Discretization).
- Attribute/Feature Selection.
• Step 2: Choosing the Function of Data Mining
- Classification, Clustering, Regression, Association Rules
• Step 3: Choosing The Mining Algorithm
- Selection of correct algorithm depending upon the quality of data.
- Selection of correct algorithm depending upon the density of data.
Step 4: Data Mining
- Search for patterns of interest:- A typical data mining algorithm can
mine millions of patterns.
• Step 5: Visualization/Knowledge Representation
- Visualization/Representation of interesting patterns, etc . and then
17
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Making
Decisions

Business
Data Presentation Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP, MDA DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Data Mining: On What Kind of Data?
1. Relational databases
2. Data warehouses
3. Transactional databases
4. Advanced DB and information repositories
 Time-series data and temporal data
 Text databases
 Multimedia databases
 Data Stream (Sensor Networks Data)
 WWW
Data Mining: Confluence of Multiple
Disciplines

Database
Statistics
Technology

Machine
Learning
Data Mining Visualization

Information Other
Science Disciplines
Data Mining vs SQL, EIS, and OLAP
• SQL. SQL is a query language, difficult for business people
to use
• EIS = Executive Information Systems. EIS systems
provide graphical interfaces that give executives a pre-
programmed (and therefore limited) selection of reports,
automatically generating the necessary SQL for each.
• OLAP allows views along multiple dimensions, and drill-
drown, therefore giving access to a vast array of analyses.
However, it requires manual navigation through scores of
reports, requiring the user to notice interesting patterns
themselves.
• Data Mining picks out interesting patterns. The user
can then use visualization tools to investigate further.
21
An Example of OLAP Analysis and its
Limits
Walking Sticks Sales by City
• What is driving sales of walking sticks ? Step 1
50
10
• Step 1: View some OLAP graphs: Karachi
e.g. walking stick sales by city. Lahore
Islamabad
• Step 2: Noticing that Islamabad has high sales
400
you decide to investigate further. Walking Sticks Sales in
• (Before OLAP, you would have to have written a Islamabad by Age Step 2
very complex SQL query instead of just simply 10 30

clicking to drill-down).
• It seems that old people are responsible for most
Less than 20
walking stick sales.
20 to 60
You confirm this by viewing a chart of age 360 Older than 60
distributions by city. Age Distribution by City

• But imagine if you had to do this 80

60 Younger than 20
manual investigation for all of the 40 20 to 60

10,000 products in your range ! 20 Older than 60

Here, OLAP gives way to Data Mining. 0

Karachi Lahore Islamabad

22
Data Mining vs Expert Systems
• Expert Systems = Rule-Driven Deduction
Top-down: From known rules (expertise) and data to
decisions. (To be dealt with in Part 2 of this course)
Rules Decisions
Expert
Data System

• Data Mining = Data-Driven Induction

Bottom-up: From data about past decisions to
discovered rules (general rules induced from the data).

Data Data Rules

(including past decisions) Mining

23
Difference b/w Machine Learning and
Data Mining
 Machine Learning techniques are designed to deal with a limited
amount of artificial intelligence data. Where the Data Mining
Techniques deal with large amount of databases data.

 Data Mining (Knowledge Discovery in Databases)

 Extraction of interesting (non-trivial, implicit, previously unknown

and potentially useful) information or patterns from data in large
databases.
 What is not Data Mining?
 (Deductive) query processing.
 Expert systems or small ML/statistical programs
Data Mining Functionalities (1)

 Data Preprocessing
 Handling Missing and Noisy Data (Data Cleaning).
 Techniques we will cover.
• Missing values Imputation using Mean, Median and Mod.
• Missing values Imputation using K-Nearest Neighbor.
• Missing values Imputation using Association Rules Mining.
• Missing values Imputation using Fault-Tolerant Patterns.
• Data Binning for Noisy Data.

TID Refund Country Taxable Income Cheat

1 Yes USA 125K No
2 UK 100K No
3 No Australia 70K No
4 120K No
5 No NZL 95K Yes
Data Mining Functionalities (1)
 Data Preprocessing
 Data Transformation (Discretization and Normalization).
 With the help of data transformation rules become more General and
Compact.
 General and Compact rules increase the Accuracy of Classification.
Age Age
15 Child
18 Child
Child = (0 to 20)
40 Young
33 Young = (21 to 47) Young
55 Old = (48 to 120) Old
48 Old
12 Child
23 Young

1. If attribute 1 = value1 & attribute 2 = value2 and Age = 08

then Buy_Computer = No.
1. If attribute 1 = value1 &
2. If attribute 1 = value1 & attribute 2 = value2 and Age = 09 attribute 2 = value2 and
then Buy_Computer = No. Age = Child then
Buy_Computer = No.
3. If attribute 1 = value1 & attribute 2 = value2 and Age = 10
then Buy_Computer = No.
Data Mining Functionalities (1)
 Data Preprocessing
 Attribute Selection/Feature Selection
• Selection of those attributes which are more relevant to data mining
task.
• Advantage1: Decrease the processing time of mining task.
• Advantage2: Generalize the rules.
 Example
• If our mining goal is to find that countries which has more Cheat
on which Taxable Income.
• Then obviously the date attribute will not be an important factor
in our mining task. Date Refund Country Taxable Income Cheat
11/02/200 Yes USA 125K No
2
13/02/200 Yes UK 100K No
2
16/02/200 No Australia 120K Yes
2
21/03/200 No Australia 120K Yes
Data Mining Functionalities (1)
 Data Preprocessing
 We will cover two Attribute/Feature Selection
Techniques
• Principle Component Analysis
• Wrapper Based
• Filter Based
Data Mining Functionalities (2)
 Association Rule Mining
 In Association Rule Mining Framework we have to find all the
rules in a transactional/relational dataset which contain a support
(frequency) Greater than some minimum support (min_sup)
threshold (provided by the user).

 For example with min_sup = 50%.

Transaction ID Items Bought
2000 Bread,Butter,Egg
1000 Bread,Butter, Egg
4000 Bread,Butter, Tea
5000 Butter, Ice cream, Cake

Itemset Support
{Butter} 4
{Bread} 3
{Egg} 2
{Bread,Butter} 3
{Bread, Butter, Egg} 2
Data Mining Functionalities (2)
 Association Rule Mining
 Topic we will cover
 Frequent Itemset Mining Algorithms (Apriori, FP-Growth, Bit-
vector ).
 Fault-Tolerant/Approximate Frequent Itemset Mining.
 N-Most Interesting Frequent Itemset Mining.
 Closed and Maximal Frequent Itemset Mining.
 Incremental Frequent Itemset Mining
 Sequential Patterns.
 Projects
• Mining Fault-Tolerant Using Pattern-Growth.
• Application of Fault-Tolerant Frequent Pattern is Missing values
Imputation (Course Project).
Data Mining Functionalities (2)
 Classification and Prediction
 Finding models (functions) that describe and distinguish classes or
concepts for future prediction
 Example: Classify rainy/un-rainy cities based on Temperature,
Humidify and Windy Attributes.
 Must have known the previous business decisions (Supervised
Learning).
City Temperature Humidity Windy Rain
Lahore hot low false No
Islamabad hot high true Yes Rule
Islamabad hot high false Yes • If Temperature = Hot &
Multan mild low false No
Humidity = High then
Karachi cool normal false No
Rain = Yes.
Rawalpindi hot high true Yes

Prediction of City
Muree
Temperature
hot
Humidity Windy
high false
Rain
?
unknown record Sibi mild low true ?
Data Mining Functionalities (2)
 Cluster Analysis
 Group data to form new classes based on un-labels class data.
 Business decisions are unknown (Also called unsupervised Learning).
 Example: Classify rainy/un-rainy cities based on Temperature,
Humidify and Windy Attributes.

City
Lahore
Temperature
hot
Humidity
low
Windy
false
Rain
?
3 clusters
Islamabad hot high true ?
Islamabad hot high false ?
Multan mild low false ?
Karachi cool normal false ?
Rawalpindi hot high true ?
Data Mining Functionalities (3)
 Outlier Analysis
 Outlier: A data object that does not comply with the general behavior
of the data.
 It can be considered as noise or exception but is quite useful in fraud
detection, rare events analysis
City Temperature Humidity Windy Rain 2 outliers
Lahore hot low false ?
Islamabad hot high true ?
Islamabad hot high false ?
Multan mild low false ?
Karachi cool normal false ?
Rawalpindi hot high true ?
Are All the “Discovered” Patterns
Interesting?
 A data mining system/query may generate thousands of
patterns, not all of them are interesting.
 Suggested approach: Query-based, Constraint
mining
 Interestingness Measures: A pattern is interesting if
it is easily understood by humans, valid on new or test
data with some degree of certainty, potentially useful,
novel, or validates some hypothesis that a user seeks to

confirm
Can We Find All and Only Interesting
Patterns?
 Find all the interesting patterns: Completeness
 Can a data mining system find all the interesting patterns?
 Remember most of the problems in Data Mining are NP-Complete.
 There is no global best solution for any single problem.
 Search for only interesting patterns: Optimization
 Can a data mining system find only the interesting patterns?
 Approaches
• First generate all the patterns and then filter out the uninteresting
ones.
• Generate only the interesting patterns—Constraint based mining (Give
threshold factors in mining)
Reading Assignment
 Book Chapter
 Chapter 1 of “Jiawei Han and Micheline Kamber” book
“Data Mining: Concepts and Techniques”.
Data Mining ------- Where?
 Some Nice Resources
 ACM Special Interest Group on Knowledge Discovery and Data
Mining (SIGKDD) http://www.acm.org/sigs/sigkdd/.

 Knowledge Discovery Nuggets www.kdnuggests.com.

 IEEE Transactions on Knowledge and Data Engineering –
http://www.computer.org/tkde/.

 IEEE Transactions on Pattern Analysis and Machine Intelligence –

http://www.computer.org/tpami/.

 Data Mining and Knowledge Discovery - Publisher: Springer

Science+Business Media B.V., Formerly Kluwer Academic
Publishers B.V. http://www.kluweronline.com/issn/1384-
5810/. current and previous offerings of Data Mining course at
Stanford, CMU, MIT and Helsinki.
Text and Reference Material
 The course will be mainly based on research
literature, following text may however be
consulted:
1. Jiawei Han and Micheline Kamber. “Data Mining: Concepts and
Techniques”, 3rd Ed.
2. Provost, F. and Fawcett, T. (2013) Data Science for Business:
What you need to know about data mining and data-analytic
thinking, 1st edition, O'Reilly Media.
3. Witten I. H., Frank, E. and Hall, M. A. (2011) Data Mining:
Practical Machine Learning Tools and Techniques, 3rd Edition,
Morgan Kaufmann.
4. David Hand, Heikki Mannila and Padhraic Smyth. “Principles of
Data Mining”. Pub. Prentice Hall of India, 2004.
5. Usama M. Fayyad et al. “Advances in Knowledge Discovery and
Data Mining”, The MIT Press, 1996.

EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
Harrah's High Payoff From Customer Information
No ratings yet
Harrah's High Payoff From Customer Information
18 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Introduction
No ratings yet
Introduction
46 pages
BIS 541 Ch01 20-21 S
No ratings yet
BIS 541 Ch01 20-21 S
129 pages
Unit-1
No ratings yet
Unit-1
148 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Unit 3
No ratings yet
Unit 3
23 pages
unit_1
No ratings yet
unit_1
102 pages
DM Day1 Intro MS F24 (1)
No ratings yet
DM Day1 Intro MS F24 (1)
111 pages
dm 1
No ratings yet
dm 1
47 pages
datamining&warehousing
No ratings yet
datamining&warehousing
65 pages
Data Mining
No ratings yet
Data Mining
46 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
Unit 3.1
No ratings yet
Unit 3.1
23 pages
Data Mining & Business Intelligence
No ratings yet
Data Mining & Business Intelligence
322 pages
DWDMUNIT1A
No ratings yet
DWDMUNIT1A
93 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
⇶Data Mining--2
No ratings yet
⇶Data Mining--2
16 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Module 2 Data Mining
No ratings yet
Module 2 Data Mining
49 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Data Mining
No ratings yet
Data Mining
3 pages
DM-Unit 1 PPT
No ratings yet
DM-Unit 1 PPT
110 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
Lecture 2 Data Mining Functions
No ratings yet
Lecture 2 Data Mining Functions
40 pages
Introduction To Data Mining With Case Studies - Sample Index
0% (1)
Introduction To Data Mining With Case Studies - Sample Index
16 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
2 DM Module 1 Introduction DVS
No ratings yet
2 DM Module 1 Introduction DVS
81 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
UNIT-1 Introduction: Motivation: Why Data Mining?
No ratings yet
UNIT-1 Introduction: Motivation: Why Data Mining?
86 pages
Course: COMP6140 - Data Mining Effective Period: September 2017
No ratings yet
Course: COMP6140 - Data Mining Effective Period: September 2017
24 pages
DB-14
No ratings yet
DB-14
97 pages
DM BS Lec1 Intro
No ratings yet
DM BS Lec1 Intro
20 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
28 pages
Data-Mining Notes
No ratings yet
Data-Mining Notes
110 pages
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
No ratings yet
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
60 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Topic10 - Data Mining
No ratings yet
Topic10 - Data Mining
29 pages
Data Mining Nostos - Resp
No ratings yet
Data Mining Nostos - Resp
39 pages
Data Mining and Business Intelligence
50% (2)
Data Mining and Business Intelligence
2 pages
01 Intro
No ratings yet
01 Intro
23 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Suraj R. Bhuyar: Presented by
No ratings yet
Suraj R. Bhuyar: Presented by
18 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
Data Mining:: Knowledge Discovery in Databases
No ratings yet
Data Mining:: Knowledge Discovery in Databases
14 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
Why Data Mining?: March 3, 2015
No ratings yet
Why Data Mining?: March 3, 2015
41 pages
Data Mining and Scientific Research
No ratings yet
Data Mining and Scientific Research
31 pages
Unit 4
No ratings yet
Unit 4
17 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Mindtree Thought Posts White Paper Customer Centricity in Airline Industry
No ratings yet
Mindtree Thought Posts White Paper Customer Centricity in Airline Industry
8 pages
Guidewire_Cloud_Data_Access_Data_Sheet
No ratings yet
Guidewire_Cloud_Data_Access_Data_Sheet
3 pages
Business Intelligence
No ratings yet
Business Intelligence
15 pages
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
No ratings yet
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
35 pages
3rd Sem Syllabus
No ratings yet
3rd Sem Syllabus
111 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
4 pages
Syl PGDM Even
No ratings yet
Syl PGDM Even
277 pages
Microsoft Certified: Azure Data Fundamentals - Skills Measured
0% (1)
Microsoft Certified: Azure Data Fundamentals - Skills Measured
3 pages
Data Mart and Data Lake
100% (1)
Data Mart and Data Lake
6 pages
Oracle Autonomous DB Cloud Specialist
No ratings yet
Oracle Autonomous DB Cloud Specialist
7 pages
ML Course Slides
No ratings yet
ML Course Slides
345 pages
Training Material - Teradata Basics Certification
No ratings yet
Training Material - Teradata Basics Certification
25 pages
MI0036 - Business Intelligence Tools: Project Planning
No ratings yet
MI0036 - Business Intelligence Tools: Project Planning
4 pages
Module 1: Introduction To Business Intelligence Architecture
0% (1)
Module 1: Introduction To Business Intelligence Architecture
42 pages
DW & DM
No ratings yet
DW & DM
23 pages
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Gautam Sinha
No ratings yet
Business Intelligence/Data Integration/Etl/Integration: An Introduction Presented By: Gautam Sinha
40 pages
Datastage Interview Question and Answers
100% (2)
Datastage Interview Question and Answers
14 pages
Data mining module - New
No ratings yet
Data mining module - New
38 pages
Klu PHD Syllabus
No ratings yet
Klu PHD Syllabus
64 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
2 pages
ch07
No ratings yet
ch07
43 pages
Naveen Surepally: Sr. Informatica ETL/IDQ Developer
No ratings yet
Naveen Surepally: Sr. Informatica ETL/IDQ Developer
6 pages
6 Bhushan Kapoor
No ratings yet
6 Bhushan Kapoor
8 pages
Chapter 14 Big Data and Data Science - DONE DONE DONE
No ratings yet
Chapter 14 Big Data and Data Science - DONE DONE DONE
28 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
Oracle Cloud Infrastructure Architect by IP Specialist
No ratings yet
Oracle Cloud Infrastructure Architect by IP Specialist
480 pages
Adhoc Routing Protocols
No ratings yet
Adhoc Routing Protocols
63 pages

Lecture 1-Introduction To Data Mining - M

Uploaded by

Lecture 1-Introduction To Data Mining - M

Uploaded by

CSD479

• Instructor: Dr. Zeeshan Gillani

This course aims to provide the students with the key

2. Currently enterprises are facing data explosion problem.

 A full 90% of world's data generated over last two

 Electronic Information an Important Asset for Business

2. What is required are Techniques that allow the enterprise to

3. The Field of Data Mining provides such Techniques.

4. These techniques can Find Novel Patterns (unknown) that

 Data mining (knowledge discovery in databases):

if [Temperature = ‘hot’ and Humidity = ‘high’] then there is 66.6%

– Look up phone – Certain names are more prevalent

• But imagine if you had to do this 80

10,000 products in your range ! 20 Older than 60

Here, OLAP gives way to Data Mining. 0

• Data Mining = Data-Driven Induction

Data Data Rules

 Data Mining (Knowledge Discovery in Databases)

 Extraction of interesting (non-trivial, implicit, previously unknown

TID Refund Country Taxable Income Cheat

1. If attribute 1 = value1 & attribute 2 = value2 and Age = 08

 For example with min_sup = 50%.

 Knowledge Discovery Nuggets www.kdnuggests.com.

 IEEE Transactions on Pattern Analysis and Machine Intelligence –

 Data Mining and Knowledge Discovery - Publisher: Springer

You might also like