Data Mining Note

jhjjuy

Uploaded by

fikru

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Data Mining Note

jhjjuy

Uploaded by

fikru

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Data Mining is a technology that uses various techniques to discover hidden knowledge from heterogeneous and

distributed historical data stored in large databases, warehouses and other massive information repositories
Four main reasons why DM now?
 The competitive pressure is very strong
 Massive data collection
 The computing power
 DM commercial products and machine learning algorithms are available.
Why Data Mining important?
 Customer relationship management
 Credit ratings:
 Targeted marketing
 Fraud detection/Network intrusion detection
Data Mining Helps Extract Such Useful Information
Query Examples.
1.Database
Find all credit applicants with first name ‘Alex’.
Identify customers who have purchased more than Birr 10,000 in the last month.
Find all customers who have purchased Bread
2.Data Mining
Find all credit applicants who have no credit risks. (classification)
Identify customers with similar buying habits. (Clustering)
Find all items which are frequently purchased with Bread. (association rules)
Data mining VS database
Data mining
• Poorly defined
• No precise/exact query language
• Non-Operational data
• Not a subset of database
Database
 Well defined Structured Query Language
 Operational data
 Precise and Subset of database.
DM VS Data Warehouse
 Data Warehouse provides the Enterprise with a memory
 Data Mining provides the Enterprise with intelligence
Data warehouse:- is a relational database management system responsible for the collection and storage of data
to support management decision making and problem solving.
-It enables managers and other business professionals to undertake data mining.
Data mart:-A subset of a data warehouse for small and medium-size businesses or departments within larger
companies
Data warehouse is an integrated, subject-oriented, time-variant, non-volatile database that provides support for
decision making.
A.Integrated  centralized, consolidated database that integrates data derived from the entire organization.
B.Subject-Oriented  Data warehouse contains data organized by topics. E.g. Sales, marketing, finance, etc.
C.Time variant  In contrast to the operational database that focus on current transactions, the data warehouse
represent the flow of data through time.
D.Nonvolatile  Once data enter the data warehouse, they are never removed.
Database & data warehouse:Differences
-Data warehouse receives its data from operational databases.
-Data warehouse contains historical data over a long time horizon.
Data warehouse environment is characterized by read-only transactions to very large data sets.
Operational environment is characterized by numerous update transactions to a few data entities at a time.
Data Processing Technologies
1. OLAP: refers to an advanced data analysis environment that supports decision making.
2. Data mining tools analyze the data, uncover problems or opportunities hidden in the data relationships.
-OLAP provides top-down, query-driven analysis
-Data mining provides bottom-up, discovery-driven analysis
Business Intelligence
• BI takes advantage of data mining and data warehousing to help organizations gather their information in
a timelier and in a more valuable manner
BI keeps the organization:
– informed about the market trends,
– alerts to new market potentials,
– helps to determine how competitors are doing
Business Intelligence Vs. Data Mining
Business intelligence is information about a company's past performance that is used to help
predict the company's future performance.
-is used to analyze and uncover information about past performance on an aggregate level.
Data Mining allows users to sift through the enormous amount of information available in data
warehouses.
-Data mining is more intuitive, allowing for increased insight beyond data warehousing.
-An implementation of data mining in an organization will serve as a guide to uncovering
inherent trends and tendencies in historical information. It will also allow for statistical
predictions, groupings and classifications of data.
Data Mining vs. Knowledge Discovery in Databases
KDD is often used as a synonym for Data Mining. Some author define KDD as the whole process involving: data
selection  data pre-processing: cleaning  data transformation  mining  result evaluation  visualization
-KDD is the process of finding useful information and patterns in data
Data Mining, on the other hand, refer to the modeling step using the various techniques to extract useful
information/pattern from the data
DM is the use of algorithms to extract hidden patterns & knowledge in data
Stages in DM: The KDD process
• Selection: Obtain data from various heterogeneous sources such as databases, data warehouses, files, non-
electronic records, etc.
• Preprocessing: Cleanse inconsistent & incorrect data; fills incomplete records; predict missing values; correct
erroneous & anomalous data.
• Transformation: Convert data from different sources into common new format. Apply data reduction & data
categorization/binning to ease data mining
• Mining: apply classification or clustering techniques to obtain predictive or descriptive models.
• Interpretation/Evaluation: Present results to user in meaningful manner using various visualization and GUI
strategies.
Data Mining Metrics
1.Return on Investment (ROI):-ROI compares costs of DM techniques against savings or benefits from its use
2.Accuracy in classification
– Analyze true positive and false positive to calculate recall, precision of the system
– Measure percentage of correct classification
3. Space/Time complexity
– Running time: how fast the algorithm runs
– Storage or memory space requirement
Data Mining implementation issues
• Scalability:-Applicability of data mining techniques to perform well with massive real world data sets
• Real World Data:-Real world data are noisy and have many missing attribute values. Algorithms should be able
to work even in the presence of these problems
• Updates:-Database can not be assumed to be static. The data is frequently changing.
High dimensionality:A conventional database schema may be composed of many different attributes. The problem
here is that all attributes may not be needed to solve a given DM problem.
Overfitting:-The size and representativeness of the dataset determines whether the model associated with a given
database states fits to also future database states.
Application:-Determining the intended use for the information obtained from the DM tool is a challenge.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
How Do I Reopen A Closed Cash App Account? Updated 2022
No ratings yet
How Do I Reopen A Closed Cash App Account? Updated 2022
7 pages
DM Intro - 1
No ratings yet
DM Intro - 1
31 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
9 pages
Data Mining and Warehousing - L1 & L2
No ratings yet
Data Mining and Warehousing - L1 & L2
30 pages
1 What Is Data Mining
No ratings yet
1 What Is Data Mining
9 pages
BMIS Chapter 4 SCMSB
No ratings yet
BMIS Chapter 4 SCMSB
35 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
Data Warehouse & Data Mining
No ratings yet
Data Warehouse & Data Mining
41 pages
BDA UNIT -1_pdf
No ratings yet
BDA UNIT -1_pdf
143 pages
Architectures of Big Data
No ratings yet
Architectures of Big Data
27 pages
Data Warehouse and Data Mining
No ratings yet
Data Warehouse and Data Mining
12 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
5 Data Warehousing and Data Mining in Government
No ratings yet
5 Data Warehousing and Data Mining in Government
26 pages
Data Mining Vs Data Warehousing
No ratings yet
Data Mining Vs Data Warehousing
5 pages
Ba Important
No ratings yet
Ba Important
13 pages
1,2 UNITS NOTES
No ratings yet
1,2 UNITS NOTES
53 pages
Data Warehouse Groupwork
No ratings yet
Data Warehouse Groupwork
8 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
Introduction To Data Mining-1
100% (1)
Introduction To Data Mining-1
24 pages
1 DM intro
No ratings yet
1 DM intro
34 pages
Data Warehousing & Data Mining-A View
No ratings yet
Data Warehousing & Data Mining-A View
11 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Data Warehousing and Data Mining
100% (1)
Data Warehousing and Data Mining
48 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
29 pages
DSBDA_UNIT1
No ratings yet
DSBDA_UNIT1
232 pages
Data Mining Questions
No ratings yet
Data Mining Questions
24 pages
UNIT II
No ratings yet
UNIT II
45 pages
DW Lecture Unit 1
No ratings yet
DW Lecture Unit 1
19 pages
Unit 3: by Dr. Anand Vyas
No ratings yet
Unit 3: by Dr. Anand Vyas
20 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Data Science
No ratings yet
Data Science
31 pages
Unit 1 Datamining For Business Intelligence
No ratings yet
Unit 1 Datamining For Business Intelligence
101 pages
What Is Data Mining
No ratings yet
What Is Data Mining
10 pages
DATA MINING Unit 1
No ratings yet
DATA MINING Unit 1
22 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
data ware house
No ratings yet
data ware house
203 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
156 pages
Unit 1 Data Warehousing and Mining
100% (1)
Unit 1 Data Warehousing and Mining
19 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Data Mining
No ratings yet
Data Mining
11 pages
DAM UNIT - IV
No ratings yet
DAM UNIT - IV
17 pages
Data Vwarehouse
No ratings yet
Data Vwarehouse
5 pages
Datawarehouse Tools
No ratings yet
Datawarehouse Tools
8 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Characteristics and Functions of Data Warehouse
No ratings yet
Characteristics and Functions of Data Warehouse
13 pages
Unit I
No ratings yet
Unit I
18 pages
Mis Assigment - Number 2
No ratings yet
Mis Assigment - Number 2
4 pages
TIS Chapter 3
No ratings yet
TIS Chapter 3
36 pages
DM NOTES
No ratings yet
DM NOTES
193 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Unit 1
No ratings yet
Unit 1
11 pages
MIS Assigment-OshinBhattarai
No ratings yet
MIS Assigment-OshinBhattarai
8 pages
DM Chapter 1
No ratings yet
DM Chapter 1
37 pages
Data Warehouse
No ratings yet
Data Warehouse
6 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Animal Behavior and Welfare Lecture
No ratings yet
Animal Behavior and Welfare Lecture
42 pages
Principle of Recombinant Dna Technology
No ratings yet
Principle of Recombinant Dna Technology
17 pages
1 DM Intro
No ratings yet
1 DM Intro
38 pages
Wu Thought II Chapter 1
No ratings yet
Wu Thought II Chapter 1
14 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
Corona Virus - CSV
No ratings yet
Corona Virus - CSV
81 pages
Real-Time Sensor-Actuator Networks: Shivakumar Sastry
No ratings yet
Real-Time Sensor-Actuator Networks: Shivakumar Sastry
19 pages
PA00N39Z
No ratings yet
PA00N39Z
108 pages
KHAMBAM BINDUMADHAVI MASTERS FINAL REPORT - PDF Jsessionid
No ratings yet
KHAMBAM BINDUMADHAVI MASTERS FINAL REPORT - PDF Jsessionid
51 pages
Network Administration Laboratory Manual
No ratings yet
Network Administration Laboratory Manual
322 pages
Bestt 1
No ratings yet
Bestt 1
6 pages
Chapter 3 Software Frameworks For Real Lecture
100% (1)
Chapter 3 Software Frameworks For Real Lecture
3 pages
Cek Stock Partner Spin 3 Vero Per 27 Jan (SG Cek) Update - Nasional
No ratings yet
Cek Stock Partner Spin 3 Vero Per 27 Jan (SG Cek) Update - Nasional
22 pages
HW02 Sol - KNN DT
No ratings yet
HW02 Sol - KNN DT
8 pages
final year project report chapter 1 &2
No ratings yet
final year project report chapter 1 &2
25 pages
Video Analytic Software - VMS Solution
No ratings yet
Video Analytic Software - VMS Solution
2 pages
Bit 2204a Bbit 310 Bac 2201 BSD 2107 Bisf 2201 Java Programming Rayfrankmuriithi
No ratings yet
Bit 2204a Bbit 310 Bac 2201 BSD 2107 Bisf 2201 Java Programming Rayfrankmuriithi
13 pages
Hadoop Admin 171103e Exercise Manual
No ratings yet
Hadoop Admin 171103e Exercise Manual
103 pages
Linx-10-Datasheet (E)
No ratings yet
Linx-10-Datasheet (E)
2 pages
Module 1.1-Linear Function
No ratings yet
Module 1.1-Linear Function
14 pages
CCN Lab 3
No ratings yet
CCN Lab 3
10 pages
AuCDtect Check (Tracks)
No ratings yet
AuCDtect Check (Tracks)
3 pages
Minimal Spanning Tree Problem
No ratings yet
Minimal Spanning Tree Problem
25 pages
Associate Cloud Engineer Exam - Free Actual Q&As, Page 4 - ExamTopics
No ratings yet
Associate Cloud Engineer Exam - Free Actual Q&As, Page 4 - ExamTopics
3 pages
KPS L200C3
No ratings yet
KPS L200C3
2 pages
Chapter 2.1 (Types of OS User Interface)
No ratings yet
Chapter 2.1 (Types of OS User Interface)
34 pages
It8078 Web Design and Management
No ratings yet
It8078 Web Design and Management
60 pages
Chapter 2072
No ratings yet
Chapter 2072
39 pages
Determination Rule in SAP SD
100% (1)
Determination Rule in SAP SD
3 pages
DC Power Requirements SUN Netra 240
No ratings yet
DC Power Requirements SUN Netra 240
11 pages
CS8601 Mobile Computig
No ratings yet
CS8601 Mobile Computig
3 pages
Savelios Aslanidis - Thesis Project - DBMS For Student Admissions
No ratings yet
Savelios Aslanidis - Thesis Project - DBMS For Student Admissions
45 pages
400GB
100% (1)
400GB
183 pages
023 DXR2 E 230V N9204en
No ratings yet
023 DXR2 E 230V N9204en
16 pages
Performance and Qeuing Theory
100% (1)
Performance and Qeuing Theory
25 pages
Dell and Intel Security Thought Leadership
No ratings yet
Dell and Intel Security Thought Leadership
11 pages
Warna Code
No ratings yet
Warna Code
3 pages
LM 2 GAS Preliminary Phase Preparatory Week
No ratings yet
LM 2 GAS Preliminary Phase Preparatory Week
4 pages
5 Block Ciphers
No ratings yet
5 Block Ciphers
137 pages
Manual de Servicio MARS 2
100% (1)
Manual de Servicio MARS 2
184 pages
Account Details 30may2024 135147
No ratings yet
Account Details 30may2024 135147
11 pages

Data Mining Note

Uploaded by

Data Mining Note

Uploaded by

Data Mining is a technology that uses various techniques to discover hidden knowledge from heterogeneous and

You might also like