0% found this document useful (0 votes)

111 views40 pages

UNIT 1 Introduction of Data Mining

1) Large amounts of data are being generated from various sources like e-commerce, social media, sensors, and simulations. 2) Data mining combines traditional data analysis with sophisticated algorithms to discover useful patterns from large datasets. 3) Data mining can help businesses make better decisions by profiling customers and detecting fraud, and help scientists make discoveries through analysis of datasets like gene expression, fMRI images, and climate data.

Uploaded by

prajak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views40 pages

UNIT 1 Introduction of Data Mining

Uploaded by

prajak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Data Mining: Introduction

Chapter 1

Introduction to Data Mining

6/30/2019 Introduction to Data Mining 1

Large-scale Data is Everywhere!
 There has been enormous data
growth in both commercial and
scientific databases due to
advances in data generation
and collection technologies E-Commerce
Cyber Security
 New mantra
 Gather whatever data you can
whenever and wherever
possible.
 Expectations
 Gathered data will have value Social Networking: Twitter
Traffic Patterns
either for the purpose
collected or for a purpose not
envisioned.

Sensor Networks Computational Simulations

6/30/2019 Introduction to Data Mining 2

Data Mining

Data Mining : A technology that blends

traditional analysis methods with
sophisticated algorithms for processing
large volume of data.

DM=(Traditional analysis methods +

Sophisticated algorithms) to process large
volume of data

6/30/2019 Introduction to Data Mining 3

Why Data Mining? Commercial Viewpoint
Business:
 Bar code scanners, RFID, Smart card technology collect up-to-the-
minute data about customer purchases.
 Retailers can utilize these information and data from e-commerce
websites to make better business decisions.
 Data mining techniques can be applied for
– customer profiling
– Marketing
– Store layout
– Fraud detection
 This helps retailers to answer important questions like
– “Who are the most profitable customers?”
– “What products can be cross-sold or up-sold?”
– “What is the revenue outlook of the company for next year?”

6/30/2019 Introduction to Data Mining 4

Why Data Mining? Scientific Viewpoint
Medicine, Science and Engineering
 Researchers accumulate data for
new discoveries.
Examples:
Understanding Earth’s climate
fMRI Data from Brain Sky Survey Data
system
 NASA EOSDIS archives over
petabytes of earth science data / year

– telescopes scanning the skies

 Sky survey data

– High-throughput biological data

Gene Expression Data

– scientific simulations
 terabytes of data generated in a few hours

Surface Temperature of Earth

6/30/2019 Introduction to Data Mining 5
Why Data Mining? Scientific Viewpoint(cont.)

Traditional methods are often not suitable for analyzing these

huge amounts of data, so techniques in data mining can aid
in answering questions like

•“What is the relationship between frequency and intensity of

ecosystem disturbances such as droughts and hurricanes to
global warming?”

•“How is land surface precipitation and temperature affected

by ocean surface temperature?”

•“How well can we predict the beginning and end of the

growing season for a region?”
6/30/2019 Introduction to Data Mining 6
Great Opportunities to Solve Society’s Major Problems

Improving health care and reducing costs Predicting the impact of climate change

Reducing hunger and poverty by

Finding alternative/ green energy sources
increasing agriculture production
6/30/2019 Introduction to Data Mining 7
What is Data Mining?
 Many Definitions
– Data mining is a technology that blends traditional data
analysis methods with sophisticated algorithms for
processing large volumes of data.
– Data mining is the process of automatically
discovering useful information in large data
repositories.
– Data mining is an integral part of knowledge discovery
in databases (KDD), which is the overall process of
converting raw data into useful information.

6/30/2019 Introduction to Data Mining 8

Data Mining: A KDD Process

– Data mining—core of Pattern Evaluation

knowledge discovery
process Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

6/30/2019 Databases Introduction to Data Mining 9

7 steps of KDD

 Data Integration: Data collected and integrated from different

sources.
 Data cleaning : Data may contain errors, missing values, or
inconsistent data. Cleaning removes anomalies.
 Data Selection: select only those data which we think useful for data
mining.
 Data Transformation: transform the cleaned data into forms
appropriate for mining. By using techniques like smoothing,
aggregation, normalization etc.
 Data Mining: apply data mining techniques on the data. Basically, it
is to discover the interesting patterns
 Pattern Evaluation: includes visualization, transformation, removing
redundant patterns from the patterns we generated.
 Decisions / Use of Discovered Knowledge
It helps to use the knowledge acquired to take better decisions.
6/30/2019 Introduction to Data Mining 10
The process of knowledge discovery in databases

Fig: The process of knowledge discovery in databases.

• Input data
•Pre-processing:
• Fusing data from multiple sources
• Cleaning data to remove noise and duplicates
• Selecting features or records that are relevant to data mining task
• Transform the raw input data into appropriate format for analysis.
•Post-processing
•“Closing-the-loop” refers to the process of integrating data mining results into decision support
system. Ex: For business application data mining results can be integrated with campaign
management for effective marketing promotions. This requires post processing step to ensure valid
and useful results are incorporated into decision support system.
6/30/2019 Introduction to Data Mining 11
What is (not) Data Mining?

What is not Data  What is Data Mining?

Mining?

– Look up phone – Certain names are more

number in phone prevalent in certain US
directory locations (O’Brien, O’Rourke,
O’Reilly… in Boston area)
– Query a Web – Group together similar
search engine for documents returned by
information about search engine according to
“Amazon” their context (e.g., Amazon
rainforest, Amazon.com)
6/30/2019 Introduction to Data Mining 12
Motivating Challenges

 Scalability:
– Novel data structures, out-of-the-core algorithms,
parallel and distributed algorithms.
 High Dimensionality:
– Ex: temporal and spatial components have high
dimensions.
 Heterogeneous and Complex Data:
– Collection of web pages containing semi-structured
data and hyperlinks; DNA data three dimensional
structure; climate data with time series
measurements.

6/30/2019
Introduction to Data Mining 13
Motivating Challenges (cont.)

 Data Ownership and Distribution:

Key challenges faced by distributed data mining algorithms,
– How to reduce the amount of communication needed
to perform the distributed computation
– How to effectively consolidate the data mining results
obtained from multiple sources
– How to address data security issues.

6/30/2019 Introduction to Data Mining 14

Motivating Challenges (cont.)

 Non-traditional Analysis:
– Traditional statistical approach is based on a
hypothesize-and-test paradigm. A hypothesis is
proposed, an experiment is designed to gather the
data, and then data is analyzed w.r.t. the hypothesis.
– Current data analysis requires evaluation of
thousands of hypothesis hence there is a
need for automating the process of hypothesis
generation and evaluation.

6/30/2019 Introduction to Data Mining 15

Origins of Data Mining
 Traditional techniques may be unsuitable due to data that is
– Large-scale
– High dimensional
– Heterogeneous
– Complex
– Distributed
 In order to meet these challenges in data mining researchers began to
focus on developing more efficient and scalable tools that could handle
diverse types of data.
 Draws ideas from
– Sampling estimation and hypothesis testing from statistics and
– Search algorithms, modeling techniques and learning theories from
artificial intelligence, pattern recognition and machine learning.
– Also been quick to adopt ideas from areas like optimization,
visualization.

6/30/2019 Introduction to Data Mining 16

Origins of Data Mining
 The figure shows the relationship of data mining to other areas.
 Database systems provide for efficient storage, indexing and query
processing.
 Support from high performance (parallel) computing to address massive
datasets.
 Distributed techniques to help
in addressing issue of size
when data cannot be
gathered in one location.

6/30/2019 Introduction to Data Mining 17

Data Mining Tasks

Data mining tasks are generally divided into 2 major

categories.
1. Predictive Tasks
2. Descriptive Tasks
 Predictive Tasks:
– Objective is to predict the value of a particular
attribute based on the values of other attributes.
– Attributes
to be predicted := target or dependent variables
Used for making prediction:= explanatory or
independent variables

6/30/2019 Introduction to Data Mining 18

Data Mining Tasks

 Descriptive Tasks:
– Objective here is to derive patterns like correlations,
trends, clusters, anomalies that summarize the
relationships in data.
– These are exploratory in nature and frequently require
post processing techniques to validate and explain the
results
– Find human-interpretable patterns that describe the
data.

6/30/2019 Introduction to Data Mining 19

Fig: illustrates 4 of the core data mining tasks.

Data
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Single 125K No

2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
11 No Married 60K No
12 Yes Divorced 220K No
13 No Single 85K Yes
14 No Married 75K No
15 No Single 90K Yes
10

Milk

6/30/2019 Introduction to Data Mining 20

Predictive Modeling: Classification
 Task of building a model for the target variable as a
function of the explanatory variables.
Model for predicting credit
 Are of Two types worthiness
– Classification
Employed
– Regression
No Yes

Class
No Education
# years at
Level of Credit
Tid Employed present { High school,
Education Worthy Graduate
address Undergrad }
1 Yes Graduate 5 Yes
2 Yes High School 2 No Number of
Number of
3 No Undergrad 1 No years years
4 Yes High School 10 Yes
> 3 yr < 3 yr > 7 yrs < 7 yrs
10
… … … … …

Yes No Yes No

6/30/2019 Introduction to Data Mining 21

Classification Example
Used for discrete target variables, i.e., binary-
valued target Level of
# years at
Credit
Tid Employed present
Education Worthy
address
1 Yes Undergrad 7 ?
2 No Graduate 3 ?
3 Yes High School 2 ?
# years at
Level of Credit … … … … …
Tid Employed present
Education Worthy 10

address
1 Yes Graduate 5 Yes
2 Yes High School 2 No
3 No Undergrad 1 No
4 Yes High School 10 Yes Test
Set
10
… … … … …

Training
Learn
Model
Set Classifier

6/30/2019 Introduction to Data Mining 22

Examples of Classification Task

 Classifying credit card transactions

as legitimate or fraudulent

 Classifying land covers (water bodies, urban areas,

forests, etc.) using satellite data

 Categorizing news stories as finance,

weather, entertainment, sports, etc

 Identifying intruders in the cyberspace

 Predicting tumor cells as benign or malignant

 Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random coil

6/30/2019 Introduction to Data Mining 23

Classification: Application 1

 Fraud Detection
– Goal: Predict fraudulent cases in credit card
transactions.
– Approach:
 Use credit card transactions and the information
on its account-holder as attributes.
– When does a customer buy, what does he buy, how
often he pays on time, etc
 Label past transactions as fraud or fair
transactions. This forms the class attribute.
 Learn a model for the class of the transactions.
 Use this model to detect fraud by observing credit
card transactions on an account.
6/30/2019 Introduction to Data Mining 24
Classification: Application 2

 Churn prediction for telephone customers

– Goal: To predict whether a customer is likely
to be lost to a competitor.
– Approach:
 Use detailed record of transactions with each of the
past and present customers, to find attributes.
– How often the customer calls, where he calls, what time-
of-the day he calls most, his financial status, marital
status, etc.
 Label the customers as loyal or disloyal.
 Find a model for loyalty.

From [Berry & Linoff] Data Mining Techniques, 1997

6/30/2019 Introduction to Data Mining 25

Regression(for continuous target variables)

 Predict a value of a given continuous valued variable

based on the values of other variables
 Extensively studied in statistics, neural network fields.
 Examples:
– Predicting sales amounts of new product based on
advertising expenditure.
– Forecasting the future price of a stock.
– Predicting the price of house based on values of other
variables.
NOTE: The goal of both predictive tasks is to find a
model that minimizes the error between predicted and
actual value of the target variable.
6/30/2019 Introduction to Data Mining 26
Cluster Analysis

 Finds groups of objects such that the objects in a

group will be similar (or related) to one another and
different from (or unrelated to) the objects in other
groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

6/30/2019 Introduction to Data Mining 27

Cluster Analysis - Examples

Examples:
 Group sets of related customers
 Find areas of ocean which have significant
impact on Earth’s climate.

6/30/2019 Introduction to Data Mining 28

Applications of Cluster Analysis
 Understanding
– Custom profiling for targeted
marketing
– Group related documents for
browsing
– Group genes and proteins that
have similar functionality
– Group stocks with similar price
fluctuations
 Summarization
– Reduce the size of large data
sets Courtesy: Michael Eisen

6/30/2019 Introduction to Data Mining 29

Clustering: Application 1

 Market Segmentation:
– Goal: subdivide a market into distinct subsets of
customers where any subset may conceivably be
selected as a market target to be reached with a
distinct marketing mix.
– Approach:
 Collect different attributes of customers based on
their geographical and lifestyle related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying
patterns of customers in same cluster vs. those
from different clusters.

6/30/2019 Introduction to Data Mining 30

Clustering: Application 2

 Document Clustering:

– Goal: To find groups of documents that are similar to

each other based on the important terms appearing in
them.

– Approach: To identify frequently occurring terms in

each document. Form a similarity measure based on
the frequencies of different terms. Use it to cluster.

6/30/2019 Introduction to Data Mining 31

Document Clustering

Consider the collection of news articles in the table.

This table can be grouped based on their
respective topics.
6/30/2019 Introduction to Data Mining 32
Document Clustering (cont.)

 Each article is represented as a set of word-frequency pairs

(w,c)
Where,
w = word
c=number of times the word appears in the article.
There are two natural clusters in the data set
1. First 4 articles corresponds to news about the economy
2. Second 4 articles corresponds to news about health
care.
A good clustering algorithm should be able to identify these
two clusters based on similarity between words that
appear in the article.
6/30/2019 Introduction to Data Mining 33
Association Analysis

 Used to discover patterns that describe strongly

associated features in the data.
 The discovered patterns are represented in the form of
implication rules or feature subset
 Because of the exponential size of its search space, the
goal of association analysis is to extract the most
interesting patterns in an efficient manner.
 Examples:
– Identifying web pages that are accessed together
– Understanding the relationships between different
elements of Earth’s climate system

6/30/2019 Introduction to Data Mining 34

Association Rule Discovery: Definition

 Given a set of records each of which contain

some number of items from a given collection
– Produce dependency rules which will predict
occurrence of an item based on occurrences of other
items.

6/30/2019 Introduction to Data Mining 35

Association Analysis: Applications

 Market-basket analysis
– Rules are used for sales promotion, shelf
management, and inventory management

 Medical Informatics
– Rules are used to find combination of patient
symptoms and test results associated with certain
diseases

6/30/2019 Introduction to Data Mining 36

Market Based Analysis- Association analysis

Consider the transaction from sales data collected at a

grocery store check-out counter

6/30/2019 Introduction to Data Mining 37

Market Based Analysis- Association analysis

 Association rules can be applied to find items that

are frequently bought together by customers
 For ex: rule {Diapers}{Milk},
– Suggests that customers who buy diapers
also tend to buy milk.
 This type of rule can be used to identify potential
cross-selling opportunities among related items.

6/30/2019 Introduction to Data Mining 38

Deviation/Anomaly/Change Detection
 Identifies observations/objects whose
characteristics are significantly
different from the rest of the data.
 Such observations are called
anomalies or outliers.
 Applications:
– Credit Card Fraud Detection
– Network Intrusion
Detection
– Identifying malicious behavior in
network devices like sensors .
– Unusual patterns of disease
– Ecosystem disturbances

6/30/2019 Introduction to Data Mining 39

Credit card fraud detection

 Credit card company records transactions made

by the card holder and also personal information
like credit limit, age, annual income, address.
 Anomaly detection technique can be applied to
build a profile of legitimate transactions for the
users.
 When a new transaction arrives, it is compared
against profile of the user.
 If characteristics of the transaction are very
different from the previously created profile, then
the transaction is flagged as potentially fraudulent.
6/30/2019 Introduction to Data Mining 40

THESIS Museum
67% (6)
THESIS Museum
57 pages
TM299 2sem 2020 21 Rabino
No ratings yet
TM299 2sem 2020 21 Rabino
248 pages
Case Presentation 3
No ratings yet
Case Presentation 3
4 pages
Lec 1
No ratings yet
Lec 1
33 pages
DM Chapter 1
No ratings yet
DM Chapter 1
37 pages
Tum Dersler Veri Madenciligi
No ratings yet
Tum Dersler Veri Madenciligi
123 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Introduction To Data Mining, 2 Edition: by Tan, Steinbach, Karpatne, Kumar
95 pages
chapter-1
No ratings yet
chapter-1
313 pages
chapter 1
No ratings yet
chapter 1
35 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Basic Concepts Data Mining (Lecture 02) - 1
No ratings yet
Basic Concepts Data Mining (Lecture 02) - 1
40 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
31 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
37 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
No ratings yet
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
56 pages
Unit 3
No ratings yet
Unit 3
23 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Unit 1 A
No ratings yet
Unit 1 A
39 pages
Week 01 Chapt01
No ratings yet
Week 01 Chapt01
49 pages
01 Intro
No ratings yet
01 Intro
35 pages
01 Intro
No ratings yet
01 Intro
61 pages
01 Intro
No ratings yet
01 Intro
23 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
Unit_2_Introduction of Data Mining
No ratings yet
Unit_2_Introduction of Data Mining
12 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
Module 1
No ratings yet
Module 1
40 pages
DWDM-LS1-Fall-24-25
No ratings yet
DWDM-LS1-Fall-24-25
42 pages
Lecture Notes For Chapter 1 Introduction To Data Mining
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining
16 pages
Topic10 - Data Mining
No ratings yet
Topic10 - Data Mining
29 pages
Combine 056
No ratings yet
Combine 056
57 pages
1 - Introduction To DM
No ratings yet
1 - Introduction To DM
59 pages
02-Introduction to Data Mining
No ratings yet
02-Introduction to Data Mining
40 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
lec slides combined mid quiz with old quizzes (1)
No ratings yet
lec slides combined mid quiz with old quizzes (1)
378 pages
Data Mining Merged Pdf CS1 CS8
No ratings yet
Data Mining Merged Pdf CS1 CS8
272 pages
KDD - Knowledge Discovery in Databases
No ratings yet
KDD - Knowledge Discovery in Databases
546 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
32 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Day-2 BE-VIII DMDW (Into. Contd..)
No ratings yet
Day-2 BE-VIII DMDW (Into. Contd..)
23 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
01Intro
No ratings yet
01Intro
41 pages
Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
No ratings yet
Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
44 pages
WINSEM2024-25_MCSE615L_TH_VL2024250502897_2024-12-19_Reference-Material-I
No ratings yet
WINSEM2024-25_MCSE615L_TH_VL2024250502897_2024-12-19_Reference-Material-I
58 pages
intro data mining
No ratings yet
intro data mining
51 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
01Intro (1)
No ratings yet
01Intro (1)
40 pages
01Intro.pptx
No ratings yet
01Intro.pptx
40 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
MEAC 101 2021-2022 Syllsbus
No ratings yet
MEAC 101 2021-2022 Syllsbus
6 pages
SIRI Tech Topic Series
No ratings yet
SIRI Tech Topic Series
23 pages
5E Lesson Plan Template: Teacher
No ratings yet
5E Lesson Plan Template: Teacher
5 pages
Myriam Mondragon and George Zoupanos - Unified Gauge Theories and Reduction of Couplings: From Finiteness To Fuzzy Extra Dimensions
No ratings yet
Myriam Mondragon and George Zoupanos - Unified Gauge Theories and Reduction of Couplings: From Finiteness To Fuzzy Extra Dimensions
26 pages
How Hybrid HR Systems Affect Performance in Call Centers: Joana Story
No ratings yet
How Hybrid HR Systems Affect Performance in Call Centers: Joana Story
17 pages
Bayley Review
No ratings yet
Bayley Review
12 pages
Tax Payers' Perception Towards: An Empirical Investigation: E-File Adoption
No ratings yet
Tax Payers' Perception Towards: An Empirical Investigation: E-File Adoption
16 pages
Process Plant
No ratings yet
Process Plant
6 pages
Teaching Internship Handbook
No ratings yet
Teaching Internship Handbook
42 pages
Chapter - 3 Research Methodology
No ratings yet
Chapter - 3 Research Methodology
11 pages
Pelamis Wec - Main Body Structural Design and Materials Selection
No ratings yet
Pelamis Wec - Main Body Structural Design and Materials Selection
52 pages
Water Testing Laboratory
No ratings yet
Water Testing Laboratory
42 pages
Thesis On Corporate Governance in India
100% (3)
Thesis On Corporate Governance in India
5 pages
ISO 9004-2000 Annex A
No ratings yet
ISO 9004-2000 Annex A
9 pages
W1 Lesson 1. Introduction To Methods of Research
No ratings yet
W1 Lesson 1. Introduction To Methods of Research
9 pages
How To Write A Thesis Statement For A Poem Analysis
100% (3)
How To Write A Thesis Statement For A Poem Analysis
6 pages
Cher N Yshev 2017
No ratings yet
Cher N Yshev 2017
11 pages
Measures of Central Tendency - Mean
No ratings yet
Measures of Central Tendency - Mean
24 pages
Financial Behavior and Problems Among University Students
100% (1)
Financial Behavior and Problems Among University Students
8 pages
Coursera
No ratings yet
Coursera
1 page
A Project On "A Study On Consumer Preferences For Coca Cola"
100% (1)
A Project On "A Study On Consumer Preferences For Coca Cola"
45 pages
Module11 PPST5 2 2
67% (3)
Module11 PPST5 2 2
45 pages
Complete Download (eBook PDF) Statistics in Context by Barbara Blatchley PDF All Chapters
No ratings yet
Complete Download (eBook PDF) Statistics in Context by Barbara Blatchley PDF All Chapters
55 pages
Complete Download Gentrifier John Joe Schlichtman PDF All Chapters
100% (2)
Complete Download Gentrifier John Joe Schlichtman PDF All Chapters
61 pages
Pharmacy Practice and Its Research: Evolution and Definitions
No ratings yet
Pharmacy Practice and Its Research: Evolution and Definitions
7 pages
1st Sem Vihaan Sociology Assignment
No ratings yet
1st Sem Vihaan Sociology Assignment
14 pages
Analytical Quality by Design
No ratings yet
Analytical Quality by Design
17 pages

UNIT 1 Introduction of Data Mining

Uploaded by

UNIT 1 Introduction of Data Mining

Uploaded by

Data Mining: Introduction

Introduction to Data Mining

6/30/2019 Introduction to Data Mining 1

Sensor Networks Computational Simulations

6/30/2019 Introduction to Data Mining 2

Data Mining : A technology that blends

DM=(Traditional analysis methods +

6/30/2019 Introduction to Data Mining 3

6/30/2019 Introduction to Data Mining 4

– telescopes scanning the skies

– High-throughput biological data

Surface Temperature of Earth

Traditional methods are often not suitable for analyzing these

•“What is the relationship between frequency and intensity of

•“How is land surface precipitation and temperature affected

•“How well can we predict the beginning and end of the

Reducing hunger and poverty by

6/30/2019 Introduction to Data Mining 8

– Data mining—core of Pattern Evaluation

Data Warehouse Selection

6/30/2019 Databases Introduction to Data Mining 9

 Data Integration: Data collected and integrated from different

Fig: The process of knowledge discovery in databases.

What is not Data  What is Data Mining?

– Look up phone – Certain names are more

 Data Ownership and Distribution:

6/30/2019 Introduction to Data Mining 14

6/30/2019 Introduction to Data Mining 15

6/30/2019 Introduction to Data Mining 16

6/30/2019 Introduction to Data Mining 17

Data mining tasks are generally divided into 2 major

6/30/2019 Introduction to Data Mining 18

6/30/2019 Introduction to Data Mining 19

1 Yes Single 125K No

6/30/2019 Introduction to Data Mining 20

6/30/2019 Introduction to Data Mining 21

6/30/2019 Introduction to Data Mining 22

 Classifying credit card transactions

 Classifying land covers (water bodies, urban areas,

 Categorizing news stories as finance,

 Identifying intruders in the cyberspace

 Predicting tumor cells as benign or malignant

 Classifying secondary structures of protein

6/30/2019 Introduction to Data Mining 23

 Churn prediction for telephone customers

From [Berry & Linoff] Data Mining Techniques, 1997

6/30/2019 Introduction to Data Mining 25

 Predict a value of a given continuous valued variable

 Finds groups of objects such that the objects in a

6/30/2019 Introduction to Data Mining 27

6/30/2019 Introduction to Data Mining 28

6/30/2019 Introduction to Data Mining 29

6/30/2019 Introduction to Data Mining 30

– Goal: To find groups of documents that are similar to

– Approach: To identify frequently occurring terms in

6/30/2019 Introduction to Data Mining 31

Consider the collection of news articles in the table.

 Each article is represented as a set of word-frequency pairs

 Used to discover patterns that describe strongly

6/30/2019 Introduction to Data Mining 34

 Given a set of records each of which contain

6/30/2019 Introduction to Data Mining 35

6/30/2019 Introduction to Data Mining 36

Consider the transaction from sales data collected at a

6/30/2019 Introduction to Data Mining 37

 Association rules can be applied to find items that

6/30/2019 Introduction to Data Mining 38

6/30/2019 Introduction to Data Mining 39

 Credit card company records transactions made

You might also like