0% found this document useful (0 votes)

0 views23 pages

DataMining and Warehousing - chapter1

Data mining is the process of discovering patterns and insights from large datasets using techniques from statistics, machine learning, and artificial intelligence. Key processes include data cleaning, pattern discovery, prediction, and evaluation, with applications across various fields such as healthcare, finance, and retail. The document also outlines the components of data mining systems, including data source, preprocessing, mining engine, evaluation, and user interface layers.

Uploaded by

Bacha Tariku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views23 pages

DataMining and Warehousing - chapter1

Uploaded by

Bacha Tariku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Chapter 1

Overview
Brief description
of Data Mining

• Data mining is the process of discovering patterns, correlations,

trends, and useful knowledge from large sets of data

• It combines techniques from statistics, machine learning,

database systems, and artificial intelligence to uncover hidden
insights from vast datasets

• The ultimate goal is to extract valuable information that can be

used for decision-making, prediction, and optimization across
various fields.
Data Cleaning: Data
Identifying and Transformation:
handling missing, Structuring data
noisy, or into a suitable
inconsistent data. format for analysis.

Key
processes Prediction: Using
discovered patterns
to make future
Pattern Discovery:
Identifying patterns,
associations, or
in Data predictions. trends in the data.

Mining
Evaluation:
Assessing the
quality and
relevance of the
discovered patterns.
Data
Mining
Architectu
re
Market Basket Analysis

Fraud Detection
Use cases
of Data Customer Segmentation
Mining
Predictive Maintenance

Network Intrusion
Detection
Descriptive data mining: involves summarizing
and describing the characteristics of a data set.

Types of Predictive data mining: involves using data to

build models that can make predictions or
Data forecasts about future events or outcomes.

Mining
Prescriptive data mining: involves using data
and models to make recommendations or
suggestions about actions or decisions. This type
of data mining is often used to optimize processes,
allocate resources, or make other decisions that
can help organizations achieve their goals.
Data Mining vs Statistics
• Statistics is a field of mathematics that focuses on data collection,
analysis, interpretation, and presentation using established
theories and mathematical models.

• Data Mining is a process in computer science and artificial

intelligence that involves automated discovery of patterns,
relationships, and insights from large datasets, often using
machine learning and algorithms.

• Both data mining and statistics involve analyzing data to find

patterns, trends, and relationships. However, they differ in
approach, purpose and methodology,
Data Mining vs Statistics
Aspect Statistics Data Mining

Approach Hypothesis-driven Data-driven

Data Size Works best with small to medium Works well with very large
datasets datasets
Process Starts with a hypothesis, then tests Finds patterns automatically
it using data without prior hypotheses
Techniques Regression, hypothesis testing, Clustering, classification,
sampling neural networks

Tools R, SPSS, SAS Python, Weka, Apache Spark,

SQL
When to use...
Situation Stat DM
Yes No
You have a hypothesis and want to test it.
You need to analyze a small, structured dataset. Yes No

You want to explore large datasets for hidden patterns. No Yes

You need to generate human-readable results. Yes No

You are working with complex, high-volume data. No Yes

You need predictions, not just explanations. No Yes

Challenges in data
mining
Data Quality

Data Privacy and Security

Data Complexity

Interpretability

Scalability
Ethical Concerns about Data
Mining
Applications of data mining: Healthcare and
Medicine

Disease Prediction Hospital Resource Drug Discovery and

and Diagnosis Optimization Development
Applications of data mining: Finance
and Banking

Risk
Fraud Management Stock Market
Detection and Credit Prediction
Scoring
Applications of data mining: Retail and E-commerce

Customer Behavior Analysis and

Personalization
Market Basket Analysis

Demand Forecasting and Inventory

Management
Applications of data mining: Manufacturing and
Industry

Predictive Maintenance

Quality Control and Defect

Detection
Supply Chain Optimization
Applications of data mining: Education

Student Performance
Prediction
Adaptive Learning Systems
Exam Cheating Detection
Applications of data mining: Social Media and
Entertainment

Sentiment Analysis

Recommendation Systems

Fake News and Misinformation

Detection
Components/layers of data mining
systems
1. Data Source Layer (Data Collection & Storage)
• This is the foundation of any data mining system.
• It consists of various data sources such as:
 Databases (MySQL, PostgreSQL, MongoDB)
 Data Warehouses (Amazon Redshift, Snowflake)
 Flat Files (CSV, Excel, Text files)
 Big Data Repositories (Hadoop, Spark)
Components/layers of data mining
systems
2. Data Preprocessing Layer (Data Cleaning &
Transformation)
Before performing data mining, raw data must be
prepared to ensure accuracy and efficiency. This
layer consists of:
 Data Cleaning: Removing noise, handling missing values, and
correcting inconsistencies.
 Data Integration: Combining data from multiple sources into a unified
format.
 Data Transformation: Normalizing or aggregating data for better
analysis.
 Data Reduction: Summarizing data to improve efficiency (e.g.,
dimensionality reduction using PCA).
Components/layers of data mining
systems
3. Data Mining Engine (Pattern Extraction & Processing)
This is the core component where actual data mining happens. It
consists of various algorithms and techniques used for pattern
recognition.
Functions of the Data Mining Engine:
o Classification & Prediction: Assigns data to categories (e.g., fraud
detection).
o Clustering: Groups similar data points (e.g., customer segmentation).
o Association Rule Mining: Identifies relationships between items (e.g.,
“Customers who buy phones often buy earphones”).
o Anomaly Detection: Identifies unusual patterns (e.g., detecting cyber
threats).
o Regression Analysis: Predicts numerical values (e.g., forecasting sales
revenue).
Components/layers of data mining
systems
4. Pattern Evaluation & Knowledge Representation
Layer
o This layer ensures that only useful, valid, and interesting
patterns are retained for decision-making.
o Pattern Validation: Determines whether a discovered
pattern is statistically significant.
o Interestingness Measures: Filters out unimportant trends.
o Visualization Tools: Displays patterns in user-friendly
formats like charts, graphs, and dashboards.
Components/layers of data mining
systems
5. User Interface Layer (Decision Making &
Interaction)
• The final layer allows users to interact with the system
and interpret results.
• It includes dashboards, query interfaces, and
reporting tools.
Common Technologies Used:
o BI Tools (Power BI, Tableau, QlikView)
o Statistical Software (SAS, R, SPSS)
o Machine Learning Libraries (Scikit-learn, TensorFlow)
Classification

Clustering

Data
mining Regression

functionalit Association Rule Learning

ies Anomaly Detection

Time Series Analysis

Are All Patterns Important?

Web Based Scholarship Management System
No ratings yet
Web Based Scholarship Management System
68 pages
Unique Fantasy Cricket Game - MyGreat11
No ratings yet
Unique Fantasy Cricket Game - MyGreat11
20 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
DWDM-LS1-Fall-24-25
No ratings yet
DWDM-LS1-Fall-24-25
42 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Unit - I
No ratings yet
Unit - I
22 pages
data mining 1
No ratings yet
data mining 1
39 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
intro data mining
No ratings yet
intro data mining
51 pages
Data Mining-Session 1
No ratings yet
Data Mining-Session 1
29 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
da257829-b262-4875-aa76-2975d8aeaa2c
No ratings yet
da257829-b262-4875-aa76-2975d8aeaa2c
31 pages
Chapter 4 Introduction to Data Mining
No ratings yet
Chapter 4 Introduction to Data Mining
21 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Lecture 1
No ratings yet
Lecture 1
17 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Data Mining and IBM SPSS Modeler
No ratings yet
Data Mining and IBM SPSS Modeler
20 pages
Data Mining
No ratings yet
Data Mining
88 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
aryanDwmppt
No ratings yet
aryanDwmppt
9 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Unit 3
No ratings yet
Unit 3
22 pages
DWDM UNIT-2
No ratings yet
DWDM UNIT-2
13 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
01 Intro
No ratings yet
01 Intro
40 pages
unit-III
No ratings yet
unit-III
101 pages
1 Intro
No ratings yet
1 Intro
33 pages
Data Science Module 1 Notes
No ratings yet
Data Science Module 1 Notes
16 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Introduction
No ratings yet
Introduction
27 pages
DM NOTES
No ratings yet
DM NOTES
91 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
DWDM 3 UNIT NOTES
No ratings yet
DWDM 3 UNIT NOTES
10 pages
01 Intro
No ratings yet
01 Intro
29 pages
Module 2 Data Mining
No ratings yet
Module 2 Data Mining
49 pages
01Intro
No ratings yet
01Intro
41 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
Chapter 2 Data Mining
No ratings yet
Chapter 2 Data Mining
25 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Methods of Proof
No ratings yet
Methods of Proof
9 pages
Mobile Application Development
No ratings yet
Mobile Application Development
8 pages
Linear Algebra I Final
No ratings yet
Linear Algebra I Final
185 pages
Structure
No ratings yet
Structure
31 pages
Introduction To C
No ratings yet
Introduction To C
40 pages
C++ Module Chapter 1
No ratings yet
C++ Module Chapter 1
9 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
C++ Module Chapter 3
No ratings yet
C++ Module Chapter 3
11 pages
Debug Port Design Guidelines
No ratings yet
Debug Port Design Guidelines
7 pages
ERD Paper Final123
No ratings yet
ERD Paper Final123
12 pages
02 - Explaining Threat Actors and Threat Intelligence
No ratings yet
02 - Explaining Threat Actors and Threat Intelligence
26 pages
User Manual For School User: JULY 2020
No ratings yet
User Manual For School User: JULY 2020
13 pages
Django Ajax Json CRUD
No ratings yet
Django Ajax Json CRUD
9 pages
Instant Ebooks Textbook (Etextbook PDF) For Essentials of MIS 13th Edition by Kenneth Download All Chapters
No ratings yet
Instant Ebooks Textbook (Etextbook PDF) For Essentials of MIS 13th Edition by Kenneth Download All Chapters
54 pages
Leveraging Zero-Downtime Maintenance For Release Upgrades of
No ratings yet
Leveraging Zero-Downtime Maintenance For Release Upgrades of
34 pages
Bahasa Inggris II - UTS
No ratings yet
Bahasa Inggris II - UTS
19 pages
REMEDIAL CLASS NOTICE For The Subject of Front End Engineering (23CS002) - BE CSE Batch 2023
No ratings yet
REMEDIAL CLASS NOTICE For The Subject of Front End Engineering (23CS002) - BE CSE Batch 2023
4 pages
Website Designing
No ratings yet
Website Designing
45 pages
What Are System Calls in Operating System
No ratings yet
What Are System Calls in Operating System
7 pages
ARIETTA-50
No ratings yet
ARIETTA-50
2 pages
Date: - 23/09/2023 Weekly Test (Answer Key) STD:-11 Time: - 30 Minutes Subject: - Computer Marks: - 25
No ratings yet
Date: - 23/09/2023 Weekly Test (Answer Key) STD:-11 Time: - 30 Minutes Subject: - Computer Marks: - 25
1 page
Major Project Report
No ratings yet
Major Project Report
26 pages
CSC 412 SW Engineering
No ratings yet
CSC 412 SW Engineering
112 pages
MODULE 4 LESSON 1
No ratings yet
MODULE 4 LESSON 1
11 pages
MPGK Quiz 22 Nov 24' - 241122 - 182439
No ratings yet
MPGK Quiz 22 Nov 24' - 241122 - 182439
6 pages
Tuition Fee Receipt - PDF - December 24 2020 Lambton College Receipt of Payment India 110032 Receipt Date C0817737 - 000562948 - Course Hero
No ratings yet
Tuition Fee Receipt - PDF - December 24 2020 Lambton College Receipt of Payment India 110032 Receipt Date C0817737 - 000562948 - Course Hero
7 pages
Lec1 24th Nov
No ratings yet
Lec1 24th Nov
29 pages
Basic Structure of C-Program
88% (16)
Basic Structure of C-Program
24 pages
Excel - Custom Date Format
No ratings yet
Excel - Custom Date Format
16 pages
Cisco NMH300 Installation guide
No ratings yet
Cisco NMH300 Installation guide
32 pages
Tears of Themis Planner
No ratings yet
Tears of Themis Planner
43 pages
Thesis Online Registration System
100% (2)
Thesis Online Registration System
7 pages
IBM FlashCore Module (FCM) Product Guide
No ratings yet
IBM FlashCore Module (FCM) Product Guide
34 pages
Conditions Loops
No ratings yet
Conditions Loops
20 pages
Vivek C++
No ratings yet
Vivek C++
71 pages
Citrix Synergy 2014 Learning Lab - Deploying Desktop Virtualization Using Citrix XenDesktop and HP Moonshot
No ratings yet
Citrix Synergy 2014 Learning Lab - Deploying Desktop Virtualization Using Citrix XenDesktop and HP Moonshot
137 pages

DataMining and Warehousing - chapter1

Uploaded by

DataMining and Warehousing - chapter1

Uploaded by

Chapter 1

• Data mining is the process of discovering patterns, correlations,

• It combines techniques from statistics, machine learning,

• The ultimate goal is to extract valuable information that can be

Types of Predictive data mining: involves using data to

• Data Mining is a process in computer science and artificial

• Both data mining and statistics involve analyzing data to find

Approach Hypothesis-driven Data-driven

Tools R, SPSS, SAS Python, Weka, Apache Spark,

You want to explore large datasets for hidden patterns. No Yes

You need to generate human-readable results. Yes No

You are working with complex, high-volume data. No Yes

You need predictions, not just explanations. No Yes

Data Privacy and Security

Disease Prediction Hospital Resource Drug Discovery and

Customer Behavior Analysis and

Demand Forecasting and Inventory

Quality Control and Defect

Fake News and Misinformation

functionalit Association Rule Learning

ies Anomaly Detection

Time Series Analysis

You might also like