0% found this document useful (0 votes)

2 views

⇶Data Mining--2

The document provides an overview of data mining, including definitions of data, information, and knowledge, as well as its applications in various industries such as marketing, finance, and healthcare. It details the data mining process, techniques, advantages, and disadvantages, along with the KDD process and database normalization. Additionally, it discusses the Apriori algorithm for association rule learning, its steps, advantages, and applications.

Uploaded by

Muhammad Ali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

⇶Data Mining--2

Uploaded by

Muhammad Ali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

⇶Data Mining:

→Data:
The most elementary description of things, events, activities and
transactions.

→Information:
Information is processed or organized data that provides meaning or context
for understanding.

→Knowledge:
Knowledge is the understanding and awareness gained through experience,
education, or the acquisition of information.

Business Intelligence:
Business intelligence (BI) refers to the processes, technologies, and tools used to
collect, analyze, and present data to help businesses make informed decisions,
optimize performance, and gain competitive advantages.
What is data mining?
Data mining is the process of discovering patterns, trends, and insights from large
sets of data using statistical techniques, machine learning, and algorithms.

Why data mining?

Decision Making: Data mining helps organizations make informed decisions by
providing insights based on historical and real-time data.

Predictive Analytics: It enables predictive modeling, allowing businesses to

forecast future trends and behaviors, which can improve planning and strategy.

Customer Insights: Companies can analyze customer behavior, preferences, and

purchasing patterns to enhance marketing strategies and improve customer service.

Application of Data mining:

1. Marketing and Retail

2. Finance and Banking

3. Healthcare
4. E-commerce

5. Telecommunications

6. Manufacturing

Steps of Data Mining:

Data Mining Techniques:

1. Classification

2. Clustering
3. Association Rule Learning

4. Regression

5. Anomaly Detection (Outlier Detection)

6. Dimensionality Reduction

7. Sequential Pattern Mining

8. Text Mining (Natural Language Processing)

Data Warehousing:
Data warehousing is the process of collecting, storing, and managing large volumes
of structured data from different sources in a central repository, known as a data
warehouse, for analysis and reporting.
OLAP (Online Analytical Processing)

● OLAP tools allow users to interactively analyze data stored in the data
warehouse by performing complex queries and generating reports. OLAP
operations like drill-down (viewing detailed data), roll-up (viewing
aggregated data), and slice-and-dice (viewing data from different
perspectives) help in multi-dimensional analysis.

OLTP (Online Transaction Processing)

● A class of systems designed to manage and process transactional data in

real-time. These systems handle high volumes of small, frequent transactions,
typically in a business environment where speed, reliability, and accuracy are
critical. OLTP systems are commonly used in everyday applications like
banking, retail, and customer management.
Data mining tools:
1.RapidMiner

2.DB Miner

3.Oracle

Classification of data mining:

Database Mined:
Relational Database Mining: Focuses on extracting patterns from structured,
table-based databases.
Object-Oriented Database Mining: Involves mining complex data stored in
object-oriented databases.
Transactional Database Mining: Deals with discovering patterns from
transactional data like sales records.
Spatial, Temporal, and Multimedia Database Mining: Analyzes specialized
databases containing spatial, time-series, or multimedia data.

Knowledge Mined:
Descriptive Data Mining: Focuses on summarizing the data and uncovering
patterns or relationships, like clustering, association rules, and summaries.
Predictive Data Mining: Aims to predict unknown or future outcomes based on
historical data, using techniques like classification, regression, and time-series
analysis.

Techniques Utilized:
Classification: Assigns data into predefined categories (e.g., decision trees, neural
networks).
Clustering: Groups similar data points into clusters without predefined labels (e.g.,
k-means, DBSCAN).
Regression: Predicts continuous values based on input data (e.g., linear regression,
support vector machines).
Association: Identifies relationships or patterns between variables (e.g., Apriori,
FP-Growth).

Application Adapted:
● Business:
● Healthcare:
● Finance:
● Retail:

KDD PROCESS:
The KDD (Knowledge Discovery in Databases) process is a series of steps for
discovering useful knowledge from large datasets. It involves the following stages:

1. Data Selection: Identify relevant data from the dataset, focusing on useful
attributes.
2. Data Preprocessing: Clean and preprocess data to handle missing values,
noise, and inconsistencies.
3. Data Transformation: Transform data into a suitable format for mining (e.g.,
normalization, feature extraction).
4. Data Mining: Apply algorithms (e.g., classification, clustering) to discover
patterns and relationships in the data.
5. Interpretation/Evaluation: Interpret the mined results, evaluate their validity,
and extract actionable knowledge.

The KDD process helps in turning raw data into valuable insights for
decision-making.
Advantages of data mining :
1. Better Decision-Making: Extracts valuable insights for informed decisions.
2. Predictive Analytics: Forecasts trends and behaviors for future planning.
3. Cost Efficiency: Reduces operational costs by automating analysis.
4. Fraud Detection: Identifies anomalies to prevent fraud.
5. Personalization: Enables customized services and marketing

Disadvantages of data mining:

1. Privacy Issues: Risk of personal data misuse or exposure.
2. Data Security: Vulnerable to breaches and unauthorized access.
3. High Costs: Implementation and maintenance can be expensive.
4. Data Quality: Inaccurate or incomplete data leads to misleading results.
5. Complexity: Requires skilled personnel and sophisticated tools.

Assignment:
Normalization table (1nf , 2nf, 3nf)
What is Database Normalization?
Normalization is a database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.

⏩Assume, a video library maintains a database of movies rented

out. Without any normalization in database, all information is stored
in one table as shown below.
(1NF):
● Each table cell should contain a single value.
● Each record needs to be unique.

(2NF):
● Rule 1- Be in 1NF
● Rule 2- Single Column Primary Key that does not functionally
dependent on any subset of candidate key relation

Third Normal Form (3NF):

● Rule 1- Be in 2NF
● Rule 2- Has no transitive functional dependencies
What is An Association Rule?
As the name suggests, the association rule is a rule that defines the
dependency between two sets of objects. It basically describes how
a particular item or a set of items is related to another set of items.
{Bread, Butter}-> {Milk, Coffee}

Itemset
An itemset is a set containing one or more items in the transaction
dataset. For instance, {Milk}, {Milk, Bread}, {Tea, Ketchup}, and
{Milk, Tea, Coffee} are all itemsets.
•An itemset can also be an empty set.
•An itemset can contain certain items even if they are not present
together in the transaction dataset.

Support:
Support indicates how frequently an item appears in the data.
Support({milk, bread}) = Number of transactions containing
{milk, bread} / Total number of transactions
= 100 / 1000
= 10%
Confidence:
Confidence is a measure of the likelihood that an itemset will
appear if another itemset appears.
Confidence("If a customer buys milk, they will also buy bread")
= Number of transactions containing
{milk, bread} / Number of transactions containing {milk}
= 100 / 200
= 50%

Lift:
Lift is a measure of the strength of the association between two
items, taking into account the frequency of both items in the
dataset.

Assignment:
Data set that utilizes support, confidence, and lift.

Apriori algorithm in data mining:

❖ The Apriori algorithm was given by R. Agrawal and R. Srikant
in 1994 for finding frequent itemsets in a dataset for the
Boolean association rule.
❖ The Apriori algorithm is an unsupervised machine learning
algorithm used for association rule learning.
Steps of APRIORI ALGORITHM
● Define minimum threshold.
● Create a list of frequent items.
● Create candidate item sets.
● Calculate the support of each candidate.
● Prune the candidate item sets.
● Iterate.
● Generate association rules.
● Evaluate association rules.
Flowchart:
Apriori Algorithm

Methods to Improve Apriori Efficiency

❖ Hash-Based Technique
❖ Transaction Reduction
❖ Partitioning
❖ Sampling
❖ Dynamic Itemset Counting

What are the advantages of the apriori algorithm in data mining?

● Simplicity & ease of implementation.

● The rules are easy to human-readable and *interpretable.
● Works well on unlabelled data.
● Flexibility & customisability.

Disadvantages of the APRIORI algorithm in data mining:

● Computational complexity.
● Time & space overhead.
● Difficulty handling sparse data.
● Limited discovery of complex patterns.
● Higher memory usage.

Applications of Apriori Algorithm

Education
Through the use of traits and specializations, data mining of
accepted students may be used to extract association rules.
Medical
Analyzing the patient's database, for example, might be
appropriate.
Forestry
Frequency and intensity of forest fire analysis using forest fire data.
Autocomplete Tool
Apriori is employed by a number of firms, including Amazon's
recommender system and
Google's autocomplete tool.

Detour Coffee Short Assignment
No ratings yet
Detour Coffee Short Assignment
3 pages
Data Mining Questions
No ratings yet
Data Mining Questions
24 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
Introduction to Data Warehouse
No ratings yet
Introduction to Data Warehouse
17 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
DMA_qb_solved
No ratings yet
DMA_qb_solved
42 pages
Course Manual on Data Mining_CSC 425_015446
No ratings yet
Course Manual on Data Mining_CSC 425_015446
44 pages
Data Mining Presentation
No ratings yet
Data Mining Presentation
25 pages
Data mining
No ratings yet
Data mining
8 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
past ppr(1)
No ratings yet
past ppr(1)
31 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Data Mining Seminar
100% (2)
Data Mining Seminar
21 pages
1 What Is Data Mining
No ratings yet
1 What Is Data Mining
9 pages
Data_mining[1]
No ratings yet
Data_mining[1]
17 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Unit3 - Machine Learning With Big Data
No ratings yet
Unit3 - Machine Learning With Big Data
74 pages
DM Module1
No ratings yet
DM Module1
15 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
Data Mining Vs Data Warehousing
No ratings yet
Data Mining Vs Data Warehousing
5 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Adm Unit - 1
No ratings yet
Adm Unit - 1
62 pages
Datawarehouse&Data mining_ALL
No ratings yet
Datawarehouse&Data mining_ALL
46 pages
Introduction to Data Mining and Its Importance
No ratings yet
Introduction to Data Mining and Its Importance
16 pages
Data Mining
No ratings yet
Data Mining
19 pages
Data Mining
No ratings yet
Data Mining
8 pages
Computer Science 3rd Year Specilization
No ratings yet
Computer Science 3rd Year Specilization
9 pages
Data Mining
No ratings yet
Data Mining
14 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Data Mining and Data Analysis UNIT-1 Notes For Print
No ratings yet
Data Mining and Data Analysis UNIT-1 Notes For Print
22 pages
Topic 3 - Data Mining
No ratings yet
Topic 3 - Data Mining
37 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
21 pages
DMI UNIT 1
No ratings yet
DMI UNIT 1
8 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Unit 3
No ratings yet
Unit 3
22 pages
IT01 UNIT 2
No ratings yet
IT01 UNIT 2
15 pages
DWDM MOD-1
No ratings yet
DWDM MOD-1
13 pages
Data Mining
No ratings yet
Data Mining
8 pages
QB 10 Marker
No ratings yet
QB 10 Marker
19 pages
Data Mining Written Notes 1
No ratings yet
Data Mining Written Notes 1
35 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
9 pages
Data Mining Process, Techniques, Tools & Examples
No ratings yet
Data Mining Process, Techniques, Tools & Examples
11 pages
Data Mining 1
No ratings yet
Data Mining 1
166 pages
DM - Weka Reprot
No ratings yet
DM - Weka Reprot
18 pages
Data Mining.pdf
No ratings yet
Data Mining.pdf
6 pages
Business Analytics Important Question Answers
No ratings yet
Business Analytics Important Question Answers
38 pages
Data Mining 445545
No ratings yet
Data Mining 445545
11 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining Unit 1-1
No ratings yet
Data Mining Unit 1-1
11 pages
Data Mining Note
No ratings yet
Data Mining Note
3 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
001Lecture_1 Introduction-1
No ratings yet
001Lecture_1 Introduction-1
40 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
Introduction To Data Mining-1
100% (1)
Introduction To Data Mining-1
24 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
3. SRS Tour & Travel Management System 2
No ratings yet
3. SRS Tour & Travel Management System 2
33 pages
SRS Online Education Website
No ratings yet
SRS Online Education Website
14 pages
C sharp slybs
No ratings yet
C sharp slybs
4 pages
System
No ratings yet
System
5 pages
C# MCQ form
No ratings yet
C# MCQ form
8 pages
BSCS 3rd 1M -COSC-2101-award-list
No ratings yet
BSCS 3rd 1M -COSC-2101-award-list
2 pages
C sharp slybs
No ratings yet
C sharp slybs
8 pages
Final Assignment of MIS - Copy
No ratings yet
Final Assignment of MIS - Copy
30 pages
F21BDOCS1M01049_SDD
No ratings yet
F21BDOCS1M01049_SDD
25 pages
MIS - Copy
No ratings yet
MIS - Copy
11 pages
Sidra Bibi Roll No 1033
No ratings yet
Sidra Bibi Roll No 1033
23 pages
Sdd ahsan
No ratings yet
Sdd ahsan
23 pages
Muhammad Ali Roll no 1100
No ratings yet
Muhammad Ali Roll no 1100
3 pages
Database Management Theory Test 2023
No ratings yet
Database Management Theory Test 2023
3 pages
Storage
No ratings yet
Storage
6 pages
SlideEgg - 200551-ERP Implementation PowerPoint
No ratings yet
SlideEgg - 200551-ERP Implementation PowerPoint
13 pages
What's New in ArcGIS Data Reviewer at 10.7-2.3
No ratings yet
What's New in ArcGIS Data Reviewer at 10.7-2.3
14 pages
1000 computer mcqs
No ratings yet
1000 computer mcqs
84 pages
Complete CCM Book
No ratings yet
Complete CCM Book
377 pages
Comparison Sheet-71000
No ratings yet
Comparison Sheet-71000
12 pages
f5 Application Delivery Controller System Rseries Data Sheet
No ratings yet
f5 Application Delivery Controller System Rseries Data Sheet
17 pages
Nithin Full Stack and C++ Trainer
No ratings yet
Nithin Full Stack and C++ Trainer
3 pages
10-Personnel Cost Planning
No ratings yet
10-Personnel Cost Planning
52 pages
AWS-Solution-Architect-Associate Dumps - 100% Valid Amazon AWS-Solution-Architect-Associate Practice Exam Questions
No ratings yet
AWS-Solution-Architect-Associate Dumps - 100% Valid Amazon AWS-Solution-Architect-Associate Practice Exam Questions
2 pages
Concurrent and Open Source Software Development: Anthony N. Ilukwe SWE 4103 Seminar 2 November 18, 2009
No ratings yet
Concurrent and Open Source Software Development: Anthony N. Ilukwe SWE 4103 Seminar 2 November 18, 2009
17 pages
System Design: Data Flow Diagram / Use Case Diagram / Flow Diagram
No ratings yet
System Design: Data Flow Diagram / Use Case Diagram / Flow Diagram
9 pages
Data Governance Checklist - 0
100% (2)
Data Governance Checklist - 0
5 pages
Storage Area Network SAN SEMINAR
No ratings yet
Storage Area Network SAN SEMINAR
6 pages
Computer 7 TQ
No ratings yet
Computer 7 TQ
4 pages
Primavera P6 vs. Microsoft Project: It's All About The Enterprise
No ratings yet
Primavera P6 vs. Microsoft Project: It's All About The Enterprise
4 pages
Filter Phases For On Submit, Merge, Modify - BMC Communities
No ratings yet
Filter Phases For On Submit, Merge, Modify - BMC Communities
6 pages
JS Scrimba
No ratings yet
JS Scrimba
4 pages
Linux TCG Software Stack Low Level Design
No ratings yet
Linux TCG Software Stack Low Level Design
64 pages
Certsdeals Veeam Vmce v12 Exam Dumps by Lambert 15-04-2024 10qa
No ratings yet
Certsdeals Veeam Vmce v12 Exam Dumps by Lambert 15-04-2024 10qa
15 pages
Ansys 2022 R1 - Job Schedulers and Queuing Systems Support
No ratings yet
Ansys 2022 R1 - Job Schedulers and Queuing Systems Support
1 page
Tos Ict 2021-2022 - Q4 - Week 1-4
No ratings yet
Tos Ict 2021-2022 - Q4 - Week 1-4
4 pages
Manual of SOLIDserver Packager v2.2.0
No ratings yet
Manual of SOLIDserver Packager v2.2.0
8 pages
Cloud Controls Matrix Template (March 2023)
No ratings yet
Cloud Controls Matrix Template (March 2023)
593 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Glossary - Tools For DS
No ratings yet
Glossary - Tools For DS
3 pages
ClearPass CLI Guide
No ratings yet
ClearPass CLI Guide
80 pages
Local Media1124144108368006559
No ratings yet
Local Media1124144108368006559
4 pages

⇶Data Mining--2

Uploaded by

⇶Data Mining--2

Uploaded by

⇶Data Mining:

Why data mining?

Predictive Analytics: It enables predictive modeling, allowing businesses to

Customer Insights: Companies can analyze customer behavior, preferences, and

Application of Data mining:

2. Finance and Banking

Steps of Data Mining:

Data Mining Techniques:

5. Anomaly Detection (Outlier Detection)

7. Sequential Pattern Mining

8. Text Mining (Natural Language Processing)

OLTP (Online Transaction Processing)

● A class of systems designed to manage and process transactional data in

Classification of data mining:

Disadvantages of data mining:

⏩Assume, a video library maintains a database of movies rented

Third Normal Form (3NF):

Apriori algorithm in data mining:

Methods to Improve Apriori Efficiency

What are the advantages of the apriori algorithm in data mining?

● Simplicity & ease of implementation.

Disadvantages of the APRIORI algorithm in data mining:

Applications of Apriori Algorithm

You might also like