0% found this document useful (0 votes)
2 views

⇶Data Mining--2

The document provides an overview of data mining, including definitions of data, information, and knowledge, as well as its applications in various industries such as marketing, finance, and healthcare. It details the data mining process, techniques, advantages, and disadvantages, along with the KDD process and database normalization. Additionally, it discusses the Apriori algorithm for association rule learning, its steps, advantages, and applications.

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

⇶Data Mining--2

The document provides an overview of data mining, including definitions of data, information, and knowledge, as well as its applications in various industries such as marketing, finance, and healthcare. It details the data mining process, techniques, advantages, and disadvantages, along with the KDD process and database normalization. Additionally, it discusses the Apriori algorithm for association rule learning, its steps, advantages, and applications.

Uploaded by

Muhammad Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

⇶Data Mining:

→Data:
The most elementary description of things, events, activities and
transactions.

→Information:
Information is processed or organized data that provides meaning or context
for understanding.

→Knowledge:
Knowledge is the understanding and awareness gained through experience,
education, or the acquisition of information.

Business Intelligence:
Business intelligence (BI) refers to the processes, technologies, and tools used to
collect, analyze, and present data to help businesses make informed decisions,
optimize performance, and gain competitive advantages.
What is data mining?
Data mining is the process of discovering patterns, trends, and insights from large
sets of data using statistical techniques, machine learning, and algorithms.

Why data mining?


Decision Making: Data mining helps organizations make informed decisions by
providing insights based on historical and real-time data.

Predictive Analytics: It enables predictive modeling, allowing businesses to


forecast future trends and behaviors, which can improve planning and strategy.

Customer Insights: Companies can analyze customer behavior, preferences, and


purchasing patterns to enhance marketing strategies and improve customer service.

Application of Data mining:


1. Marketing and Retail

2. Finance and Banking

3. Healthcare
4. E-commerce

5. Telecommunications

6. Manufacturing

Steps of Data Mining:

Data Mining Techniques:


1. Classification

2. Clustering
3. Association Rule Learning

4. Regression

5. Anomaly Detection (Outlier Detection)

6. Dimensionality Reduction

7. Sequential Pattern Mining

8. Text Mining (Natural Language Processing)

Data Warehousing:
Data warehousing is the process of collecting, storing, and managing large volumes
of structured data from different sources in a central repository, known as a data
warehouse, for analysis and reporting.
OLAP (Online Analytical Processing)

● OLAP tools allow users to interactively analyze data stored in the data
warehouse by performing complex queries and generating reports. OLAP
operations like drill-down (viewing detailed data), roll-up (viewing
aggregated data), and slice-and-dice (viewing data from different
perspectives) help in multi-dimensional analysis.

OLTP (Online Transaction Processing)

● A class of systems designed to manage and process transactional data in


real-time. These systems handle high volumes of small, frequent transactions,
typically in a business environment where speed, reliability, and accuracy are
critical. OLTP systems are commonly used in everyday applications like
banking, retail, and customer management.
Data mining tools:
1.RapidMiner

2.DB Miner

3.Oracle

Classification of data mining:

Database Mined:
Relational Database Mining: Focuses on extracting patterns from structured,
table-based databases.
Object-Oriented Database Mining: Involves mining complex data stored in
object-oriented databases.
Transactional Database Mining: Deals with discovering patterns from
transactional data like sales records.
Spatial, Temporal, and Multimedia Database Mining: Analyzes specialized
databases containing spatial, time-series, or multimedia data.

Knowledge Mined:
Descriptive Data Mining: Focuses on summarizing the data and uncovering
patterns or relationships, like clustering, association rules, and summaries.
Predictive Data Mining: Aims to predict unknown or future outcomes based on
historical data, using techniques like classification, regression, and time-series
analysis.

Techniques Utilized:
Classification: Assigns data into predefined categories (e.g., decision trees, neural
networks).
Clustering: Groups similar data points into clusters without predefined labels (e.g.,
k-means, DBSCAN).
Regression: Predicts continuous values based on input data (e.g., linear regression,
support vector machines).
Association: Identifies relationships or patterns between variables (e.g., Apriori,
FP-Growth).

Application Adapted:
● Business:
● Healthcare:
● Finance:
● Retail:

KDD PROCESS:
The KDD (Knowledge Discovery in Databases) process is a series of steps for
discovering useful knowledge from large datasets. It involves the following stages:

1. Data Selection: Identify relevant data from the dataset, focusing on useful
attributes.
2. Data Preprocessing: Clean and preprocess data to handle missing values,
noise, and inconsistencies.
3. Data Transformation: Transform data into a suitable format for mining (e.g.,
normalization, feature extraction).
4. Data Mining: Apply algorithms (e.g., classification, clustering) to discover
patterns and relationships in the data.
5. Interpretation/Evaluation: Interpret the mined results, evaluate their validity,
and extract actionable knowledge.

The KDD process helps in turning raw data into valuable insights for
decision-making.
Advantages of data mining :
1. Better Decision-Making: Extracts valuable insights for informed decisions.
2. Predictive Analytics: Forecasts trends and behaviors for future planning.
3. Cost Efficiency: Reduces operational costs by automating analysis.
4. Fraud Detection: Identifies anomalies to prevent fraud.
5. Personalization: Enables customized services and marketing

Disadvantages of data mining:


1. Privacy Issues: Risk of personal data misuse or exposure.
2. Data Security: Vulnerable to breaches and unauthorized access.
3. High Costs: Implementation and maintenance can be expensive.
4. Data Quality: Inaccurate or incomplete data leads to misleading results.
5. Complexity: Requires skilled personnel and sophisticated tools.

Assignment:
Normalization table (1nf , 2nf, 3nf)
What is Database Normalization?
Normalization is a database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.

⏩Assume, a video library maintains a database of movies rented


out. Without any normalization in database, all information is stored
in one table as shown below.
(1NF):
● Each table cell should contain a single value.
● Each record needs to be unique.

(2NF):
● Rule 1- Be in 1NF
● Rule 2- Single Column Primary Key that does not functionally
dependent on any subset of candidate key relation

Third Normal Form (3NF):


● Rule 1- Be in 2NF
● Rule 2- Has no transitive functional dependencies
What is An Association Rule?
As the name suggests, the association rule is a rule that defines the
dependency between two sets of objects. It basically describes how
a particular item or a set of items is related to another set of items.
{Bread, Butter}-> {Milk, Coffee}

Itemset
An itemset is a set containing one or more items in the transaction
dataset. For instance, {Milk}, {Milk, Bread}, {Tea, Ketchup}, and
{Milk, Tea, Coffee} are all itemsets.
•An itemset can also be an empty set.
•An itemset can contain certain items even if they are not present
together in the transaction dataset.

Support:
Support indicates how frequently an item appears in the data.
Support({milk, bread}) = Number of transactions containing
{milk, bread} / Total number of transactions
= 100 / 1000
= 10%
Confidence:
Confidence is a measure of the likelihood that an itemset will
appear if another itemset appears.
Confidence("If a customer buys milk, they will also buy bread")
= Number of transactions containing
{milk, bread} / Number of transactions containing {milk}
= 100 / 200
= 50%

Lift:
Lift is a measure of the strength of the association between two
items, taking into account the frequency of both items in the
dataset.

Assignment:
Data set that utilizes support, confidence, and lift.

Apriori algorithm in data mining:


❖ The Apriori algorithm was given by R. Agrawal and R. Srikant
in 1994 for finding frequent itemsets in a dataset for the
Boolean association rule.
❖ The Apriori algorithm is an unsupervised machine learning
algorithm used for association rule learning.
Steps of APRIORI ALGORITHM
● Define minimum threshold.
● Create a list of frequent items.
● Create candidate item sets.
● Calculate the support of each candidate.
● Prune the candidate item sets.
● Iterate.
● Generate association rules.
● Evaluate association rules.
Flowchart:
Apriori Algorithm

Methods to Improve Apriori Efficiency


❖ Hash-Based Technique
❖ Transaction Reduction
❖ Partitioning
❖ Sampling
❖ Dynamic Itemset Counting

What are the advantages of the apriori algorithm in data mining?

● Simplicity & ease of implementation.


● The rules are easy to human-readable and *interpretable.
● Works well on unlabelled data.
● Flexibility & customisability.

Disadvantages of the APRIORI algorithm in data mining:


● Computational complexity.
● Time & space overhead.
● Difficulty handling sparse data.
● Limited discovery of complex patterns.
● Higher memory usage.

Applications of Apriori Algorithm


Education
Through the use of traits and specializations, data mining of
accepted students may be used to extract association rules.
Medical
Analyzing the patient's database, for example, might be
appropriate.
Forestry
Frequency and intensity of forest fire analysis using forest fire data.
Autocomplete Tool
Apriori is employed by a number of firms, including Amazon's
recommender system and
Google's autocomplete tool.

You might also like