0% found this document useful (0 votes)
4 views

data mining

Data mining is the process of extracting knowledge from large datasets using statistical and computational techniques to discover hidden patterns and relationships. It can be categorized into descriptive, predictive, and prescriptive types, each serving different purposes such as summarizing data, making forecasts, or providing recommendations. While data mining offers benefits like improved decision-making and increased efficiency, it also faces limitations such as data quality issues, model bias, ethical considerations, and technical challenges.

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

data mining

Data mining is the process of extracting knowledge from large datasets using statistical and computational techniques to discover hidden patterns and relationships. It can be categorized into descriptive, predictive, and prescriptive types, each serving different purposes such as summarizing data, making forecasts, or providing recommendations. While data mining offers benefits like improved decision-making and increased efficiency, it also faces limitations such as data quality issues, model bias, ethical considerations, and technical challenges.

Uploaded by

tushikasahu5
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

.

Exp 1: Overview of Data Mining


Define
Data mining :- It is a process to extract information from huge amount of data.
1. Data mining is the process of extracting knowledge or insights from large amounts
of data using various statistical and computational techniques. The data can be
structured, semi-structured or unstructured, and can be stored in various forms
such as databases, data warehouses, and data lakes.
The primary goal of data mining is to discover hidden patterns and relationships in
the data that can be used to make informed decisions or predictions. This involves
exploring the data using various techniques such as clustering, classification,
regression analysis, association rule mining, and anomaly detection.

2. Data mining is the process of discovering patterns and relationships in large


datasets using techniques such as machine learning and statistical analysis. The goal
of data mining is to extract useful information from large datasets and use it to
make predictions or inform decision-making. Data mining is important because it
allows organizations to uncover insights and trends in their data that would be
difficult or impossible to discover manually.
Types of data mining
There are many different types of data mining, but they can generally be grouped into
three broad categories: descriptive, predictive, and prescriptive.
 Descriptive data mining involves summarizing and describing the characteristics of
a data set. This type of data mining is often used to explore and understand the
data, identify patterns and trends, and summarize the data in a meaningful way.
 Predictive data mining involves using data to build models that can make
predictions or forecasts about future events or outcomes. This type of data mining
is often used to identify and model relationships between different variables, and
to make predictions about future events or outcomes based on those relationships.
 Prescriptive data mining involves using data and models to make recommendations
or suggestions about actions or decisions. This type of data mining is often used to
optimize processes, allocate resources, or make other decisions that can help
organizations achieve their goals.
Benefits of Data Mining
Data mining is the process of extracting useful information and insights from large data
sets. It is a powerful and flexible tool that has many benefits, including:
1. Improved decision-making – One of the main benefits of data mining is that it can
help organizations make better decisions. By analyzing data and uncovering hidden
patterns and trends, data mining can provide valuable insights and information that
can be used to inform and improve decision-making.

2. Increased efficiency and productivity – Data mining can also help organizations
increase their efficiency and productivity. By automating and streamlining the data
analysis process, data mining can save time and resources, and help organizations
work more effectively and efficiently.
3. Reduced costs – Data mining can also help organizations reduce their costs. By
identifying and addressing inefficiencies and waste, data mining can help
organizations save money and improve their bottom line.
4. Increased customer satisfaction – Data mining can also be used to improve
customer satisfaction. By analyzing data on customer behavior and preferences,
data mining can help organizations understand their customers better, and provide
more personalized and relevant products and services.
5. Improved risk management – Data mining can also be used to improve risk
management. By analyzing data on potential risks and vulnerabilities, data mining
can help organizations identify and mitigate potential risks, and make more
informed and strategic decisions.

Limitations of Data Mining


Data mining is a powerful and flexible tool for extracting useful information and insights
from large data sets. However, like any other tool, data mining has its limitations and
challenges. Some of the main limitations of data mining include:
1. Data quality – One of the main limitations of data mining is the quality of the data.
Data mining can only be as accurate and reliable as the data that it is based on, and
poor-quality data can lead to inaccurate or misleading results.

2. Model bias – Another limitation of data mining is the potential for bias in the
models that are built from the data. If the data is not representative of the
population, or if there is bias in the way the data is collected or analyzed, the
models that are built from the data may be biased, and may not accurately reflect
the underlying relationships in the data.

3. Ethical considerations – Data mining also raises ethical considerations. The data
that is collected and analyzed may be sensitive or personal, and organizations must
ensure that they handle this data responsibly and in compliance with relevant laws
and regulations.

4. Technical challenges – Data mining can also be technically challenging, especially


when dealing with large and complex data sets. Extracting useful information and
insights from data can require specialized skills and expertise, and can be time-
consuming and resource-intensive.
KDD Process
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of
useful, previously unknown, and potentially valuable information from large datasets.
The KDD process is an iterative process and it requires multiple iterations of the above
steps to extract accurate knowledge from the data.The following steps are included in
KDD process:
Data Integration
Data integration is defined as heterogeneous data from multiple sources combined in a
common source(DataWarehouse). Data integration using Data Migration tools, Data
Synchronization tools and ETL(Extract-Load-Transformation) process.
Data Cleaning
Data cleaning is defined as removal of noisy and irrelevant data from collection.
1. Cleaning in case of Missing values.
2. Cleaning noisy data, where noise is a random or variance error.
3. Cleaning with Data discrepancy detection and Data transformation tools.
Data Selection
Data selection is defined as the process where data relevant to the analysis is decided
and retrieved from the data collection. For this we can use Neural network, Decision
Trees, Naive bayes, Clustering, and Regression methods.
Data Transformation
Data Transformation is defined as the process of transforming data into appropriate
form required by mining procedure. Data Transformation is a two step process:
1. Data Mapping: Assigning elements from source base to destination to capture
transformations.
2. Code generation: Creation of the actual transformation program.
Data Mining
Data mining is defined as techniques that are applied to extract patterns potentially
useful. It transforms task relevant data into patterns, and decides purpose of model
using classification or characterization.
Pattern Evaluation
Pattern Evaluation is defined as identifying strictly increasing patterns representing
knowledge based on given measures. It find interestingness score of each pattern, and
uses summarization and Visualization to make data understandable by user.
Knowledge Representation
This involves presenting the results in a way that is meaningful and can be used to make
decisions.
information
data
evaluation
data
reduction
data
transformat
data ion
data select
data clean
integration
Database

You might also like