0% found this document useful (0 votes)
5 views

Data Mining

The document discusses data mining, which is defined as extracting information from large datasets. It covers topics like knowledge discovery, classification, prediction, decision trees, clustering, and web mining. It also discusses applications of data mining like market analysis, risk management, and fraud detection.

Uploaded by

naackrmu2023
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Mining

The document discusses data mining, which is defined as extracting information from large datasets. It covers topics like knowledge discovery, classification, prediction, decision trees, clustering, and web mining. It also discusses applications of data mining like market analysis, risk management, and fraud detection.

Uploaded by

naackrmu2023
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Mining

Data Mining is defined as the procedure of extracting information from huge sets of data. In other words,
we can say that data mining is mining knowledge from data. The tutorial starts off with a basic overview
and the terminologies involved in data mining and then gradually moves on to cover topics such as
knowledge discovery, query language, classification and prediction, decision tree induction, cluster
analysis, and how to mine the Web.

Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data
mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used
for any of the following applications −
● Market Analysis
● Fraud Detection
● Customer Retention
● Production Control
● Science Exploration

Data Mining Applications


Data mining is highly useful in the following domains −
● Market Analysis and Management
● Corporate Analysis & Risk Management
● Fraud Detection
Apart from these, data mining can also be used in the areas of production control, customer retention,
science exploration, sports, astrology, and Internet Web Surf-Aid

Market Analysis and Management


Listed below are the various fields of market where data mining is used −
● Customer Profiling − Data mining helps determine what kind of people buy what kind of products.
● Identifying Customer Requirements − Data mining helps in identifying the best products for different
customers. It uses prediction to find the factors that may attract new customers.
● Cross Market Analysis − Data mining performs Association/correlations between product sales.
● Target Marketing − Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.
● Determining Customer purchasing pattern − Data mining helps in determining customer purchasing pattern.
● Providing Summary Information − Data mining provides us various multidimensional summary reports.

Corporate Analysis and Risk Management


Data mining is used in the following fields of the Corporate Sector −
● Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction, contingent claim
analysis to evaluate assets.
● Resource Planning − It involves summarizing and comparing the resources and spending.
● Competition − It involves monitoring competitors and market directions.

Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In
fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.
Data Mining functions are used to define the trends or correlations contained in data mining
activities.
In comparison, data mining activities can be divided into 2 categories:
1. Descriptive Data Mining:
It includes certain knowledge to understand what is happening within the data without a
previous idea. The common data features are highlighted in the data set.
For examples: count, average etc.
2. Predictive Data Mining:
It helps developers to provide unlabeled definitions of attributes. Based on previous
tests, the software estimates the characteristics that are absent.
For example: Judging from the findings of a patient’s medical examinations that is he
suffering from any particular disease.

Data Mining Functionality:


1. Class/Concept Descriptions:
Classes or definitions can be correlated with results. In simplified, descriptive and yet accurate
ways, it can be helpful to define individual groups and concepts.
These class or concept definitions are referred to as class/concept descriptions.
● Data Characterization:
This refers to the summary of general characteristics or features of the class that is
under the study. For example. To study the characteristics of a software product whose
sales increased by 15% two years ago, anyone can collect these type of data related to
such products by running SQL queries.
● Data Discrimination:
It compares common features of class which is under study. The output of this process
can be represented in many forms. Eg., bar charts, curves and pie charts.
2. Mining Frequent Patterns, Associations, and Correlations:
Frequent patterns are nothing but things that are found to be most common in the data.
There are different kinds of frequency that can be observed in the dataset.
● Frequent item set:
This applies to a number of items that can be seen together regularly for eg: milk and
sugar.
● Frequent Subsequence:
This refers to the pattern series that often occurs regularly such as purchasing a phone
followed by a back cover.
● Frequent Substructure:
It refers to the different kinds of data structures such as trees and graphs that may be
combined with the itemset or subsequence.

Association Analysis:
The process involves uncovering the relationship between data and deciding the rules of the
association. It is a way of discovering the relationship between various items. for example, it can
be used to determine the sales of items that are frequently purchased together.
Correlation Analysis:
Correlation is a mathematical technique that can show whether and how strongly the pairs of
attributes are related to each other. For example, Highted people tend to have more weight.

Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be mined, there
are two categories of functions involved in Data Mining −
● Descriptive
● Classification and Prediction

Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of descriptive
functions −
● Class/Concept Description
● Mining of Frequent Patterns
● Mining of Associations
● Mining of Correlations
● Mining of Clusters

Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a company, the
classes of items for sales include computer and printers, and concepts of customers include big spenders and
budget spenders. Such descriptions of a class or a concept are called class/concept descriptions. These descriptions
can be derived by the following two ways −
● Data Characterization − This refers to summarizing data of class under study. This class under study is
called as Target Class.
● Data Discrimination − It refers to the mapping or classification of a class with some predefined group or
class.

Mining of Frequent Patterns


Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of frequent
patterns −
● Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk and bread.
● Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a camera is
followed by memory card.
● Frequent Sub Structure − Substructure refers to different structural forms, such as graphs, trees, or lattices,
which may be combined with item-sets or subsequences.

Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This
process refers to the process of uncovering the relationship among data and determining association
rules.
For example, a retailer generates an association rule that shows that 70% of time milk is sold with bread
and only 30% of times biscuits are sold with bread.

Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between
associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative
or no effect on each other.

Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects
that are very similar to each other but are highly different from the objects in other clusters.

Classification and Prediction


Classification is the process of finding a model that describes the data classes or concepts. The purpose is to be
able to use this model to predict the class of objects whose class label is unknown. This derived model is based on
the analysis of sets of training data. The derived model can be presented in the following forms −
● Classification (IF-THEN) Rules
● Decision Trees
● Mathematical Formulae
● Neural Networks
The list of functions involved in these processes are as follows −
● Classification − It predicts the class of objects whose class label is unknown. Its objective is to find a
derived model that describes and distinguishes data classes or concepts. The Derived Model is based on the
analysis set of training data i.e. the data object whose class label is well known.
● Prediction − It is used to predict missing or unavailable numerical data values rather than class labels.
Regression Analysis is generally used for prediction. Prediction can also be used for identification of
distribution trends based on available data.
● Outlier Analysis − Outliers may be defined as the data objects that do not comply with the general behavior
or model of the data available.
● Evolution Analysis − Evolution analysis refers to the description and model regularities or trends for objects
whose behavior changes over time.

Data Mining Task Primitives


● We can specify a data mining task in the form of a data mining query.
● This query is input to the system.
● A data mining query is defined in terms of data mining task primitives.
Note − These primitives allow us to communicate in an interactive manner with the data mining system. Here is
the list of Data Mining Task Primitives −
● Set of task relevant data to be mined.
● Kind of knowledge to be mined.
● Background knowledge to be used in discovery process.
● Interestingness measures and thresholds for pattern evaluation.
● Representation for visualizing the discovered patterns.

Set of task relevant data to be mined


This is the portion of database in which the user is interested. This portion includes the following −
● Database Attributes
● Data Warehouse dimensions of interest

Kind of knowledge to be mined


It refers to the kind of functions to be performed. These functions are −
● Characterization
● Discrimination
● Association and Correlation Analysis
● Classification
● Prediction
● Clustering
● Outlier Analysis
● Evolution Analysis

Background knowledge
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the
Concept hierarchies are one of the background knowledge that allows data to be mined at multiple
levels of abstraction.

Interestingness measures and thresholds for pattern evaluation


This is used to evaluate the patterns that are discovered by the process of knowledge discovery. There
are different interesting measures for different kind of knowledge.

Representation for visualizing the discovered patterns


This refers to the form in which discovered patterns are to be displayed. These representations may include the
following. −
● Rules
● Tables
● Charts
● Graphs
● Decision Trees
● Cubes

You might also like