Data Mining
Data Mining
Data Mining is defined as the procedure of extracting information from huge sets of data. In other words,
we can say that data mining is mining knowledge from data. The tutorial starts off with a basic overview
and the terminologies involved in data mining and then gradually moves on to cover topics such as
knowledge discovery, query language, classification and prediction, decision tree induction, cluster
analysis, and how to mine the Web.
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data
mining is the procedure of mining knowledge from data. The information or knowledge extracted so can be used
for any of the following applications −
● Market Analysis
● Fraud Detection
● Customer Retention
● Production Control
● Science Exploration
Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In
fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.
Data Mining functions are used to define the trends or correlations contained in data mining
activities.
In comparison, data mining activities can be divided into 2 categories:
1. Descriptive Data Mining:
It includes certain knowledge to understand what is happening within the data without a
previous idea. The common data features are highlighted in the data set.
For examples: count, average etc.
2. Predictive Data Mining:
It helps developers to provide unlabeled definitions of attributes. Based on previous
tests, the software estimates the characteristics that are absent.
For example: Judging from the findings of a patient’s medical examinations that is he
suffering from any particular disease.
Association Analysis:
The process involves uncovering the relationship between data and deciding the rules of the
association. It is a way of discovering the relationship between various items. for example, it can
be used to determine the sales of items that are frequently purchased together.
Correlation Analysis:
Correlation is a mathematical technique that can show whether and how strongly the pairs of
attributes are related to each other. For example, Highted people tend to have more weight.
Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be mined, there
are two categories of functions involved in Data Mining −
● Descriptive
● Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of descriptive
functions −
● Class/Concept Description
● Mining of Frequent Patterns
● Mining of Associations
● Mining of Correlations
● Mining of Clusters
Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a company, the
classes of items for sales include computer and printers, and concepts of customers include big spenders and
budget spenders. Such descriptions of a class or a concept are called class/concept descriptions. These descriptions
can be derived by the following two ways −
● Data Characterization − This refers to summarizing data of class under study. This class under study is
called as Target Class.
● Data Discrimination − It refers to the mapping or classification of a class with some predefined group or
class.
Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This
process refers to the process of uncovering the relationship among data and determining association
rules.
For example, a retailer generates an association rule that shows that 70% of time milk is sold with bread
and only 30% of times biscuits are sold with bread.
Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between
associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative
or no effect on each other.
Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects
that are very similar to each other but are highly different from the objects in other clusters.
Background knowledge
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the
Concept hierarchies are one of the background knowledge that allows data to be mined at multiple
levels of abstraction.