Knowledge Mining: Data Patterns
Knowledge Mining: Data Patterns
DEFINITION
Knowledge mining is the process of extracting patterns from data.
Knowledge mining is becoming an increasingly important tool to transform this
data into information. It is commonly used in a wide range of profiling practices,
such as marketing, surveillance, fraud detection and scientific discovery.
INTRODUCTION
In the last several years, the field of data mining has been rapidly expanding,
and attracting many new researchers and users. The underlying reason for such a
rapid growth is a great need for systems that can automatically derive useful
knowledge from vast volumes of computer data being accumulated
worldwide. The field of data mining offers a promise for addressing this need. The
major trust of research has been to develop a repertoire of tools for discovering
both strong and useful patterns in large databases. The function performed by such
tools can be succinctly characterized as a mapping:
DATA PATTERNS
An underlying assumption is that the patterns are created solely from the
data, and thus are expressed in terms of attributes and relations appearing in the
data. Determining such patterns can be a problem of significant computational
complexity, but of a relatively low conceptual complexity, and many efficient
algorithms have been developed for this purpose. This approach to the problem of
deriving useful knowledge from databases has, however, some fundamental
limitations, and new research should address several important tasks. To assure
that patterns are not only strong (i.e., represent frequently occurring relationships),
but also useful to a specific user or group of users, the system cannot rely solely on
the data, but must be able also to represent and understand user’s goals. This
requires a method for goal representation. To be able to derive patterns that are not
only combinatorial combinations of concepts (attributes and relations) that are
already present in the data, the system needs background knowledge that will allow
it to reinterpret and/or combine concepts in the data into new concepts that can
lead to more accurate and/or simpler patterns. To be able to re-use patterns
determined in the previous analyses in the process of updating them in view of new
data, the system needs to have a capability for incremental knowledge-based
pattern discovery. The above-listed requirements create a major new challenge: to
integrating a knowledge base within a data mining system, and to develop methods
for applying this knowledge during data mining.
Since there is a vast array of different tasks for which knowledge generated
from data can be used, there are many different knowledge needs. Therefore, a data
mining system has to use advanced knowledge representations and be able to
generate many different types of knowledge from a given data source. This
problem is being partially addressed by the growing inventory of available data
mining programs. These programs are, however, often arranged into toolboxes, and
individuals programs have to be manually invoked. Using such toolboxes can,
therefore, be a very laborious and time consuming process, and may require
considerable expertise. This problem is being partially addressed by the
development of multistrategy data mining systems that integrate different data
mining tools. To automate further a data mining process, such tools need to be
invokable through a high-level knowledge generation language. Since users want
to understand data mining results, an important research direction is also the
development knowledge visualization methods.
To address the research direction that aims at achieving all the above-
mentioned tasks, we use the term knowledge mining. Knowledge mining can thus
be characterized as concerned with developing and integrating a wide range of data
analysis methods that are able to derive directly or incrementally new knowledge
from large (or small) volumes of data using relevant prior knowledge. The process
of deriving new knowledge has to be guided by criteria inputted to the system
defining the type of knowledge a particular user is interested in. Algorithms for
generating new knowledge must be not only efficient but also oriented toward
producing knowledge satisfying the comprehensibility postulate, that is, easy to
understand and interpret by the users. Knowledge mining can be simply
characterized by the following mapping:
BACKGROUND
There are various steps that are involved in mining data as shown in the picture.
1. Data Integration: First of all the data are collected and integrated from all
the different sources.
2. Data Selection: We may not all the data we have collected in the first step.
So in this step we select only those data which we think useful for data
mining.
3. Data Cleaning: The data we have collected are not clean and may contain
errors, missing values, noisy or inconsistent data. So we need to apply
different techniques to get rid of such anomalies.
4. Data Transformation: The data even after cleaning are not ready for
mining as we need to transform them into forms appropriate for mining. The
techniques used to accomplish this are smoothing, aggregation,
normalization etc.
5. Data Mining: Now we are ready to apply data mining techniques on the
data to discover the interesting patterns. Techniques like clustering and
association analysis are among the many different techniques used for data
mining.
6. Pattern Evaluation and Knowledge Presentation: This step
involves visualization, transformation, removing redundant patterns etc from
the patterns we generated.
NOTABLE USES
Games
Business
In recent years, knowledge mining has been widely used in area of science
and engineering, such as bioinformatics, genetics,
medicine, education and electrical power engineering.
Knowledge mining, which is the partially automated search for hidden patterns
in large databases, offers great potential benefits for applied GIS-based decision-
making. Recently, the task of integrating these two technologies has become
critical, especially as various public and private sector organisations possessing
huge databases with thematic and geographically referenced data begin to realise
the huge potential of the information hidden there. Among those organisations are:
APPLICATIONS
AGRICULTURE:
CUSTOMER ANALYTICS:
For this Customer Analytics, there are two types of categories of data
mining. Predictive models use previous customer interactions to predict future
events while segmentation techniques are used to place customers with similar
behaviors and attributes into distinct groups. This grouping can help marketers to
optimize their campaign management and targeting processes.
SURVEILLANCE
OTHER APPLICATIONS: