0% found this document useful (0 votes)
152 views

Knowledge Mining: Data Patterns

Knowledge mining is the process of extracting useful patterns from large datasets using data mining techniques and prior knowledge. It is used in applications like marketing, fraud detection, and scientific discovery to transform data into useful information. The key steps in knowledge mining involve data integration, cleaning, transformation, mining patterns from the data, evaluating and presenting the patterns, and using the discovered knowledge for decision making. Some notable uses of knowledge mining include extracting strategies from game databases and optimizing customer relationship management in business.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

Knowledge Mining: Data Patterns

Knowledge mining is the process of extracting useful patterns from large datasets using data mining techniques and prior knowledge. It is used in applications like marketing, fraud detection, and scientific discovery to transform data into useful information. The key steps in knowledge mining involve data integration, cleaning, transformation, mining patterns from the data, evaluating and presenting the patterns, and using the discovered knowledge for decision making. Some notable uses of knowledge mining include extracting strategies from game databases and optimizing customer relationship management in business.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

KNOWLEDGE MINING

DEFINITION
Knowledge mining is the process of extracting patterns from data.
Knowledge mining is becoming an increasingly important tool to transform this
data into information. It is commonly used in a wide range of profiling practices,
such as marketing, surveillance, fraud detection and scientific discovery.

INTRODUCTION
In the last several years, the field of data mining has been rapidly expanding,
and attracting many new researchers and users. The underlying reason for such a
rapid growth is a great need for systems that can automatically derive useful
knowledge from vast volumes of computer data being accumulated
worldwide. The field of data mining offers a promise for addressing this need. The
major trust of research has been to develop a repertoire of tools for discovering
both strong and useful patterns in large databases. The function performed by such
tools can be succinctly characterized as a mapping:

DATA PATTERNS

An underlying assumption is that the patterns are created solely from the
data, and thus are expressed in terms of attributes and relations appearing in the
data. Determining such patterns can be a problem of significant computational
complexity, but of a relatively low conceptual complexity, and many efficient
algorithms have been developed for this purpose. This approach to the problem of
deriving useful knowledge from databases has, however, some fundamental
limitations, and new research should address several important tasks. To assure
that patterns are not only strong (i.e., represent frequently occurring relationships),
but also useful to a specific user or group of users, the system cannot rely solely on
the data, but must be able also to represent and understand user’s goals. This
requires a method for goal representation. To be able to derive patterns that are not
only combinatorial combinations of concepts (attributes and relations) that are
already present in the data, the system needs background knowledge that will allow
it to reinterpret and/or combine concepts in the data into new concepts that can
lead to more accurate and/or simpler patterns. To be able to re-use patterns
determined in the previous analyses in the process of updating them in view of new
data, the system needs to have a capability for incremental knowledge-based
pattern discovery. The above-listed requirements create a major new challenge: to
integrating a knowledge base within a data mining system, and to develop methods
for applying this knowledge during data mining.

Since there is a vast array of different tasks for which knowledge generated
from data can be used, there are many different knowledge needs. Therefore, a data
mining system has to use advanced knowledge representations and be able to
generate many different types of knowledge from a given data source. This
problem is being partially addressed by the growing inventory of available data
mining programs. These programs are, however, often arranged into toolboxes, and
individuals programs have to be manually invoked. Using such toolboxes can,
therefore, be a very laborious and time consuming process, and may require
considerable expertise. This problem is being partially addressed by the
development of multistrategy data mining systems that integrate different data
mining tools. To automate further a data mining process, such tools need to be
invokable through a high-level knowledge generation language. Since users want
to understand data mining results, an important research direction is also the
development knowledge visualization methods.
To address the research direction that aims at achieving all the above-
mentioned tasks, we use the term knowledge mining. Knowledge mining can thus
be characterized as concerned with developing and integrating a wide range of data
analysis methods that are able to derive directly or incrementally new knowledge
from large (or small) volumes of data using relevant prior knowledge. The process
of deriving new knowledge has to be guided by criteria inputted to the system
defining the type of knowledge a particular user is interested in. Algorithms for
generating new knowledge must be not only efficient but also oriented toward
producing knowledge satisfying the comprehensibility postulate, that is, easy to
understand and interpret by the users. Knowledge mining can be simply
characterized by the following mapping:

DATA + PRIOR _KNOWLEDGE + GOAL  NEW_KNOWLEDGE

where GOAL is an encoding of the knowledge needs of the user(s), and


NEW_KNOWLEDGE is knowledge satisfying the GOAL. Such knowledge can be
in the form of decision rules, association rules, decision trees, conceptual or
similarity-based clusters, equations, Bayesian nets, statistical summaries,
visualizations, natural language summaries, or other knowledge representations.

BACKGROUND

The manual extraction of patterns from data has occurred for centuries.


Early methods of identifying patterns in data include Bayes' theorem (1700s) and
regression analysis (1800s). The proliferation, ubiquity and increasing power of
computer technology has increased data collection, storage and manipulations.
As data sets have grown in size and complexity, direct hands-on data analysis has
increasingly been augmented with indirect, automatic data processing. This has
been aided by other discoveries in computer science, such as neural
networks, clustering, genetic algorithms (1950s), decision trees (1960s)
and support vector machines (1980s). Knowledge mining is the process of
applying these methods to data with the intention of uncovering hidden patterns. It
has been used for many years by businesses, scientists and governments to sift
through volumes of data such as airline passenger trip records, census data and
supermarket scanner data to produce market research reports. (Note, however, that
reporting is not always considered to be data mining.)

A primary reason for using knowledge mining is to assist in the analysis of


collections of observations of behaviour. Such data are vulnerable
to collinearity because of unknown interrelations. An unavoidable fact of
knowledge mining is that the (sub-)set(s) of data being analysed may not be
representative of the whole domain, and therefore may not contain examples of
certain critical relationships and behaviours that exist across other parts of the
domain. To address this sort of issue, the analysis may be augmented using
experiment-based and other approaches, such as Choice Modelling for human-
generated data. In these situations, inherent correlations can be either controlled
for, or removed altogether, during the construction of the experimental design.

Steps of Knowledge Mining

There are various steps that are involved in mining data as shown in the picture.
1. Data Integration: First of all the data are collected and integrated from all
the different sources.
2. Data Selection: We may not all the data we have collected in the first step.
So in this step we select only those data which we think useful for data
mining.
3. Data Cleaning: The data we have collected are not clean and may contain
errors, missing values, noisy or inconsistent data. So we need to apply
different techniques to get rid of such anomalies.
4. Data Transformation: The data even after cleaning are not ready for
mining as we need to transform them into forms appropriate for mining. The
techniques used to accomplish this are smoothing, aggregation,
normalization etc.
5. Data Mining: Now we are ready to apply data mining techniques on the
data to discover the interesting patterns. Techniques like clustering and
association analysis are among the many different techniques used for data
mining.
6. Pattern Evaluation and Knowledge Presentation: This step
involves visualization, transformation, removing redundant patterns etc from
the patterns we generated.

Decisions / Use of Discovered Knowledge: This step helps user to make use of


the knowledge acquired to take better decisions.

NOTABLE USES
Games

Since the early 1960s, with the availability of oracles for


certain combinatorial games, also called tablebases (e.g. for 3x3-chess) with any
beginning configuration, small-board dots-and-boxes, small-board-hex, and certain
endgames in chess, dots-and-boxes, and hex; a new area for knowledge mining has
been opened up. This is the extraction of human-usable strategies from these
oracles. Current pattern recognition approaches do not seem to fully have the
required high level of abstraction in order to be applied successfully. Instead,
extensive experimentation with the tablebases, combined with an intensive study of
tablebase-answers to well designed problems and with knowledge of prior art, i.e.
pre-tablebase knowledge, is used to yield insightful patterns. Berlekamp in dots-
and-boxes etc. and John Nunn in chess endgames are notable examples of
researchers doing this work, though they were not and are not involved in
tablebase generation.

Business

Knowledge mining in customer relationship management applications can


contribute significantly to the bottom line. Rather than randomly contacting a
prospect or customer through a call center or sending mail, a company can
concentrate its efforts on prospects that are predicted to have a high likelihood of
responding to an offer. More sophisticated methods may be used to optimise
resources across campaigns so that one may predict which channel and which offer
an individual is most likely to respond to—across all potential offers. Additionally,
sophisticated applications could be used to automate the mailing. Once the results
from data mining (potential prospect/customer and channel/offer) are determined,
this "sophisticated application" can either automatically send an e-mail or regular
mail. Finally, in cases where many people will take an action without an offer,
uplift modeling can be used to determine which people will have the greatest
increase in responding if given an offer. Data clustering can also be used to
automatically discover the segments or groups within a customer data set.
Businesses employing knowledge mining may see a return on investment,
but also they recognise that the number of predictive models can quickly become
very large. Rather than one model to predict how many customers will churn, a
business could build a separate model for each region and customer type. Then
instead of sending an offer to all people that are likely to churn, it may only want
to send offers to customers. And finally, it may also want to determine which
customers are going to be profitable over a window of time and only send the
offers to those that are likely to be profitable. In order to maintain this quantity of
models, they need to manage model versions and move to automated data mining.

Knowledge mining can also be helpful to human-resources departments in


identifying the characteristics of their most successful employees. Information
obtained, such as universities attended by highly successful employees, can help
HR focus recruiting efforts accordingly. Additionally, Strategic Enterprise
Management applications help a company translate corporate-level goals, such as
profit and margin share targets, into operational decisions, such as production plans
and workforce levels.

Another example of knowledge mining, often called the market basket


analysis, relates to its use in retail sales. If a clothing store records the purchases of
customers, a data-mining system could identify those customers who favour silk
shirts over cotton ones. Although some explanations of relationships may be
difficult, taking advantage of it is easier. The example deals with association
rules within transaction-based data. Not all data are transaction based and logical
or inexact rules may also be present within a database. In a manufacturing
application, an inexact rule may state that 73% of products which have a specific
defect or problem will develop a secondary problem within the next six months.
Market basket analysis has also been used to identify the purchase patterns
of the Alpha consumer. Alpha Consumers are people that play a key role in
connecting with the concept behind a product, then adopting that product, and
finally validating it for the rest of society. Analyzing the data collected on this type
of users has allowed companies to predict future buying trends and forecast supply
demands.

Science and engineering

In recent years, knowledge mining has been widely used in area of science
and engineering, such as bioinformatics, genetics,
medicine, education and electrical power engineering.

In the area of study on human genetics, an important goal is to understand


the mapping relationship between the inter-individual variation in
human DNAsequences and variability in disease susceptibility. In lay terms, it is to
find out how the changes in an individual's DNA sequence affect the risk of
developing common diseases such as cancer. This is very important to help
improve the diagnosis, prevention and treatment of the diseases. The data mining
technique that is used to perform this task is known as multifactor dimensionality
reduction.

In the area of electrical power engineering, knowledge mining techniques


have been widely used for condition monitoring of high voltage electrical
equipment. The purpose of condition monitoring is to obtain valuable information
on the insulation's health status of the equipment. Data clustering such as self-
organizing map(SOM) has been applied on the vibration monitoring and analysis
of transformer on-load tap-changers(OLTCS). Using vibration monitoring, it can
be observed that each tap change operation generates a signal that contains
information about the condition of the tap changer contacts and the drive
mechanisms. Obviously, different tap positions will generate different signals.
However, there was considerable variability amongst normal condition signals for
exactly the same tap position. SOM has been applied to detect abnormal conditions
and to estimate the nature of the abnormalities.

Knowledge mining techniques have also been applied for dissolved gas


analysis (DGA) on power transformers. DGA, as a diagnostics for power
transformer, has been available for many years. Data mining techniques such as
SOM has been applied to analyse data and to determine trends which are not
obvious to the standard DGA ratio techniques such as Duval Triangle.

A fourth area of application for knowledge mining in science/engineering is


within educational research, where knowledge mining has been used to study the
factors leading students to choose to engage in behaviors which reduce their
learning and to understand the factors influencing university student retention. A
similar example of the social application of data mining is its use in expertise
finding systems, whereby descriptors of human expertise are extracted, normalised
and classified so as to facilitate the finding of experts, particularly in scientific and
technical fields.

Spatial knowledge mining

Spatial knowledge mining is the application of data mining techniques to


spatial data. Spatial data mining follows along the same functions in data mining,
with the end objective to find patterns in geography. So far, knowledge mining
and Geographic Information Systems (GIS) have existed as two separate
technologies, each with its own methods, traditions and approaches to visualization
and data analysis. Particularly, most contemporary GIS have only very basic
spatial analysis functionality. The immense explosion in geographically referenced
data occasioned by developments in IT, digital mapping, remote sensing, and the
global diffusion of GIS emphasises the importance of developing data driven
inductive approaches to geographical analysis and modeling.

Knowledge mining, which is the partially automated search for hidden patterns
in large databases, offers great potential benefits for applied GIS-based decision-
making. Recently, the task of integrating these two technologies has become
critical, especially as various public and private sector organisations possessing
huge databases with thematic and geographically referenced data begin to realise
the huge potential of the information hidden there. Among those organisations are:

 offices requiring analysis or dissemination of geo-referenced statistical data


 public health services searching for explanations of disease clusters
 environmental agencies assessing the impact of changing land-use patterns
on climate change
 geo-marketing companies doing customer segmentation based on spatial
location.
Challenges in Spatial knowledge mining

Geospatial data repositories tend to be very large. Moreover, existing GIS


datasets are often splintered into feature and attribute components, that are
conventionally archived in hybrid data management systems. Algorithmic
requirements differ substantially for relational (attribute) data management and for
topological (feature) data management. Related to this is the range and diversity of
geographic data formats, that also presents unique challenges. The digital
geographic data revolution is creating new types of data formats beyond the
traditional "vector" and "raster" formats. Geographic data repositories increasingly
include ill-structured data such as imagery and geo-referenced multi-media.

There are several critical research challenges in geographic knowledge


discovery and data mining. Miller and Han  offer the following list of emerging
research topics in the field:

 Developing and supporting geographic data warehouses – Spatial


properties are often reduced to simple aspatial attributes in mainstream data
warehouses. Creating an integrated GDW requires solving issues in spatial and
temporal data interoperability, including differences in semantics, referencing
systems, geometry, accuracy and position.
 Better spatio-temporal representations in geographic knowledge
discovery – Current geographic knowledge discovery (GKD) techniques
generally use very simple representations of geographic objects and spatial
relationships. Geographic data mining techniques should recognise more
complex geographic objects (lines and polygons) and relationships (non-
Euclidean distances, direction, connectivity and interaction through attributed
geographic space such as terrain). Time needs to be more fully integrated into
these geographic representations and relationships.
 Geographic knowledge discovery using diverse data types – GKD
techniques should be developed that can handle diverse data types beyond the
traditional raster and vector models, including imagery and geo-referenced
multimedia, as well as dynamic data types (video streams, animation).

APPLICATIONS
AGRICULTURE:

Knowledge mining in agriculture is a very recent research topic. It consists


in the application of data mining techniques to agriculture. Recent technologies are
nowadays able to provide a lot of information on agricultural-related activities,
which can then be analyzed in order to find important information. A related, but
not equivalent term is precision agriculture.

 Prediction of problematic wine fermentations


 Detection of diseases from sounds issued by animals
 Sorting apples by watercores
 Optimizing pesticide usage by knowledge Mining
 Explaining pesticide abuse by knowledge Mining

CUSTOMER ANALYTICS:

For this Customer Analytics, there are two types of categories of data
mining. Predictive models use previous customer interactions to predict future
events while segmentation techniques are used to place customers with similar
behaviors and attributes into distinct groups. This grouping can help marketers to
optimize their campaign management and targeting processes.

By continuing to improve customer prediction techniques it will become a


necessity rather than a convenient commodity for businesses to use customer
analytics. With this valuable information there is an opportunity to fine-tune retail
operations and store manager decisions. Rapid decision making will increase in
speed and effectiveness in the future as tools and information become more easily
accessible. The possibilities are still emerging, but applications in political races,
jury selection, and developing clinical trial communities are areas that customer
analytics could be used in the future.

SURVEILLANCE

Knowledge mining is used, to stop terrorist programs. include the Total


Information Awareness (TIA) program, Secure Flight (formerly known as
Computer-Assisted Passenger Prescreening System), Analysis, Dissemination,
Visualization, Insight, Semantic Enhancement (ADVISE), and the Multi-state
Anti-Terrorism Information Exchange (MATRIX). These programs have been
discontinued due to controversy over whether they violate the US Constitution's
4th amendment, although many programs that were formed under them continue to
be funded by different organisations, or under different names.

Two plausible data mining techniques in the context of combating terrorism


include "pattern mining" and "subject-based data mining".

OTHER APPLICATIONS:

 National Security Agency


 Quantitative structure-activity relationship
 Police-enforced ANPR in the UK
 Stellar wind (code name)
 Educational Data Mining
RECENT RESEARCH IN KNOWLEDGE MINING USING NLP

Mining Event-based Commonsense Knowledge from Web Using NLP Techniques:


The real life intelligent applications such as agents, expert systems, dialog
understanding systems, weather forecasting systems, robotics etc. mainly focus on
commonsense knowledge And basically these works on the knowledgebase which
contains large amount of commonsense knowledge. The main intention of this
work is to create a commonsense knowledgebase by using an effective
methodology to retrieve commonsense knowledge from large amount of web data.
In order to achieve the best results, it makes use of different natural language
processing techniques such as semantic role labeling, lexical and syntactic
analysis.

Text analysis and knowledge mining using NLP:


As the use of IT systems expands, growing amounts of textual data are being
generated, stored, and searched. This trend is widely believed to be causing
information overload. Although the increase of accessible data is intended to
increase our knowledge and yield insights for better actions, the data glut is
making it hard to find meaning. Natural Language Processing (NLP) is a key
technology to exploit text data, so applications for NLP are increasing rapidly.
Such applications often exploit text mining, but they involve a broad range of NLP
technologies as the applications develop. This new trend is generating new
demands for NLP that require more research.

Knowledge mining for supporting learning processes:

AI technologies for knowledge mining are commonly used in technical


environments. Their application for social processes like learning processes, for
example, is a quite a new challenge, which is characterized by having
ldquohumans in the looprdquo. Humans' desires, preferences and decisions may be
unpredictable and thus, not appropriate for modeling - at a first glance. However,
in learning processes didactic variants can be anticipated and can become a subject
of AI technologies. A semi-formal modeling approach called storyboarding, is
outlined here. A storyboard represents various opportunities for composing a
learning process according to individual circumstances, such as topical
prerequisites (educational history), mental prerequisites (preferred learning styles,
etc.), performance prerequisites (a requested success level in former learning
activities, etc.), and personal aspects (needs, wishes, talents, aims). By
storyboarding, various didactic variants can be validated by considering the
average learning success associated with the different paths through a storyboard in
a case study. Based on validation results, success chances can be derived for the
different paths. Here, a concept and an implementation to pre-estimate success
chances of intended (future) learning paths through a storyboard are introduced.
They are based on a data mining technology, and construct a decision tree by
analyzing former learners' paths and their degrees of success. Furthermore, this
technology generates a supplement to a submitted path, which is optimal according
to the success chances. This technology has been tested at a Japanese university, in
which students had to compose their individual plan (subject sequences) in
advance, and the technology helped them by predicting success chances and
suggesting alternatives.

Knowledge management and mining for marketing:


Due to the proliferation of information systems and technology, businesses
increasingly have the capability to accumulate huge amounts of customer data in
large databases. However, much of the useful marketing insights into customer
characteristics and their purchase patterns are largely hidden and untapped. Current
emphasis on customer relationship management makes the marketing function an
ideal application area to greatly benefit from the use of data mining tools for
decision support. A systematic methodology that uses data mining and knowledge
management techniques is proposed to manage the marketing knowledge and
support marketing decisions. This methodology can be the basis for enhancing
customer relationship management.

Text analysis and knowledge mining system:

Large text databases potentially contain a great wealth of knowledge.


However, text represents factual information (and information about the author's
communicative intentions) in a complex, rich, and opaque manner. Consequently,
unlike numerical and fixed field data, it cannot be analyzed by standard statistical
data mining methods. Relying on human analysis results in either huge workloads
or the analysis of only a tiny fraction of the database. Working are on text mining
technology to extract knowledge from very large amounts of textual data. Unlike
information retrieval technology that allows a user to select documents that meet
the user's requirements and interests, or document clustering technology that
organizes documents, we focus on finding valuable patterns and rules in text that
indicate trends and significant features about specific topics. By applying prototype
system named TAKMI (Text Analysis and Knowledge MIning) to textual
databases in PC help centers, automatically detect product failures; determine
issues that have led to rapid increases in the number of calls and their underlying
reasons; and analyze help center productivity and changes in customers' behavior
involving a particular product, without reading any of the text.

You might also like