Download Complete Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal PDF for All Chapters
Download Complete Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal PDF for All Chapters
https://ebookfinal.com/download/pattern-recognition-algorithms-for-
data-mining-1st-edition-sankar-k-pal/
https://ebookfinal.com/download/data-mining-and-knowledge-discovery-
technologies-advances-in-data-warehousing-and-mining-1st-edition-
david-taniar/
https://ebookfinal.com/download/knowledge-discovery-and-data-mining-
challenges-and-realities-xingquan-zhu/
https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/
Pattern Recognition 1st Edition William Gibson
https://ebookfinal.com/download/pattern-recognition-1st-edition-
william-gibson/
https://ebookfinal.com/download/pattern-recognition-and-trading-
decisions-chris-satchwell/
https://ebookfinal.com/download/pattern-recognition-4ed-edition-
sergios-theodoridis/
https://ebookfinal.com/download/data-mining-for-bioinformatics-1st-
edition-sumeet-dua/
Pal, Sankar K.
Pattern recognition algorithms for data mining : scalability, knowledge discovery, and
soft granular computing / Sankar K. Pal and Pabitra Mitra.
p. cm.
Includes bibliographical references and index.
ISBN 1-58488-457-6 (alk. paper)
1. Data mining. 2. Pattern recognition systems. 3. Computer algorithms. 4. Granular
computing / Sankar K. Pal and Pabita Mitra.
QA76.9.D343P38 2004
006.3'12—dc22 2004043539
This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.
Foreword xiii
Preface xxi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pattern Recognition in Brief . . . . . . . . . . . . . . . . . . 3
1.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Feature selection/extraction . . . . . . . . . . . . . . . 4
1.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Knowledge Discovery in Databases (KDD) . . . . . . . . . . 7
1.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Data mining tasks . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Data mining tools . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Applications of data mining . . . . . . . . . . . . . . . 12
1.5 Different Perspectives of Data Mining . . . . . . . . . . . . . 14
1.5.1 Database perspective . . . . . . . . . . . . . . . . . . . 14
1.5.2 Statistical perspective . . . . . . . . . . . . . . . . . . 15
1.5.3 Pattern recognition perspective . . . . . . . . . . . . . 15
1.5.4 Research issues and challenges . . . . . . . . . . . . . 16
1.6 Scaling Pattern Recognition Algorithms to Large Data Sets . 17
1.6.1 Data reduction . . . . . . . . . . . . . . . . . . . . . . 17
1.6.2 Dimensionality reduction . . . . . . . . . . . . . . . . 18
1.6.3 Active learning . . . . . . . . . . . . . . . . . . . . . . 19
1.6.4 Data partitioning . . . . . . . . . . . . . . . . . . . . . 19
1.6.5 Granular computing . . . . . . . . . . . . . . . . . . . 20
1.6.6 Efficient search algorithms . . . . . . . . . . . . . . . . 20
1.7 Significance of Soft Computing in KDD . . . . . . . . . . . . 21
1.8 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
© 2004 by Taylor & Francis Group, LLC
viii
References 215
Index 237
Indian Statistical Institute (ISI), the home base of Professors S.K. Pal and P.
Mitra, has long been recognized as the world’s premier center of fundamental
research in probability, statistics and, more recently, pattern recognition and
machine intelligence. The halls of ISI are adorned with the names of P.C. Ma-
halanobis, C.R. Rao, R.C. Bose, D. Basu, J.K. Ghosh, D. Dutta Majumder,
K.R. Parthasarathi and other great intellects of the past century–great intel-
lects who have contributed so much and in so many ways to the advancement
of science and technology. The work of Professors Pal and Mitra, ”Pattern
Recognition Algorithms for Data Mining,” or PRDM for short, reflects this
illustrious legacy. The importance of PRDM is hard to exaggerate. It is a
treatise that is an exemplar of authority, deep insights, encyclopedic coverage
and high expository skill.
The primary objective of PRDM, as stated by the authors, is to provide
a unified framework for addressing pattern recognition tasks which are es-
sential for data mining. In reality, the book accomplishes much more; it
develops a unified framework and presents detailed analyses of a wide spec-
trum of methodologies for dealing with problems in which recognition, in one
form or another, plays an important role. Thus, the concepts and techniques
described in PRDM are of relevance not only to problems in pattern recog-
nition, but, more generally, to classification, analysis of dependencies, system
identification, authentication, and ultimately, to data mining. In this broad
perspective, conventional pattern recognition becomes a specialty–a specialty
with deep roots and a large store of working concepts and techniques.
Traditional pattern recognition is subsumed by what may be called recog-
nition technology. I take some credit for arguing, some time ago, that de-
velopment of recognition technology should be accorded a high priority. My
arguments may be found in the foreword,” Recognition Technology and Fuzzy
Logic, ”Special Issue on Recognition Technology, IEEE Transactions on Fuzzy
Systems, 2001. A visible consequence of my arguments was an addition of
the subtitle ”Soft Computing in Recognition and Search,” to the title of the
journal ”Approximate Reasoning.” What is important to note is that recogni-
tion technology is based on soft computing–a coalition of methodologies which
collectively provide a platform for the conception, design and utilization of in-
telligent systems. The principal constitutes of soft computing are fuzzy logic,
neurocomputing, evolutionary computing, probabilistic computing, rough set
theory and machine learning. These are the methodologies which are de-
scribed and applied in PRDM with a high level of authority and expository
xiii
© 2004 by Taylor & Francis Group, LLC
xiv
xvii
© 2004 by Taylor & Francis Group, LLC
Foreword
This is the latest in a series of volumes by Professor Sankar Pal and his col-
laborators on pattern recognition methodologies and applications. Knowledge
discovery and data mining, the recognition of patterns that may be present in
very large data sets and across distributed heterogeneous databases, is an ap-
plication of current prominence. This volume provides a very useful, thorough
exposition of the many facets of this application from several perspectives.
The chapters provide overviews of pattern recognition, data mining, outline
some of the research issues and carefully take the reader through the many
steps that are involved in reaching the desired goal of exposing the patterns
that may be embedded in voluminous data sets. These steps include prepro-
cessing operations for reducing the volume of the data and the dimensionality
of the feature space, clustering, segmentation, and classification. Search al-
gorithms and statistical and database operations are examined. Attention is
devoted to soft computing algorithms derived from the theories of rough sets,
fuzzy sets, genetic algorithms, multilayer perceptrons (MLP), and various hy-
brid combinations of these methodologies.
A valuable expository appendix describes various soft computing method-
ologies and their role in knowledge discovery and data mining (KDD). A sec-
ond appendix provides the reader with several data sets for experimentation
with the procedures described in this volume.
As has been the case with previous volumes by Professor Pal and his col-
laborators, this volume will be very useful to both researchers and students
interested in the latest advances in pattern recognition and its applications in
KDD.
I congratulate the authors of this volume and I am pleased to recommend
it as a valuable addition to the books in this field.
xix
© 2004 by Taylor & Francis Group, LLC
Preface
xxi
© 2004 by Taylor & Francis Group, LLC
xxii
Sankar K. Pal
September 13, 2003 Pabitra Mitra
xxv
© 2004 by Taylor & Francis Group, LLC
xxvi
8.1 Rough set dependency rules for Vowel data along with the input
fuzzification parameter values . . . . . . . . . . . . . . . . . . 191
8.2 Comparative performance of different models . . . . . . . . . 193
8.3 Comparison of the performance of the rules extracted by vari-
ous methods for Vowel, Pat and Hepatobiliary data . . . . . . 195
8.4 Rules extracted from trained networks (Model S) for Vowel
data along with the input fuzzification parameter values . . . 196
8.5 Rules extracted from trained networks (Model S) for Pat data
along with the input fuzzification parameter values . . . . . . 196
8.6 Rules extracted from trained networks (Model S) for Hepato-
biliary data along with the input fuzzification parameter values 197
8.7 Crude rules obtained via rough set theory for staging of cervical
cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.8 Rules extracted from the modular rough MLP for staging of
cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 199
xxvii
© 2004 by Taylor & Francis Group, LLC
xxviii
4.4 Variation of atest with CPU time for (a) cancer, (b) ionosphere,
(c) heart, (d) twonorm, and (e) forest cover type data. . . . . 98
4.5 Variation of confidence factor c and distance D for (a) cancer,
(b) ionosphere, (c) heart, and (d) twonorm data. . . . . . . . 99
4.6 Variation of confidence factor c with iterations of StatQSVM
algorithm for (a) cancer, (b) ionosphere, (c) heart, and (d)
twonorm data. . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Margin distribution obtained at each iteration by the StatQSVM
algorithm for the Twonorm data. The bold line denotes the fi-
nal distribution obtained. . . . . . . . . . . . . . . . . . . . . 101
4.8 Margin distribution obtained by some SVM design algorithms
for the Twonorm data set. . . . . . . . . . . . . . . . . . . . . 102
7.1 The basic network structure for the Kohonen feature map. . . 151
7.2 Neighborhood Nc , centered on unit c (xc , yc ). Three different
neighborhoods are shown at distance d = 1, 2, and 3. . . . . . 153
7.3 Mapping of reducts in the competitive layer of RSOM. . . . . 154
7.4 Variation of quantization error with iteration for Pat data. . . 160
7.5 Variation of quantization error with iteration for vowel data. 160
7.6 Plot showing the frequency of winning nodes using random
weights for the Pat data. . . . . . . . . . . . . . . . . . . . . . 161
7.7 Plot showing the frequency of winning nodes using rough set
knowledge for the Pat data. . . . . . . . . . . . . . . . . . . . 161
1.1 Introduction
Pattern recognition (PR) is an activity that we humans normally excel
in. We do it almost all the time, and without conscious effort. We receive
information via our various sensory organs, which is processed instantaneously
by our brain so that, almost immediately, we are able to identify the source
of the information, without having made any perceptible effort. What is
even more impressive is the accuracy with which we can perform recognition
tasks even under non-ideal conditions, for instance, when the information that
needs to be processed is vague, imprecise or even incomplete. In fact, most
of our day-to-day activities are based on our success in performing various
pattern recognition tasks. For example, when we read a book, we recognize the
letters, words and, ultimately, concepts and notions, from the visual signals
received by our brain, which processes them speedily and probably does a
neurobiological implementation of template-matching! [189]
The discipline of pattern recognition (or pattern recognition by machine)
essentially deals with the problem of developing algorithms and methodolo-
gies/devices that can enable the computer-implementation of many of the
recognition tasks that humans normally perform. The motivation is to per-
form these tasks more accurately, or faster, and perhaps more economically
than humans and, in many cases, to release them from drudgery resulting from
performing routine recognition tasks repetitively and mechanically. The scope
of PR also encompasses tasks humans are not good at, such as reading bar
codes. The goal of pattern recognition research is to devise ways and means of
automating certain decision-making processes that lead to classification and
recognition.
Machine recognition of patterns can be viewed as a two-fold task, consisting
of learning the invariant and common properties of a set of samples charac-
terizing a class, and of deciding that a new sample is a possible member of
the class by noting that it has properties common to those of the set of sam-
ples. The task of pattern recognition by a computer can be described as a
transformation from the measurement space M to the feature space F and
finally to the decision space D; i.e.,
M → F → D.
1
© 2004 by Taylor & Francis Group, LLC
2 Pattern Recognition Algorithms for Data Mining
Data mining is that part of knowledge discovery which deals with the pro-
cess of identifying valid, novel, potentially useful, and ultimately understand-
able patterns in data, and excludes the knowledge interpretation part of KDD.
Therefore, as it stands now, data mining can be viewed as applying PR and
machine learning principles in the context of voluminous, possibly heteroge-
neous data sets [189].
The objective of this book is to provide some results of investigations,
both theoretical and experimental, addressing certain pattern recognition
tasks essential for data mining. Tasks considered include data condensation,
feature selection, case generation, clustering, classification and rule genera-
tion/evaluation. Various methodologies based on both classical and soft com-
puting approaches (integrating fuzzy logic, artificial neural networks, rough
sets, genetic algorithms) have been presented. The emphasis of these method-
ologies is given on (a) handling data sets which are large (both in size and
dimension) and involve classes that are overlapping, intractable and/or having
nonlinear boundaries, and (b) demonstrating the significance of granular com-
puting in soft computing paradigm for generating linguistic rules and dealing
with the knowledge discovery aspect. Before we describe the scope of the
book, we provide a brief review of pattern recognition, knowledge discovery
in data bases, data mining, challenges in application of pattern recognition
algorithms to data mining problems, and some of the possible solutions.
Section 1.2 presents a description of the basic concept, features and tech-
niques of pattern recognition briefly. Next, we define the KDD process and
describe its various components. In Section 1.4 we elaborate upon the data
mining aspects of KDD, discussing its components, tasks involved, approaches
and application areas. The pattern recognition perspective of data mining is
introduced next and related research challenges are mentioned. The problem
of scaling pattern recognition algorithms to large data sets is discussed in Sec-
tion 1.6. Some broad approaches to achieving scalability are listed. The role
of soft computing in knowledge discovery is described in Section 1.7. Finally,
Section 1.8 discusses the plan of the book.
two categories – feature selection in the measurement space and feature selec-
tion in a transformed space. The techniques in the first category generally
reduce the dimensionality of the measurement space by discarding redundant
or least information carrying features. On the other hand, those in the sec-
ond category utilize all the information contained in the measurement space
to obtain a new transformed space, thereby mapping a higher dimensional
pattern to a lower dimensional one. This is referred to as feature extraction.
1.2.3 Classification
The problem of classification is basically one of partitioning the feature
space into regions, one region for each category of input. Thus it attempts to
assign every data point in the entire feature space to one of the possible classes
(say, M ) . In real life, the complete description of the classes is not known.
We have instead a finite and usually smaller number of samples which often
provides partial information for optimal design of feature selector/extractor
or classifying/clustering system. Under such circumstances, it is assumed that
these samples are representative of the classes. Such a set of typical patterns
is called a training set. On the basis of the information gathered from the
samples in the training set, the pattern recognition systems are designed; i.e.,
we decide the values of the parameters of various pattern recognition methods.
Design of a classification or clustering scheme can be made with labeled or
unlabeled data. When the computer is given a set of objects with known
classifications (i.e., labels) and is asked to classify an unknown object based
on the information acquired by it during training, we call the design scheme
supervised learning; otherwise we call it unsupervised learning. Supervised
learning is used for classifying different objects, while clustering is performed
through unsupervised learning.
Pattern classification, by its nature, admits many approaches, sometimes
complementary, sometimes competing, to provide solution of a given problem.
These include decision theoretic approach (both deterministic and probabilis-
tic), syntactic approach, connectionist approach, fuzzy and rough set theoretic
approach and hybrid or soft computing approach.
In the decision theoretic approach, once a pattern is transformed, through
feature evaluation, to a vector in the feature space, its characteristics are ex-
pressed only by a set of numerical values. Classification can be done by using
deterministic or probabilistic techniques [55, 59]. In deterministic classifica-
tion approach, it is assumed that there exists only one unambiguous pattern
class corresponding to each of the unknown pattern vectors. Nearest neighbor
classifier (NN rule) [59] is an example of this category.
In most of the practical problems, the features are usually noisy and the
classes in the feature space are overlapping. In order to model such systems,
the features x1 , x2 , . . . , xi , . . . , xp are considered as random variables in the
probabilistic approach. The most commonly used classifier in such probabilis-
tic systems is the Bayes maximum likelihood classifier [59].
Although both these concepts are important, it has often been observed that
actionability and unexpectedness are correlated. In literature, unexpectedness
is often defined in terms of the dissimilarity of a discovered pattern from a
vocabulary provided by the user.
As an example, consider a database of student evaluations of different
courses offered at some university. This can be defined as EVALUATE (TERM,
YEAR, COURSE, SECTION, INSTRUCTOR, INSTRUCT RATING, COURSE RATING). We
describe two patterns that are interesting in terms of actionability and unex-
pectedness respectively. The pattern that “Professor X is consistently getting
the overall INSTRUCT RATING below the overall COURSE RATING” can be of in-
terest to the chairperson because this shows that Professor X has room for
improvement. If, on the other hand, in most of the course evaluations the
overall INSTRUCT RATING is higher than the COURSE RATING and it turns out
that in most of Professor X’s ratings overall the INSTRUCT RATING is lower
than the COURSE RATING, then such a pattern is unexpected and hence inter-
esting. ✸
Data mining is a step in the KDD process that consists of applying data
analysis and discovery algorithms which, under acceptable computational lim-
itations, produce a particular enumeration of patterns (or generate a model)
over the data. It uses historical information to discover regularities and im-
prove future decisions [161].
The overall KDD process is outlined in Figure 1.1. It is interactive and
iterative involving, more or less, the following steps [65, 66]:
. Data
Cleaning
. Data
Condensation
Machine Mathematical
Knowledge
Interpretation
. Dimensionality Preprocessed Learning Model
of Data Useful
Huge Reduction .Classification . Knowledge Knowledge
Hetero-
geneous
Data
. Clustering (Patterns) Extraction
Raw . Rule . Knowledge
Data
. Data Generation Evaluation
Wrapping
Thus, KDD refers to the overall process of turning low-level data into high-
level knowledge. Perhaps the most important step in the KDD process is data
mining. However, the other steps are also important for the successful appli-
cation of KDD in practice. For example, steps 1, 2 and 3, mentioned above,
have been the subject of widespread research in the area of data warehousing.
We now focus on the data mining component of KDD.
that contain it. Businesses can use knowledge of these patterns to im-
prove placement of items in a store or for mail-order marketing. The
huge size of transaction databases and the exponential increase in the
number of potential frequent itemsets with increase in the number of at-
tributes (items) make the above problem a challenging one. The a priori
algorithm [3] provided one early solution which was improved by sub-
sequent algorithms using partitioning, hashing, sampling and dynamic
itemset counting.
2. Clustering: maps a data item into one of several clusters, where clusters
are natural groupings of data items based on similarity metrics or prob-
ability density models. Clustering is used in several exploratory data
analysis tasks, customer retention and management, and web mining.
The clustering problem has been studied in many fields, including statis-
tics, machine learning and pattern recognition. However, large data
considerations were absent in these approaches. Recently, several new
algorithms with greater emphasis on scalability have been developed, in-
cluding those based on summarized cluster representation called cluster
feature (Birch [291], ScaleKM [29]), sampling (CURE [84]) and density
joins (DBSCAN [61]).
Some other tasks required in some data mining applications are, outlier/
anomaly detection, link analysis, optimization and planning.
5. Example based methods (e.g., nearest neighbor [7], lazy learning [5] and
case based reasoning [122, 208] methods )
The data mining algorithms determine both the flexibility of the model in
representing the data and the interpretability of the model in human terms.
Typically, the more complex models may fit the data better but may also
be more difficult to understand and to fit reliably. Also, each representation
suits some problems better than others. For example, decision tree classifiers
can be very useful for finding structure in high dimensional spaces and are
also useful in problems with mixed continuous and categorical data. However,
they may not be suitable for problems where the true decision boundaries are
nonlinear multivariate functions.
Other (11%)
Banking (17%)
Telecom (11%)
Biology/Genetics (8%)
Retail (6%)
eCommerce/Web (15%)
Pharmaceuticals (5%)
Investment/Stocks (4%)
Fraud Detection (8%)
Insurance (6%)
• The World Wide Web: Information retrieval, resource location [62, 210].
8. Integration. Data mining tools are often only a part of the entire decision
making system. It is desirable that they integrate smoothly, both with
the database and the final decision-making procedure.
In the next section we discuss the issues related to the large size of the data
sets in more detail.
of the original massive data set [18]. The reduced representation should be
as faithful to the original data as possible, for its effective use in different
mining tasks. At present the following categories of reduced representations
are mainly used:
• Sampling/instance selection: Various random, deterministic and den-
sity biased sampling strategies exist in statistics literature. Their use
in machine learning and data mining tasks has also been widely stud-
ied [37, 114, 142]. Note that merely generating a random sample from
a large database stored on disk may itself be a non-trivial task from
a computational viewpoint. Several aspects of instance selection, e.g.,
instance representation, selection of interior/boundary points, and in-
stance pruning strategies, have also been investigated in instance-based
and nearest neighbor classification frameworks [279]. Challenges in de-
signing an instance selection algorithm include accurate representation
of the original data distribution, making fine distinctions at different
scales and noticing rare events and anomalies.
• Data squashing: It is a form of lossy compression where a large data
set is replaced by a small data set and some accompanying quantities,
while attempting to preserve its statistical information [60].
• Indexing data structures: Systems such as kd-trees [22], R-trees, hash
tables, AD-trees, multiresolution kd-trees [54] and cluster feature (CF)-
trees [29] partition the data (or feature space) into buckets recursively,
and store enough information regarding the data in the bucket so that
many mining queries and learning tasks can be achieved in constant or
linear time.
• Frequent itemsets: They are often applied in supermarket data analysis
and require that the attributes are sparsely valued [3].
• DataCubes: Use a relational aggregation database operator to represent
chunks of data [82].
The last four techniques fall into the general class of representation called
cached sufficient statistics [177]. These are summary data structures that lie
between the statistical algorithms and the database, intercepting the kinds of
operations that have the potential to consume large time if they were answered
by direct reading of the data set. Case-based reasoning [122] also involves a
related approach where salient instances (or descriptions) are either selected
or constructed and stored in the case base for later use.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com