Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal - Own the complete ebook set now in PDF and DOCX formats
Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal - Own the complete ebook set now in PDF and DOCX formats
https://ebookfinal.com/download/pattern-recognition-algorithms-for-
data-mining-1st-edition-sankar-k-pal/
https://ebookfinal.com/download/data-mining-and-knowledge-discovery-
technologies-advances-in-data-warehousing-and-mining-1st-edition-
david-taniar/
https://ebookfinal.com/download/knowledge-discovery-and-data-mining-
challenges-and-realities-xingquan-zhu/
https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/
Pattern Recognition 1st Edition William Gibson
https://ebookfinal.com/download/pattern-recognition-1st-edition-
william-gibson/
https://ebookfinal.com/download/pattern-recognition-and-trading-
decisions-chris-satchwell/
https://ebookfinal.com/download/pattern-recognition-4ed-edition-
sergios-theodoridis/
https://ebookfinal.com/download/data-mining-for-bioinformatics-1st-
edition-sumeet-dua/
Pal, Sankar K.
Pattern recognition algorithms for data mining : scalability, knowledge discovery, and
soft granular computing / Sankar K. Pal and Pabitra Mitra.
p. cm.
Includes bibliographical references and index.
ISBN 1-58488-457-6 (alk. paper)
1. Data mining. 2. Pattern recognition systems. 3. Computer algorithms. 4. Granular
computing / Sankar K. Pal and Pabita Mitra.
QA76.9.D343P38 2004
006.3'12—dc22 2004043539
This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.
Foreword xiii
Preface xxi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pattern Recognition in Brief . . . . . . . . . . . . . . . . . . 3
1.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Feature selection/extraction . . . . . . . . . . . . . . . 4
1.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Knowledge Discovery in Databases (KDD) . . . . . . . . . . 7
1.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Data mining tasks . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Data mining tools . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Applications of data mining . . . . . . . . . . . . . . . 12
1.5 Different Perspectives of Data Mining . . . . . . . . . . . . . 14
1.5.1 Database perspective . . . . . . . . . . . . . . . . . . . 14
1.5.2 Statistical perspective . . . . . . . . . . . . . . . . . . 15
1.5.3 Pattern recognition perspective . . . . . . . . . . . . . 15
1.5.4 Research issues and challenges . . . . . . . . . . . . . 16
1.6 Scaling Pattern Recognition Algorithms to Large Data Sets . 17
1.6.1 Data reduction . . . . . . . . . . . . . . . . . . . . . . 17
1.6.2 Dimensionality reduction . . . . . . . . . . . . . . . . 18
1.6.3 Active learning . . . . . . . . . . . . . . . . . . . . . . 19
1.6.4 Data partitioning . . . . . . . . . . . . . . . . . . . . . 19
1.6.5 Granular computing . . . . . . . . . . . . . . . . . . . 20
1.6.6 Efficient search algorithms . . . . . . . . . . . . . . . . 20
1.7 Significance of Soft Computing in KDD . . . . . . . . . . . . 21
1.8 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . 22
vii
© 2004 by Taylor & Francis Group, LLC
viii
References 215
Index 237
Indian Statistical Institute (ISI), the home base of Professors S.K. Pal and P.
Mitra, has long been recognized as the world’s premier center of fundamental
research in probability, statistics and, more recently, pattern recognition and
machine intelligence. The halls of ISI are adorned with the names of P.C. Ma-
halanobis, C.R. Rao, R.C. Bose, D. Basu, J.K. Ghosh, D. Dutta Majumder,
K.R. Parthasarathi and other great intellects of the past century–great intel-
lects who have contributed so much and in so many ways to the advancement
of science and technology. The work of Professors Pal and Mitra, ”Pattern
Recognition Algorithms for Data Mining,” or PRDM for short, reflects this
illustrious legacy. The importance of PRDM is hard to exaggerate. It is a
treatise that is an exemplar of authority, deep insights, encyclopedic coverage
and high expository skill.
The primary objective of PRDM, as stated by the authors, is to provide
a unified framework for addressing pattern recognition tasks which are es-
sential for data mining. In reality, the book accomplishes much more; it
develops a unified framework and presents detailed analyses of a wide spec-
trum of methodologies for dealing with problems in which recognition, in one
form or another, plays an important role. Thus, the concepts and techniques
described in PRDM are of relevance not only to problems in pattern recog-
nition, but, more generally, to classification, analysis of dependencies, system
identification, authentication, and ultimately, to data mining. In this broad
perspective, conventional pattern recognition becomes a specialty–a specialty
with deep roots and a large store of working concepts and techniques.
Traditional pattern recognition is subsumed by what may be called recog-
nition technology. I take some credit for arguing, some time ago, that de-
velopment of recognition technology should be accorded a high priority. My
arguments may be found in the foreword,” Recognition Technology and Fuzzy
Logic, ”Special Issue on Recognition Technology, IEEE Transactions on Fuzzy
Systems, 2001. A visible consequence of my arguments was an addition of
the subtitle ”Soft Computing in Recognition and Search,” to the title of the
journal ”Approximate Reasoning.” What is important to note is that recogni-
tion technology is based on soft computing–a coalition of methodologies which
collectively provide a platform for the conception, design and utilization of in-
telligent systems. The principal constitutes of soft computing are fuzzy logic,
neurocomputing, evolutionary computing, probabilistic computing, rough set
theory and machine learning. These are the methodologies which are de-
scribed and applied in PRDM with a high level of authority and expository
xiii
© 2004 by Taylor & Francis Group, LLC
xiv
xvii
© 2004 by Taylor & Francis Group, LLC
Foreword
This is the latest in a series of volumes by Professor Sankar Pal and his col-
laborators on pattern recognition methodologies and applications. Knowledge
discovery and data mining, the recognition of patterns that may be present in
very large data sets and across distributed heterogeneous databases, is an ap-
plication of current prominence. This volume provides a very useful, thorough
exposition of the many facets of this application from several perspectives.
The chapters provide overviews of pattern recognition, data mining, outline
some of the research issues and carefully take the reader through the many
steps that are involved in reaching the desired goal of exposing the patterns
that may be embedded in voluminous data sets. These steps include prepro-
cessing operations for reducing the volume of the data and the dimensionality
of the feature space, clustering, segmentation, and classification. Search al-
gorithms and statistical and database operations are examined. Attention is
devoted to soft computing algorithms derived from the theories of rough sets,
fuzzy sets, genetic algorithms, multilayer perceptrons (MLP), and various hy-
brid combinations of these methodologies.
A valuable expository appendix describes various soft computing method-
ologies and their role in knowledge discovery and data mining (KDD). A sec-
ond appendix provides the reader with several data sets for experimentation
with the procedures described in this volume.
As has been the case with previous volumes by Professor Pal and his col-
laborators, this volume will be very useful to both researchers and students
interested in the latest advances in pattern recognition and its applications in
KDD.
I congratulate the authors of this volume and I am pleased to recommend
it as a valuable addition to the books in this field.
xix
© 2004 by Taylor & Francis Group, LLC
Preface
xxi
© 2004 by Taylor & Francis Group, LLC
xxii
Sankar K. Pal
September 13, 2003 Pabitra Mitra
xxv
© 2004 by Taylor & Francis Group, LLC
xxvi
8.1 Rough set dependency rules for Vowel data along with the input
fuzzification parameter values . . . . . . . . . . . . . . . . . . 191
8.2 Comparative performance of different models . . . . . . . . . 193
8.3 Comparison of the performance of the rules extracted by vari-
ous methods for Vowel, Pat and Hepatobiliary data . . . . . . 195
8.4 Rules extracted from trained networks (Model S) for Vowel
data along with the input fuzzification parameter values . . . 196
8.5 Rules extracted from trained networks (Model S) for Pat data
along with the input fuzzification parameter values . . . . . . 196
8.6 Rules extracted from trained networks (Model S) for Hepato-
biliary data along with the input fuzzification parameter values 197
8.7 Crude rules obtained via rough set theory for staging of cervical
cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.8 Rules extracted from the modular rough MLP for staging of
cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 199
xxvii
© 2004 by Taylor & Francis Group, LLC
xxviii
4.4 Variation of atest with CPU time for (a) cancer, (b) ionosphere,
(c) heart, (d) twonorm, and (e) forest cover type data. . . . . 98
4.5 Variation of confidence factor c and distance D for (a) cancer,
(b) ionosphere, (c) heart, and (d) twonorm data. . . . . . . . 99
4.6 Variation of confidence factor c with iterations of StatQSVM
algorithm for (a) cancer, (b) ionosphere, (c) heart, and (d)
twonorm data. . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Margin distribution obtained at each iteration by the StatQSVM
algorithm for the Twonorm data. The bold line denotes the fi-
nal distribution obtained. . . . . . . . . . . . . . . . . . . . . 101
4.8 Margin distribution obtained by some SVM design algorithms
for the Twonorm data set. . . . . . . . . . . . . . . . . . . . . 102
Language: French
PAR
GASTON LEMAY
PARIS
G. CHARPENTIER, LIBRAIRE-ÉDITEUR
13, RUE DE GRENELLE-SAINT-GERMAIN, 13
1879
Tous droits réservés.
PARIS. — IMPRIMERIE Vve P. LAROUSSE ET Cie
19, RUE MONTPARNASSE, 19
A mes compagnons de voyage
A l’État-major
et à l’Équipage de la JUNON
Gaston Lemay.
A MONSIEUR GEORGES BIARD
LIEUTENANT DE VAISSEAU
G. L.
DE MARSEILLE A GIBRALTAR
En mer, 2 août.
3 août.
4 août.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookfinal.com