100% found this document useful (1 vote)
6 views

Download Complete Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal PDF for All Chapters

The document provides information on the book 'Pattern Recognition Algorithms for Data Mining' by Sankar K. Pal and Pabitra Mitra, which focuses on scalability, knowledge discovery, and soft granular computing. It includes details about the book's content, structure, and various chapters covering topics such as data mining, feature selection, and active learning. Additionally, it offers links to download the book and other related ebooks.

Uploaded by

potianavice
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
6 views

Download Complete Pattern recognition algorithms for data mining scalability knowledge discovery and soft granular computing 1st Edition Sankar K. Pal PDF for All Chapters

The document provides information on the book 'Pattern Recognition Algorithms for Data Mining' by Sankar K. Pal and Pabitra Mitra, which focuses on scalability, knowledge discovery, and soft granular computing. It includes details about the book's content, structure, and various chapters covering topics such as data mining, feature selection, and active learning. Additionally, it offers links to download the book and other related ebooks.

Uploaded by

potianavice
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Visit https://ebookfinal.

com to download the full version and


explore more ebooks

Pattern recognition algorithms for data mining


scalability knowledge discovery and soft
granular computing 1st Edition Sankar K. Pal

_____ Click the link below to download _____


https://ebookfinal.com/download/pattern-recognition-
algorithms-for-data-mining-scalability-knowledge-
discovery-and-soft-granular-computing-1st-edition-
sankar-k-pal/

Explore and download more ebooks at ebookfinal.com


Here are some suggested products you might be interested in.
Click the link to download

Pattern Recognition Algorithms for Data Mining 1st Edition


Sankar K. Pal

https://ebookfinal.com/download/pattern-recognition-algorithms-for-
data-mining-1st-edition-sankar-k-pal/

Data Mining and Knowledge Discovery Technologies Advances


in Data Warehousing and Mining 1st Edition David Taniar

https://ebookfinal.com/download/data-mining-and-knowledge-discovery-
technologies-advances-in-data-warehousing-and-mining-1st-edition-
david-taniar/

Knowledge Discovery and Data Mining Challenges and


Realities Xingquan Zhu

https://ebookfinal.com/download/knowledge-discovery-and-data-mining-
challenges-and-realities-xingquan-zhu/

Cloud Computing Solutions 1st Edition Souvik Pal

https://ebookfinal.com/download/cloud-computing-solutions-1st-edition-
souvik-pal/
Pattern Recognition 1st Edition William Gibson

https://ebookfinal.com/download/pattern-recognition-1st-edition-
william-gibson/

Pattern Recognition and Trading Decisions Chris Satchwell

https://ebookfinal.com/download/pattern-recognition-and-trading-
decisions-chris-satchwell/

Pattern Recognition 4ed Edition Sergios Theodoridis

https://ebookfinal.com/download/pattern-recognition-4ed-edition-
sergios-theodoridis/

Data Mining for Bioinformatics 1st Edition Sumeet Dua

https://ebookfinal.com/download/data-mining-for-bioinformatics-1st-
edition-sumeet-dua/

Algorithms and Data Structures The Science of Computing


Electrical and Computer Engineering Series 1st Edition
Douglas Baldwin
https://ebookfinal.com/download/algorithms-and-data-structures-the-
science-of-computing-electrical-and-computer-engineering-series-1st-
edition-douglas-baldwin/
Pattern recognition algorithms for data mining scalability
knowledge discovery and soft granular computing 1st
Edition Sankar K. Pal Digital Instant Download
Author(s): Sankar K. Pal, Pabitra Mitra
ISBN(s): 1584884576
Edition: 1
File Details: PDF, 2.51 MB
Year: 2004
Language: english
Pattern Recognition
Algorithms for
Data Mining
Scalability, Knowledge Discovery and Soft
Granular Computing

Sankar K. Pal and Pabitra Mitra


Machine Intelligence Unit
Indian Statistical Institute
Calcutta, India

CHAPMAN & HALL/CRC


A CRC Press Company
Boca Raton London New York Washington, D.C.

Cover art provided by Laura Bright


(http://laurabright.com).
http://www.ciaadvertising.org/SA/sping_03/391K/
lbright/paper/site/report/introduction.html

© 2004 by Taylor & Francis Group, LLC


C4576 disclaimer.fm Page 1 Tuesday, April 6, 2004 10:36 AM

Library of Congress Cataloging-in-Publication Data

Pal, Sankar K.
Pattern recognition algorithms for data mining : scalability, knowledge discovery, and
soft granular computing / Sankar K. Pal and Pabitra Mitra.
p. cm.
Includes bibliographical references and index.
ISBN 1-58488-457-6 (alk. paper)
1. Data mining. 2. Pattern recognition systems. 3. Computer algorithms. 4. Granular
computing / Sankar K. Pal and Pabita Mitra.

QA76.9.D343P38 2004
006.3'12—dc22 2004043539

This book contains information obtained from authentic and highly regarded sources. Reprinted material
is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable
efforts have been made to publish reliable data and information, but the author and the publisher cannot
assume responsibility for the validity of all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, microfilming, and recording, or by any information storage or
retrieval system, without prior permission in writing from the publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for
creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC
for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at www.crcpress.com

© 2004 by CRC Press LLC

No claim to original U.S. Government works


International Standard Book Number 1-58488-457-6
Library of Congress Card Number 2004043539
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper

© 2004 by Taylor & Francis Group, LLC


To our parents

© 2004 by Taylor & Francis Group, LLC


Contents

Foreword xiii

Preface xxi

List of Tables xxv

List of Figures xxvii

1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pattern Recognition in Brief . . . . . . . . . . . . . . . . . . 3
1.2.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Feature selection/extraction . . . . . . . . . . . . . . . 4
1.2.3 Classification . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Knowledge Discovery in Databases (KDD) . . . . . . . . . . 7
1.4 Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.1 Data mining tasks . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Data mining tools . . . . . . . . . . . . . . . . . . . . 12
1.4.3 Applications of data mining . . . . . . . . . . . . . . . 12
1.5 Different Perspectives of Data Mining . . . . . . . . . . . . . 14
1.5.1 Database perspective . . . . . . . . . . . . . . . . . . . 14
1.5.2 Statistical perspective . . . . . . . . . . . . . . . . . . 15
1.5.3 Pattern recognition perspective . . . . . . . . . . . . . 15
1.5.4 Research issues and challenges . . . . . . . . . . . . . 16
1.6 Scaling Pattern Recognition Algorithms to Large Data Sets . 17
1.6.1 Data reduction . . . . . . . . . . . . . . . . . . . . . . 17
1.6.2 Dimensionality reduction . . . . . . . . . . . . . . . . 18
1.6.3 Active learning . . . . . . . . . . . . . . . . . . . . . . 19
1.6.4 Data partitioning . . . . . . . . . . . . . . . . . . . . . 19
1.6.5 Granular computing . . . . . . . . . . . . . . . . . . . 20
1.6.6 Efficient search algorithms . . . . . . . . . . . . . . . . 20
1.7 Significance of Soft Computing in KDD . . . . . . . . . . . . 21
1.8 Scope of the Book . . . . . . . . . . . . . . . . . . . . . . . . 22

vii
© 2004 by Taylor & Francis Group, LLC
viii

2 Multiscale Data Condensation 29


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Data Condensation Algorithms . . . . . . . . . . . . . . . . . 32
2.2.1 Condensed nearest neighbor rule . . . . . . . . . . . . 32
2.2.2 Learning vector quantization . . . . . . . . . . . . . . 33
2.2.3 Astrahan’s density-based method . . . . . . . . . . . . 34
2.3 Multiscale Representation of Data . . . . . . . . . . . . . . . 34
2.4 Nearest Neighbor Density Estimate . . . . . . . . . . . . . . 37
2.5 Multiscale Data Condensation Algorithm . . . . . . . . . . . 38
2.6 Experimental Results and Comparisons . . . . . . . . . . . . 40
2.6.1 Density estimation . . . . . . . . . . . . . . . . . . . . 41
2.6.2 Test of statistical significance . . . . . . . . . . . . . . 41
2.6.3 Classification: Forest cover data . . . . . . . . . . . . 47
2.6.4 Clustering: Satellite image data . . . . . . . . . . . . . 48
2.6.5 Rule generation: Census data . . . . . . . . . . . . . . 49
2.6.6 Study on scalability . . . . . . . . . . . . . . . . . . . 52
2.6.7 Choice of scale parameter . . . . . . . . . . . . . . . . 52
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 Unsupervised Feature Selection 59


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.3.1 Filter approach . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Wrapper approach . . . . . . . . . . . . . . . . . . . . 64
3.4 Feature Selection Using Feature Similarity (FSFS) . . . . . . 64
3.4.1 Feature similarity measures . . . . . . . . . . . . . . . 65
3.4.2 Feature selection through clustering . . . . . . . . . . 68
3.5 Feature Evaluation Indices . . . . . . . . . . . . . . . . . . . 71
3.5.1 Supervised indices . . . . . . . . . . . . . . . . . . . . 71
3.5.2 Unsupervised indices . . . . . . . . . . . . . . . . . . . 72
3.5.3 Representation entropy . . . . . . . . . . . . . . . . . 73
3.6 Experimental Results and Comparisons . . . . . . . . . . . . 74
3.6.1 Comparison: Classification and clustering performance 74
3.6.2 Redundancy reduction: Quantitative study . . . . . . 79
3.6.3 Effect of cluster size . . . . . . . . . . . . . . . . . . . 80
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Active Learning Using Support Vector Machine 83


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . 86
4.3 Incremental Support Vector Learning with Multiple Points . 88
4.4 Statistical Query Model of Learning . . . . . . . . . . . . . . 89
4.4.1 Query strategy . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Confidence factor of support vector set . . . . . . . . . 90

© 2004 by Taylor & Francis Group, LLC


ix

4.5 Learning Support Vectors with Statistical Queries . . . . . . 91


4.6 Experimental Results and Comparison . . . . . . . . . . . . 94
4.6.1 Classification accuracy and training time . . . . . . . 94
4.6.2 Effectiveness of the confidence factor . . . . . . . . . . 97
4.6.3 Margin distribution . . . . . . . . . . . . . . . . . . . 97
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Rough-fuzzy Case Generation 103


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Soft Granular Computing . . . . . . . . . . . . . . . . . . . . 105
5.3 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.1 Information systems . . . . . . . . . . . . . . . . . . . 107
5.3.2 Indiscernibility and set approximation . . . . . . . . . 107
5.3.3 Reducts . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.4 Dependency rule generation . . . . . . . . . . . . . . . 110
5.4 Linguistic Representation of Patterns and Fuzzy Granulation 111
5.5 Rough-fuzzy Case Generation Methodology . . . . . . . . . . 114
5.5.1 Thresholding and rule generation . . . . . . . . . . . . 115
5.5.2 Mapping dependency rules to cases . . . . . . . . . . . 117
5.5.3 Case retrieval . . . . . . . . . . . . . . . . . . . . . . . 118
5.6 Experimental Results and Comparison . . . . . . . . . . . . 120
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6 Rough-fuzzy Clustering 123


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Clustering Methodologies . . . . . . . . . . . . . . . . . . . . 124
6.3 Algorithms for Clustering Large Data Sets . . . . . . . . . . 126
6.3.1 CLARANS: Clustering large applications based upon
randomized search . . . . . . . . . . . . . . . . . . . . 126
6.3.2 BIRCH: Balanced iterative reducing and clustering us-
ing hierarchies . . . . . . . . . . . . . . . . . . . . . . 126
6.3.3 DBSCAN: Density-based spatial clustering of applica-
tions with noise . . . . . . . . . . . . . . . . . . . . . . 127
6.3.4 STING: Statistical information grid . . . . . . . . . . 128
6.4 CEMMiSTRI: Clustering using EM, Minimal Spanning Tree
and Rough-fuzzy Initialization . . . . . . . . . . . . . . . . . 129
6.4.1 Mixture model estimation via EM algorithm . . . . . 130
6.4.2 Rough set initialization of mixture parameters . . . . 131
6.4.3 Mapping reducts to mixture parameters . . . . . . . . 132
6.4.4 Graph-theoretic clustering of Gaussian components . . 133
6.5 Experimental Results and Comparison . . . . . . . . . . . . 135
6.6 Multispectral Image Segmentation . . . . . . . . . . . . . . . 139
6.6.1 Discretization of image bands . . . . . . . . . . . . . . 141
6.6.2 Integration of EM, MST and rough sets . . . . . . . . 141
6.6.3 Index for segmentation quality . . . . . . . . . . . . . 141

© 2004 by Taylor & Francis Group, LLC


x

6.6.4 Experimental results and comparison . . . . . . . . . . 141


6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7 Rough Self-Organizing Map 149


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2 Self-Organizing Maps (SOM) . . . . . . . . . . . . . . . . . . 150
7.2.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2.2 Effect of neighborhood . . . . . . . . . . . . . . . . . . 152
7.3 Incorporation of Rough Sets in SOM (RSOM) . . . . . . . . 152
7.3.1 Unsupervised rough set rule generation . . . . . . . . 153
7.3.2 Mapping rough set rules to network weights . . . . . . 153
7.4 Rule Generation and Evaluation . . . . . . . . . . . . . . . . 154
7.4.1 Extraction methodology . . . . . . . . . . . . . . . . . 154
7.4.2 Evaluation indices . . . . . . . . . . . . . . . . . . . . 155
7.5 Experimental Results and Comparison . . . . . . . . . . . . 156
7.5.1 Clustering and quantization error . . . . . . . . . . . . 157
7.5.2 Performance of rules . . . . . . . . . . . . . . . . . . . 162
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8 Classification, Rule Generation and Evaluation using Modu-


lar Rough-fuzzy MLP 165
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.2 Ensemble Classifiers . . . . . . . . . . . . . . . . . . . . . . . 167
8.3 Association Rules . . . . . . . . . . . . . . . . . . . . . . . . 170
8.3.1 Rule generation algorithms . . . . . . . . . . . . . . . 170
8.3.2 Rule interestingness . . . . . . . . . . . . . . . . . . . 173
8.4 Classification Rules . . . . . . . . . . . . . . . . . . . . . . . 173
8.5 Rough-fuzzy MLP . . . . . . . . . . . . . . . . . . . . . . . . 175
8.5.1 Fuzzy MLP . . . . . . . . . . . . . . . . . . . . . . . . 175
8.5.2 Rough set knowledge encoding . . . . . . . . . . . . . 176
8.6 Modular Evolution of Rough-fuzzy MLP . . . . . . . . . . . 178
8.6.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.6.2 Evolutionary design . . . . . . . . . . . . . . . . . . . 182
8.7 Rule Extraction and Quantitative Evaluation . . . . . . . . . 184
8.7.1 Rule extraction methodology . . . . . . . . . . . . . . 184
8.7.2 Quantitative measures . . . . . . . . . . . . . . . . . . 188
8.8 Experimental Results and Comparison . . . . . . . . . . . . 189
8.8.1 Classification . . . . . . . . . . . . . . . . . . . . . . . 190
8.8.2 Rule extraction . . . . . . . . . . . . . . . . . . . . . . 192
8.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

© 2004 by Taylor & Francis Group, LLC


xi

A Role of Soft-Computing Tools in KDD 201


A.1 Fuzzy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
A.1.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 202
A.1.2 Association rules . . . . . . . . . . . . . . . . . . . . . 203
A.1.3 Functional dependencies . . . . . . . . . . . . . . . . . 204
A.1.4 Data summarization . . . . . . . . . . . . . . . . . . . 204
A.1.5 Web application . . . . . . . . . . . . . . . . . . . . . 205
A.1.6 Image retrieval . . . . . . . . . . . . . . . . . . . . . . 205
A.2 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 206
A.2.1 Rule extraction . . . . . . . . . . . . . . . . . . . . . . 206
A.2.2 Clustering and self organization . . . . . . . . . . . . . 206
A.2.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . 207
A.3 Neuro-fuzzy Computing . . . . . . . . . . . . . . . . . . . . . 207
A.4 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . 208
A.5 Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
A.6 Other Hybridizations . . . . . . . . . . . . . . . . . . . . . . 210

B Data Sets Used in Experiments 211

References 215

Index 237

About the Authors 243

© 2004 by Taylor & Francis Group, LLC


Foreword

Indian Statistical Institute (ISI), the home base of Professors S.K. Pal and P.
Mitra, has long been recognized as the world’s premier center of fundamental
research in probability, statistics and, more recently, pattern recognition and
machine intelligence. The halls of ISI are adorned with the names of P.C. Ma-
halanobis, C.R. Rao, R.C. Bose, D. Basu, J.K. Ghosh, D. Dutta Majumder,
K.R. Parthasarathi and other great intellects of the past century–great intel-
lects who have contributed so much and in so many ways to the advancement
of science and technology. The work of Professors Pal and Mitra, ”Pattern
Recognition Algorithms for Data Mining,” or PRDM for short, reflects this
illustrious legacy. The importance of PRDM is hard to exaggerate. It is a
treatise that is an exemplar of authority, deep insights, encyclopedic coverage
and high expository skill.
The primary objective of PRDM, as stated by the authors, is to provide
a unified framework for addressing pattern recognition tasks which are es-
sential for data mining. In reality, the book accomplishes much more; it
develops a unified framework and presents detailed analyses of a wide spec-
trum of methodologies for dealing with problems in which recognition, in one
form or another, plays an important role. Thus, the concepts and techniques
described in PRDM are of relevance not only to problems in pattern recog-
nition, but, more generally, to classification, analysis of dependencies, system
identification, authentication, and ultimately, to data mining. In this broad
perspective, conventional pattern recognition becomes a specialty–a specialty
with deep roots and a large store of working concepts and techniques.
Traditional pattern recognition is subsumed by what may be called recog-
nition technology. I take some credit for arguing, some time ago, that de-
velopment of recognition technology should be accorded a high priority. My
arguments may be found in the foreword,” Recognition Technology and Fuzzy
Logic, ”Special Issue on Recognition Technology, IEEE Transactions on Fuzzy
Systems, 2001. A visible consequence of my arguments was an addition of
the subtitle ”Soft Computing in Recognition and Search,” to the title of the
journal ”Approximate Reasoning.” What is important to note is that recogni-
tion technology is based on soft computing–a coalition of methodologies which
collectively provide a platform for the conception, design and utilization of in-
telligent systems. The principal constitutes of soft computing are fuzzy logic,
neurocomputing, evolutionary computing, probabilistic computing, rough set
theory and machine learning. These are the methodologies which are de-
scribed and applied in PRDM with a high level of authority and expository

xiii
© 2004 by Taylor & Francis Group, LLC
xiv

skill. Particularly worthy of note is the exposition of methods in which rough


set theory and fuzzy logic are used in combination.
Much of the material in PRDM is new and reflects the authors’ extensive
experience in dealing with a wide variety of problems in which recognition and
analysis of dependencies play essential roles. Such is the case in data mining
and, in particular, in the analysis of both causal and non-causal dependencies.
A pivotal issue–which subsumes feature selection and feature extraction–
and which receives a great deal of attention in PRDM, is that of feature
analysis. Feature analysis has a position of centrality in recognition, and
its discussion in PRDM is an order of magnitude more advanced and more
insightful than what can be found in the existing literature. And yet, it
cannot be claimed that the basic problem of feature selection–especially in
the context of data mining–has been solved or is even close to solution. Why?
The reason, in my view, is the following. To define what is meant by a feature
it is necessary to define what is meant by relevance. Conventionally, relevance
is defined as a bivalent concept, that is, if q is a query and p is a proposition or
a collection of propositions, then either p is relevant to q or p is not relevant
to q, with no shades of gray allowed. But it is quite obvious that relevance is a
matter of degree, which is consistent with the fact that in a natural language
we allow expressions such as quite relevant, not very relevant, highly relevant,
etc. In the existing literature, there is no definition of relevance which makes
it possible to answer the question: To what degree is p relevant to q? For
example, if q is: How old is Carol? and p is: Carol has a middle-aged mother,
then to what degree is the knowledge that Carol has a middle-aged mother,
relevant to the query: How old is Carol? As stated earlier, the problem is that
relevance is not a bivalent concept, as it is frequently assumed to be; rather,
relevance is a fuzzy concept which does not lend itself to definition within the
conceptual structure of bivalent logic. However, what can be found in PRDM
is a very thorough discussion of a related issue, namely, methods of assessment
of relative importance of features in the context of pattern recognition and
data mining.
A difficult problem which arises both in assessment of the degree of relevance
of a proposition, p, and in assessment of the degree of importance of a feature,
f, relates to combination of such degrees. More concretely, if we have two
propositions p−1 and p2 with respective degrees of relevance r1 and r2 , then
all that can be said about the relevance of (p1 , p2 ) is that it is bounded
from below by max(r1 , r2 ). This makes it possible for both p1 and p2 to be
irrelevant (r1 = r2 = 0), and yet the degree of relevance of (p1 , p2 ) may be
close to 1.
The point I am trying to make is that there are many basic issues in pattern
recognition–and especially in relation to its role in data mining–whose reso-
lution lies beyond the reach of methods based on bivalent logic and bivalent–
logic-based probability theory. The issue of relevance is a case in point. An-
other basic issue is that of causality. But what is widely unrecognized is that
even such familiar concepts as cluster and edge are undefinable within the

© 2004 by Taylor & Francis Group, LLC


xv

conceptual structure of bivalent logic. This assertion is not contradicted by


the fact that there is an enormous literature on cluster analysis and edge de-
tection. What cannot be found in this literature are formalized definitions of
cluster and edge.
How can relevance, causality, cluster, edge and many other familiar concepts
be defined? In my view, what is needed for this purpose is the methodology
of computing with words. In this methodology, the objects of computation
are words and propositions drawn from a natural language. I cannot be more
detailed in a foreword.
Although PRDM does not venture into computing with words directly, it
does lay the groundwork for it, especially through extensive exposition of
granular computing and related methods of computation. It does so through
an exceptionally insightful discussion of advanced methods drawn from fuzzy
logic, neurocomputing, probabilistic computing, rough set theory and machine
learning.
In summary, “Pattern Recognition Algorithms in Data Mining” is a book
that commands admiration. Its authors, Professors S.K. Pal and P. Mitra are
foremost authorities in pattern recognition, data mining and related fields.
Within its covers, the reader finds an exceptionally well-organized exposition
of every concept and every method that is of relevance to the theme of the
book. There is much that is original and much that cannot be found in the
literature. The authors and the publisher deserve our thanks and congrat-
ulations for producing a definitive work that contributes so much and in so
many important ways to the advancement of both the theory and practice of
recognition technology, data mining and related fields. The magnum opus of
Professors Pal and Mitra is a must reading for anyone who is interested in the
conception, design and utilization of intelligent systems.

March 2004 Lotfi A. Zadeh


University of California
Berkeley, CA, USA

© 2004 by Taylor & Francis Group, LLC


Foreword

Data mining offers techniques of discovering patterns in voluminous databases.


In other words, data mining is a technique of discovering knowledge from
large data sets (KDD). Knowledge is usually presented in the form of decision
rules easy to understand and used by humans. Therefore, methods for rule
generation and evaluation are of utmost importance in this context.
Many approaches to accomplish this have been developed and explored in
recent years. The prominent scientist Prof. Sankar K. Pal and his student
Dr. Pabitra Mitra present in this valuable volume, in addition to classi-
cal methods, recently emerged various new methodologies for data mining,
such as rough sets, rough fuzzy hybridization, granular computing, artificial
neural networks, genetic algorithms, and others. In addition to theoretical
foundations, the book also includes experimental results. Many real life and
nontrivial examples given in the book show how the new techniques work and
can be used in reality and what advantages they offer compared with classical
methods (e.g., statistics).
This book covers a wide spectrum of problems related to data mining, data
analysis, and knowledge discovery in large databases. It should be recom-
mended reading for any researcher or practitioner working in these areas.
Also graduate students in AI get a very well-organized book presenting mod-
ern concepts and tools used in this domain.
In the appendix various basic computing tools and data sets used in exper-
iments are supplied. A complete bibliography on the subject is also included.
The book presents an unbeatable combination of theory and practice and
gives a comprehensive view on methods and tools in modern KDD.
The authors deserve the highest appreciation for this excellent monograph.

January 2004 Zdzislaw Pawlak


Polish Academy of Sciences
Warsaw, Poland

xvii
© 2004 by Taylor & Francis Group, LLC
Foreword

This is the latest in a series of volumes by Professor Sankar Pal and his col-
laborators on pattern recognition methodologies and applications. Knowledge
discovery and data mining, the recognition of patterns that may be present in
very large data sets and across distributed heterogeneous databases, is an ap-
plication of current prominence. This volume provides a very useful, thorough
exposition of the many facets of this application from several perspectives.
The chapters provide overviews of pattern recognition, data mining, outline
some of the research issues and carefully take the reader through the many
steps that are involved in reaching the desired goal of exposing the patterns
that may be embedded in voluminous data sets. These steps include prepro-
cessing operations for reducing the volume of the data and the dimensionality
of the feature space, clustering, segmentation, and classification. Search al-
gorithms and statistical and database operations are examined. Attention is
devoted to soft computing algorithms derived from the theories of rough sets,
fuzzy sets, genetic algorithms, multilayer perceptrons (MLP), and various hy-
brid combinations of these methodologies.
A valuable expository appendix describes various soft computing method-
ologies and their role in knowledge discovery and data mining (KDD). A sec-
ond appendix provides the reader with several data sets for experimentation
with the procedures described in this volume.
As has been the case with previous volumes by Professor Pal and his col-
laborators, this volume will be very useful to both researchers and students
interested in the latest advances in pattern recognition and its applications in
KDD.
I congratulate the authors of this volume and I am pleased to recommend
it as a valuable addition to the books in this field.

February 2004 Laveen N. Kanal


University of Maryland
College Park, MD, USA

xix
© 2004 by Taylor & Francis Group, LLC
Preface

In recent years, government agencies and scientific, business and commercial


organizations are routinely using computers not just for computational pur-
poses but also for storage, in massive databases, of the immense volumes of
data that they routinely generate or require from other sources. We are in the
midst of an information explosion, and there is an urgent need for method-
ologies that will help us bring some semblance of order into the phenomenal
volumes of data. Traditional statistical data summarization and database
management techniques are just not adequate for handling data on this scale,
and for extracting intelligently information or knowledge that may be useful
for exploring the domain in question or the phenomena responsible for the data
and providing support to decision-making processes. This quest had thrown
up some new phrases, for example, data mining and knowledge discovery in
databases (KDD).
Data mining deals with the process of identifying valid, novel, potentially
useful, and ultimately understandable patterns in data. It may be viewed as
applying pattern recognition (PR) and machine learning principles in the con-
text of voluminous, possibly heterogeneous data sets. Two major challenges
in applying PR algorithms to data mining problems are those of “scalability”
to large/huge data sets and of “discovering knowledge” which is valid and
comprehensible to humans. Research is going on in these lines for developing
efficient PR methodologies and algorithms, in different classical and modern
computing frameworks, as applicable to various data mining tasks with real
life applications.
The present book is aimed at providing a treatise in a unified framework,
with both theoretical and experimental results, addressing certain pattern
recognition tasks essential for data mining. Tasks considered include data
condensation, feature selection, case generation, clustering/classification, rule
generation and rule evaluation. Various theories, methodologies and algo-
rithms using both a classical approach and hybrid paradigm (e.g., integrating
fuzzy logic, artificial neural networks, rough sets, genetic algorithms) have
been presented. The emphasis is given on (a) handling data sets that are
large (both in size and dimension) and involve classes that are overlapping,
intractable and/or have nonlinear boundaries, and (b) demonstrating the sig-
nificance of granular computing in soft computing frameworks for generating
linguistic rules and dealing with the knowledge discovery aspect, besides re-
ducing the computation time.
It is shown how several novel strategies based on multi-scale data con-

xxi
© 2004 by Taylor & Francis Group, LLC
xxii

densation, dimensionality reduction, active support vector learning, granular


computing and efficient search heuristics can be employed for dealing with
the issue of scaling up in large scale learning problem. The tasks of encoding,
extraction and evaluation of knowledge in the form of human comprehensible
linguistic rules are addressed in a soft computing framework by different in-
tegrations of its constituting tools. Various real life data sets, mainly large in
dimension and/or size, taken from varied domains, e.g., geographical informa-
tion systems, remote sensing imagery, population census, speech recognition
and cancer management, are considered to demonstrate the superiority of
these methodologies with statistical significance.
Examples are provided, wherever necessary, to make the concepts more
clear. A comprehensive bibliography on the subject is appended. Major
portions of the text presented in the book are from the published work of the
authors. Some references in the related areas might have been inadvertently
omitted because of oversight or ignorance.
This volume, which is unique in its character, will be useful to graduate
students and researchers in computer science, electrical engineering, system
science, and information technology both as a text and a reference book for
some parts of the curriculum. The researchers and practitioners in industry
and research and development laboratories working in fields such as system
design, pattern recognition, data mining, image processing, machine learning
and soft computing will also benefit. For convenience, brief descriptions of
the data sets used in the experiments are provided in the Appendix.
The text is organized in eight chapters. Chapter 1 describes briefly ba-
sic concepts, features and techniques of PR and introduces data mining and
knowledge discovery in light of PR, different research issues and challenges,
the problems of scaling of PR algorithms to large data sets, and the signifi-
cance of soft computing in knowledge discovery.
Chapters 2 and 3 deal with the (pre-processing) tasks of multi-scale data
condensation and unsupervised feature selection or dimensionality reduction.
After providing a review in the respective fields, a methodology based on a
statistical approach is described in detail in each chapter along with experi-
mental results. The method of k-NN density estimation and the concept of
representation entropy, used therein, are explained in their respective chap-
ters. The data condensation strategy preserves the salient characteristics of
the original data at different scales by representing the underlying probability
density. The unsupervised feature selection algorithm is based on computing
the similarity between features and then removing the redundancy therein
without requiring any search. These methods are scalable.
Chapter 4 concerns the problem of learning with support vector machine
(SVM). After describing the design procedure of SVM, two active learning
strategies for handling the large quadratic problem in a SVM framework are
presented. In order to reduce the sample complexity, a statistical query model
is employed incorporating a trade-off between the efficiency and robustness in
performance.

© 2004 by Taylor & Francis Group, LLC


xxiii

Chapters 5 to 8 highlight the significance of granular computing for dif-


ferent mining tasks in a soft paradigm. While the rough-fuzzy framework is
used for case generation in Chapter 5, the same is integrated with expectation
maximization algorithm and minimal spanning trees in Chapter 6 for cluster-
ing large data sets. The role of rough sets is to use information granules for
extracting the domain knowledge which is encoded in different ways. Since
computation is made using the granules (clump of objects), not the individual
points, the methods are fast. The cluster quality, envisaged on a multi-spectral
image segmentation problem, is also improved owing to the said integration.
In Chapter 7, design procedure of a rough self-organizing map (RSOM) is
described for clustering and unsupervised linguistic rule generation with a
structured network.
The problems of classification, and rule generation and evaluation in a su-
pervised mode are addressed in Chapter 8 with a modular approach through a
synergistic integration of four soft computing tools, namely, fuzzy sets, rough
sets, neural nets and genetic algorithms. A modular evolutionary rough-fuzzy
multi-layered perceptron is described which results in accelerated training,
compact network, unambiguous linguistic rules and improved accuracy. Dif-
ferent rule evaluation indices are used to reflect the knowledge discovery as-
pect.
Finally, we take this opportunity to thank Mr. Robert B. Stern of Chapman
& Hall/CRC Press, Florida, for his initiative and encouragement. Financial
support to Dr. Pabitra Mitra from the Council of Scientific and Industrial
Research (CSIR), New Delhi in the form of Research Associateship (through
Grant # 22/346/02-EMR II) is also gratefully acknowledged.

Sankar K. Pal
September 13, 2003 Pabitra Mitra

© 2004 by Taylor & Francis Group, LLC


List of Tables

2.1 Comparison of k-NN density estimation error of condensation


algorithms (lower CR) . . . . . . . . . . . . . . . . . . . . . . 43
2.2 Comparison of k-NN density estimation error of condensation
algorithms (higher CR) . . . . . . . . . . . . . . . . . . . . . 44
2.3 Comparison of kernel (Gaussian) density estimation error of
condensation algorithms (lower CR, same condensed set as Ta-
ble 2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.4 Comparison of kernel (Gaussian) density estimation error of
condensation algorithms (higher CR, same condensed set as
Table 2.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5 Classification performance for Forest cover type data . . . . . 58
2.6 β value and CPU time of different clustering methods . . . . 58
2.7 Rule generation performance for the Census data . . . . . . . 58

3.1 Comparison of feature selection algorithms for large dimen-


sional data sets . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 Comparison of feature selection algorithms for medium dimen-
sional data sets . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.3 Comparison of feature selection algorithms for low dimensional
data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Comparison of feature selection algorithms for large data sets
when search algorithms use FFEI as the selection criterion . . 79
s
3.5 Representation entropy HR of subsets selected using some al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6 Redundancy reduction using different feature similarity mea-
sures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1 Comparison of performance of SVM design algorithms . . . . 96

5.1 Hiring: An example of a decision table . . . . . . . . . . . . . 107


5.2 Two decision tables obtained by splitting the Hiring table S
(Table 5.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Discernibility matrix MAccept for the split Hiring decision table
SAccept (Table 5.2(a)) . . . . . . . . . . . . . . . . . . . . . . 112
5.4 Rough dependency rules for the Iris data . . . . . . . . . . . 121
5.5 Cases generated for the Iris data . . . . . . . . . . . . . . . . 121
5.6 Comparison of case selection algorithms for Iris data . . . . . 121

xxv
© 2004 by Taylor & Francis Group, LLC
xxvi

5.7 Comparison of case selection algorithms for Forest cover type


data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.8 Comparison of case selection algorithms for Multiple features
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.1 Comparative performance of clustering algorithms . . . . . . 139


6.2 Comparative performance of different clustering methods for
the Calcutta image . . . . . . . . . . . . . . . . . . . . . . . . 144
6.3 Comparative performance of different clustering methods for
the Bombay image . . . . . . . . . . . . . . . . . . . . . . . . 146

7.1 Comparison of RSOM with randomly and linearly initialized


SOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.2 Comparison of rules extracted from RSOM and FSOM . . . . 162

8.1 Rough set dependency rules for Vowel data along with the input
fuzzification parameter values . . . . . . . . . . . . . . . . . . 191
8.2 Comparative performance of different models . . . . . . . . . 193
8.3 Comparison of the performance of the rules extracted by vari-
ous methods for Vowel, Pat and Hepatobiliary data . . . . . . 195
8.4 Rules extracted from trained networks (Model S) for Vowel
data along with the input fuzzification parameter values . . . 196
8.5 Rules extracted from trained networks (Model S) for Pat data
along with the input fuzzification parameter values . . . . . . 196
8.6 Rules extracted from trained networks (Model S) for Hepato-
biliary data along with the input fuzzification parameter values 197
8.7 Crude rules obtained via rough set theory for staging of cervical
cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.8 Rules extracted from the modular rough MLP for staging of
cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . 199

© 2004 by Taylor & Francis Group, LLC


List of Figures

1.1 The KDD process [189]. . . . . . . . . . . . . . . . . . . . . . 9


1.2 Application areas of data mining. . . . . . . . . . . . . . . . . 13

2.1 Multiresolution data reduction. . . . . . . . . . . . . . . . . . 31


2.2 Representation of data set at different levels of detail by the
condensed sets. ‘.’ is a point belonging to the condensed set;
the circles about the points denote the discs covered that point.
The two bold circles denote the boundaries of the data set. . 36
2.3 Plot of the condensed points (of the Norm data) for the mul-
tiscale algorithm and Astrahan’s method, for different sizes of
the condensed set. Bold dots represent a selected point and the
discs represent the area of F1 − F2 plane covered by a selected
point at their center. . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 IRS images of Calcutta: (a) original Band 4 image, and seg-
mented images using (b) k-means algorithm, (c) Astrahan’s
method, (d) multiscale algorithm. . . . . . . . . . . . . . . . . 50
2.5 Variation in error in density estimate (log-likelihood measure)
with the size of the Condensed Set (expressed as percentage of
the original set) with the corresponding, for (a) the Norm data,
(b) Vowel data, (c) Wisconsin Cancer data. . . . . . . . . . . 53
2.6 Variation of condensation ratio CR (%) with k. . . . . . . . . 54

3.1 Nature of errors in linear regression, (a) Least square fit (e),


(b) Least square projection fit (λ2 ). . . . . . . . . . . . . . . . 68
3.2 Feature clusters. . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3 Variation in classification accuracy with size of the reduced
subset for (a) Multiple features, (b) Ionosphere, and (c) Cancer
data sets. The vertical dotted line marks the point for which
results are reported in Tables 3.1−3.3. . . . . . . . . . . . . . 78
3.4 Variation in size of the reduced subset with parameter k for (a)
multiple features, (b) ionosphere, and (c) cancer data. . . . . 81

4.1 SVM as maximum margin classifier (linearly separable case). 87


4.2 Incremental support vector learning with multiple points (Al-
gorithm 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Active support vector learning with statistical queries (Algo-
rithm 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

xxvii
© 2004 by Taylor & Francis Group, LLC
xxviii

4.4 Variation of atest with CPU time for (a) cancer, (b) ionosphere,
(c) heart, (d) twonorm, and (e) forest cover type data. . . . . 98
4.5 Variation of confidence factor c and distance D for (a) cancer,
(b) ionosphere, (c) heart, and (d) twonorm data. . . . . . . . 99
4.6 Variation of confidence factor c with iterations of StatQSVM
algorithm for (a) cancer, (b) ionosphere, (c) heart, and (d)
twonorm data. . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.7 Margin distribution obtained at each iteration by the StatQSVM
algorithm for the Twonorm data. The bold line denotes the fi-
nal distribution obtained. . . . . . . . . . . . . . . . . . . . . 101
4.8 Margin distribution obtained by some SVM design algorithms
for the Twonorm data set. . . . . . . . . . . . . . . . . . . . . 102

5.1 Rough representation of a set with upper and lower approxi-


mations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 π−Membership functions for linguistic fuzzy sets low (L), medium
(M) and high (H) for each feature axis. . . . . . . . . . . . . . 114
5.3 Generation of crisp granules from linguistic (fuzzy) represen-
tation of the features F1 and F2 . Dark region (M1 , M2 ) indi-
cates a crisp granule obtained by 0.5-cuts on the µ1medium and
µ2medium functions. . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Schematic diagram of rough-fuzzy case generation. . . . . . . 116
5.5 Rough-fuzzy case generation for a two-dimensional data. . . . 119

6.1 Rough-fuzzy generation of crude clusters for two-dimensional


data (a) data distribution and rough set rules, (b) probability
density function for the initial mixture model. . . . . . . . . . 133
6.2 Using minimal spanning tree to form clusters. . . . . . . . . . 134
6.3 Scatter plot of the artificial data Pat. . . . . . . . . . . . . . . 137
6.4 Scatter plot of points belonging to four different component
Gaussians for the Pat data. Each Gaussian is represented by a
separate symbol (+, o, , and ). . . . . . . . . . . . . . . . . 138
6.5 Variation of log-likelihood with EM iterations for the Pat data. 138
6.6 Final clusters obtained using (a) hybrid algorithm (b)
k-means algorithm for the Pat data (clusters are marked by
‘+’ and ‘o’). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.7 Block diagram of the image segmentation algorithm. . . . . . 142
6.8 Segmented IRS image of Calcutta using (a) CEMMiSTRI, (b)
EM with MST (EMMST), (c) fuzzy k-means algorithm (FKM),
(d) rough set initialized EM (REM), (e) EM with k-means ini-
tialization (KMEM), (f) rough set initialized k-means (RKM),
(g) EM with random initialization (EM), (h) k-means with ran-
dom initialization (KM). . . . . . . . . . . . . . . . . . . . . . 145
6.9 Segmented IRS image of Bombay using (a) CEMMiSTRI, (b)
k-means with random initialization (KM). . . . . . . . . . . . 146

© 2004 by Taylor & Francis Group, LLC


xxix

6.10 Zoomed images of a bridge on the river Ganges in Calcutta


for (a) CEMMiSTRI, (b) k-means with random initialization
(KM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.11 Zoomed images of two parallel airstrips of Calcutta airport
for (a) CEMMiSTRI, (b) k-means with random initialization
(KM). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.1 The basic network structure for the Kohonen feature map. . . 151
7.2 Neighborhood Nc , centered on unit c (xc , yc ). Three different
neighborhoods are shown at distance d = 1, 2, and 3. . . . . . 153
7.3 Mapping of reducts in the competitive layer of RSOM. . . . . 154
7.4 Variation of quantization error with iteration for Pat data. . . 160
7.5 Variation of quantization error with iteration for vowel data. 160
7.6 Plot showing the frequency of winning nodes using random
weights for the Pat data. . . . . . . . . . . . . . . . . . . . . . 161
7.7 Plot showing the frequency of winning nodes using rough set
knowledge for the Pat data. . . . . . . . . . . . . . . . . . . . 161

8.1 Illustration of adaptive thresholding of membership functions. 177


8.2 Intra- and inter-module links. . . . . . . . . . . . . . . . . . . 179
8.3 Steps for designing a sample modular rough-fuzzy MLP. . . . 181
8.4 Chromosome representation. . . . . . . . . . . . . . . . . . . . 182
8.5 Variation of mutation probability with iteration. . . . . . . . 183
8.6 Variation of mutation probability along the encoded string (chro-
mosome). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
8.7 (a) Input π-functions and (b) data distribution along F1 axis
for the Vowel data. Solid lines represent the initial functions
and dashed lines represent the functions obtained finally after
tuning with GAs. The horizontal dotted lines represent the
threshold level. . . . . . . . . . . . . . . . . . . . . . . . . . . 191
8.8 Histogram plot of the distribution of weight values with (a)
Model S and (b) Model F for Vowel data. . . . . . . . . . . . 194
8.9 Positive connectivity of the network obtained for the Vowel
data, using Model S. (Bold lines indicate weights greater than
P T hres2 , while others indicate values between P T hres1 and
P T hres2 .) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

© 2004 by Taylor & Francis Group, LLC


Chapter 1
Introduction

1.1 Introduction
Pattern recognition (PR) is an activity that we humans normally excel
in. We do it almost all the time, and without conscious effort. We receive
information via our various sensory organs, which is processed instantaneously
by our brain so that, almost immediately, we are able to identify the source
of the information, without having made any perceptible effort. What is
even more impressive is the accuracy with which we can perform recognition
tasks even under non-ideal conditions, for instance, when the information that
needs to be processed is vague, imprecise or even incomplete. In fact, most
of our day-to-day activities are based on our success in performing various
pattern recognition tasks. For example, when we read a book, we recognize the
letters, words and, ultimately, concepts and notions, from the visual signals
received by our brain, which processes them speedily and probably does a
neurobiological implementation of template-matching! [189]
The discipline of pattern recognition (or pattern recognition by machine)
essentially deals with the problem of developing algorithms and methodolo-
gies/devices that can enable the computer-implementation of many of the
recognition tasks that humans normally perform. The motivation is to per-
form these tasks more accurately, or faster, and perhaps more economically
than humans and, in many cases, to release them from drudgery resulting from
performing routine recognition tasks repetitively and mechanically. The scope
of PR also encompasses tasks humans are not good at, such as reading bar
codes. The goal of pattern recognition research is to devise ways and means of
automating certain decision-making processes that lead to classification and
recognition.
Machine recognition of patterns can be viewed as a two-fold task, consisting
of learning the invariant and common properties of a set of samples charac-
terizing a class, and of deciding that a new sample is a possible member of
the class by noting that it has properties common to those of the set of sam-
ples. The task of pattern recognition by a computer can be described as a
transformation from the measurement space M to the feature space F and
finally to the decision space D; i.e.,
M → F → D.

1
© 2004 by Taylor & Francis Group, LLC
2 Pattern Recognition Algorithms for Data Mining

Here the mapping δ : F → D is the decision function, and the elements


d ∈ D are termed as decisions.
PR has been a thriving field of research for the past few decades, as is amply
borne out by the numerous books [55, 59, 72, 200, 204, 206] devoted to it.
In this regard, mention must be made of the seminal article by Kanal [104],
which gives a comprehensive review of the advances made in the field until
the early 1970s. More recently, a review article by Jain et al. [101] provides
an engrossing survey of the advances made in statistical pattern recognition
till the end of the twentieth century. Though the subject has attained a
very mature level during the past four decades or so, it remains green to the
researchers due to continuous cross-fertilization of ideas from disciplines such
as computer science, physics, neurobiology, psychology, engineering, statistics,
mathematics and cognitive science. Depending on the practical need and
demand, various modern methodologies have come into being, which often
supplement the classical techniques [189].
In recent years, the rapid advances made in computer technology have en-
sured that large sections of the world population have been able to gain easy
access to computers on account of falling costs worldwide, and their use is now
commonplace in all walks of life. Government agencies and scientific, busi-
ness and commercial organizations are routinely using computers, not just
for computational purposes but also for storage, in massive databases, of the
immense volumes of data that they routinely generate or require from other
sources. Large-scale computer networking has ensured that such data has
become accessible to more and more people. In other words, we are in the
midst of an information explosion, and there is urgent need for methodologies
that will help us bring some semblance of order into the phenomenal volumes
of data that can readily be accessed by us with a few clicks of the keys of our
computer keyboard. Traditional statistical data summarization and database
management techniques are just not adequate for handling data on this scale
and for intelligently extracting information, or rather, knowledge that may
be useful for exploring the domain in question or the phenomena responsible
for the data, and providing support to decision-making processes. This quest
has thrown up some new phrases, for example, data mining and knowledge
discovery in databases (KDD) [43, 65, 66, 88, 89, 92].
The massive databases that we are talking about are generally character-
ized by the presence of not just numeric, but also textual, symbolic, pictorial
and aural data. They may contain redundancy, errors, imprecision, and so on.
KDD is aimed at discovering natural structures within such massive and often
heterogeneous data. Therefore PR plays a significant role in KDD process.
However, KDD is visualized as being capable not only of knowledge discovery
using generalizations and magnifications of existing and new pattern recogni-
tion algorithms, but also of the adaptation of these algorithms to enable them
to process such data, the storage and accessing of the data, its preprocessing
and cleaning, interpretation, visualization and application of the results, and
the modeling and support of the overall human-machine interaction.

© 2004 by Taylor & Francis Group, LLC


Introduction 3

Data mining is that part of knowledge discovery which deals with the pro-
cess of identifying valid, novel, potentially useful, and ultimately understand-
able patterns in data, and excludes the knowledge interpretation part of KDD.
Therefore, as it stands now, data mining can be viewed as applying PR and
machine learning principles in the context of voluminous, possibly heteroge-
neous data sets [189].
The objective of this book is to provide some results of investigations,
both theoretical and experimental, addressing certain pattern recognition
tasks essential for data mining. Tasks considered include data condensation,
feature selection, case generation, clustering, classification and rule genera-
tion/evaluation. Various methodologies based on both classical and soft com-
puting approaches (integrating fuzzy logic, artificial neural networks, rough
sets, genetic algorithms) have been presented. The emphasis of these method-
ologies is given on (a) handling data sets which are large (both in size and
dimension) and involve classes that are overlapping, intractable and/or having
nonlinear boundaries, and (b) demonstrating the significance of granular com-
puting in soft computing paradigm for generating linguistic rules and dealing
with the knowledge discovery aspect. Before we describe the scope of the
book, we provide a brief review of pattern recognition, knowledge discovery
in data bases, data mining, challenges in application of pattern recognition
algorithms to data mining problems, and some of the possible solutions.
Section 1.2 presents a description of the basic concept, features and tech-
niques of pattern recognition briefly. Next, we define the KDD process and
describe its various components. In Section 1.4 we elaborate upon the data
mining aspects of KDD, discussing its components, tasks involved, approaches
and application areas. The pattern recognition perspective of data mining is
introduced next and related research challenges are mentioned. The problem
of scaling pattern recognition algorithms to large data sets is discussed in Sec-
tion 1.6. Some broad approaches to achieving scalability are listed. The role
of soft computing in knowledge discovery is described in Section 1.7. Finally,
Section 1.8 discusses the plan of the book.

1.2 Pattern Recognition in Brief


A typical pattern recognition system consists of three phases, namely, data
acquisition, feature selection/extraction and classification/clustering. In the
data acquisition phase, depending on the environment within which the ob-
jects are to be classified/clustered, data are gathered using a set of sensors.
These are then passed on to the feature selection/extraction phase, where
the dimensionality of the data is reduced by retaining/measuring only some
characteristic features or properties. In a broader perspective, this stage

© 2004 by Taylor & Francis Group, LLC


4 Pattern Recognition Algorithms for Data Mining

significantly influences the entire recognition process. Finally, in the clas-


sification/clustering phase, the selected/extracted features are passed on to
the classifying/clustering system that evaluates the incoming information and
makes a final decision. This phase basically establishes a transformation be-
tween the features and the classes/clusters. Different forms of transformation
can be a Bayesian rule of computing a posterior class probabilities, nearest
neighbor rule, linear discriminant functions, perceptron rule, nearest proto-
type rule, etc. [55, 59].

1.2.1 Data acquisition


Pattern recognition techniques are applicable in a wide domain, where the
data may be qualitative, quantitative, or both; they may be numerical, linguis-
tic, pictorial, or any combination thereof. The collection of data constitutes
the data acquisition phase. Generally, the data structures that are used in
pattern recognition systems are of two types: object data vectors and relational
data. Object data, a set of numerical vectors, are represented in the sequel
as Y = {y1 , y2 , . . . , yn }, a set of n feature vectors in the p-dimensional mea-
surement space ΩY . An sth object, s = 1, 2, . . . , n, observed in the process
has vector ys as its numerical representation; ysi is the ith (i = 1, 2, . . . , p)
feature value associated with the sth object. Relational data is a set of n2
numerical relationships, say {rsq }, between pairs of objects. In other words,
rsq represents the extent to which sth and qth objects are related in the sense
of some binary relationship ρ. If the objects that are pairwise related by ρ
are called O = {o1 , o2 , . . . , on }, then ρ : O × O → IR.

1.2.2 Feature selection/extraction


Feature selection/extraction is a process of selecting a map of the form
X = f (Y ), by which a sample y (=[y1 , y2 , . . . , yp ]) in a p-dimensional mea-
surement space ΩY is transformed into a point x (=[x1 , x2 , . . . , xp ]) in a p -
dimensional feature space ΩX , where p < p. The main objective of this task
[55] is to retain/generate the optimum salient characteristics necessary for
the recognition process and to reduce the dimensionality of the measurement
space ΩY so that effective and easily computable algorithms can be devised
for efficient classification. The problem of feature selection/extraction has
two aspects – formulation of a suitable criterion to evaluate the goodness
of a feature set and searching the optimal set in terms of the criterion. In
general, those features are considered to have optimal saliencies for which
interclass/intraclass distances are maximized/minimized. The criterion of a
good feature is that it should be unchanging with any other possible variation
within a class, while emphasizing differences that are important in discrimi-
nating between patterns of different types.
The major mathematical measures so far devised for the estimation of fea-
ture quality are mostly statistical in nature, and can be broadly classified into

© 2004 by Taylor & Francis Group, LLC


Introduction 5

two categories – feature selection in the measurement space and feature selec-
tion in a transformed space. The techniques in the first category generally
reduce the dimensionality of the measurement space by discarding redundant
or least information carrying features. On the other hand, those in the sec-
ond category utilize all the information contained in the measurement space
to obtain a new transformed space, thereby mapping a higher dimensional
pattern to a lower dimensional one. This is referred to as feature extraction.

1.2.3 Classification
The problem of classification is basically one of partitioning the feature
space into regions, one region for each category of input. Thus it attempts to
assign every data point in the entire feature space to one of the possible classes
(say, M ) . In real life, the complete description of the classes is not known.
We have instead a finite and usually smaller number of samples which often
provides partial information for optimal design of feature selector/extractor
or classifying/clustering system. Under such circumstances, it is assumed that
these samples are representative of the classes. Such a set of typical patterns
is called a training set. On the basis of the information gathered from the
samples in the training set, the pattern recognition systems are designed; i.e.,
we decide the values of the parameters of various pattern recognition methods.
Design of a classification or clustering scheme can be made with labeled or
unlabeled data. When the computer is given a set of objects with known
classifications (i.e., labels) and is asked to classify an unknown object based
on the information acquired by it during training, we call the design scheme
supervised learning; otherwise we call it unsupervised learning. Supervised
learning is used for classifying different objects, while clustering is performed
through unsupervised learning.
Pattern classification, by its nature, admits many approaches, sometimes
complementary, sometimes competing, to provide solution of a given problem.
These include decision theoretic approach (both deterministic and probabilis-
tic), syntactic approach, connectionist approach, fuzzy and rough set theoretic
approach and hybrid or soft computing approach.
In the decision theoretic approach, once a pattern is transformed, through
feature evaluation, to a vector in the feature space, its characteristics are ex-
pressed only by a set of numerical values. Classification can be done by using
deterministic or probabilistic techniques [55, 59]. In deterministic classifica-
tion approach, it is assumed that there exists only one unambiguous pattern
class corresponding to each of the unknown pattern vectors. Nearest neighbor
classifier (NN rule) [59] is an example of this category.
In most of the practical problems, the features are usually noisy and the
classes in the feature space are overlapping. In order to model such systems,
the features x1 , x2 , . . . , xi , . . . , xp are considered as random variables in the
probabilistic approach. The most commonly used classifier in such probabilis-
tic systems is the Bayes maximum likelihood classifier [59].

© 2004 by Taylor & Francis Group, LLC


6 Pattern Recognition Algorithms for Data Mining

When a pattern is rich in structural information (e.g., picture recognition,


character recognition, scene analysis), i.e., the structural information plays an
important role in describing and recognizing the patterns, it is convenient to
use syntactic approaches [72] which deal with the representation of structures
via sentences, grammars and automata. In the syntactic method [72], the
ability of selecting and classifying the simple pattern primitives and their
relationships represented by the composition operations is the vital criterion
of making a system effective. Since the techniques of composition of primitives
into patterns are usually governed by the formal language theory, the approach
is often referred to as a linguistic approach. An introduction to a variety of
approaches based on this idea can be found in [72].
A good pattern recognition system should possess several characteristics.
These are on-line adaptation (to cope with the changes in the environment),
handling nonlinear class separability (to tackle real life problems), handling
of overlapping classes/clusters (for discriminating almost similar but different
objects), real-time processing (for making a decision in a reasonable time),
generation of soft and hard decisions (to make the system flexible), verification
and validation mechanisms (for evaluating its performance), and minimizing
the number of parameters in the system that have to be tuned (for reducing
the cost and complexity). Moreover, the system should be made artificially
intelligent in order to emulate some aspects of the human processing system.
Connectionist approaches (or artificial neural network based approaches) to
pattern recognition are attempts to achieve these goals and have drawn the
attention of researchers because of their major characteristics such as adap-
tivity, robustness/ruggedness, speed and optimality.
All these approaches to pattern recognition can again be fuzzy set theo-
retic [24, 105, 200, 285] in order to handle uncertainties, arising from vague,
incomplete, linguistic, overlapping patterns, etc., at various stages of pattern
recognition systems. Fuzzy set theoretic classification approach is developed
based on the realization that a pattern may belong to more than one class,
with varying degrees of class membership. Accordingly, fuzzy decision theo-
retic, fuzzy syntactic, fuzzy neural approaches are developed [24, 34, 200, 204].
More recently, the theory of rough sets [209, 214, 215, 261] has emerged as
another major mathematical approach for managing uncertainty that arises
from inexact, noisy, or incomplete information. It is turning out to be method-
ologically significant to the domains of artificial intelligence and cognitive sci-
ences, especially in the representation of and reasoning with vague and/or
imprecise knowledge, data classification, data analysis, machine learning, and
knowledge discovery [227, 261].
Investigations have also been made in the area of pattern recognition using
genetic algorithms [211]. Like neural networks, genetic algorithms (GAs) [80]
are also based on powerful metaphors from the natural world. They mimic
some of the processes observed in natural evolution, which include cross-over,
selection and mutation, leading to a stepwise optimization of organisms.
There have been several attempts over the last decade to evolve new ap-

© 2004 by Taylor & Francis Group, LLC


Introduction 7

proaches to pattern recognition and deriving their hybrids by judiciously com-


bining the merits of several techniques [190, 204]. Recently, a consolidated
effort is being made in this regard to integrate mainly fuzzy logic, artificial
neural networks, genetic algorithms and rough set theory, for developing an
efficient new paradigm called soft computing [287]. Here integration is done in
a cooperative, rather than a competitive, manner. The result is a more intel-
ligent and robust system providing a human-interpretable, low cost, approxi-
mate solution, as compared to traditional techniques. Neuro-fuzzy approach
is perhaps the most visible hybrid paradigm [197, 204, 287] in soft computing
framework. Rough-fuzzy [209, 265] and neuro-rough [264, 207] hybridizations
are also proving to be fruitful frameworks for modeling human perceptions
and providing means for computing with words. Significance of the recently
proposed computational theory of perceptions (CTP) [191, 289] may also be
mentioned in this regard.

1.3 Knowledge Discovery in Databases (KDD)


Knowledge discovery in databases (KDD) is defined as [65]:
The nontrivial process of identifying valid, novel, potentially use-
ful, and ultimately understandable patterns in data.
In this definition, the term pattern goes beyond its traditional sense to in-
clude models or structure in data. Data is a set of facts F (e.g., cases in a
database), and a pattern is an expression E in a language L describing the
facts in a subset FE (or a model applicable to that subset) of F . E is called a
pattern if it is simpler than the enumeration of all facts in FE . A measure of
certainty, measuring the validity of discovered patterns, is a function C map-
ping expressions in L to a partially or totally ordered measure space MC . An
expression E in L about a subset FE ⊂ F can be assigned a certainty measure
c = C(E, F ). Novelty of patterns can be measured by a function N (E, F ) with
respect to changes in data or knowledge. Patterns should potentially lead to
some useful actions, as measured by some utility function u = U (E, F ) map-
ping expressions in L to a partially or totally ordered measure space MU . The
goal of KDD is to make patterns understandable to humans. This is measured
by a function s = S(E, F ) mapping expressions E in L to a partially or totally
ordered measure space MS .
Interestingness of a pattern combines validity, novelty, usefulness, and un-
derstandability and can be expressed as i = I(E, F, C, N, U, S) which maps
expressions in L to a measure space MI . A pattern E ∈ L is called knowledge
if for some user-specified threshold i ∈ MI , I(E, F, C, N, U, S) > i [65]. One
can select some thresholds c ∈ MC , s ∈ MS , and u ∈ Mu and term a pattern
E knowledge

© 2004 by Taylor & Francis Group, LLC


8 Pattern Recognition Algorithms for Data Mining

iff C(E, F ) > c, and S(E, F ) > s, and U (E, F ) > u. (1.1)


The role of interestingness is to threshold the huge number of discovered
patterns and report only those that may be of some use. There are two ap-
proaches to designing a measure of interestingness of a pattern, viz., objective
and subjective. The former uses the structure of the pattern and is generally
used for computing rule interestingness. However, often it fails to capture all
the complexities of the pattern discovery process. The subjective approach,
on the other hand, depends additionally on the user who examines the pat-
tern. Two major reasons why a pattern is interesting from the subjective
(user-oriented) point of view are as follow [257]:

• Unexpectedness: when it is “surprising” to the user.

• Actionability: when the user can act on it to her/his advantage.

Although both these concepts are important, it has often been observed that
actionability and unexpectedness are correlated. In literature, unexpectedness
is often defined in terms of the dissimilarity of a discovered pattern from a
vocabulary provided by the user.
As an example, consider a database of student evaluations of different
courses offered at some university. This can be defined as EVALUATE (TERM,
YEAR, COURSE, SECTION, INSTRUCTOR, INSTRUCT RATING, COURSE RATING). We
describe two patterns that are interesting in terms of actionability and unex-
pectedness respectively. The pattern that “Professor X is consistently getting
the overall INSTRUCT RATING below the overall COURSE RATING” can be of in-
terest to the chairperson because this shows that Professor X has room for
improvement. If, on the other hand, in most of the course evaluations the
overall INSTRUCT RATING is higher than the COURSE RATING and it turns out
that in most of Professor X’s ratings overall the INSTRUCT RATING is lower
than the COURSE RATING, then such a pattern is unexpected and hence inter-
esting. ✸
Data mining is a step in the KDD process that consists of applying data
analysis and discovery algorithms which, under acceptable computational lim-
itations, produce a particular enumeration of patterns (or generate a model)
over the data. It uses historical information to discover regularities and im-
prove future decisions [161].
The overall KDD process is outlined in Figure 1.1. It is interactive and
iterative involving, more or less, the following steps [65, 66]:

1. Data cleaning and preprocessing: includes basic operations, such as noise


removal and handling of missing data. Data from real-world sources are
often erroneous, incomplete, and inconsistent, perhaps due to operation
error or system implementation flaws. Such low quality data needs to
be cleaned prior to data mining.

© 2004 by Taylor & Francis Group, LLC


Introduction 9
Data Mining (DM)

. Data
Cleaning

. Data
Condensation
Machine Mathematical
Knowledge
Interpretation
. Dimensionality Preprocessed Learning Model
of Data Useful
Huge Reduction .Classification . Knowledge Knowledge
Hetero-
geneous
Data
. Clustering (Patterns) Extraction
Raw . Rule . Knowledge
Data
. Data Generation Evaluation
Wrapping

Knowledge Discovery in Database (KDD)

FIGURE 1.1: The KDD process [189].

2. Data condensation and projection: includes finding useful features and


samples to represent the data (depending on the goal of the task) and
using dimensionality reduction or transformation methods.

3. Data integration and wrapping: includes integrating multiple, heteroge-


neous data sources and providing their descriptions (wrappings) for ease
of future use.

4. Choosing the data mining function(s) and algorithm(s): includes de-


ciding the purpose (e.g., classification, regression, summarization, clus-
tering, discovering association rules and functional dependencies, or a
combination of these) of the model to be derived by the data mining
algorithm and selecting methods (e.g., neural networks, decision trees,
statistical models, fuzzy models) to be used for searching patterns in
data.

5. Data mining: includes searching for patterns of interest in a particular


representational form or a set of such representations.

6. Interpretation and visualization: includes interpreting the discovered


patterns, as well as the possible visualization of the extracted patterns.
One can analyze the patterns automatically or semiautomatically to
identify the truly interesting/useful patterns for the user.

7. Using discovered knowledge: includes incorporating this knowledge into


the performance system, taking actions based on knowledge.

Thus, KDD refers to the overall process of turning low-level data into high-
level knowledge. Perhaps the most important step in the KDD process is data
mining. However, the other steps are also important for the successful appli-
cation of KDD in practice. For example, steps 1, 2 and 3, mentioned above,

© 2004 by Taylor & Francis Group, LLC


10 Pattern Recognition Algorithms for Data Mining

have been the subject of widespread research in the area of data warehousing.
We now focus on the data mining component of KDD.

1.4 Data Mining


Data mining involves fitting models to or determining patterns from ob-
served data. The fitted models play the role of inferred knowledge. Deciding
whether the model reflects useful knowledge or not is a part of the overall
KDD process for which subjective human judgment is usually required. Typ-
ically, a data mining algorithm constitutes some combination of the following
three components [65].

• The model: The function of the model (e.g., classification, cluster-


ing) and its representational form (e.g., linear discriminants, neural net-
works). A model contains parameters that are to be determined from
the data.

• The preference criterion: A basis for preference of one model or set


of parameters over another, depending on the given data. The criterion
is usually some form of goodness-of-fit function of the model to the
data, perhaps tempered by a smoothing term to avoid overfitting, or
generating a model with too many degrees of freedom to be constrained
by the given data.

• The search algorithm: The specification of an algorithm for find-


ing particular models and parameters, given the data, model(s), and a
preference criterion.

A particular data mining algorithm is usually an instantiation of the model/


preference/search components.

1.4.1 Data mining tasks


The more common model tasks/functions in current data mining practice
include:

1. Association rule discovery: describes association relationship among dif-


ferent attributes. The origin of association rules is in market basket
analysis. A market basket is a collection of items purchased by a cus-
tomer in an individual customer transaction. One common analysis task
in a transaction database is to find sets of items, or itemsets, that fre-
quently appear together. Each pattern extracted through the analysis
consists of an itemset and its support, i.e., the number of transactions

© 2004 by Taylor & Francis Group, LLC


Introduction 11

that contain it. Businesses can use knowledge of these patterns to im-
prove placement of items in a store or for mail-order marketing. The
huge size of transaction databases and the exponential increase in the
number of potential frequent itemsets with increase in the number of at-
tributes (items) make the above problem a challenging one. The a priori
algorithm [3] provided one early solution which was improved by sub-
sequent algorithms using partitioning, hashing, sampling and dynamic
itemset counting.

2. Clustering: maps a data item into one of several clusters, where clusters
are natural groupings of data items based on similarity metrics or prob-
ability density models. Clustering is used in several exploratory data
analysis tasks, customer retention and management, and web mining.
The clustering problem has been studied in many fields, including statis-
tics, machine learning and pattern recognition. However, large data
considerations were absent in these approaches. Recently, several new
algorithms with greater emphasis on scalability have been developed, in-
cluding those based on summarized cluster representation called cluster
feature (Birch [291], ScaleKM [29]), sampling (CURE [84]) and density
joins (DBSCAN [61]).

3. Classification: classifies a data item into one of several predefined cat-


egorical classes. It is used for the purpose of predictive data mining in
several fields, e.g., in scientific discovery, fraud detection, atmospheric
data mining and financial engineering. Several classification methodolo-
gies have already been discussed earlier in Section 1.2.3. Some typical
algorithms suitable for large databases are based on Bayesian techniques
(AutoClass [40]), and decision trees (Sprint [254], RainForest [75]).
4. Sequence analysis [85]: models sequential patterns, like time-series data
[130]. The goal is to model the process of generating the sequence or
to extract and report deviation and trends over time. The framework
is increasingly gaining importance because of its application in bioinfor-
matics and streaming data analysis.
5. Regression [65]: maps a data item to a real-valued prediction variable.
It is used in different prediction and modeling applications.
6. Summarization [65]: provides a compact description for a subset of data.
A simple example would be mean and standard deviation for all fields.
More sophisticated functions involve summary rules, multivariate visu-
alization techniques and functional relationship between variables. Sum-
marization functions are often used in interactive data analysis, auto-
mated report generation and text mining.

7. Dependency modeling [28, 86]: describes significant dependencies among


variables.

© 2004 by Taylor & Francis Group, LLC


12 Pattern Recognition Algorithms for Data Mining

Some other tasks required in some data mining applications are, outlier/
anomaly detection, link analysis, optimization and planning.

1.4.2 Data mining tools


A wide variety and number of data mining algorithms are described in the
literature – from the fields of statistics, pattern recognition, machine learning
and databases. They represent a long list of seemingly unrelated and often
highly specific algorithms. Some representative groups are mentioned below:

1. Statistical models (e.g., linear discriminants [59, 92])

2. Probabilistic graphical dependency models (e.g., Bayesian networks [102])

3. Decision trees and rules (e.g., CART [32])

4. Inductive logic programming based models (e.g., PROGOL [180] and


FOIL [233])

5. Example based methods (e.g., nearest neighbor [7], lazy learning [5] and
case based reasoning [122, 208] methods )

6. Neural network based models [44, 46, 148, 266]

7. Fuzzy set theoretic models [16, 23, 43, 217]

8. Rough set theory based models [137, 123, 227, 176]

9. Genetic algorithm based models [68, 106]

10. Hybrid and soft computing models [175]

The data mining algorithms determine both the flexibility of the model in
representing the data and the interpretability of the model in human terms.
Typically, the more complex models may fit the data better but may also
be more difficult to understand and to fit reliably. Also, each representation
suits some problems better than others. For example, decision tree classifiers
can be very useful for finding structure in high dimensional spaces and are
also useful in problems with mixed continuous and categorical data. However,
they may not be suitable for problems where the true decision boundaries are
nonlinear multivariate functions.

1.4.3 Applications of data mining


A wide range of organizations including business companies, scientific labo-
ratories and governmental departments have deployed successful applications
of data mining. While early adopters of this technology have tended to be
in information-intensive industries such as financial services and direct mail

© 2004 by Taylor & Francis Group, LLC


Introduction 13

Other (11%)
Banking (17%)

Telecom (11%)

Biology/Genetics (8%)

Science Data (8%)

Retail (6%)
eCommerce/Web (15%)

Pharmaceuticals (5%)

Investment/Stocks (4%)
Fraud Detection (8%)
Insurance (6%)

FIGURE 1.2: Application areas of data mining.

marketing, the technology is applicable to any company looking to leverage a


large data warehouse to better manage their operations. Two critical factors
for success with data mining are: a large, well-integrated data warehouse and
a well-defined understanding of the process within which data mining is to be
applied. Several domains where large volumes of data are stored in centralized
or distributed databases include the following.

• Financial Investment: Stock indices and prices, interest rates, credit


card data, fraud detection [151].

• Health Care: Several diagnostic information stored by hospital manage-


ment systems [27].

• Manufacturing and Production: Process optimization and trouble shoot-


ing [94].

• Telecommunication network: Calling patterns and fault management


systems [246].

• Scientific Domain: Astronomical object detection [64], genomic and bi-


ological data mining[15].

• The World Wide Web: Information retrieval, resource location [62, 210].

The results of a recent poll conducted at the www.kdnuggets.com web site


regarding the usage of data mining algorithms in different domains are pre-
sented in Figure 1.2.

© 2004 by Taylor & Francis Group, LLC


14 Pattern Recognition Algorithms for Data Mining

1.5 Different Perspectives of Data Mining


In the previous section we discussed the generic components of a data min-
ing system, common data mining tasks/tools and related principles and issues
that appear in designing a data mining system. At present, the goal of the
KDD community is to develop a unified framework of data mining which
should be able to model typical data mining tasks, be able to discuss the
probabilistic nature of the discovered patterns and models, be able to talk
about data and inductive generalizations of the data, and accept the presence
of different forms of data (relational data, sequences, text, web). Also, the
framework should recognize that data mining is an interactive and iterative
process, where comprehensibility of the discovered knowledge is important
and where the user has to be in the loop [153, 234].
Pattern recognition and machine learning algorithms seem to be the most
suitable candidates for addressing the above tasks. It may be mentioned in this
context that historically the subject of knowledge discovery in databases has
evolved, and continues to evolve, from the intersection of research from such
fields as machine learning, pattern recognition, statistics, databases, artificial
intelligence, reasoning with uncertainties, expert systems, data visualization,
and high-performance computing. KDD systems incorporate theories, algo-
rithms, and methods from all these fields. Therefore, before elaborating the
pattern recognition perspective of data mining, we describe briefly two other
prominent frameworks, namely, the database perspective and the statistical
perspective of data mining.

1.5.1 Database perspective


Since most business data resides in industrial databases and warehouses,
commercial companies view mining as a sophisticated form of database query-
ing [88, 99]. Research based on this perspective seeks to enhance the ex-
pressiveness of query languages (rule query languages, meta queries, query
optimizations), enhance the underlying model of data and DBMSs (the log-
ical model of data, deductive databases, inductive databases, rules, active
databases, semistructured data, etc.) and improve integration with data
warehousing systems (online analytical processing (OLAP), historical data,
meta-data, interactive exploring). The approach also has close links with
search-based perspective of data mining, exemplified by the popular work on
association rules [3] at IBM Almaden.
The database perspective has several advantages including scalability to
large databases present in secondary and tertiary storage, generic nature of
the algorithms (applicability to a wide range of tasks and domains), capability
to handle heterogeneous data, and easy user interaction and visualization of
mined patterns. However, it is still ill-equipped to address the full range of

© 2004 by Taylor & Francis Group, LLC


Introduction 15

knowledge discovery tasks because of its inability to mine complex patterns


and model non-linear relationships (the database models being of limited rich-
ness), unsuitability for exploratory analysis, lack of induction capability, and
restricted scope for evaluating the significance of mined patterns [234].

1.5.2 Statistical perspective


The statistical perspective views data mining as computer automated ex-
ploratory data analysis of (usually) large complex data sets [79, 92]. The
term data mining existed in statistical data analysis literature long before its
current definition in the computer science community. However, the abun-
dance and massiveness of data has provided impetus to development of al-
gorithms which, though rooted in statistics, lays more emphasis on compu-
tational efficiency. Presently, statistical tools are used in all the KDD tasks
like preprocessing (sampling, outlier detection, experimental design), data
modeling (clustering, expectation maximization, decision trees, regression,
canonical correlation etc), model selection, evaluation and averaging (robust
statistics, hypothesis testing) and visualization (principal component analysis,
Sammon’s mapping).
The advantages of the statistical approach are its solid theoretical back-
ground, and ease of posing formal questions. Tasks such as classification and
clustering fit easily into this approach. What seems to be lacking are ways
for taking into account the iterative and interactive nature of the data min-
ing process. Also scalability of the methods to very large, especially tertiary
memory data, is still not fully achieved.

1.5.3 Pattern recognition perspective


At present, pattern recognition and machine learning provide the most
fruitful framework for data mining [109, 161]. Not only do they provide
a wide range of models (linear/non-linear, comprehensible/complex, predic-
tive/descriptive, instance/rule based) for data mining tasks (clustering, clas-
sification, rule discovery), methods for modeling uncertainties (probabilistic,
fuzzy) in the discovered patterns also form part of PR research. Another
aspect that makes pattern recognition algorithms attractive for data mining
is their capability of learning or induction. As opposed to many statisti-
cal techniques that require the user to have a hypothesis in mind first, PR
algorithms automatically analyze data and identify relationships among at-
tributes and entities in the data to build models that allow domain experts
to understand the relationship between the attributes and the class. Data
preprocessing tasks like instance selection, data cleaning, dimensionality re-
duction, handling missing data are also extensively studied in pattern recog-
nition framework. Besides these, other data mining issues addressed by PR
methodologies include handling of relational, sequential and symbolic data
(syntactic PR, PR in arbitrary metric spaces), human interaction (knowledge

© 2004 by Taylor & Francis Group, LLC


16 Pattern Recognition Algorithms for Data Mining

encoding and extraction), knowledge evaluation (description length principle)


and visualization.
Pattern recognition is at the core of data mining systems. However, pat-
tern recognition and data mining are not equivalent considering their original
definitions. There exists a gap between the requirements of a data mining
system and the goals achieved by present day pattern recognition algorithms.
Development of new generation PR algorithms is expected to encompass more
massive data sets involving diverse sources and types of data that will sup-
port mixed-initiative data mining, where human experts collaborate with the
computer to form hypotheses and test them. The main challenges to PR as a
unified framework for data mining are mentioned below.

1.5.4 Research issues and challenges


1. Massive data sets and high dimensionality. Huge data sets create combi-
natorially explosive search spaces for model induction which may make
the process of extracting patterns infeasible owing to space and time
constraints. They also increase the chances that a data mining algo-
rithm will find spurious patterns that are not generally valid.
2. Overfitting and assessing the statistical significance. Data sets used for
mining are usually huge and available from distributed sources. As a
result, often the presence of spurious data points leads to overfitting of
the models. Regularization and resampling methodologies need to be
emphasized for model design.
3. Management of changing data and knowledge. Rapidly changing data,
in a database that is modified/deleted/augmented, may make the previ-
ously discovered patterns invalid. Possible solutions include incremental
methods for updating the patterns.
4. User interaction and prior knowledge. Data mining is inherently an
interactive and iterative process. Users may interact at various stages,
and domain knowledge may be used either in the form of a high level
specification of the model, or at a more detailed level. Visualization of
the extracted model is also desirable.
5. Understandability of patterns. It is necessary to make the discoveries
more understandable to humans. Possible solutions include rule struc-
turing, natural language representation, and the visualization of data
and knowledge.

6. Nonstandard and incomplete data. The data can be missing and/or


noisy.

7. Mixed media data. Learning from data that is represented by a combi-


nation of various media, like (say) numeric, symbolic, images and text.

© 2004 by Taylor & Francis Group, LLC


Introduction 17

8. Integration. Data mining tools are often only a part of the entire decision
making system. It is desirable that they integrate smoothly, both with
the database and the final decision-making procedure.

In the next section we discuss the issues related to the large size of the data
sets in more detail.

1.6 Scaling Pattern Recognition Algorithms to Large


Data Sets
Organizations are amassing very large repositories of customer, operations,
scientific and other sorts of data of gigabytes or even terabytes size. KDD
practitioners would like to be able to apply pattern recognition and machine
learning algorithms to these large data sets in order to discover useful knowl-
edge. The question of scalability asks whether the algorithm can process large
data sets efficiently, while building from them the best possible models.
From the point of view of complexity analysis, for most scaling problems
the limiting factor of the data set has been the number of examples and
their dimension. A large number of examples introduces potential problems
with both time and space complexity. For time complexity, the appropriate
algorithmic question is what is the growth rate of the algorithm’s run time as
the number of examples and their dimensions increase? As may be expected,
time-complexity analysis does not tell the whole story. As the number of
instances grows, space constraints become critical, since, almost all existing
implementations of a learning algorithm operate with training set entirely in
main memory. Finally, the goal of a learning algorithm must be considered.
Evaluating the effectiveness of a scaling technique becomes complicated if
degradation in the quality of the learning is permitted. Effectiveness of a
technique for scaling pattern recognition/learning algorithms is measured in
terms of the above three factors, namely, time complexity, space complexity
and quality of learning.
Many diverse techniques, both general and task specific, have been proposed
and implemented for scaling up learning algorithms. An excellent survey of
these methods is provided in [230]. We discuss here some of the broad cat-
egories relevant to the book. Besides these, other hardware-driven (parallel
processing, distributed computing) and database-driven (relational represen-
tation) methodologies are equally effective.

1.6.1 Data reduction


The simplest approach for coping with the infeasibility of learning from
a very large data set is to learn from a reduced/condensed representation

© 2004 by Taylor & Francis Group, LLC


18 Pattern Recognition Algorithms for Data Mining

of the original massive data set [18]. The reduced representation should be
as faithful to the original data as possible, for its effective use in different
mining tasks. At present the following categories of reduced representations
are mainly used:
• Sampling/instance selection: Various random, deterministic and den-
sity biased sampling strategies exist in statistics literature. Their use
in machine learning and data mining tasks has also been widely stud-
ied [37, 114, 142]. Note that merely generating a random sample from
a large database stored on disk may itself be a non-trivial task from
a computational viewpoint. Several aspects of instance selection, e.g.,
instance representation, selection of interior/boundary points, and in-
stance pruning strategies, have also been investigated in instance-based
and nearest neighbor classification frameworks [279]. Challenges in de-
signing an instance selection algorithm include accurate representation
of the original data distribution, making fine distinctions at different
scales and noticing rare events and anomalies.
• Data squashing: It is a form of lossy compression where a large data
set is replaced by a small data set and some accompanying quantities,
while attempting to preserve its statistical information [60].
• Indexing data structures: Systems such as kd-trees [22], R-trees, hash
tables, AD-trees, multiresolution kd-trees [54] and cluster feature (CF)-
trees [29] partition the data (or feature space) into buckets recursively,
and store enough information regarding the data in the bucket so that
many mining queries and learning tasks can be achieved in constant or
linear time.
• Frequent itemsets: They are often applied in supermarket data analysis
and require that the attributes are sparsely valued [3].
• DataCubes: Use a relational aggregation database operator to represent
chunks of data [82].
The last four techniques fall into the general class of representation called
cached sufficient statistics [177]. These are summary data structures that lie
between the statistical algorithms and the database, intercepting the kinds of
operations that have the potential to consume large time if they were answered
by direct reading of the data set. Case-based reasoning [122] also involves a
related approach where salient instances (or descriptions) are either selected
or constructed and stored in the case base for later use.

1.6.2 Dimensionality reduction


An important problem related to mining large data sets, both in dimension
and size, is of selecting a subset of the original features [141]. Preprocess-
ing the data to obtain a smaller set of representative features, retaining the

© 2004 by Taylor & Francis Group, LLC


Introduction 19

optimal/salient characteristics of the data, not only decreases the processing


time but also leads to more compactness of the models learned and better
generalization.
Dimensionality reduction can be done in two ways, namely, feature selec-
tion and feature extraction. As mentioned in Section 1.2.2 feature selection
refers to reducing the dimensionality of the measurement space by discarding
redundant or least information carrying features. Different methods based
on indices like divergence, Mahalanobis distance, Bhattacharya coefficient are
available in [30]. On the other hand, feature extraction methods utilize all the
information contained in the measurement space to obtain a new transformed
space, thereby mapping a higher dimensional pattern to a lower dimensional
one. The transformation may be either linear, e.g., principal component anal-
ysis (PCA) or nonlinear, e.g., Sammon’s mapping, multidimensional scaling.
Methods in soft computing using neural networks, fuzzy sets, rough sets and
evolutionary algorithms have also been reported for both feature selection and
extraction in supervised and unsupervised frameworks. Some other methods
including those based on Markov blankets [121], wrapper approach [117], and
Relief [113], which are applicable to data sets with large size and dimension,
have been explained in Section 3.3.

1.6.3 Active learning


Traditional machine learning algorithms deal with input data consisting
of independent and identically distributed (iid) samples. In this framework,
the number of samples required (sample complexity) by a class of learning
algorithms to achieve a specified accuracy can be theoretically determined [19,
275]. In practice, as the amount of data grows, the increase in accuracy slows,
forming the learning curve. One can hope to avoid this slow-down in learning
by employing selection methods for sifting through the additional examples
and filtering out a small non-iid set of relevant examples that contain essential
information. Formally, active learning studies the closed-loop phenomenon of
a learner selecting actions or making queries that influence what data are
added to its training set. When actions/queries are selected properly, the
sample complexity for some problems decreases drastically, and some NP-
hard learning problems become polynomial in computation time [10, 45].

1.6.4 Data partitioning


Another approach to scaling up is to partition the data, avoiding the need
to run algorithms on very large data sets. The models learned from individ-
ual partitions are then combined to obtain the final ensemble model. Data
partitioning techniques can be categorized based on whether they process sub-
sets sequentially or concurrently. Several model combination strategies also
exist in literature [77], including boosting, bagging, ARCing classifiers, com-
mittee machines, voting classifiers, mixture of experts, stacked generalization,

© 2004 by Taylor & Francis Group, LLC


Exploring the Variety of Random
Documents with Different Content
— Allons, Benoît, et vous Baptiste, laissez-moi et gagnez vos
logements personnels… Quant à vous, Benoît, soyez
particulièrement attentif à mon service, et tâchez d’éviter les écarts
de langage.
Sanplan et Bernard sortirent avec dignité.
On présenta cérémonieusement à M. de Paulac l’un des
gentilshommes présents, lequel, à son tour, présenta deux ou trois
des jeunes beautés qui s’approchaient, toutes pimpantes.
La marquise de La Gaillarde, occupée à quelque bavardage,
n’avait pas reconnu Gaspard ; mais, lui, l’avait déjà aperçue.
On lui nomma ensuite quelques gentilshommes ; puis on lui
désigna, avec un affecté dédain, La Trébourine et Leteur comme
gros négociants en grains, à qui leur fortune donnait accès dans une
si fastueuse maison.
— C’est un singulier maître d’hôtel que celui-ci, vint lui dire
Montvert à qui on avait fait la leçon. C’est une sorte de maniaque qui
se fait un amusement de sa profession. Il entend ne pas la quitter,
malgré l’étendue de sa fortune qu’il a, dit-on, faite aux Indes, en sa
jeunesse. On le soupçonne, avec raison, je crois, d’être né, de
s’appeler La Galinière, et de voiler décemment ses titres de haute
noblesse qui sont des plus authentiques ; il doit être le descendant,
dégénéré en quelque manière, spirituel pourtant, d’un hobereau
distingué ; et finalement il a su faire de sa maison un lieu de réunion
aimable et facile, pour la meilleure noblesse d’Aix. Nous tolérons de
sa part une certaine familiarité qui ne dépasse jamais les limites
supportables…
— On m’avait conté tout cela, affirma Gaspard ; et c’est pourquoi,
vous me voyez ici, et ravi d’y être en si belle et bonne compagnie.
On ne sait ce qu’entendit le sourd, mais, à la grande stupéfaction
de Gaspard il répliqua :
— Je ne suis pas veuf.
L’aimable soirée avait un air de fête. La présence d’un
personnage aussi considérable que M. de Paulac mettait un éclat
inaccoutumé dans les yeux des femmes, car elles aiment plaire aux
puissants, et exciter les hommes à la lutte contre le rival de passage.
Tous reconnaissaient que l’étranger était beau, élégant, spirituel,
séduisant.
Beaucoup, hommes ou femmes, regrettaient tout bas qu’on se fût
engagé dans une galégeade qui, lorsqu’elle lui serait dénoncée,
pourrait déplaire à un tel homme, si charmant ! Il était en vérité
dommage de le « gaber » si insolemment ! Bah ! il était homme à
imaginer une réplique qui serait drôle sans être méchante ; et la
farce continuait. Il le fallait bien, et ne plus se préoccuper de savoir si
elle était ou non de bon goût.
Marin disait à Gaspard, c’est-à-dire à M. de Paulac :
— Oui, monsieur, je suis petit neveu de Vatel ; et j’ai hérité son
épée. La voici à mon côté, telle qu’on la retira du corps de ce
gentilhomme qui, n’étant pas responsable du retard de la marée,
aurait dû dire comme François Ier après Pavie : « Tout est perdu,
c’est-à-dire l’ordonnance du repas, mais non pas mon honneur, ni
celui de la France. » Il mourut victime d’un scrupule absurde autant
que respectable !
Et ce disant, Marin, tirant son épée, en faisait remarquer à
Gaspard, la finesse, la souplesse et, en un mot, la beauté.
— Comme petit neveu de Vatel, monsieur, j’ai hérité, outre son
épée, sa passion de la bonne cuisine, art éminemment français. J’ai
parcouru l’Italie et l’Espagne, qui sont pays d’une sobriété savante et
vraiment gracieuse. L’Angleterre, au contraire, comprend la
nécessité d’une cuisine plantureuse, vu son climat, et c’est la patrie
des rôtisseurs ; de même la Hollande et la Flandre ; mais
l’Allemagne, mais la Prusse, monsieur ! la Prusse est à proprement
parler, la patrie du porc. Ah ! monsieur ! pour comble d’inconscience,
on y appelle délicatesses toutes les lourdes bagatelles de la
cochonnerie. Lourdes bagatelles, pesante nourriture, indigeste
boisson. Ces gens-là mangent comme la bête fauve qui, au fond des
bois, se gorge et se gonfle de proie sanguinolente. Il faut se méfier
d’un pareil peuple, monsieur ; il voudra quelque jour dévorer
l’Europe, engloutir le monde, digérer l’univers. Ah ! monsieur, on sait
manger dans les autres pays ; en Prusse on engloutit, on dévore, on
absorbe, on goinfre, on avale, on engouffre, on bâfre, on gloutonne,
on se gave, on se bourre, on se gonfle, on s’empiffre, résultat : un
empâtement charnel, qui étouffe l’esprit cérébral sous la violence
des appétits ou esprits gastriques, et qui anéantit tout sentiment
élevé ou charitable ; en sorte qu’un peuple si affamé ne cultive son
intelligence qu’en vue seulement de fabriquer des cochonailles ; ou
bien des engins de mort, c’est-à-dire des pièges destinés à duper la
proie, à la prendre et à l’enfourner toute vive dans le gaster
pantagruélique d’un Pantagruel sans ironie ni gaîté, qui s’en
crèvera… ouf !… Nous seuls, monsieur, savons exécuter une
omelette élégante, crémeuse, dorée et légère ; nous seuls avons le
secret d’une odorante grillade obtenue sur un feu de vigne aromatisé
de romarin. En cela, monsieur, nous sommes inimitables !
Marin venait d’achever son essoufflant discours, lorsque la petite
comtesse, toute gentille dans son rôle de Lisette, s’approcha de
Gaspard, superbe dans son rôle de Paulac, et lui présenta
coquettement un plateau sur lequel étincelaient un flacon de cristal
empli de vin d’Espagne, et des verres de Venise, en forme de lys.
Gaspard, ayant regardé le plateau, le flacon de cristal ciselé, et
les verres pareils à des fleurs, leva les yeux sur la comtesse et,
devinant sans peine la femme de qualité sous un costume de
chambrière, il lui prit le menton…, ce geste étant selon la tradition
des gentilshommes en séjour dans les hôtelleries.
Puis, ayant bu, il dit, verre en main :
— Que voilà, malepeste, un minois de mon goût ! Vertubleu !
monsieur l’hôte ! rien que pour voir cette frimousse-là, on deviendrait
volontiers le client de votre maison, surtout la nuit !
Mais le mari de la prétendue Lisette, oubliant de rester à son
rang de Frontin, s’était approché, poussé par la jalousie ; et, ayant
entendu ces paroles qui le mirent en colère :
— Monsieur, dit-il à Gaspard, c’est ma femme !
Gaspard, se retournant et flairant le gentilhomme et le mari sous
l’habit du valet :
— Que me veut ce maraud ? fit-il avec rudesse.
Insulté, le comte, trop bien déguisé, s’oublia, et faisant le geste
involontaire de chercher son épée absente, il mit de la hauteur
offensée dans ce simple mot :
— Marquis !
Gaspard répliqua :
— Jocrisse ?
Et lui donna du pied au bas du dos, avec une grâce inimitable.
Frontin allait éclater en cris de rage, quand Marin lui saisissant le
bras :
— Chut ! et patience !… C’est si drôle !
— Un peu trop ! grinça Frontin qui s’éloigna en se frottant les
fesses.
Gaspard s’amusait fort, songeant qu’il n’était pas venu pour faire
autrement ni mieux que donner du pied aux parlementaires et à
leurs amis.
Lisette attendait que Gaspard remît le verre sur le plateau. La
joie courait dans l’assistance.
— Ne vous étonnez pas trop de… l’insistance de ce valet,
monsieur, dit Marin ; il est très véritablement, comme il l’assure, le
mari de Lisette ; il n’est pas bête au fond ; et, s’il a pris le ton d’un
gentilhomme offensé, c’est par badinage, et pour faire passer, non
sans esprit, sa protestation conjugale.
— Monsieur mon hôte, dit Gaspard, la vanité d’un chef de police
va jusqu’à prétendre qu’il sait comprendre ces choses simples, sans
qu’il soit besoin qu’un sot les lui explique.
A ce mot, la gaîté des assistants fut portée au comble ; Leteur, et
La Trébourine surtout, exultaient.
— Il faut que vous sachiez, monsieur, déclara Marin d’un air
hautain, que les plus grands noms de l’armorial de Provence forment
la liste de mes invités, et que nul de ces gentilshommes, familiers de
ma maison, ne me traite sans quelque courtoisie. J’aime à croire que
vous vous rangerez à suivre un si galant exemple, dès que vous
aurez jeté un coup d’œil sur ma liste de ce soir.
Il la montra. Gaspard l’examina :
— Grands noms, en effet… Ah ! Ah ?… Un Cocarel ? Sera-ce le
père ou le fils ? ou tous les deux ?
— Le père, s’est fait excuser. Nous n’aurons que le fils.
— Bien ; j’ai à lui faire une communication secrète.
— Pour être tout à lui, dit Marin, vous pourrez vous retirer dans
les appartements qu’on vous a désignés à votre arrivée, et dont
monsieur votre intendant… s’est montré satisfait… mais vous ne
paraissez pas remarquer, sur ma liste, ce nom-ci, le plus beau peut-
être : Mirabeau !
— Le fils ?
— Non, le père… Et, ici, voyez un beau nom encore : La
marquise de la Gaillarde, dont le mari est un… débris de Fontenoy.
— Je sais, je sais, fit Gaspard.
— La voici qui s’approche.
— Je suis déjà charmé.
— En ce cas, mon devoir, monsieur, est de vous laisser en sa
délicieuse compagnie.
Il s’inclina et s’éloigna ; et la marquise :
— Ai-je mal entendu ? dit-elle à Gaspard sans le reconnaître ; et
ne parliez-vous pas de Fontenoy, — c’est-à-dire de moi, monsieur
de Paulac ?
— Si vous aviez vu Fontenoy, madame, les flammes de vos yeux
seraient à demi-éteintes, les lys et les roses de votre visage seraient
à demi fanés, — et monsieur votre époux n’aurait plus besoin de
soigner l’heureuse goutte qui nous permet de vous voir sans lui.
Elle le regarda attentivement. Le son de cette voix lui était
connu… Elle éprouva un léger frisson : crainte ? ou volupté ?… Elle
ne pouvait admettre qu’une singulière ressemblance…
— Madame, murmura alors Gaspard d’un air mystérieux,
l’homme est d’étoupe ; la femme est de feu… le diable souffle.
— Ah ! mon Dieu ! fit-elle.
— Vous brûlez !…
Elle chuchota :
— Quoi ! Vous ? C’est vous !… Quelle imprudence ! mais alors…,
le vrai monsieur de Paulac ?
— … est en sûreté ; j’ai appris qu’on lui préparait une
mystification, et j’ai désiré lui en épargner le ridicule… J’arrête le
courrier de temps en temps, comme vous savez.
— Hélas ! fit-elle ; puis, éclatant de rire :
— Mais c’est charmant !
Aussi insolent qu’un valet véritable, Frontin, de nouveau, s’était
approché, — et il cherchait à entendre.
Gaspard, se retournant brusquement vers lui :
— Que me veut encore ce maroufle ?
Et, pour la seconde fois, son pied visita les chausses de M. le
comte, qui se jugea décidément trop bien camouflé.
Et comme, en s’éloignant, le faux valet lui montrait le dos :
— Voilà, dit Gaspard, un derrière fort amoureux de ma botte !
Marin était accouru :
— Décidément, Frontin…, je vous chasserai ! Tenez-vous mieux.
Lisette, indulgente à son mari, et surtout prompte à se rapprocher
du gentil Paulac, accourut aussi :
— Pardonnez-lui, monsieur ! dit-elle au prétendu chef de police.
— Soit, pour amour de tes beaux yeux, friponne ! mais je veux te
baiser dix fois, afin qu’il enrage !
Frontin revint, comme attiré par la botte de Gaspard, et ne put
s’empêcher de dire en frappant du pied :
— Ah ! c’est assez, à la fin, monsieur !
— Assez ? Non, ma foi ! En voici encore ! dit Gaspard.
Et, pour la troisième fois, son pied atteignit au bas des reins le
gémissant et soi-disant valet.
Marin prit la main du comte, et, la lui serrant :
— C’est fini ! du courage, Frontin, mon ami !
— Vertubleu ! s’exclamait Gaspard, voilà bien l’esprit nouveau !
Le dernier des laquais affiche son impertinence, et ne sait plus
souffrir qu’un bon gentilhomme lutine sa femme ! C’est intolérable ! Il
est donc vrai qu’une abominable révolte couve au cœur des
peuples ! Je vois avec chagrin grandir ce mal incroyable !
Et comme on l’entourait pour le mieux écouter :
— Vous rappelez-vous qu’un Voltaire, par trop… libertin, voyait
dans Mandrin… un héros ! Est-il étonnant, après cela, que votre
peuple de Provence aime et favorise un Gaspard de Besse !
— C’est bien vrai ! approuva La Trébourine.
Alors Marin, penché à l’oreille de M. de Paulac :
— Entre nous, tout à fait entre nous, M. de Voltaire n’avait pas
tort, et notre Gaspard a du bon.
A ce mot, Gaspard, feignant l’indignation, se retourna contre
Marin, avec un mouvement de jambe qui menaçait le président-
cuisinier d’un châtiment pareil à celui dont gémissait Frontin.
— Tu dis, coquin ? s’exclama Gaspard.
Marin insista, courageusement :
— Je vous en demande pardon, mais…
Et il chuchota, un peu haut, à l’oreille de Gaspard :
— Notre parlement a des torts ; et, sur les fautes du Parlement, le
président Marin partage l’opinion du fameux Gaspard.
— Qu’entends-je, confia tout bas Leteur à La Trébourine, — le
président nous trahit !
La marquise de la Gaillarde intervenant :
— Sachez, monsieur, que ce Gaspard est adoré des femmes.
— Et pourquoi donc, marquise ?
Ce fut alors à qui raconterait un trait de galanterie attribué à
Gaspard de Besse ; et, bien entendu, on cita tout d’abord l’histoire
du bandit qui, voulant couper un doigt de femme pour s’assurer la
possession d’une bague de prix, fut tué par Gaspard d’un coup de
pistolet…
D’une manière inattendue, M. de Paulac répondit à cela, d’un ton
froid et sévère :
— Je connais cette histoire ; et vous feriez mieux, messieurs, et
vous, mesdames, de n’en jamais parler, car des traits pareils sont de
nature à faire aimer ce héros de potence !
Mademoiselle de Malherbe osa protester :
— Songez, monsieur, qu’il n’a aucun meurtre à se reprocher, car
l’acte qu’on vient de narrer est celui d’un justicier, protecteur des
femmes.
— Ce Gaspard, ajouta madame de La Gaillarde, est un
aventurier hardi, nullement un voleur vulgaire. Ses entreprises ont
un caractère, comment dirai-je ? politique.
— Oui-dà ! ricana le faux Paulac ; eh bien, même s’il en est ainsi,
surtout s’il en est ainsi, nous agirons contre lui avec la plus grande
énergie.
— Oh ! monsieur, supplia Mlle de Malherbe, dites-nous que, dans
vos rapports à Sa Majesté, vous serez clément à notre bandit
préféré ?
Et toutes les femmes en chœur :
— Promettez, de grâce, qu’on ne le pendra point.
— Ce que j’entends est inouï, invraisemblable ! gronda le
pseudo-envoyé du lieutenant général de police… Votre bandit
préféré mérite la roue, mesdames ! Il y arrivera.
— Quoi ! dit encore Mlle de Malherbe en voilant ses beaux yeux
d’une main tremblante, — on lui romprait les bras ?
— Et les jambes ! affirma Paulac brutalement.
— Monsieur, dit Leteur, n’écoutez point les femmes. Elles
raffolent de ce gredin qui est un danger d’autant plus grand qu’il
paraît être un ensorceleur. Un bon bûcher serait bien son affaire ;
mais la mode en est presque passée, par malheur.
— Je suis du même avis que Leteur, insista La Trébourine.
Les femmes, découragées par le ton menaçant du prétendu
Paulac, et voyant Leteur lui parler d’un air mystérieux, s’étaient
éloignées.
Marin observait tout, était tout à tous.
— Messieurs, dit Gaspard à Leteur et à La Trébourine, qui êtes-
vous donc, pour me donner des conseils sans en être priés ? Car
enfin, nous sommes ici dans une hôtellerie, dans un lieu public ?
— Mais, monsieur, nous avons eu l’honneur de vous être
présentés tout à l’heure. L’avez-vous oublié ?
— Complètement.
— Nous sommes, monsieur, nous sommes des juges… avoua
étourdiment Leteur.
— Des juges ?
— De bons juges en grains, orge, blés et avoine, monsieur, se
hâta d’ajouter La Trébourine, c’est-à-dire que nous sommes de
riches marchands… de très riches marchands. Et nous prenons la
liberté de vous apprendre qu’il y a, dans Aix, un homme plus
dangereux peut-être que Gaspard de Besse… un certain homme qui
a la langue et la dent empoisonnées !… une vipère !
— Et cet homme… Quel est-il ? Est-il ici ?
— Il n’est pas ici…, dit Leteur tout bas ; c’est le président du
Parlement, monsieur Marin.
— Le président lui-même ! confirma La Trébourine.
Les deux juges parlaient ensemble et ils n’avaient pas achevé,
l’un placé à la droite, l’autre à la gauche de Gaspard, que celui-ci,
écartant largement tout à coup et rejetant ses bras en arrière,
souffleta les deux juges du dos de ses mains ouvertes.
Cela fit un double claquement semblable à un appel.
— Voilà, voilà ! cria Marin qui, ayant vu le geste, accourait
gaîment. Que se passe-t-il donc, monsieur ?
— Monsieur, dit Gaspard, votre auberge serait-elle un foyer de
sédition, un centre de conjuration ? Voilà deux marchands d’avoine
qui se permettent de parler, à moi, Paulac, du président Marin et par
conséquent du Parlement même, en termes insupportables ! Les
Parlements, messieurs, sont à la fois les soutiens, les avertisseurs et
les frondeurs du trône. Le président Marin l’a compris, et vous n’êtes
que des lourdauds. S’il arrive que les Parlements se trompent
parfois, nous les couvrirons envers et contre tous ; car le principe
auquel nous sommes attachés, et que nous défendrons jusqu’à la
mort, entendez-vous, c’est le principe d’autorité ; la justice ne vient
qu’ensuite… lorsqu’elle vient… Traiter le président, devant moi, de
vipère ! cela confine au crime de lèse-majesté !
— Ils ne le feront plus, n’est-ce pas, chers confrères ? dit Marin,
qui, décidément, se félicitait du succès de sa soirée.
Et dans tout l’Hôtel des marins ce ne fut qu’un cri : « Quelle
énergie, ce Paulac ! Que de décision !… »
— Tout de même, murmuraient les femmes entre elles, il faudra
lui faire entendre que notre Gaspard est un bon diable !
Un valet annonça : « M. le comte Séraphin de Cocarel. »
CHAPITRE XXII

Où l’on assistera à la deuxième rencontre de Gaspard avec Séraphin de


Cocarel et aux remontrances royales de M. de Paulac au Parlement d’Aix-
en-Provence.

Une jeune beauté s’était mise au clavecin. Elle déclarait, en


chantant, qu’elle se mourait d’amour.
Gaspard n’ayant pas entendu annoncer Cocarel, demanda à
Marin, en désignant du regard le nouveau venu :
— Quel est donc là-bas ce personnage qui va boitillant d’un air
d’importance ?
— C’est M. de Cocarel, Séraphin, le fils du juge au Parlement.
Vous m’avez dit que vous désiriez lui parler en secret. Je vais vous
l’amener.
— Mais, monsieur, fit Gaspard en arrêtant Marin par la manche
de sa veste blanche, en quel temps faites-vous des chapeaux ?
— Je vous comprends, monsieur, vous me renvoyez à ma
cuisine ? Sachez donc que je n’y parais qu’en chef d’armée ; les
grands chefs ne doivent aller au feu que rarement ; leur affaire est de
la diriger ; et l’on ne peut voir les ensembles que de loin… Mes
ordres sont donnés. Les princes, monsieur, ont pour devoir de se
laisser attacher au rivage pendant que l’armée combat, attendu que
la victoire dépend de leur commandement, c’est-à-dire de leur
existence. Pour moi, je n’aurais évidemment aucun risque à courir
devant mes rôtissoires ; mais, si je m’absente des cuisines, c’est à
bon escient, et lorsque je suis sûr de mes lieutenants… Mon souper
de ce soir sera un triomphe ; et vous en conviendrez, le tout premier,
dans une heure, à table. Je ne suis dans mes salons que parce que
vous y êtes, monsieur, et que mon devoir, tel que je le comprends,
est de veiller à ce que, dans mon salon même, tout puisse convenir
et plaire à un nouvel hôte de distinction qui m’honore de sa
présence.
Il s’inclina, s’éloigna, et amena bientôt Cocarel à Paulac ; puis,
les ayant présentés l’un à l’autre, il déclara :
— Mille excuses ; je vais surveiller mon champ de bataille.
Alors, sans préambule d’aucune sorte, l’envoyé du lieutenant
général de police dit brusquement à Cocarel :
— C’est vous, monsieur, qui, en joyeuse compagnie, par un beau
soir d’été, pendîtes un manant aux branches d’un olivier ?
Cocarel se redressa, pour se défendre d’abord par l’attitude,
mais ne trouva pas sur-le-champ la réponse habile qu’il cherchait.
— N’oubliez pas, monsieur Cocarel, que je suis dans l’exercice
de ma fonction… Votre crime, monsieur, eut des suites fâcheuses.
Le Parlement, ayant étouffé cette affaire, — à prix d’or, dit-on, — le
peuple s’est ému de tant d’impunité d’un côté ; de tant de
prévarication de l’autre ; et des vengeurs se sont dressés contre
vous et contre nos magistrats ; car la prétendue bande de Gaspard
de Besse n’a été recrutée par lui que pour faire une guerre acharnée
au Parlement et obtenir le châtiment des assassins de Teisseire ;
votre punition d’abord, celle ensuite des juges prévaricateurs. Or, M.
le lieutenant général m’envoie pour faire la lumière sur cette affaire
par trop obscure ; et je dois m’occuper de vous avant toute chose.
Nous aviserons ensuite, en ce qui concerne ce Gaspard ; oui, nous
aviserons seulement quand nous saurons si votre crime, étant
avéré, n’est pas pour ce bandit une manière d’excuse, ou tout au
moins n’est pas une explication qui lui mérite quelque égard
politique.
— Eh ! monsieur ! dit enfin Cocarel, qu’allez-vous chercher là ?
Ce Gaspard est un vulgaire voleur, et facile à prendre !
— Pas si facile à prendre que vous croyez ; et la preuve, c’est
qu’il court encore ! Soyez assuré, monsieur, qu’on ne le prendra pas
sans moi !… Il est d’ailleurs à peu près certain que le Parlement ne
veut pas, et que vous ne voulez pas qu’il soit pris.
— Et pourquoi serait-ce ?
— Parce que, lui pris, il faudra bien qu’on revienne sur votre
crime, et c’est peut-être ce que souhaite Gaspard lui-même.
Cependant, il ne se laissera pas capturer, dit-on, avant d’avoir
enlevé comme otages assez de vos amis pour que son procès
émeuve la France et l’Europe. Croyez-moi, monsieur, je suis à la
source des renseignements.
— Monsieur, protesta Cocarel, soyez convaincu que le Parlement
(je le sais par mon père) a tout mis en œuvre pour se saisir de
Gaspard.
— Pour le faire assassiner, peut-être ; capturer, non ! on craint
trop ses défenses. Vous savez bien qu’il est insaisissable. N’a-t-il
pas osé vous provoquer, vous-même ? Ne vous a-t-il pas blessé, en
duel ?…
— Lui ? moi ? Monsieur ?
— Vous voyez que notre police est bien faite.
— Je n’ai jamais vu le visage de ce Gaspard, monsieur.
— Son visage, c’est possible ; mais lui ?… Ce duelliste masqué
auquel vous devez votre légère et si gracieuse claudication…
— Lui ! c’était lui ?
— C’était lui, monsieur ; et j’en ai la preuve.
— Lui !… je m’en doutais depuis assez longtemps ! s’écria, le
plus bas possible, Cocarel…
Et entre ses dents :
— Je trouverai un moyen de vengeance !
— Vengez-vous, si vous le pouvez, monsieur ; ce sera servir Sa
Majesté ; mais je vous répète que, pour ce qui est de prendre
Gaspard, vous ne sauriez le prendre sans moi !
— Monsieur, dit Cocarel, je vous assure que je m’emploierai de
toutes mes forces à l’entreprise d’une capture qui intéresse si fort la
sûreté de l’État.
— A la bonne heure ! Entrevoyez-vous un moyen de nous y
aider ?
Cocarel parut réfléchir ; mais Gaspard n’entendait pas que ce
Cocarel lui échappât sans lui laisser aux mains quelque plume de
l’aile ; c’est-à-dire sans que la comédie de cette soirée mémorable
ait été de quelque heureux résultat pour la cause de Bernard.
Voyant que Cocarel continuait à se taire :
— Je dois vous prévenir formellement que vous aurez bientôt à
vous défendre contre une accusation de meurtre… Il me faudrait des
raisons bien extraordinaires, et que je ne saurais prévoir, pour
modifier mon rapport à Sa Majesté, en ce qui vous concerne.
Nouveau silence.
Voyant Paulac et Cocarel en si intime conciliabule, la plupart des
« invités » s’absorbaient dans le jeu ou dans leurs conversations
personnelles. Le clavecin résonnait toujours.
— Monsieur, susurra enfin Cocarel avec une mine prudente —
cette affaire… du moins en ce qui me concerne, ne saurait-elle
vraiment… s’arranger… un peu ?
— Comment l’entendez-vous ? dit Gaspard.
Il avait déjà compris qu’il allait subir un assaut.
Cocarel, insinuant et tâtant son terrain, reprit :
— Vous êtes en position de me servir… mais avez-vous… mille
excuses… une fortune digne de votre situation ?
— Ma fortune est nulle. Je suis le soldat qui n’a que sa solde.
— La fortune de mon père, déclara Cocarel d’un air fin, est
considérable.
— Hum ! voilà — ou je me trompe fort — une tentative de
corruption ? fit observer Paulac.
Comme, dans le ton de M. de Paulac, aucune indignation ne
perçait, Cocarel se sentit encouragé. Il dit, procédant par
insinuation :
— Une question sur votre fortune, rapprochée d’une confidence
sur la mienne, ne saurait constituer une tentative de corruption,
monsieur ; vous êtes, j’en suis certain, trop bon juriste pour l’ignorer.
Il n’y a donc pas de tentative… jusqu’ici.
Gaspard souriait toujours.
— Ce jusqu’ici est éloquent ! Eh bien, qu’avez-vous à ajouter ?
Cocarel conclut qu’il avait partie gagnée. Il n’en était pas surpris.
La corruption, pensait-il, était chose couramment admise.
— Voyons, monsieur, continua-t-il, ne me soyez pas trop sévère.
Je crois, toute réflexion faite, que le Roi lui-même, le cas échéant,
n’hésiterait pas à vous octroyer une pension, pour vous remercier
d’avoir sauvé, en ma personne, l’honneur de sa noblesse de robe.
Vous paraissez trop oublier, monsieur, que la raison d’État doit
primer la justice, dans un État bien gouverné.
— Justement, dit Gaspard — sans réfléchir qu’il s’appelait pour
l’instant Paulac — justement j’ai pour opinion qu’il devrait en être
autrement.
Cocarel se persuada que Paulac voulait vendre à plus haut prix
sa conscience.
Gaspard ajouta bien vite :
— C’est pourquoi je m’estimerais peu, si je manquais pour vous à
mes principes personnels, en même temps qu’aux devoirs de ma
charge. Vos principes à vous, je les connais. Vous êtes de ceux qui,
dans le procès la Cadière, eussent condamné l’innocence, sous
prétexte qu’en sauvant le coupable ils sauvaient la religion elle-
même, comme si la religion n’était pas au-dessus de pareils calculs
et de si honteuses manœuvres.
— Quoi qu’il en soit, osa dire Cocarel, qui, impatienté, devenait
arrogant d’allure ;… quel prix fixez-vous à votre complaisance ?
Gaspard, étant Paulac, eut envie, sincèrement, de souffleter
l’insolent… Il répliqua, étant Gaspard :
— Fixez-le vous-même… avec politesse. Et réfléchissez que le
prix de votre conscience et celui de la mienne, tous deux réunis, ne
peuvent être que très élevés. Il faut me payer l’un et l’autre !
Gaspard réfléchissait que vingt ou trente mille livres de plus
arrondiraient d’heureuse manière la dot de Bernard, époux de
Thérèse. Il penchait pour trente mille. Les fontes de trois chevaux les
emporteraient facilement en beaux louis d’or. Marin devait posséder
cette somme chez lui. Cocarel n’avait qu’à la lui emprunter, ce soir
même… Une dette de jeu, sur parole, à payer avant minuit… Il s’en
expliqua nettement avec Cocarel, sauf qu’il feignit, bien entendu, de
croire à la fable du riche hôtelier-amateur, chez qui l’on joue gros
jeu, et d’ignorer le vrai nom du maître de la maison.
— C’est convenu, fit Cocarel joyeux.
— J’ai ici, lui dit Gaspard, un appartement personnel où nous
pourrons nous rendre tout à l’heure, car j’entends vous donner un
reçu en forme, pour votre entière sécurité. Vous pourriez soupçonner
le prévaricateur que me voici devenu, d’être homme à nier un jour
qu’il ait reçu de vous cette somme. Il vous faut donc une arme contre
moi. C’est surtout entre coquins, monsieur, que les précautions sont
nécessaires. Allez, je vous attends.
Cocarel courut à la recherche de Marin.
Il le rencontra qui accompagnait, pour les présenter à M. de
Paulac, un groupe de parlementaires par lui invités à sa folle soirée.
— J’ai à vous parler, souffla mystérieusement Cocarel.
— Dans un instant, je serai à vous.
Force fut à Cocarel d’attendre sur place.
Les présentations faites, Gaspard déclara :
— Je dois dire, messieurs, au nom de Sa Majesté, un mot
personnel à chacun de vous.
Gaspard, au temps de ses amours aixoises, avait assez
fréquenté la ville d’Aix pour apprendre à mettre un nom sur les
visages de tous les parlementaires.
Et tirant à part le plus proche de lui :
— Monsieur, lui dit-il en l’appelant par son nom, j’ai le regret de
vous adresser un grave reproche. Nous n’ignorons pas que votre
femme, lorsqu’elle sait qu’un de vos plaignants est un homme marié,
s’arrange, en attendant que le procès de ce mari soit jugé, pour
attirer sa femme ou sa fille dans votre maison. Là, elle met en main
de la fille ou de l’épouse une quenouille qu’elle lui fait filer pendant
des semaines. Cela, à la longue, vous fait de beau et bon linge ; le
peuple se raconte cela et en murmure. Ne vous défendez pas, vous
mentiriez. Vous compromettez la dignité du Parlement, monsieur, et
vous justifiez ainsi la révolte d’un Gaspard de Besse. Pas un mot.
Allez.
Il prit, de même, à part, un deuxième magistrat :
— Monsieur, vous aimez le gibier. Quand un paysan a un procès
en cours, vous lui donnez à entendre que vous êtes friand de lièvres
et perdrix, fussent-ils pris au lacet sur les terres de vos confrères.
Vous vous arrogez ainsi le pouvoir de donner aux roturiers le droit de
chasse et de vol. Il faut que cela change. Vous compromettez la
justice, la dignité du Parlement, et vous justifiez ainsi la révolte d’un
Gaspard de Besse. Allez, allez, monsieur. Vous n’avez rien à dire…
Il fit, sur ce ton, à quatre ou cinq parlementaires, des
remontrances confidentielles, et dit aux autres, d’une voix haute :
— Vous messieurs, je n’ai que des éloges à vous faire de la part
de Sa Majesté. Si elle n’avait, comme parlementaires, que des
hommes intègres, tels que vous, un Gaspard de Besse ne se
dresserait point contre le Parlement, avec l’approbation du peuple.
Les magistrats étaient confondus. Marin riait sous cape.
— Je regrette, messieurs, continua Gaspard, que votre président
M. Marin ne soit pas parmi vous. Sans doute le verrai-je demain ;
mais j’aurais eu plaisir à lui adresser, ici, ce soir, quelques
compliments, comme à vous.
Leteur et La Trébourine se rapprochèrent du groupe. Marin
parlait ; il disait au pseudo-Paulac :
— Le président Marin m’honore de son amitié, monsieur…
Gaspard l’interrompit :
— Messieurs, je ne suis pas seulement l’envoyé de M. le
lieutenant de police. S. M. le roi de France en personne m’ayant fait
l’honneur de m’entretenir de ses volontés à votre sujet, m’a confié
une mission spéciale, concernant le Parlement d’Aix. C’est ce qui
m’oblige à vous parler comme je le fais. Je vous dirai donc que M. le
président Marin est un homme de beaucoup d’esprit, et judicieux
autant que juste. Cela fait qu’il mène, contre son propre Parlement,
une campagne d’épigrammes et de bons mots dont on ne peut
s’empêcher de louer l’inspiration ; cependant, le fait est fâcheux en
un sens, parce que ces épigrammes, tombées de si haut, sont
ramassées par le peuple qui en fait des gorges chaudes ; et nous
savons que la troupe de Gaspard de Besse se sert de ces étincelles
d’esprit pour aviver le feu des rancunes et des mécontentements
populaires. Personnellement j’applaudis aux sarcasmes du président
Marin. Comme grand officier de la police, je les déplore, car il faut
savoir, en certains cas, ne pas avoir trop raison.
Il conclut :
— Tâchez, Messieurs, de ne plus mériter à l’avenir ni le fouet
satirique de votre président, ni la menace vengeresse d’un Gaspard.
Vous avez eu longtemps le droit de présenter au roi vos
remontrances. Je vous ai apporté ici les remontrances du roi ; ce
sont celles du peuple.
Les magistrats, avec ensemble, s’inclinèrent. Jamais Gaspard
n’avait si fièrement senti sa force, sa propre royauté, éphémère,
mais grosse d’avenir.
A ce moment, il aperçut, parmi des visages nouveaux-venus, le
fin profil de Mme de Lizerolles.
Elle trouva le moyen de lui dire, à voix basse, en passant près de
lui et sans avoir l’air de le connaître :
— Cela est bien, je suis contente, monsieur de Paulac.
Ce fut là une grande minute pour Gaspard de Besse.
Cocarel entraînait, hors des salons, Marin qu’il avait pris par le
bras.
A ce moment, tous les invités se levèrent. Dans un vieux
gentilhomme qui venait d’entrer, on saluait le marquis de Mirabeau,
le père de celui que connaissait Gaspard.
Mille compliments s’échangèrent ; c’était à qui fêterait le marquis,
qui finit par dire :
— De grâce, vous m’étouffez, messieurs !… Mes amis, un peu
d’espace, s’il vous plaît ! je suis en nage.
Il s’éventait avec son mouchoir.
Gaspard s’avança vers lui :
— Souffrez, monsieur, que je me présente moi-même : marquis
de Paulac.
— Je salue un beau nom, dit le marquis ; et je suis venu pour le
saluer, monsieur de Paulac.
— Et votre fils, monsieur ? dit Gaspard.
— Mais… j’en ai deux, monsieur.
— Je le sais, monsieur, et tous deux font grande figure ; le
vicomte, je ne l’ignore pas, s’est distingué dans cette guerre
d’Amérique qui présage au monde entier de nouvelles destinées ;
mais, marquis, c’est au comte que j’en ai.
— Humph ! dit le marquis, celui-là est une manière de taureau
sauvage ; il fut d’abord un simple poulain échappé ; il s’est mué en
taureau indompté. Il a le diable au corps, et de l’éloquence, le
monstre ! On dit qu’il parle en tonnerre ; que son éloquence est un
orage, ou un coup de mistral sur le Rhône. Tout cela est fort joli,
mais il me donne bien du fil à retordre !… La prison de Ré, celle de
Manosque, celle du château d’If et du fort de Joux, il a usé toutes les
prisons dont à ma grande satisfaction, le roi a pu disposer en ma
faveur ; mais mon démon incarné écule les geôliers, et séduit toutes
leurs filles. Il a, paraît-il, une laideur engageante… il m’inquiète nuit
et jour, la nuit surtout. C’est tout ce que j’en peux dire ; ce n’est pas
un homme ; c’est une révolution, ce bougre-là !
— Monsieur, dit Gaspard, devenez-lui, de grâce, indulgent. Il
nous faudra sans doute, avant longtemps, des bougres de sa taille
pour mettre ou maintenir dans la bonne voie des révoltés d’une autre
caste.
Le marquis, étonné, leva sur M. de Paulac un regard perçant.
— Mais souffrez, je vous prie, poursuivait Gaspard, que je vous
quitte un instant ; les devoirs de ma mission sont parfois importuns…
et… j’ai à causer d’une affaire passablement sérieuse, avec M.
Séraphin de Cocarel.
Gaspard avait aperçu Cocarel qui, du seuil, lui faisait signe,
donnant à entendre qu’il était en mesure de conclure leur marché.
Gaspard laissa là le marquis de Mirabeau, un peu rêveur ; et,
conduit par Cocarel, il gagna ses appartements.
Dans la galerie, il rencontra Sanplan qui attendait ses ordres.
— Entrez donc chez moi, monsieur Cocarel ; un ordre à donner à
mon majordome, — et je suis tout vôtre.
— Toi, attends-moi là un instant, majordome.
Et, s’éloignant, Gaspard essaya d’abord de retrouver Mme de
Lizerolles. Poussée par une curiosité bien féminine, elle n’avait fait,
dans la maison du président, qu’une brève apparition, le temps de
voir le triomphe de Gaspard.
Le faux Paulac revint parler au faux majordome.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like