100% found this document useful (3 votes)
18 views

Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf download

The document provides information about various data mining resources and textbooks, including 'Data Mining: A Tutorial-Based Primer' by Richard J. Roiger. It outlines the aims and scope of the Data Mining and Knowledge Discovery series, emphasizing the integration of mathematical, statistical, and computational methods. Additionally, it lists published titles in the series and includes a detailed table of contents for the primer, covering fundamental concepts and techniques in data mining.

Uploaded by

pekerdureeu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
18 views

Data Mining A Tutorial Based Primer 2nd Edition Richard J. Roiger pdf download

The document provides information about various data mining resources and textbooks, including 'Data Mining: A Tutorial-Based Primer' by Richard J. Roiger. It outlines the aims and scope of the Data Mining and Knowledge Discovery series, emphasizing the integration of mathematical, statistical, and computational methods. Additionally, it lists published titles in the series and includes a detailed table of contents for the primer, covering fundamental concepts and techniques in data mining.

Uploaded by

pekerdureeu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Data Mining A Tutorial Based Primer 2nd Edition

Richard J. Roiger download

https://ebookgate.com/product/data-mining-a-tutorial-based-
primer-2nd-edition-richard-j-roiger/

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats

Making Sense of Data A Practical Guide to Exploratory


Data Analysis and Data Mining 1st Edition Glenn J.
Myatt

https://ebookgate.com/product/making-sense-of-data-a-practical-
guide-to-exploratory-data-analysis-and-data-mining-1st-edition-
glenn-j-myatt/

Data Mining Using SAS Enterprise Miner A Case Study


Approach 2nd Edition

https://ebookgate.com/product/data-mining-using-sas-enterprise-
miner-a-case-study-approach-2nd-edition/

Clustering for Data Mining A Data Recovery Approach 1st


Edition Boris Mirkin

https://ebookgate.com/product/clustering-for-data-mining-a-data-
recovery-approach-1st-edition-boris-mirkin/

Data Mining and Analysis Fundamental Concepts and


Algorithms Mohammed J. Zaki

https://ebookgate.com/product/data-mining-and-analysis-
fundamental-concepts-and-algorithms-mohammed-j-zaki/
Encyclopedia of Data Warehousing and Mining 2nd Edition
John Wang

https://ebookgate.com/product/encyclopedia-of-data-warehousing-
and-mining-2nd-edition-john-wang/

Data Mining 3rd ed Edition Jiawei Han

https://ebookgate.com/product/data-mining-3rd-ed-edition-jiawei-
han/

Statistical Data Mining Using SAS Applications Second


Edition Chapman Hall CRC Data Mining and Knowledge
Discovery Series George Fernandez

https://ebookgate.com/product/statistical-data-mining-using-sas-
applications-second-edition-chapman-hall-crc-data-mining-and-
knowledge-discovery-series-george-fernandez/

Handbook of Statistics 24 Data Mining and Data


Visualization C.R. Rao

https://ebookgate.com/product/handbook-of-statistics-24-data-
mining-and-data-visualization-c-r-rao/

R Data Mining Blueprints 1st edition Edition Mishra

https://ebookgate.com/product/r-data-mining-blueprints-1st-
edition-edition-mishra/
DATA MINING
A Tutorial-Based Primer
SECOND EDITION
Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A.

AIMS AND SCOPE


This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

PUBLISHED TITLES
ACCELERATING DISCOVERY: MINING UNSTRUCTURED INFORMATION FOR
HYPOTHESIS GENERATION
Scott Spangler
ADVANCES IN MACHINE LEARNING AND DATA MINING FOR ASTRONOMY
Michael J. Way, Jeffrey D. Scargle, Kamal M. Ali, and Ashok N. Srivastava
BIOLOGICAL DATA MINING
Jake Y. Chen and Stefano Lonardi
COMPUTATIONAL BUSINESS ANALYTICS
Subrata Das
COMPUTATIONAL INTELLIGENT DATA ANALYSIS FOR SUSTAINABLE
DEVELOPMENT
Ting Yu, Nitesh V. Chawla, and Simeon Simoff
COMPUTATIONAL METHODS OF FEATURE SELECTION
Huan Liu and Hiroshi Motoda
CONSTRAINED CLUSTERING: ADVANCES IN ALGORITHMS, THEORY,
AND APPLICATIONS
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
CONTRAST DATA MINING: CONCEPTS, ALGORITHMS, AND APPLICATIONS
Guozhu Dong and James Bailey
DATA CLASSIFICATION: ALGORITHMS AND APPLICATIONS
Charu C. Aggarwal
DATA CLUSTERING: ALGORITHMS AND APPLICATIONS
Charu C. Aggarwal and Chandan K. Reddy
DATA CLUSTERING IN C++: AN OBJECT-ORIENTED APPROACH
Guojun Gan
DATA MINING: A TUTORIAL-BASED PRIMER, SECOND EDITION
Richard J. Roiger
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada
DATA MINING WITH R: LEARNING WITH CASE STUDIES, SECOND EDITION
Luís Torgo
EVENT MINING: ALGORITHMS AND APPLICATIONS
Tao Li
FOUNDATIONS OF PREDICTIVE ANALYTICS
James Wu and Stephen Coggeshall
GEOGRAPHIC DATA MINING AND KNOWLEDGE DISCOVERY,
SECOND EDITION
Harvey J. Miller and Jiawei Han
GRAPH-BASED SOCIAL MEDIA ANALYSIS
Ioannis Pitas
HANDBOOK OF EDUCATIONAL DATA MINING
Cristóbal Romero, Sebastian Ventura, Mykola Pechenizkiy, and Ryan S.J.d. Baker
HEALTHCARE DATA ANALYTICS
Chandan K. Reddy and Charu C. Aggarwal
INFORMATION DISCOVERY ON ELECTRONIC HEALTH RECORDS
Vagelis Hristidis
INTELLIGENT TECHNOLOGIES FOR WEB APPLICATIONS
Priti Srinivas Sajja and Rajendra Akerkar
INTRODUCTION TO PRIVACY-PRESERVING DATA PUBLISHING: CONCEPTS
AND TECHNIQUES
Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, and Philip S. Yu
KNOWLEDGE DISCOVERY FOR COUNTERTERRORISM AND
LAW ENFORCEMENT
David Skillicorn
KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
MACHINE LEARNING AND KNOWLEDGE DISCOVERY FOR
ENGINEERING SYSTEMS HEALTH MANAGEMENT
Ashok N. Srivastava and Jiawei Han
MINING SOFTWARE SPECIFICATIONS: METHODOLOGIES AND APPLICATIONS
David Lo, Siau-Cheng Khoo, Jiawei Han, and Chao Liu
MULTIMEDIA DATA MINING: A SYSTEMATIC INTRODUCTION TO
CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang
MUSIC DATA MINING
Tao Li, Mitsunori Ogihara, and George Tzanetakis
NEXT GENERATION OF DATA MINING
Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar
RAPIDMINER: DATA MINING USE CASES AND BUSINESS ANALYTICS
APPLICATIONS
Markus Hofmann and Ralf Klinkenberg
RELATIONAL DATA CLUSTERING: MODELS, ALGORITHMS,
AND APPLICATIONS
Bo Long, Zhongfei Zhang, and Philip S. Yu
SERVICE-ORIENTED DISTRIBUTED KNOWLEDGE DISCOVERY
Domenico Talia and Paolo Trunfio
SPECTRAL FEATURE SELECTION FOR DATA MINING
Zheng Alan Zhao and Huan Liu
STATISTICAL DATA MINING USING SAS APPLICATIONS, SECOND EDITION
George Fernandez
SUPPORT VECTOR MACHINES: OPTIMIZATION BASED THEORY,
ALGORITHMS, AND EXTENSIONS
Naiyang Deng, Yingjie Tian, and Chunhua Zhang
TEMPORAL DATA MINING
Theophano Mitsa
TEXT MINING: CLASSIFICATION, CLUSTERING, AND APPLICATIONS
Ashok N. Srivastava and Mehran Sahami
TEXT MINING AND VISUALIZATION: CASE STUDIES USING OPEN-SOURCE
TOOLS
Markus Hofmann and Andrew Chisholm
THE TOP TEN ALGORITHMS IN DATA MINING
Xindong Wu and Vipin Kumar
UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX
DECOMPOSITIONS
David Skillicorn
DATA MINING
A Tutorial-Based Primer
SECOND EDITION

Richard J. Roiger
This book was previously published by Pearson Education, Inc.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2017 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20161025

International Standard Book Number-13: 978-1-4987-6397-4 (Pack - Book and Ebook)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the valid-
ity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or uti-
lized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopy-
ing, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents

List of Figures, xvii


List of Tables, xxix
Preface, xxxi
Acknowledgments, xxxix
Author, xli

SECTION I Data Mining Fundamentals

CHAPTER 1 ◾ Data Mining: A First View 3


CHAPTER OBJECTIVES 3
1.1 DATA SCIENCE, ANALYTICS, MINING, AND KNOWLEDGE
DISCOVERY IN DATABASES 4
1.1.1 Data Science and Analytics 4
1.1.2 Data Mining 5
1.1.3 Data Science versus Knowledge Discovery in Databases 5
1.2 WHAT CAN COMPUTERS LEARN? 6
1.2.1 Three Concept Views 6
1.2.1.1 The Classical View 6
1.2.1.2 The Probabilistic View 7
1.2.1.3 The Exemplar View 7
1.2.2 Supervised Learning 8
1.2.3 Supervised Learning: A Decision Tree Example 9
1.2.4 Unsupervised Clustering 11
1.3 IS DATA MINING APPROPRIATE FOR MY PROBLEM? 14
1.3.1 Data Mining or Data Query? 14
1.3.2 Data Mining versus Data Query: An Example 15
1.4 DATA MINING OR KNOWLEDGE ENGINEERING? 16

vii
viii ◾ Contents

1.5 A NEAREST NEIGHBOR APPROACH 18


1.6 A PROCESS MODEL FOR DATA MINING 19
1.6.1 Acquiring Data 20
1.6.1.1 The Data Warehouse 20
1.6.1.2 Relational Databases and Flat Files 21
1.6.1.3 Distributed Data Access 21
1.6.2 Data Preprocessing 21
1.6.3 Mining the Data 23
1.6.4 Interpreting the Results 23
1.6.5 Result Application 24
1.7 DATA MINING, BIG DATA, AND CLOUD COMPUTING 24
1.7.1 Hadoop 24
1.7.2 Cloud Computing 24
1.8 DATA MINING ETHICS 25
1.9 INTRINSIC VALUE AND CUSTOMER CHURN 26
1.10 CHAPTER SUMMARY 27
1.11 KEY TERMS 28

CHAPTER 2 ◾ Data Mining: A Closer Look 33


CHAPTER OBJECTIVES 33
2.1 DATA MINING STRATEGIES 34
2.1.1 Classification 34
2.1.2 Estimation 35
2.1.3 Prediction 36
2.1.4 Unsupervised Clustering 39
2.1.5 Market Basket Analysis 40
2.2 SUPERVISED DATA MINING TECHNIQUES 41
2.2.1 The Credit Card Promotion Database 41
2.2.2 Rule-Based Techniques 42
2.2.3 Neural Networks 44
2.2.4 Statistical Regression 46
2.3 ASSOCIATION RULES 47
2.4 CLUSTERING TECHNIQUES 48
2.5 EVALUATING PERFORMANCE 49
2.5.1 Evaluating Supervised Learner Models 50
2.5.2 Two-Class Error Analysis 52
Contents ◾ ix

2.5.3 Evaluating Numeric Output 53


2.5.4 Comparing Models by Measuring Lift 53
2.5.5 Unsupervised Model Evaluation 55
2.6 CHAPTER SUMMARY 56
2.7 KEY TERMS 57

CHAPTER 3 ◾ Basic Data Mining Techniques 63


CHAPTER OBJECTIVES 63
3.1 DECISION TREES 64
3.1.1 An Algorithm for Building Decision Trees 64
3.1.2 Decision Trees for the Credit Card Promotion Database 70
3.1.3 Decision Tree Rules 73
3.1.4 Other Methods for Building Decision Trees 73
3.1.5 General Considerations 74
3.2 A BASIC COVERING RULE ALGORITHM 74
3.3 GENERATING ASSOCIATION RULES 80
3.3.1 Confidence and Support 80
3.3.2 Mining Association Rules: An Example 82
3.3.3 General Considerations 84
3.4 THE K-MEANS ALGORITHM 85
3.4.1 An Example Using K-means 86
3.4.2 General Considerations 89
3.5 GENETIC LEARNING 90
3.5.1 Genetic Algorithms and Supervised Learning 91
3.5.2 General Considerations 95
3.6 CHOOSING A DATA MINING TECHNIQUE 95
3.7 CHAPTER SUMMARY 97
3.8 KEY TERMS 98

SECTION II Tools for Knowledge Discovery

CHAPTER 4 ◾ Weka—An Environment for Knowledge Discovery 105


CHAPTER OBJECTIVES 105
4.1 GETTING STARTED WITH WEKA 106
4.2 BUILDING DECISION TREES 109
4.3 GENERATING PRODUCTION RULES WITH PART 117
x ◾ Contents

4.4 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION 122


4.5 ASSOCIATION RULES 127
4.6 COST/BENEFIT ANALYSIS, (OPTIONAL) 131
4.7 UNSUPERVISED CLUSTERING WITH THE K-MEANS ALGORITHM 137
4.8 CHAPTER SUMMARY 141

CHAPTER 5 ◾ Knowledge Discovery with RapidMiner 145


CHAPTER OBJECTIVES 145
5.1 GETTING STARTED WITH RAPIDMINER 146
5.1.1 Installing RapidMiner 146
5.1.2 Navigating the Interface 146
5.1.3 A First Process Model 149
5.1.4 A Decision Tree for the Credit Card Promotion Database 156
5.1.5 Breakpoints 158
5.2 BUILDING DECISION TREES 159
5.2.1 Scenario 1: Using a Training and Test Set 160
5.2.2 Scenario 2: Adding a Subprocess 165
5.2.3 Scenario 3: Creating, Saving, and Applying the Final Model 167
5.2.3.1 Saving a Model to an Output File 167
5.2.3.2 Reading and Applying a Model 168
5.2.4 Scenario 4: Using Cross-Validation 168
5.3 GENERATING RULES 173
5.3.1 Scenario 1: Tree to Rules 173
5.3.2 Scenario 2: Rule Induction 176
5.3.3 Scenario 3: Subgroup Discovery 178
5.4 ASSOCIATION RULE LEARNING 181
5.4.1 Association Rules for the Credit Card Promotion Database 182
5.4.2 The Market Basket Analysis Template 183
5.5 UNSUPERVISED CLUSTERING WITH K-MEANS 187
5.6 ATTRIBUTE SELECTION AND NEAREST NEIGHBOR CLASSIFICATION 191
5.7 CHAPTER SUMMARY 194

CHAPTER 6 ◾ The Knowledge Discovery Process 199


CHAPTER OBJECTIVES 199
6.1 A PROCESS MODEL FOR KNOWLEDGE DISCOVERY 199
6.2 GOAL IDENTIFICATION 201
Contents ◾ xi

6.3 CREATING A TARGET DATA SET 202


6.4 DATA PREPROCESSING 203
6.4.1 Noisy Data 203
6.4.1.1 Locating Duplicate Records 204
6.4.1.2 Locating Incorrect Attribute Values 204
6.4.1.3 Data Smoothing 204
6.4.1.4 Detecting Outliers 205
6.4.2 Missing Data 207
6.5 DATA TRANSFORMATION 208
6.5.1 Data Normalization 208
6.5.2 Data Type Conversion 209
6.5.3 Attribute and Instance Selection 209
6.5.3.1 Wrapper and Filtering Techniques 210
6.5.3.2 More Attribute Selection Techniques 211
6.5.3.3 Genetic Learning for Attribute Selection 211
6.5.3.4 Creating Attributes 212
6.5.3.5 Instance Selection 213
6.6 DATA MINING 214
6.7 INTERPRETATION AND EVALUATION 214
6.8 TAKING ACTION 215
6.9 THE CRISP-DM PROCESS MODEL 215
6.10 CHAPTER SUMMARY 216
6.11 KEY TERMS 216

CHAPTER 7 ◾ Formal Evaluation Techniques 221


CHAPTER OBJECTIVES 221
7.1 WHAT SHOULD BE EVALUATED? 222
7.2 TOOLS FOR EVALUATION 223
7.2.1 Single-Valued Summary Statistics 224
7.2.2 The Normal Distribution 225
7.2.3 Normal Distributions and Sample Means 226
7.2.4 A Classical Model for Hypothesis Testing 228
7.3 COMPUTING TEST SET CONFIDENCE INTERVALS 230
7.4 COMPARING SUPERVISED LEARNER MODELS 232
7.4.1 Comparing the Performance of Two Models 233
7.4.2 Comparing the Performance of Two or More Models 234
xii ◾ Contents

7.5 UNSUPERVISED EVALUATION TECHNIQUES 235


7.5.1 Unsupervised Clustering for Supervised Evaluation 235
7.5.2 Supervised Evaluation for Unsupervised Clustering 235
7.5.3 Additional Methods for Evaluating an Unsupervised Clustering 236
7.6 EVALUATING SUPERVISED MODELS WITH NUMERIC OUTPUT 236
7.7 COMPARING MODELS WITH RAPIDMINER 238
7.8 ATTRIBUTE EVALUATION FOR MIXED DATA TYPES 241
7.9 PARETO LIFT CHARTS 244
7.10 CHAPTER SUMMARY 247
7.11 KEY TERMS 248

SECTION III Building Neural Networks

CHAPTER 8 ◾ Neural Networks 253


CHAPTER OBJECTIVES 253
8.1 FEED-FORWARD NEURAL NETWORKS 254
8.1.1 Neural Network Input Format 254
8.1.2 Neural Network Output Format 255
8.1.3 The Sigmoid Evaluation Function 256
8.2 NEURAL NETWORK TRAINING: A CONCEPTUAL VIEW 258
8.2.1 Supervised Learning with Feed-Forward Networks 258
8.2.1.1 Training a Neural Network: Backpropagation Learning 258
8.2.1.2 Training a Neural Network: Genetic Learning 259
8.2.2 Unsupervised Clustering with Self-Organizing Maps 259
8.3 NEURAL NETWORK EXPLANATION 260
8.4 GENERAL CONSIDERATIONS 262
8.5 NEURAL NETWORK TRAINING: A DETAILED VIEW 263
8.5.1 The Backpropagation Algorithm: An Example 263
8.5.2 Kohonen Self-Organizing Maps: An Example 266
8.6 CHAPTER SUMMARY 268
8.7 KEY TERMS 269

CHAPTER 9 ◾ Building Neural Networks with Weka 271


CHAPTER OBJECTIVES 271
9.1 DATA SETS FOR BACKPROPAGATION LEARNING 272
9.1.1 The Exclusive-OR Function 272
9.1.2 The Satellite Image Data Set 273
Contents ◾ xiii

9.2 MODELING THE EXCLUSIVE-OR FUNCTION: NUMERIC OUTPUT 274


9.3 MODELING THE EXCLUSIVE-OR FUNCTION: CATEGORICAL OUTPUT 280
9.4 MINING SATELLITE IMAGE DATA 282
9.5 UNSUPERVISED NEURAL NET CLUSTERING 287
9.6 CHAPTER SUMMARY 288
9.7 KEY TERMS 289

CHAPTER 10 ◾ Building Neural Networks with RapidMiner 293


CHAPTER OBJECTIVES 293
10.1 MODELING THE EXCLUSIVE-OR FUNCTION 294
10.2 MINING SATELLITE IMAGE DATA 301
10.3 PREDICTING CUSTOMER CHURN 306
10.4 RAPIDMINER’S SELF-ORGANIZING MAP OPERATOR 311
10.5 CHAPTER SUMMARY 313

SECTION IV Advanced Data Mining Techniques

CHAPTER 11 ◾ Supervised Statistical Techniques 317


CHAPTER OBJECTIVES 317
11.1 NAÏVE BAYES CLASSIFIER 317
11.1.1 Naïve Bayes Classifier: An Example 318
11.1.2 Zero-Valued Attribute Counts 321
11.1.3 Missing Data 321
11.1.4 Numeric Data 322
11.1.5 Implementations of the Naïve Bayes Classifier 324
11.1.6 General Considerations 324
11.2 SUPPORT VECTOR MACHINES 324
11.2.1 Linearly Separable Classes 332
11.2.2 The Nonlinear Case 336
11.2.3 General Considerations 337
11.2.4 Implementations of Support Vector Machines 340
11.3 LINEAR REGRESSION ANALYSIS 340
11.3.1 Simple Linear Regression 344
11.3.2 Multiple Linear Regression 344
11.3.2.1 Linear Regression—Weka 344
11.3.2.2 Linear Regression—RapidMiner 345
xiv ◾ Contents

11.4 REGRESSION TREES 349


11.5 LOGISTIC REGRESSION 350
11.5.1 Transforming the Linear Regression Model 350
11.5.2 The Logistic Regression Model 351
11.6 CHAPTER SUMMARY 352
11.7 KEY TERMS 352

CHAPTER 12 ◾ Unsupervised Clustering Techniques 357


CHAPTER OBJECTIVES 357
12.1 AGGLOMERATIVE CLUSTERING 358
12.1.1 Agglomerative Clustering: An Example 358
12.1.2 General Considerations 360
12.2 CONCEPTUAL CLUSTERING 360
12.2.1 Measuring Category Utility 361
12.2.2 Conceptual Clustering: An Example 362
12.2.3 General Considerations 364
12.3 EXPECTATION MAXIMIZATION 364
12.3.1 Implementations of the EM Algorithm 365
12.3.2 General Considerations 365
12.4 GENETIC ALGORITHMS AND UNSUPERVISED CLUSTERING 371
12.5 CHAPTER SUMMARY 374
12.6 KEY TERMS 374

CHAPTER 13 ◾ Specialized Techniques 377


CHAPTER OBJECTIVES 377
13.1 TIME-SERIES ANALYSIS 377
13.1.1 Stock Market Analytics 378
13.1.2 Time-Series Analysis—An Example 379
13.1.2.1 Creating the Target Data Set—Numeric Output 380
13.1.2.2 Data Preprocessing and Transformation 380
13.1.2.3 Creating the Target Data Set—Categorical Output 382
13.1.2.4 Mining the Data—RapidMiner 382
13.1.2.5 Mining the Data—Weka 387
13.1.2.6 Interpretation, Evaluation, and Action 390
13.1.3 General Considerations 390
Contents ◾ xv

13.2 MINING THE WEB 391


13.2.1 Web-Based Mining: General Issues 391
13.2.1.1 Identifying the Goal 391
13.2.2 Preparing the Data 392
13.2.2.1 Mining the Data 393
13.2.2.2 Interpreting and Evaluating Results 393
13.2.2.3 Taking Action 394
13.2.3 Data Mining for Website Evaluation 395
13.2.4 Data Mining for Personalization 395
13.2.5 Data Mining for Website Adaptation 396
13.2.6 PageRank and Link Analysis 396
13.2.7 Operators for Web-Based Mining 398
13.3 MINING TEXTUAL DATA 398
13.3.1 Analyzing Customer Reviews 399
13.4 TECHNIQUES FOR LARGE-SIZED, IMBALANCED,
AND STREAMING DATA 404
13.4.1 Large-Sized Data 404
13.4.2 Dealing with Imbalanced Data 405
13.4.2.1 Methods for Addressing Rarity 406
13.4.2.2 Receiver Operating Characteristics Curves 406
13.4.3 Methods for Streaming Data 412
13.5 ENSEMBLE TECHNIQUES FOR IMPROVING PERFORMANCE 413
13.5.1 Bagging 413
13.5.2 Boosting 414
13.5.3 AdaBoost—An Example 414
13.6 CHAPTER SUMMARY 417
13.7 KEY TERMS 418

CHAPTER 14 ◾ The Data Warehouse 423


CHAPTER OBJECTIVES 423
14.1 OPERATIONAL DATABASES 424
14.1.1 Data Modeling and Normalization 424
14.1.2 The Relational Model 425
14.2 DATA WAREHOUSE DESIGN 426
14.2.1 Entering Data into the Warehouse 427
xvi ◾ Contents

14.2.2 Structuring the Data Warehouse: The Star Schema 429


14.2.2.1 The Multidimensionality of the Star Schema 430
14.2.2.2 Additional Relational Schemas 431
14.2.3 Decision Support: Analyzing the Warehouse Data 432
14.3 ONLINE ANALYTICAL PROCESSING 434
14.3.1 OLAP: An Example 435
14.3.2 General Considerations 438
14.4 EXCEL PIVOT TABLES FOR DATA ANALYTICS 438
14.5 CHAPTER SUMMARY 445
14.6 KEY TERMS 446

APPENDIX A—SOFTWARE AND DATA SETS FOR DATA MINING, 451

APPENDIX B—STATISTICS FOR PERFORMANCE EVALUATION, 455

BIBLIOGRAPHY, 461

INDEX, 465
List of Figures

Figure 1.1 A decision tree for the data in Table 1.1. 10


Figure 1.2 Data mining versus expert systems. 17
Figure 1.3 A process model for data mining. 20
Figure 1.4 A perfect positive correlation (r = 1). 22
Figure 1.5 A perfect negative correlation (r = −1). 23
Figure 1.6 Intrinsic versus actual customer value. 26
Figure 2.1 A hierarchy of data mining strategies. 34
Figure 2.2 A fully connected multilayer neural network. 45
Figure 2.3 An unsupervised clustering of the credit card database. 49
Figure 2.4 Targeted versus mass mailing. 54
Figure 3.1 A partial decision tree with root node = income range. 68
Figure 3.2 A partial decision tree with root node = credit card insurance. 69
Figure 3.3 A partial decision tree with root node = age. 70
Figure 3.4 A three-node decision tree for the credit card database. 71
Figure 3.5 A two-node decision tree for the credit card database. 71
Figure 3.6 Domain statistics for the credit card promotion database. 76
Figure 3.7 Class statistics for life insurance promotion = yes. 77
Figure 3.8 Class statistics for life insurance promotion = no. 78
Figure 3.9 Statistics for life insurance promotion = yes after removing five instances. 79
Figure 3.10 A coordinate mapping of the data in Table 3.6. 86
Figure 3.11 A K-means clustering of the data in Table 3.6 (K = 2). 89
Figure 3.12 Supervised genetic learning. 91

xvii
xviii ◾ List of Figures

Figure 3.13 A crossover operation. 94


Figure 4.1 Weka GUI Chooser. 107
Figure 4.2 Explorer four graphical user interfaces (GUI’s). 107
Figure 4.3 Weka install folder. 108
Figure 4.4 Sample data sets. 108
Figure 4.5 Instances of the contact-lenses file. 109
Figure 4.6 Loading the contact-lenses data set. 110
Figure 4.7 Navigating the Explorer interface. 111
Figure 4.8 A partial list of attribute filters. 112
Figure 4.9 Command line call for J48. 112
Figure 4.10 Parameter setting options for J48. 113
Figure 4.11 Decision tree for the contact-lenses data set. 113
Figure 4.12 Weka’s tree visualizer. 114
Figure 4.13 Decision tree output for the contact-lenses data set. 115
Figure 4.14 Classifier output options. 116
Figure 4.15 Actual and predicted output. 116
Figure 4.16 Customer churn data. 118
Figure 4.17 A decision list for customer churn data. 118
Figure 4.18 Customer churn output generated by PART. 119
Figure 4.19 Loading the customer churn instances of unknown outcome. 120
Figure 4.20 Predicting customers likely to churn. 121
Figure 4.21 Nearest neighbor output for the spam data set. 123
Figure 4.22 Weka’s attribute selection filter. 124
Figure 4.23 Options for the attribute selection filter. 124
Figure 4.24 Parameter settings for ranker. 125
Figure 4.25 Most predictive attributes for the spam data set. 126
Figure 4.26 IBk output after removing the 10 least predictive attributes. 126
Figure 4.27 Association rules for the contact-lenses data set. 128
Figure 4.28 Parameters for the Apriori algorithm. 129
List of Figures ◾ xix

Figure 4.29 The supermarket data set. 130


Figure 4.30 Instances of the supermarket data set. 130
Figure 4.31 Ten association rules for the supermarket data set. 131
Figure 4.32 A J48 classification of the credit card screening data set. 132
Figure 4.33 Invoking a cost/benefit analysis. 133
Figure 4.34 Cost/benefit output for the credit card screening data set. 133
Figure 4.35 Cost/benefit analysis set to match J48 classifier output. 134
Figure 4.36 Invoking a cost/benefit analysis. 135
Figure 4.37 Minimizing total cost. 135
Figure 4.38 Cutoff scores for credit card application acceptance. 136
Figure 4.39 Classes to clusters evaluation for simpleKmeans. 137
Figure 4.40 Include standard deviation values for simpleKmeans. 138
Figure 4.41 Classes to clusters output. 139
Figure 4.42 Partial list of attribute values for the K-means clustering in Figure 4.41. 140
Figure 4.43 Additional attribute values for the SimpleKMeans clustering in Figure 4.41. 140
Figure 5.1 An introduction to RapidMiner. 147
Figure 5.2 Creating a new blank process. 147
Figure 5.3 A new blank process with helpful pointers. 148
Figure 5.4 Creating and saving a process. 150
Figure 5.5 Importing the credit card promotion database. 150
Figure 5.6 Selecting the cells to import. 151
Figure 5.7 A list of allowable data types. 152
Figure 5.8 Changing the role of Life Ins Promo. 152
Figure 5.9 Storing a file in the data folder. 153
Figure 5.10 The credit card promotion database. 153
Figure 5.11 A successful file import. 154
Figure 5.12 Connecting the credit card promotion database to an output port. 154
Figure 5.13 Summary statistics for the credit card promotion database. 155
Figure 5.14 A bar graph for income range. 155
xx ◾ List of Figures

Figure 5.15 A scatterplot comparing age and life insurance promotion. 156
Figure 5.16 A decision tree process model. 157
Figure 5.17 A decision tree for the credit card promotion database. 158
Figure 5.18 A decision tree in descriptive form. 158
Figure 5.19 A list of operator options. 159
Figure 5.20 Customer churn—A training and test set scenario. 160
Figure 5.21 Removing instances of unknown outcome from the churn data set. 161
Figure 5.22 Partitioning the customer churn data. 162
Figure 5.23 The customer churn data set. 163
Figure 5.24 Filter Examples has removed all instances of unknown outcome. 163
Figure 5.25 A decision tree for the customer churn data set. 164
Figure 5.26 Output of the Apply Model operator. 164
Figure 5.27 A performance vector for the customer churn data set. 165
Figure 5.28 Adding a subprocess to the main process window. 166
Figure 5.29 A subprocess for data preprocessing. 167
Figure 5.30 Creating and saving a decision tree model. 168
Figure 5.31 Reading and applying a saved model. 169
Figure 5.32 An Excel file stores model predictions. 169
Figure 5.33 Testing a model using cross-validation. 170
Figure 5.34 A subprocess to read and filter customer churn data. 171
Figure 5.35 Nested subprocesses for cross-validation. 171
Figure 5.36 Performance vector for a decision tree tested using cross-validation. 172
Figure 5.37 Subprocess for the Tree to Rules operator. 174
Figure 5.38 Building a model with the Tree to Rules operator. 174
Figure 5.39 Rules generated by the Tree to Rules operator. 175
Figure 5.40 Performance vector for the customer churn data set. 175
Figure 5.41 A process design for rule induction. 176
Figure 5.42 Adding the Discretize by Binning operator. 177
Figure 5.43 Covering rules for customer churn data. 177
List of Figures ◾ xxi

Figure 5.44 Performance vector for the covering rules of Figure 5.43. 178
Figure 5.45 Process design for subgroup discovery. 179
Figure 5.46 Subprocess design for subgroup discovery. 179
Figure 5.47 Rules generated by the Subgroup Discovery operator. 180
Figure 5.48 Ten rules identifying likely churn candidates. 181
Figure 5.49 Generating association rules for the credit card promotion database. 182
Figure 5.50 Preparing data for association rule generation. 183
Figure 5.51 Interface for listing association rules. 184
Figure 5.52 Association rules for the credit card promotion database. 184
Figure 5.53 Market basket analysis template. 185
Figure 5.54 The pivot operator rotates the example set. 186
Figure 5.55 Association rules for the Market Basket Analysis template. 186
Figure 5.56 Process design for clustering gamma-ray burst data. 188
Figure 5.57 A partial clustering of gamma-ray burst data. 189
Figure 5.58 Three clusters of gamma-ray burst data. 189
Figure 5.59 Decision tree illustrating a gamma-ray burst clustering. 190
Figure 5.60 A descriptive form of a decision tree showing a clustering
of gamma-ray burst data. 190
Figure 5.61 Benchmark performance for nearest neighbor classification. 192
Figure 5.62 Main process design for nearest neighbor classification. 192
Figure 5.63 Subprocess for nearest neighbor classification. 193
Figure 5.64 Forward selection subprocess for nearest neighbor classification. 193
Figure 5.65 Performance vector when forward selection is used for choosing
attributes. 194
Figure 5.66 Unsupervised clustering for attribute evaluation. 197
Figure 6.1 A seven-step KDD process model. 200
Figure 6.2 The Acme credit card database. 203
Figure 6.3 A process model for detecting outliers. 205
Figure 6.4 Two outlier instances from the diabetes patient data set. 206
xxii ◾ List of Figures

Figure 6.5 Ten outlier instances from the diabetes patient data set. 207
Figure 7.1 Components for supervised learning. 222
Figure 7.2 A normal distribution. 225
Figure 7.3 Random samples from a population of 10 elements. 226
Figure 7.4 A process model for comparing three competing models. 239
Figure 7.5 Subprocess for comparing three competing models. 240
Figure 7.6 Cross-validation test for a decision tree with maximum depth = 5. 240
Figure 7.7 A matrix of t-test scores. 241
Figure 7.8 ANOVA comparing three competing models. 241
Figure 7.9 ANOVA operators for comparing nominal and numeric attributes. 242
Figure 7.10 The grouped ANOVA operator comparing class and maximum heart
rate. 243
Figure 7.11 The ANOVA matrix operator for the cardiology patient data set. 243
Figure 7.12 A process model for creating a lift chart. 244
Figure 7.13 Preprocessing the customer churn data set. 245
Figure 7.14 Output of the Apply Model operator for the customer churn data set. 245
Figure 7.15 Performance vector for customer churn. 246
Figure 7.16 A Pareto lift chart for customer churn. 247
Figure 8.1 A fully connected feed-forward neural network. 254
Figure 8.2 The sigmoid evaluation function. 257
Figure 8.3 A 3 × 3 Kohonen network with two input-layer nodes. 260
Figure 8.4 Connections for two output-layer nodes. 266
Figure 9.1 Graph of the XOR function. 272
Figure 9.2 XOR training data. 273
Figure 9.3 Satellite image data. 274
Figure 9.4 Weka four graphical user interfaces (GUIs) for XOR training. 275
Figure 9.5 Backpropagation learning parameters. 276
Figure 9.6 Architecture for the XOR function. 278
Figure 9.7 XOR training output. 278
List of Figures ◾ xxiii

Figure 9.8 Network architecture with associated connection weights. 279


Figure 9.9 XOR network architecture without a hidden layer. 280
Figure 9.10 Confusion matrix for XOR without a hidden layer. 281
Figure 9.11 XOR with hidden layer and categorical output. 281
Figure 9.12 XOR confusion matrix and categorical output. 282
Figure 9.13 Satellite image data network architecture. 284
Figure 9.14 Confusion matrix for satellite image data. 284
Figure 9.15 Updated class assignment for instances 78 through 94 of the satellite
image data set. 286
Figure 9.16 Initial classification for pixel instances 78 through 94 of the satellite
image data set. 286
Figure 9.17 Parameter settings for Weka’s SelfOrganizingMap. 287
Figure 9.18 Applying Weka’s SelfOrganizingMap to the diabetes data set. 288
Figure 10.1 Statistics for the XOR function. 295
Figure 10.2 Main process for learning the XOR function. 295
Figure 10.3 Default settings for the hidden layers parameter. 296
Figure 10.4 A single hidden layer of five nodes. 297
Figure 10.5 Performance parameters for neural network learning. 298
Figure 10.6 Neural network architecture for the XOR function. 298
Figure 10.7 Hidden-to-output layer connection weights. 299
Figure 10.8 Prediction confidence values for the XOR function. 299
Figure 10.9 Performance vector for the XOR function. 300
Figure 10.10 Absolute error value for the XOR function. 300
Figure 10.11 Attribute declarations for the satellite image data set. 302
Figure 10.12 Main process for mining the satellite image data set. 302
Figure 10.13 Subprocesses for satellite image data set. 303
Figure 10.14 Network architecture for the satellite image data set. 304
Figure 10.15 Performance vector for the satellite image data set. 304
Figure 10.16 Removing correlated attributes from the satellite image data set. 305
xxiv ◾ List of Figures

Figure 10.17 Green and red have been removed from the satellite image data set. 305
Figure 10.18 Correlation matrix for the satellite image data set. 306
Figure 10.19 Neural network model for predicting customer churn. 307
Figure 10.20 Preprocessing the customer churn data. 308
Figure 10.21 Cross-validation subprocess for customer churn. 308
Figure 10.22 Performance vector for customer churn. 309
Figure 10.23 Process for creating and saving a neural network model. 309
Figure 10.24 Process model for reading and applying a neural network model. 310
Figure 10.25 Neural network output for predicting customer churn. 310
Figure 10.26 SOM process model for the cardiology patient data set. 312
Figure 10.27 Clustered instances of the cardiology patient data set. 312
Figure 11.1 RapidMiner’s naïve Bayes operator. 325
Figure 11.2 Subprocess for applying naïve Bayes to customer churn data. 326
Figure 11.3 Naïve Bayes Distribution Table for customer churn data. 326
Figure 11.4 Naïve Bayes performance vector for customer churn data. 327
Figure 11.5 Life insurance promotion by gender. 328
Figure 11.6 Naïve Bayes model with output attribute = LifeInsPromo. 329
Figure 11.7 Predictions for the life insurance promotion. 329
Figure 11.8 Hyperplanes separating the circle and star classes. 330
Figure 11.9 Hyperplanes passing through their respective support vectors. 331
Figure 11.10 Maximal margin hyperplane separating the star and circle classes. 335
Figure 11.11 Loading the nine instances of Figure 11.8 into the Explorer. 338
Figure 11.12 Invoking SMO model. 339
Figure 11.13 Disabling data normalization/standardization. 339
Figure 11.14 The SMO-created MMH for the data shown in Figure 11.8. 340
Figure 11.15 Applying mySVM to the cardiology patient data set. 341
Figure 11.16 Normalized cardiology patient data. 342
Figure 11.17 Equation of the MMH for the cardiology patient data set. 342
Figure 11.18 Actual and predicted output for the cardiology patient data. 343
List of Figures ◾ xxv

Figure 11.19 Performance vector for the cardiology patient data. 343
Figure 11.20 A linear regression model for the instances of Figure 11.8. 345
Figure 11.21 Main process window for applying RapidMiner’s linear regression
operator to the gamma-ray burst data set. 346
Figure 11.22 Subprocess windows for the Gamma Ray burst experiment. 346
Figure 11.23 Linear regression—actual and predicted output for the gamma-ray
burst data set. 347
Figure 11.24 Summary statistics and the linear regression equation for the
gamma-ray burst data set. 347
Figure 11.25 Scatterplot diagram showing the relationship between t90 and t50. 348
Figure 11.26 Performance vector resulting from the application of linear
regression to the gamma-ray burst data set. 348
Figure 11.27 A generic model tree. 349
Figure 11.28 The logistic regression equation. 351
Figure 12.1 A Cobweb-created hierarchy. 363
Figure 12.2 Applying EM to the gamma-ray burst data set. 366
Figure 12.3 Removing correlated attributes from the gamma-ray burst data set. 367
Figure 12.4 An EM clustering of the gamma-ray burst data set. 367
Figure 12.5 Summary statistics for an EM clustering of the gamma-ray burst data set. 368
Figure 12.6 Decision tree representing a clustering of the gamma-ray burst data set. 368
Figure 12.7 The decision tree of Figure 12.6 in descriptive form. 369
Figure 12.8 Classes of the sensor data set. 370
Figure 12.9 Generic object editor allows us to specify the number of clusters. 370
Figure 12.10 Classes to clusters summary statistics. 371
Figure 12.11 Unsupervised genetic clustering. 372
Figure 13.1 A process model for extracting historical market data. 380
Figure 13.2 Historical data for XIV. 381
Figure 13.3 Time-series data with numeric output. 382
Figure 13.4 Time-series data with categorical output. 383
Figure 13.5 Time-series data for processing with RapidMiner. 383
xxvi ◾ List of Figures

Figure 13.6 A 3-month price chart for XIV. 384


Figure 13.7 A process model for time-series analysis with categorical output. 385
Figure 13.8 Predictions and confidence scores for time-series analysis. 385
Figure 13.9 Performance vector—time-series analysis for XIV. 386
Figure 13.10 Predicting the next-day closing price of XIV. 386
Figure 13.11 Time-series data formatted for Weka—categorical output. 387
Figure 13.12 Time-series data formatted for Weka—numeric output. 388
Figure 13.13 Time-series analysis with categorical output. 388
Figure 13.14 Time-series analysis with numeric output. 389
Figure 13.15 Cluster analysis using time-series data. 389
Figure 13.16 A generic Web usage data mining model. 392
Figure 13.17 Creating usage profiles from session data. 395
Figure 13.18 Hypertext link recommendations from usage profiles. 396
Figure 13.19 A page link structure. 397
Figure 13.20 A main process model for mining textual data. 401
Figure 13.21 A template to enter folder names used for textual data. 401
Figure 13.22 Class and folder names containing textual data. 402
Figure 13.23 Subprocess for tokenizing and stemming textual data. 402
Figure 13.24 A tokenized positive evaluation. 403
Figure 13.25 A tokenized and stemmed positive evaluation. 403
Figure 13.26 Textual data reduced to two dimensions. 403
Figure 13.27 Rules defining the three product evaluation classes. 404
Figure 13.28 An ROC graph for four competing models. 407
Figure 13.29 The PART algorithm applied to the spam data set. 408
Figure 13.30 An ROC curve created by applying PART to the spam data set. 409
Figure 13.31 Locating the true- and false-positive rate position in the ROC curve. 410
Figure 13.32 Confidence scores for predicted values with the spam data set. 410
Figure 13.33 Sorted confidence scores for the spam data set. 411
Figure 13.34 A main process for testing the AdaBoost operator. 415
List of Figures ◾ xxvii

Figure 13.35 Subprocess using a decision tree without AdaBoost. 415


Figure 13.36 Subprocess using AdaBoost, which builds several decision trees. 416
Figure 13.37 T-test results for testing AdaBoost. 416
Figure 13.38 Results of the ANOVA with the AdaBoost operator. 417
Figure 14.1 A simple entity-relationship diagram. 424
Figure 14.2 A data warehouse process model. 428
Figure 14.3 A star schema for credit card purchases. 429
Figure 14.4 Dimensions of the fact table shown in Figure 14.3. 431
Figure 14.5 A constellation schema for credit card purchases and promotions. 433
Figure 14.6 A multidimensional cube for credit card purchases. 435
Figure 14.7 A concept hierarchy for location. 436
Figure 14.8 Rolling up from months to quarters. 437
Figure 14.9 Creating a pivot table. 439
Figure 14.10 A blank pivot table. 440
Figure 14.11 A comparison of credit card insurance and income range. 440
Figure 14.12 A chart comparing credit card insurance and income range. 441
Figure 14.13 A credit card promotion cube. 442
Figure 14.14 A pivot table for the cube shown in Figure 14.13. 442
Figure 14.15 Pivot table position corresponding to the highlighted cell
in Figure 14.13. 443
Figure 14.16 Drilling down into the cell highlighted in Figure 14.15. 444
Figure 14.17 Highlighting female customers with a slice operation. 444
Figure 14.18 A second approach for highlighting female customers. 445
Figure A.1 A successful installation. 452
Figure A.2 Locating and installing a package. 452
Figure A.3 List of installed packages. 452
List of Tables

Table 1.1 Hypothetical training data for disease diagnosis 9


Table 1.2 Data instances with an unknown classification 10
Table 1.3 Acme Investors Incorporated 12
Table 2.1 Cardiology patient data 37
Table 2.2 Typical and atypical instances from the cardiology domain 38
Table 2.3 Credit card promotion database 41
Table 2.4 Neural network training: actual and computed output 46
Table 2.5 A three-class confusion matrix 51
Table 2.6 A simple confusion matrix 52
Table 2.7 Two confusion matrices each showing a 10% error rate 52
Table 2.8 Two confusion matrices: no model and an ideal model 54
Table 2.9 Two confusion matrices for alternative models with lift equal to 2.25 55
Table 3.1 The credit card promotion database 67
Table 3.2 Training data instances following the path in Figure 3.4 to credit card
insurance = no 72
Table 3.3 A subset of the credit card promotion database 82
Table 3.4 Single-item sets 83
Table 3.5 Two-item sets 83
Table 3.6 K-means input values 86
Table 3.7 Several applications of the K-means algorithm (K = 2) 89
Table 3.8 An initial population for supervised genetic learning 92
Table 3.9 Training data for genetic learning 93

xxix
xxx ◾ List of Tables

Table 3.10 A second-generation population 94


Table 6.1 Initial population for genetic attribute selection 212
Table 7.1 A confusion matrix for the null hypothesis 229
Table 7.2 Absolute and squared error (output attribute = life insurance promotion) 237
Table 8.1 Initial weight values for the neural network shown in Figure 8.1 257
Table 8.2 A population of weight elements for the network in Figure 8.1 259
Table 9.1 Exclusive-OR function 272
Table 11.1 Data for Bayes classifier 318
Table 11.2 Counts and probabilities for attribute gender 318
Table 11.3 Addition of attribute age to the Bayes classifier data set 323
Table 12.1 Five instances from the credit card promotion database 358
Table 12.2 Agglomerative clustering: first iteration 358
Table 12.3 Agglomerative clustering: second iteration 359
Table 12.4 Data for conceptual clustering 364
Table 12.5 Instances for unsupervised genetic learning 373
Table 12.6 A first-generation population for unsupervised clustering 373
Table 13.1 Daily closing price for XIV, January 4, 2016 to January 8, 2016 381
Table 13.2 Daily closing price for XIV January 5, 2016 to January 11, 2016 381
Table 14.1 Relational table for vehicle-type 425
Table 14.2 Relational table for customer 426
Table 14.3 Join of Tables 14.1 and 14.2 426
Preface

Data mining is the process of finding interesting patterns in data. The objective of data
mining is to use discovered patterns to help explain current behavior or to predict future
outcomes. Several aspects of the data mining process can be studied. These include

• Data gathering and storage


• Data selection and preparation
• Model building and testing
• Interpreting and validating results
• Model application

A single book cannot concentrate on all areas of the data mining process. Although
we furnish some detail about all aspects of data mining and knowledge discovery, our
primary focus is centered on model building and testing, as well as on interpreting and
validating results.

OBJECTIVES
I wrote the text to facilitate the following student learning goals:

• Understand what data mining is and how data mining can be employed to solve real
problems
• Recognize whether a data mining solution is a feasible alternative for a specific
problem
• Step through the knowledge discovery process and write a report about the results of
a data mining session
• Know how to apply data mining software tools to solve real problems
• Apply basic statistical and nonstatistical techniques to evaluate the results of a data
mining session

xxxi
xxxii ◾ Preface

• Recognize several data mining strategies and know when each strategy is appropriate
• Develop a comprehensive understanding of how several data mining techniques
build models to solve problems
• Develop a general awareness about the structure of a data warehouse and how a data
warehouse can be used
• Understand what online analytical processing (OLAP) is and how it can be applied
to analyze data

UPDATED CONTENT AND SOFTWARE CHANGES


The most obvious difference between the first and second editions of the text is the change
in data mining software. Here is a short list of the major changes seen with the second
edition:

• In Chapter 4, I introduce the Waikato Environment for Knowledge Analysis (Weka),


an easy-to-use, publicly available tool for data mining. Weka contains a wealth of
preprocessing and data mining techniques, graphical features, and visualization
capabilities.

• Chapter 5 is all about data mining using RapidMiner Studio, a powerful open-source
and code-free version of RapidMiner’s commercial product. RapidMiner uses a
drag and drop workflow paradigm for building models to solve complex problems.
RapidMiner’s intuitive user interface, visualization capabilities, and assortment of
operators for preprocessing and mining data are second to none.

• This edition covers what are considered to be the top 10 data mining algorithms
(Wu and Kumar, 2009). Nine of the algorithms are used in one or more tutorials.

• Tutorials have been added for attribute selection, dealing with imbalanced data, out-
lier analysis, time-series analysis, and mining textual data.

• Over 90% of the tutorials are presented using both Weka and RapidMiner. This
allows readers maximum flexibility for their hands-on data mining experience.

• Selected new topics include

• A brief introduction to big data and data analytics

• Receiver operating characteristic (ROC) curves

• Methods for handling large-sized, streaming, and imbalanced data

• Extended coverage of textual data mining

• Added techniques for attribute and outlier analysis


Preface ◾ xxxiii

DATA SETS FOR DATA MINING


All data sets used for tutorials, illustrations, and end-of-chapter exercises are described in
the text. The data sets come from several areas including business, health and medicine, and
science. The data sets together with screenshots in PowerPoint and PDF format showing
what you will see as you work through the tutorials can be downloaded from two locations:

• The CRC website: https://www.crcpress.com/Data-Mining-A-Tutorial-Based-Primer


-Second-Edition/Roiger/p/book/9781498763974, under the Downloads tab
• https://krypton.mnsu.edu/~sa7379bt

INTENDED AUDIENCE
I developed most of the material for this book while teaching a one-semester data mining
course open to students majoring or minoring in business or computer science. In writing
this text, I directed my attention toward four groups of individuals:

• Educators in the areas of decision science, computer science, information systems,


and information technology who wish to teach a unit, workshop, or entire course on
data mining and knowledge discovery
• Students who want to learn about data mining and desire hands-on experience with
a data mining tool
• Business professionals who need to understand how data mining can be applied to
help solve their business problems
• Applied researchers who wish to add data mining methods to their problem-solving
and analytics tool kit

CHAPTER FEATURES
I take the approach that model building is both an art and a science best understood from
the perspective of learning by doing. My view is supported by several features found within
the pages of the text. The following is a partial list of these features.

• Simple, detailed examples. I remove much of the mystery surrounding data mining
by presenting simple, detailed examples of how the various data mining techniques
build their models. Because of its tutorial nature, the text is appropriate as a self-study
guide as well as a college-level textbook for a course about data mining and knowl-
edge discovery.
• Overall tutorial style. All examples in Chapters 4, 5, 9, and 10 are tutorials. Selected
sections in Chapters 6, 7, 11, 12, 13, and 14 offer easy-to-follow, step-by-step tutorials
xxxiv ◾ Preface

for performing data analytics. All selected section tutorials are highlighted for easy
differentiation from regular text.

• Data sets for data mining. A variety of data sets from business, medicine, and science
are ready for data mining.

• Key term definitions. Each chapter introduces several key terms. A list of definitions
for these terms is provided at the end of each chapter.

• End-of-chapter exercises. The end-of-chapter exercises reinforce the techniques


and concepts found within each chapter. The exercises are grouped into one of
three categories—review questions, data mining questions, and computational
questions.

• Review questions ask basic questions about the concepts and content found
within each chapter. The questions are designed to help determine if the reader
understands the major points conveyed in each chapter.

• Data mining questions require the reader to use one or several data mining tools
to perform data mining sessions.

• Computational questions have a mathematical flavor in that they require the


reader to perform one or several calculations. Many of the computational ques-
tions are appropriate for challenging the more advanced student.

CHAPTER CONTENT
The ordering of the chapters and the division of the book into separate parts is based
on several years of experience in teaching courses on data mining. Section I introduces
material that is fundamental to understanding the data mining process. The presenta-
tion is informal and easy to follow. Basic data mining concepts, strategies, and tech-
niques are introduced. Students learn about the types of problems that can be solved
with data mining.
Once the basic concepts are understood, Section II provides the tools for knowledge
discovery with detailed tutorials taking you through the knowledge discovery process.
The fact that data preprocessing is fundamental to successful data mining is empha-
sized. Also, special attention is given to formal data mining evaluation techniques.
Section III is all about neural networks. A conceptual and detailed presentation is offered
for feed-forward networks trained with backpropagation learning and self-organizing
maps for unsupervised clustering. Section III contains several tutorials for neural network
learning with Weka and RapidMiner.
Section IV focuses on several specialized techniques. Topics of current interest such as
time-series analysis, textual data mining, imbalanced and streaming data, as well as Web-
based data mining are described.
Preface ◾ xxxv

Section I: Data Mining Fundamentals

• Chapter 1 offers an overview of data analytics and all aspects of the data mining pro-
cess. Special emphasis is placed on helping the student determine when data mining
is an appropriate problem-solving strategy.
• Chapter 2 presents a synopsis of several common data mining strategies and tech-
niques. Basic methods for evaluating the outcome of a data mining session are described.
• Chapter 3 details a decision tree algorithm, the Apriori algorithm for producing asso-
ciation rules, a covering rule algorithm, the K-means algorithm for unsupervised
clustering, and supervised genetic learning. Tools are provided to help determine
which data mining techniques should be used to solve specific problems.

Section II: Tools for Knowledge Discovery

• Chapter 4 presents a tutorial introduction to Weka’s Explorer. Several tutorials pro-


vide a hands-on experience using the algorithms presented in the first three chapters.
• Chapter 5 introduces RapidMiner Studio 7, an open-source, code-free version of
RapidMiner’s commercial product. The chapter parallels the tutorials presented in
Chapter 4 for building, testing, saving, and applying models.
• Chapter 6 introduces the knowledge discovery in databases (KDD) process model
as a formal methodology for solving problems with data mining. This chapter offers
tutorials for outlier detection.
• Chapter 7 describes formal statistical and nonstatistical methods for evaluating the
outcome of a data mining session. The chapter illustrates how to create and read
Pareto lift charts and how to apply RapidMiner’s ANOVA, Grouped ANOVA, and
T-Test statistical operators for model evaluation.

Section III: Building Neural Networks

• Chapter 8 presents two popular neural network models. A detailed explanation of


neural network training is offered for the more technically inclined reader.
• Chapter 9 offers tutorials for applying Weka’s MultilayerPerceptron neural network
function for supervised learning and Weka’s SelfOrganizingMap for unsupervised
clustering.
• Chapter 10 presents tutorials on applying RapidMiner’s Neural Network operator for
supervised learning, and Self-Organizing Map operator for unsupervised clustering.
xxxvi ◾ Preface

Section IV: Advanced Data Mining Techniques

• Chapter 11 details several supervised statistical techniques including naive Bayes


classifier, support vector machines, linear regression, logistic regression, regression
trees, and model trees. The chapter contains several examples and tutorials.
• Chapter 12 presents several unsupervised clustering techniques including agglomera-
tive clustering, hierarchical conceptual clustering, and expectation maximization (EM).
Tutorials on using supervised learning for unsupervised cluster evaluation are presented.
• Chapter 13 introduces techniques for performing time-series analysis, Web-based
mining, and textual data mining. Methods for dealing with large-sized, imbalanced,
and streaming data are offered. Bagging and boosting are described as methods for
improving model performance. Tutorials and illustrations for time-series analysis,
textual data mining, and ensemble learning are presented. A detailed example using
receiver operator curves is offered.
• Chapter 14 provides a gentle introduction to data warehouse design and OLAP. A
tutorial on using Excel pivot tables for data analysis is included.

INSTRUCTOR SUPPLEMENTS
The following supplements are provided to help the instructor organize lectures and write
examinations:

• PowerPoint slides. Each figure and table in the text is part of a PowerPoint presenta-
tion. These slides are also offered in PDF format.
• A second set of slides containing the screenshots seen as you work through the
tutorials in Chapters 4 through 14.
• All RapidMiner processes used in the tutorials, demonstrations, and end-of-chapter
exercises are readily available together with simple installation instructions.
• Test questions. Several test questions are provided for each chapter.
• Answers to selected exercises. Answers are given for most of the end-of-chapter
exercises.
• Lesson planner. The lesson planner contains ideas for lecture format and points for
discussion. The planner also provides suggestions for using selected end-of-chapter
exercises in a laboratory setting.

Please note that these supplements are available to qualified instructors only. Contact
your CRC sales representative or get help by visiting https://www.crcpress.com/contactus
to access this material. Supplements will be updated as needed.
Preface ◾ xxxvii

USING WEKA AND RAPIDMINER


Students are likely to benefit most by developing a working knowledge of both tools. This
is best accomplished by students beginning their data mining experience with Weka’s
Explorer interface. The Explorer is easy to navigate and makes several of the more dif-
ficult preprocessing tasks transparent to the user. Missing data are automatically handled
by most data mining algorithms, and data type conversions are automatic. The format
for model evaluation, be it a training/test set scenario or cross-validation, is implemented
with a simple click of the mouse. This transparency allows the beginning student to
immediately experience the data mining process with a minimum of frustration. The
Explorer is a great starting point but still supports the resources for a complete data min-
ing experience.
Once students become familiar with the data mining process, they are ready to advance
to Chapter 5 and RapidMiner. RapidMiner Studio’s drag-and-drop workflow environ-
ment gives students complete control over their model building experience. Just a few of
RapidMiner’s features include an intuitive user interface, excellent graphics, and over 1500
operators for data visualization and preprocessing, data mining, and result evaluation. An
interface for cloud computing as well as extensions for mining textual data, financial data,
and the Web is supplemented by a large user community to help answer your data mining
questions.
Here are a few suggestions when using both models:

• Cover the following sections to gain enough knowledge to understand the tutorials
presented in later chapters.
• If Weka is your choice, at a minimum, work through Sections 4.1, 4.2, and 4.7 of
Chapter 4.
• If you are focusing on RapidMiner, cover at least Sections 5.1 and 5.2 of Chapter 5.
• Here is a summary of the tutorials given in Chapters 6, 7, 11, 12, 13, and 14.
• Chapter 6: RapidMiner is used to provide a tutorial on outlier analysis.
• Chapter 7: Tutorials are presented using RapidMiner’s T-Test and ANOVA opera-
tors for comparing model performance.
• Chapter 11: Both models are used for tutorials highlighting naive Bayes classifier
and support vector machines.
• Chapter 12: RapidMiner and Weka are used to illustrate unsupervised clustering
with the EM (Expectation Maximization) algorithm.
• Chapter 13: Both RapidMiner and Weka are employed for time-series analysis.
RapidMiner is used for a tutorial on textual data mining. Weka is employed for
a tutorial on ROC curves. RapidMiner is used to give an example of ensemble
learning.
xxxviii ◾ Preface

• Chapter 14: Tutorials are given for creating simple and multidimensional MS
Excel pivot tables.
• Chapter 9 is about neural networks using Weka. Chapter 10 employs RapidMiner
to cover the same material. There are advantages to examining at least some of the
material in both chapters. Weka’s neural network function is able to mine data hav-
ing a numeric output attribute, and RapidMiner’s self-organizing map operator can
perform dimensionality reduction as well as unsupervised clustering.

SUGGESTED COURSE OUTLINES


The text is appropriate for the undergraduate information systems or computer science
student. It can also provide assistance for the graduate student who desires a working
knowledge of data mining and knowledge discovery. The text is designed to be covered in
a single semester.

A Data Mining Course for Information Systems Majors or Minors


Cover Chapters 1, 2, and 3 in detail. However, Section 3.5 of Chapter 3 may be omitted or
lightly covered. Spend enough time on Chapters 4 and 5 for students to feel comfortable
working with the software tools.
If your students lack a course in basic statistics, Sections 7.1 through 7.7 can be lightly
covered. The tutorial material in Sections 7.8 through 7.10 is instructive. Section 8.5 can be
excluded, but cover all of either Chapter 9 or Chapter 10. Cover topics from Chapters 11
through 13 as appropriate for your class. For Chapter 13, all students need some exposure
to time-series analysis as well as Web-based and textual data mining.

A Data Mining Course for Undergraduate Computer Science Majors or Minors


Cover Chapters 1 through 13 in detail. Spend a day or two on the material in Chapter 14 to
provide students with a basic understanding of online analytical processing and data ware-
house design. Spend extra time covering material in Chapter 13. For a more intense course,
the material in Appendix B, “Statistics for Performance Evaluation,” can be covered as part
of the regular course.

A Data Mining Short Course


The undergraduate or graduate student interested in quickly developing a working knowl-
edge of data mining should devote time to Chapters 1, 2, and 3, and Chapter 4 or 5. A
working knowledge of neural networks can be obtained through the study of Chapter 8
(Sections 8.1 through 8.4) and Chapter 9 or Chapter 10.
Acknowledgments

I am indebted to my editor Randi Cohen for the confidence she placed in me and for allow-
ing me the freedom to make critical decisions about the content of the text. I am very grate-
ful to Dr. Mark Polczynski and found his constructive comments to be particularly helpful
during revisions of the manuscript. Finally, I am most deeply indebted to my wife Suzanne
for her extreme patience, helpful comments, and consistent support.

xxxix
Author

Richard J. Roiger, PhD, is a professor emeritus at Minnesota State University, Mankato,


where he taught and performed research in the Computer Information Science Department
for 27 years. Dr. Roiger earned his PhD degree in computer and information sciences at the
University of Minnesota. Dr. Roiger has presented conference papers and written several
journal articles about topics in data mining and knowledge discovery. After retirement,
Dr. Roiger continues to serve as a part-time faculty member teaching courses in data min-
ing, artificial intelligence, and research methods. He is a board member of the Retired
Education Association of Minnesota, where he serves as their financial advisor.

xli
I
Data Mining Fundamentals

1
Another Random Scribd Document
with Unrelated Content
Prediction of coming events was practiced by the priests in North
America, as it was elsewhere. They persuaded the multitude, says
Charlevoix, that they suffered from ecstatic transports. During these
conditions, they said that their spirits gave them a large
acquaintance with remote things, and with the future (N. F., vol. iii.
p. 347). Moreover, they practiced magic, and with such effect that
Charlevoix felt himself compelled to ascribe their performances to
their alliance with the devil. They even pretended to be born in a
supernatural manner, and found believers ready to think that only by
some sort of enchantment and illusion had they formerly imagined
that they had come into the world like other people. When they
went into the state of ecstasy, they resembled the Pythoness on the
tripod; they assumed tones of voice and performed actions which
seemed beyond human capacity. On these occasions they suffered
so much that it was hard to induce them, even by handsome
payment, thus to yield themselves to the spirit. So often did they
prophesy truly, that Charlevoix can only resort again to his
hypothesis of a real intercourse between them and the "father of
seduction and of lies," who manifested his connection with them by
telling them the truth. Thus, a lady named Madame de Marson, by
no means an "esprit faible," was anxious about her husband, who
was commanding at a French outpost in Acadia, and who had stayed
away beyond the time fixed for his return. A native woman, having
ascertained the reason of her trouble, told her not to be distressed,
for that her husband would return on a certain day at a certain hour,
wearing a grey hat. Seeing that the lady did not believe in her, she
returned on the day and at the hour named, and asked her if she
would not come to meet her husband. After much pressing, she
induced the lady to accompany her to the bank of the river. Scarcely
had they arrived, when M. de Marson appeared in a canoe, wearing
a grey hat upon his head. The writer was informed of this fact by
Madame de Marson's son-in-law, at that time Governor-General of
the French dominions in America, who had heard it from herself (N.
F., vol. iii. p. 359-363). The priests of the Tartars are also their
diviners. They predict eclipses, and announce lucky and unlucky
days for all sorts of business (Bergeron, Voyage de Rubruquis, ch.
47).
Among the Buddhist priesthood of Thibet, there is a class of
Lamas who are astrologers, distinguished by a peculiar dress, and
making it their business to tell fortunes, exorcise evil spirits, and so
forth. The astrologers "are considered to have intercourse with
Sadag," a spirit who is supposed to be "lord of the ground," in which
bodies are interred, and who, along with other spirits, requires to be
pacified by charms and rites known only to these priests. To prevent
them from injuring the dead, the relations offer a price in cattle or
money to Sadag; and the astrologers, when satisfied with the
amount, undertake the necessary conjuration (B. T., pp. 156, 271).
In the Old Testament, this class of unofficial priests is mentioned
with the reprobation inspired by rivalry. The Hebrew legislator is at
one with the Roman Senate in his desire to expel them from the
land. "There shall not be found among you any one that ... useth
divination, or an observer of times, or an enchanter, or a witch, or a
charmer, or a consultor with familiar spirits, or a wizard, or a
necromancer. For all that do these things are an abomination unto
the Lord: and because of these abominations the Lord thy God doth
drive them out from before thee" (Deut. xviii. 10-12). The very
prohibition evinces the existence of the objects against whom it is
aimed; and proves that, along with the recognized worship of
Jehovah, there existed an unrecognized resort to practices which the
sterner adherents of that worship would not permit.
In addition to their claim to be in possession of special means of
ascertaining the occult causes of phenomena (as in illness), and of
special contrivances for penetrating the future (as in astrology or
fortune-telling), priesthoods pretend to a more direct inspiration
from on high, qualifying them either to announce the will of their
god on exceptional occasions, or to intimate his purpose in matters
of more ordinary occurrence. This inspiration was granted to the
native North American priests at the critical age of puberty, "It was
revealed to its possessor by the character of the visions he perceived
at the ordeal he passed through on arriving at puberty; and by the
northern nations was said to be the manifestation of a more potent
personal spirit than ordinary. It was not a faculty, but an inspiration;
not an inborn strength, but a spiritual gift" (M. N. W., p. 279). So in
India; among the several meanings of the word Brahman, is that of
a person "elected by special divine favor to receive the gift of
inspiration" (O. S. T., vol. i. p. 259). The missionary Turner, who has
an eye for parallels, observes, among other just reflections, that "the
way in which the Samoan priests declared that the gods spoke by
them, strikingly reminds us of the mode by which God of old made
known his will to man by the Hebrew prophets" (N. Y., p. 349).
Although the Levites were said to be the Lord's, and to have been
hallowed by him instead of all the first-born of Israel, yet it does not
appear that they were in general endowed with any high order of
inspiration. The high-priest no doubt received communications from
God by the Urim and Thummim. Priests were also the judges whom
the Lord chose, and whose sentence in court was to be obeyed on
penalty of death; but the inspiration that was fitted to guide the
Israelites was supplied not so much by them as by the prophets, a
kind of supplementary priesthood of which the members, sometimes
priests, sometimes consecrated by other prophets, were as a rule
unconsecrated, deriving their appointment directly from Jehovah.
While, therefore, it was attained in a somewhat unusual way, the
general need of an inspired order was supplied no less perfectly
among the Israelites than elsewhere. Christian priests enjoy two
kinds of inspiration. In the first place, they are inspired specially
when assembled in general councils, to declare the truth in matters
of doctrine, or in other words, to issue supplementary revelations; in
the second place, they are inspired generally to remit or retain
offenses, their sentence being—according to the common doctrine of
Catholics and Episcopalian Protestants—always ratified in the Court
above.
Consistently with this exalted conception of their authority, priestly
orders threaten punishment to offenders, and announce the future
destiny of souls. Thus the Mexican priests warned their penitents
after confession not to fall again into sin, holding out the prospect of
the torments of hell if they should neglect the admonition (A. M.,
vol. v, p. 370). The priests in some parts of Africa know the fate of
each soul after death, and can say whether it has gone to God or to
the evil spirit (G. d. M., p. 335).
Sometimes the priests are held to be protected against injury by
the especial care of heaven. To take away a Brahman's wife is an
offense involving terrible calamities, while kings who restore her to
the Brahman enjoy "the abundance of the earth" (0. S. T., vol. i. p.
257). A king who should eat a Brahman's cow is warned in solemn
language of the dreadful consequences of such conduct, both in this
world and the next (Ibid., vol. i. p. 285). The sacred volumes declare
that "whenever a king, fancying himself mighty, seeks to devour a
Brahman, that kingdom is broken up, in which a Brahman is
oppressed" (Ibid., vol. i. p. 287). "No one who has eaten a
Brahman's cow continues to watch (i.e., to rule) over a country." The
Indian gods, moreover, "do not eat the food offered by a king who
has no ... Purohita," or domestic chaplain (A. B., p. 528). The
murder of a king who had honored and enriched the Buddhist
priesthood, is said to have entailed the destruction of the power and
strength of the kingdom of Thibet, and to have extinguished the
happiness and welfare of its people (G. O. M., p. 362). And Jewish
history affords abundant instances of the manner in which the
success or glory of the rulers was connected, by the sacerdotal class,
with the respect shown towards themselves as the ministers of
Jehovah, and with the rigor evinced in persecuting or putting down
the ministers of every other creed. That the same bias has been
betrayed by the Christian priesthood and their adherents in the
interpretation of history needs no proof.
The presence of a priest or priests at important rites is held to be
indispensable by all religions. With the negroes visited by Oldendorp,
the priest was in requisition at burials; for he only could help the
soul to get to God, and keep off the evil spirit who would seek to
obtain possession of it (G. d. M., p. 327). "For most of the
ceremonies" (in Thibet) "the performance by a Lama is considered
indispensable to its due effect; and even where this is not so, the
efficacy of the rite is increased by the Lama's assistance" (B. T., p.
247). Much the same thing may be said here. For certain
ceremonies, such as confirmation, the administration of the
sacrament, the conduct of divine service on Sundays, the priest is a
necessary official. For others, such as marriage, the majority of the
people prefer to employ him, and no doubt believe that "the efficacy
of the rite is increased" by the fact that he reads the words of the
service. Nor is this surprising when we consider that, until within
very recent times, no legitimate child could be produced in England
without the assistance of a priest.
Not only is the ecclesiastical caste required to render religious rites
acceptable to the deity, but they are often endowed with the
attribute of ability to modify the course of nature. Tanna, one of the
Fiji group, "there are rain-makers and thunder-makers, and fly and
musquito makers, and a host of other 'sacred men;'" and in another
island "there is a rain-making class of priests" (N. Y., pp. 89, 428). In
Christian countries all priests are rain-makers, the reading of prayers
for fine or wet weather being a portion of their established duties.
Naturally, the members of a class whose functions are of this high
value to the community enjoy great power, are regarded as
extremely sacred, and above all, are well rewarded. First, as to the
power they enjoy. This is accorded to them alike by savage tribes
and by cultivated Europeans. According to Brinton, all North
American tribes "appear to have been controlled" by secret societies
of priests. "Withal," says the same authority, "there was no class of
persons who so widely and deeply influenced the culture, and
shaped the destiny of the Indian tribes, as their priests" (M. N. W., p.
285). Over the negroes of the Caribbean Islands the priests and
priestesses exercised an almost unlimited dominion, being regarded
with the greatest reverence. No negro would have ventured to
transgress the arrangements made by a priest (G. d. M., p. 327). On
the coast of Guinea there exists, or existed, an institution by which
certain women became priestesses; and such women, even though
slaves before, enjoyed, on receiving this dignity, a high position and
even exercised absolute authority precisely in the quarter where it
must have been sweetest to their minds, namely, over their
husbands (D. C. G., p. 363). Writing of the Talapoins in Siam,
Gervaise says, that they are exempted from all public charges; they
salute nobody, while everybody prostrates himself before them; they
are maintained at the public expense, and so forth (H. N. S.,
troisième partie, chs. 5, 6). Of the enormous power wielded by the
clerical order in Europe, especially during the Middle Ages, it is
unnecessary to speak. The humiliation of Theodosius by Ambrose
was one of the most conspicuous, as it was one of the most
beneficent, exercises of their extensive rights.
Secondly, the sanctity attached to their persons is usually
considerable, and may often, to ambitious minds, afford a large
compensation for the loss (if such be required) of some kinds of
secular enjoyment. The African priestesses just mentioned are "as
much respected as the priest, or rather more," and call themselves
by the appellation of "God's children." When certain Buddhist
ecclesiastics were executed for rebellion in Ceylon, the utmost
astonishment was expressed by the people at the temerity of the
king in so treating "such holy and reverend persons. And none
heretofore," adds the reporter of the fact, "have been so served;
being reputed and called sons of Boddon" (H. R. C., p. 75), or
Buddha; a title exactly corresponding to that of God's children
bestowed upon the priestesses. In Siam the "Talapoins," or priests,
are of two kinds: secular, living in the world; and regular, living in
the forest without intercourse with men. There is no limit to the
veneration given by the Siamese to these last, whom they look upon
as demigods (H. N. S., troisième partie, p. 184). "The Brahman
caste," according to the sacred books of the Hindus, "is sprung from
the gods" (O. S. T., vol. i. p. 21); and the exceptional honor always
accorded to them is in harmony with this theory of their origin. The
title "Reverend," man to be revered, given to the clergy in Europe,
implies the existence, at least originally, of a similar sentiment of
respect.
Lastly, the services of priests are generally well rewarded, and
they themselves take every care to encourage liberality towards their
order. Payment is made to them either in the shape of direct
remuneration, or in that of exceptional pecuniary privileges, or in
that of exemptions from burdens. Direct remuneration may be, and
often is, given in the shape of a fixed portion abstracted from the
property of the laity for the benefit of the clergy. Such are the tithes
bestowed by law upon the latter among the Jews, the Parsees, and
the Christians. Or, direct remuneration may consist in fees for
services rendered, and in voluntary gifts. Such fees and gifts are
always represented by the priesthood as highly advantageous to the
givers. If the relatives of a deceased Parsee do not give the priest
who officiates at the funeral four new robes, the dead will appear
naked before the throne of God at the resurrection, and will be put
to shame before the whole assembly (Av., vol. ii. p. xli.; iii. p. xliv).
Moreover, those Parsees who wish to live happily, and have children
who will do them honor, must pay four priests, who during three
days and three nights perform the Yasna for them (Z. A., vol. ii. p.
564). In Thibet there is great merit in consecrating a domestic
animal to a certain god, the animal being after a certain time
"delivered to the Lamas, who may eat it" (B. T., p. 158). Giving alms
to the monks is a duty most sedulously inculcated by Buddhism, and
the Buddhist writings abound in illustrations of the advantages
derived from the practice. Similar benefits accrue to the clergy from
the custom, prevailing in Ceylon, of making offerings in the temples
for recovery from sickness; for when the Singhalese have left their
gift on the altar, "the priest presents it with all due ceremony to the
god; and after its purpose is thus served, very prudently converts it
to his own use" (A I. C., p. 205). Of the Levites it is solemnly
declared in Deuteronomy that they have "no part nor inheritance
with Israel," and that "the Lord is their inheritance." But "the Lord" is
soon seen to be a very substantial inheritance indeed. From those
that offer an ox or a sheep the priests are to receive "the shoulder,
the two cheeks, and the maw;" while the first-fruits of corn, wine,
and oil, and the first of the sheep's fleeces are to be given to them
(Deut. xviii. 1-5). Moreover, giving to the priest is declared to be the
same thing as giving to the Lord (Num. v. 8). A similar notion,
always fostered by ecclesiastical influence, has led to the vast
endowments bestowed by pious monarchs and wealthy individuals
upon the Christian clergy.
Occasionally, the priests enjoy exemptions from the taxes, or other
burdens levied upon ordinary people. A singular instance of this is
found in the privilege of the Parsee priests, of not paying their
doctors (J. A., vol. ii. p. 555). Large immunities used to be enjoyed
by ecclesiastics among ourselves, especially that of exemption from
the jurisdiction of the ordinary courts of law.
While the life of a priest often entails certain privations, he is
nevertheless frequently sustained by the thought that there is merit
in the sacrifices he makes. Thus, it is held by a Buddhist authority,
that the merit obtained by entering the spiritual order is very great;
and that his merit is immeasurable who either permits a son, a
daughter, or a slave, to enter it, or enters it himself (W. u. T., p.
107).
Priesthoods may either be hereditary or selected. The Brahmins in
India, and the Levites in Judæa, are remarkable types of hereditary,
the Buddhist and the Christian clergy of selected, sacerdotal orders.
Curious modifications of the hereditary principle were found among
the American Indians. Thus, "among the Nez Percés of Oregon," the
priestly office "was transmitted in one family from father to son and
daughter, but, always with the proviso that the children at the proper
age reported dreams of a satisfactory character." The Shawnees
"confined it to one totem:" but just as the Hebrew prophets need
not be Levites, "the greatest of their prophets ... was not a member
of this clan." The Cherokees "had one family set apart for the
priestly office," and when they "abused their birthright" and were all
massacred, another family took their places. With another tribe, the
Choctaws, the office of high-priest remained in one family, passing
from father to son; "and the very influential piaches of the Carib
tribes very generally transmitted their rank and position to their
children." A more important case of hereditary priesthood is that of
the Incas of Peru, who monopolized the highest offices both in
Church and State. "In ancient Anahuac" there existed a double
system of inheritance and selection. The priests of Huitzilopochtli,
"and perhaps a few other gods," were hereditary; and the high-
priest of that god, towards whom the whole order was required to
observe implicit obedience, was the "hereditary pontifex maximus."
But the rest were dedicated to ecclesiastical life from early
childhood, and were carefully educated for the profession (M. N. W.,
p. 281-291).
Christianity entirely abandoned the hereditary principle prevalent
among its spiritual ancestors, the Jews, and selected for its ministers
of religion those who felt, or professed to feel, an internal vocation
for this career. Doubtless this is the most effectual plan for securing
a powerful priesthood. Those who belong to it have their heart far
more thoroughly in their work than can possibly be the case when it
falls to them by right of birth. Just the most priestly-minded of the
community become priests; and a far greater air of zeal and of
sanctity attaches to an order thus maintained, than to one of which
many of the members possess no qualification but that of family,
tribe, or caste.
Nothing can be more irrational than the denunciation of priests
and priestcraft which is often indulged in by Liberal writers and
politicians. If it be true that priests have shown considerable
cunning, it is also true that the people have fostered that cunning by
credulity. And if the clergy have put forth very large pretensions to
inspiration, divine authority, and hidden knowledge, it is equally the
fact that the laity have demanded such qualifications at their hands.
An order can scarcely be blamed if it seeks to satisfy the claims
which the popular religion makes upon it. Enlightenment from
heaven has in all ages and countries been positively demanded.
Sacrifices have always had to be made; and when it was found more
convenient to delegate the function of offering them to a class apart,
that class naturally established ritualistic rules of their own, and as
naturally asserted (and no doubt believed) that all sacrifices not
offered according to these rules were displeasing to God. And they
could not profess the inspiration which they were expected to
manifest without also requiring obedience to divine commands.
Priests are, in fact, the mere outcome of religious belief as it
commonly exists; and partly minister to that belief by deliberate
trickery, partly share it themselves, and honestly accept the
accredited view of their own lofty commission.
Divine inspiration leads by a very logical process to infallibility. A
Church founded on revelation needs living teachers to preserve the
correct interpretation of that revelation. Without such living
teachers, revealed truth itself becomes (as it always has done
among Protestants) an occasion of discord and of schism. But the
interpreters of revelation in their turn must be able to appeal to
some sole and supreme authority, as the arbiter between varying
opinions, and the guide to be followed through all the intricacies of
dogma. Nowhere can such an arbiter and such a guide be found
more naturally than in the head of the Church himself. If God speaks
to mankind through his Church, it is only a logical conclusion that
within that Church there must be one through whom he speaks with
absolute certainty, and whose prophetic voice must therefore be
infallible. There cannot be a more consistent application of the
general theory of priesthood; and there is no more fatal sign for the
prospects of Christianity than the inability of many of its supporters
to accept so useful a doctrine, and the thoughtless indignation of
some among them against the single Church which has had the
wisdom to proclaim it.
CHAPTER V.
HOLY PERSONS.
Although for the ordinary and regular communications from the
divine Being to man the established priesthoods might suffice, yet
occasions arise when there is need of a plenipotentiary with higher
authority and more extensive powers. What is required of these
exceptional ambassadors is not merely to repeat the doctrines of the
old religion, but to establish a new one. In other words, they are the
original founders of the great religions of the world. Of such
founders there is but a very limited number.
Beginning with China, and proceeding from East to West, we find
six:—

1. Confucius, or Khung-fu-tsze, the founder of Confucianism.


2. Laò-tsé, the founder of Taouism.
3. Sakyamuni, or Gautama Buddha, the founder of Buddhism.
4. Zarathustra, or Zoroaster, the founder of Parseeism.
5. Mohammed, or Mahomet, the founder of Islamism.
6. Jesus Christ, the founder of Christianity.

All these men, whom for convenience sake I propose to call


prophets, occupy an entirely exceptional position in the history of
the human race. The characteristics, or marks, by which they may
be distinguished from other great men, are partly external,
belonging to the views of others about them; partly internal,
belonging to their own view about themselves.
1. The first external mark by which they are distinguished is, that
within his own religion each of these is recognized as the highest
known authority. They alone are thought of as having the right to
change what is established. While all other teachers appeal to them
for the sanction of their doctrines, there is no appeal from them to
any one beyond. What they have said is final. They are in perfect
possession of the truth. Others are in possession of it only in so far
as they agree with them. No doubt, the sacred books are equally
infallible with the prophets; but the sacred books of religions
founded by prophets derive their authority in the last resort from
them, and are always held to be only a written statement of their
teaching. Thus, the sacred books of China are partly of direct
Confucian authorship; partly by others who recognize him as their
head. The only sacred book of the Tao-tsé is by their founder
himself. The sacred books of the Buddhists are supposed discourses
of the Buddha. The Avesta is the reputed work of Zarathustra. The
Koran is the actual work of Mahomet. And lastly, the New Testament
is all of it written in express subordination to the authority of Christ,
to which it constantly appeals. These books, then, are infallible,
because they contain the doctrines of their founders.
The same thing is true where there is an infallible Church. The
Church never claims the same absolute authority as it concedes to
its prophet. Its infallibility consists in its power to interpret correctly
the mind of him by whom it was established. He it is who brought
the message from above which no human power could have
discovered. It is the Church's function to explain that message to the
world; and, where needed, to deduce such inferences therefrom as
by its supernatural inspiration it perceives to be just. Beyond this,
the power of the Church does not extend.
A second external mark, closely related to the first, is, that the
prophet of each religion is, within the limits of that religion, the
object of a more or less mythical delineation of his personality. His
historical form is, to some extent, superseded by the form bestowed
upon him by a dogmatic legend. According to that legend there was
something about his nature that was more than human. He was in
some way extraordinary. The myths related vary from a mere
exaltation of the common features of humanity, to the invention of
completely supernatural attributes. But their object is the same: to
represent their prophet as more highly endowed than other mortals.
Even where there is little of absolute myth, the representation we
receive is one-sided; we know nothing of the prophet's faults, except
in so far as we may discover them against the will of the
biographers. To them he appears all-virtuous. These remarks will be
abundantly illustrated when we come to consider the life of Jesus,
and to compare it with that of his compeers.
2. The internal mark corresponds to the first external mark, of
which it is indeed the subjective counterpart. These prophets
conceive themselves deputed to teach a faith, and they virtually
recognize in the performance of this mission no human authority
superior to their own. In words, perhaps, they do acknowledge some
established authority; but in fact they set it aside. No Church or
priesthood has the smallest weight with them, as opposed to that
intense internal conviction which appears to them an inspiration.
Hence it was observed of Jesus, that he taught with authority, and
not as the scribes. Without being able themselves to give any
explanation of the fact, they feel themselves endowed with plenary
power to reform. And it is not, like other reformers, in the name of
another that they do this; they reform in their own right, and with
no other title than their own profound consciousness of being not
only permitted, but charged to do it.
Nevertheless, it must not be imagined that the prophets sweep
away everything they find in the existing religion. On the contrary, it
will be found on examination that they always retain some important
element or elements of the older faith. Without this, they would
have no hold on the popular mind of their country, from which they
would be too far removed to make themselves understood. Thus,
Allah was already recognized as God by the Arabians in the time of
Mahomet, whose reform consisted in teaching that he was the only
God. Thus, the Messiah was already expected by the Jews in the
time of Jesus, whose reform consisted in applying the expectation to
himself. Prophets take advantage of a faith already in existence, and
making that the foundation of a new religion, erect upon it the more
special truths they are inspired to proclaim.
No prophet can construct a religion entirely from his own brain.
Were he to do so, he would be unable to show any reason why it
should be accepted. There would be no feeling in the minds of his
hearers to which he could appeal. A religion to be accepted by any
but an insignificant fraction, must find a response not only in the
intellects, but in the emotions of those for whom it is designed.
This, it appears to me, is the weak point of Positivism. Auguste
Comte, having abolished all that in the general mind constitutes
religion at all, attempted to compose a faith for his disciples by the
merely arbitrary exercise of his own ingenuity. He perhaps did not
consider that in all history there is no example of a religion being
invented by an individual thinker. It is like attempting to sell a
commodity for which there is no demand. Even if his philosophical
principles should be accepted by the whole of Europe, there can be
no reason why the special observances he recommends should be
adopted, or the special saints whom he places in the calendar be
adored. Those who receive his philosophy will have no need for his
ceremonies. While even if ceremonies cannot be entirely dispensed
with, it is not the mere fact of a solitary thinker planning it in his
own mind that can ever ensure the adoption of a ritual.
Very different has been the procedure of the prophets of whom
we are now to speak. Intellectually, they were no doubt far inferior
to the founder of the Positive Philosophy. But emotionally, they were
fitted for the part which he unsuccessfully endeavored to play. They
entered into the religious feelings of their countrymen, and gave
those feelings a higher expression than had yet been found for
them. Instinctively fixing on some conspicuous part of the old
religion, they made that the starting-point for the development of
the new. They reformed, but the reformation linked itself to some
conviction that was already deeply rooted in the nature of their
converts. They assumed boundless authority; but it was authority to
proclaim a pre-existing truth, not to spin out of their purely personal
ideas of fitness a system altogether disconnected from the past
evolution of religion, and to impose that system upon the remainder
of mankind.

Section I.—Confucius.[11]

The life of the prophet of China is not eventful. It has neither the
charm of philosophic placidity and retirement from the world which
belongs to that of Laò-tsé, nor the romantic interest of the more
varied careers of Sakyamuni, Christ, or Mahomet. For Confucius,
though a philosopher, did not object, indeed rather desired, to take
some share in the government of his country, but his wishes
received very little gratification. Rulers refused to acquiesce in his
principles of administration, and he was compelled to rely for their
propagation mainly on the oral instruction imparted to his disciples.
His life, therefore, bears to some extent the aspect of a failure,
though for this appearance he himself is not to blame. Another
cause, which somewhat diminishes the interest we might otherwise
take in him, is his excessive attention to proprieties, ceremonies, and
rites. We cannot but feel that a truly great man, even in China,
would have emancipated himself from the bondage of such trifles.
Nevertheless, after all deductions are made, enough remains to
render the career and character of Confucius deserving of attention,
and in many respects of admiration.
Descended from a family which had formerly been powerful and
noble, but was now in comparatively modest circumstances, he was
born in b.c. 551, his father's name being Shuh-leang Heih, and his
mother's Ching-Tsae. The legends related of his nativity I pass over
for the present. His father, who was an old man when he was born,
died when the child was in his third year; and his mother in b.c. 528.
At nineteen, Confucius was married; and at twenty-one he came
forward as a teacher. Disciples attached themselves to him, and
during his long career as a philosopher, we find him constantly
attended by some faithful friends, who receive all he says with
unbounded deference, and propose questions for his decision as to
an authority against whom there can be no appeal. The maxims of
Confucius did not refer solely to ethics or to religion; they bore
largely upon the art of government, and he was desirous if possible
of putting them in actual practice in the administration of public
affairs. China, however, was in a state of great confusion in his days;
there were rebellions and wars in progress: and the character of the
rulers from whom he might have obtained employment was such,
that he could not, consistently with the high standard of honor on
which he always acted, accept favors at their hands. One of them
proposed to grant him a town with its revenues; but Confucius said:
"A superior man will only receive reward for services which he has
done. I have given advice to the duke Ting (see below), but he has
not obeyed it, and now he would endow me with this place! very far
is he from understanding me" (C. C., vol. i., Prolegomena, p. 68). In
the year 500 the means were at length put within his reach of
carrying his views into practice. He was made "chief magistrate of a
town" in the state of Loo; and this first appointment was followed by
that of "assistant-superintendent of works," and subsequently by
that of "minister of crime." In this office he is said to have put an
end to crime altogether; but Dr. Legge rightly warns us against
confiding in the "indiscriminating eulogies" of his disciples. A more
substantial service attributed to him is that of procuring the
dismantlement of two fortified towns which were the refuge of
dangerous and warlike chiefs. But his reforming government was
brought to an end after a few years by the weakness of his
sovereign, duke Ting, who was captivated by a present of eighty
beautiful and accomplished girls, and one hundred and twenty
horses, from a neighboring State. Engrossed by this present, the
duke neglected public affairs, and the philosopher felt bound to
resign.
We need not follow him during the long wanderings through
various parts of China which followed upon this disappointment.
After traveling from State to State for many years, he returned in his
sixty-ninth year to Loo, but not to office. In the year 478 his sad and
troubled life was closed by death.
Our information respecting the character of Confucius is ample.
From the book which Dr. Legge has entitled the "Confucian
Analects," a collection of his sayings made (as he believes) by the
disciples of his disciples, we obtain the most minute particulars both
as to his personal habits and as to the nature of his teaching. The
impression derived from these accounts is that of a gentle, virtuous,
benevolent, and eminently honorable man; a man who, like
Socrates, was indifferent to the reward received for his tuition,
though not refusing payment altogether; who would never sacrifice
a single principle for the sake of his individual advantage; yet who
was anxious, if possible, to benefit the kingdom by the
establishment of an administration penetrated with those ethical
maxims which he conceived to be all-important. Yet, irreproachable
as his moral character was, there is about him a deficiency of that
bold originality which has characterized the greatest prophets of
other nations. Sakyamuni revolted against the restrictions of caste
which dominated all minds in India. Jesus boldly claimed for moral
conduct a rank far superior to that of every ceremonial obligation,
even those which were held the most sacred by his countrymen.
Mahomet, morally far below the Chinese sage, evinced a far more
independent genius by his attack on the prevalent idolatry of Mecca.
Confucius did nothing of this kind. His was a mind which looked back
longingly to antiquity, and imagined that it discovered in the ancient
rulers and the ancient modes of action, the models of perfection
which all later times should strive to follow. Nor was this all. He was
so profoundly under the influence of Chinese ways of thinking, as to
attach an almost ludicrous importance to a precise conformity to
certain rules of propriety, and to regard the exactitude with which
ceremonies were performed as matter of the highest concern. In
fact, he could not emancipate himself from the traditions of his
country; and his principles would have resulted rather in making his
followers perfect Chinamen than perfect men.
A far more serious charge is indeed brought against him by Dr.
Legge—that of insincerity (C. C., vol. i.—Prolegomena— p. 101). I
hesitate to impugn the opinion of so competent a scholar; yet the
evidence he has produced does not seem to me sufficient to sustain
the indictment. Granting that he gave an unwelcome visitor the
excuse of sickness, which was untrue, still, as we are ignorant of the
reasons which led him to decline seeing the person in question, we
cannot estimate the force of the motives that induced him to put
forward a plea in conformity with the polite customs of his country.
It does not appear, moreover, that he practiced an intentional deceit.
And though on one occasion he may have violated an oath extorted
by rebels who had him in their power, therein acting wrongly (as I
think), it is always an open question how far promises made under
such circumstances are binding on the conscience. Whatever
failings, however, it may be necessary to admit, there can be no
question of the preëminent purity alike of his life and doctrine. His is
a character which, be its imperfections what they may, we cannot
help loving; and there have been few, indeed, who would not have
been benefited by the attempt to reach even that standard of virtue
which he held up to the admiration of his disciples.
A few quotations from the works in which his words and actions
are preserved, will illustrate these remarks. In the tenth Book of the
Analects (C. C., vol. i. p. 91-100), his manners, his garments, his
mode of behavior under various circumstances, are elaborately
described. There are not many personages in history of whom we
have so minute a knowledge. We learn that "in his village" he
"looked simple and sincere, and as if he were not able to speak." His
reverence for his superiors seems to have been profound. "When the
prince was present, his manner displayed respectful uneasiness; it
was grave, but self-possessed." When going to an audience of the
prince, "he ascended the dais, holding up his robe with both his
hands, and his body bent; holding in his breath also, as if he dared
not breathe. When he came out from the audience (the italics, here
and elsewhere, are in Legge), as soon as he had descended one
step, he began to relax his countenance, and had a satisfied look.
When he had got to the bottom of the steps, he advanced rapidly to
his place, with his arms like wings, and on occupying it, his manner
still showed respectful uneasiness." He was rather particular about
his food, rejecting meat unless "cut properly," and with "its proper
sauce."
Whatever he might be eating, however, "he would offer a little of
it in sacrifice." "When any of his friends died, if the deceased had no
relations who could be depended on for the necessary offices, he
would say, 'I will bury him.'" "In bed, he did not lie like a corpse."
And it is satisfactory to learn of one who was such a respecter of
formalities, that "at home he did not put on any formal deportment."
Notwithstanding this, he does not appear to have been on very
intimate terms with his son, to whom he is reported to have said
that unless he learned "the odes" he would not be fit to converse
with; and that unless he learned "the rules of propriety" his
character could not be established. The disciple, who was informed
by the son himself that he had never heard from his father any other
special doctrine, was probably right in concluding that "the superior
man maintains a distant reserve towards his son" (Lun Yu, xvi. 13).
But with his beloved disciples Confucius was on terms of
affectionate intimacy which does not seem to have been marred by
"the rules of propriety." For the death of one of them at least he
mourned so bitterly as to draw down upon himself the expostulation
of those who remained (Ibid., xi. 9). The picture of the Master,
accompanied at all times by his faithful friends, who hang upon his
lips, and eagerly gather up his every utterance, is on the whole a
pleasant one. "Do you think, my disciples," he asks, "that I have any
concealments? I conceal nothing from you. There is nothing that I
do which is not shown to you, my disciples;—that is my way" (Ibid.,
vii. 23). And with all the homage he is constantly receiving,
Confucius is never arrogant. He never speaks like a man who wishes
to enforce his views in an authoritative style on others; never
threatens punishment either here or hereafter to those who dissent
from him.
"There were four things," his disciples tell us, "from which the
Master was entirely free. He had no foregone conclusions, no
arbitrary predeterminations, no obstinacy, and no egoism" (Lun Yu,
ix. 4). And his conduct is entirely in harmony with this statement. It
is as a learner, rather than a teacher, that he regards himself. "The
Master said, 'When I walk along with two others, they may serve me
as my teachers. I will select their good qualities, and follow them;
their bad qualities, and avoid them'" (Ibid., vii. 21). Or again: "The
sage and the man of perfect virtue, how dare I rank myself with
them? It may simply be said of me, that I strive to become such
without satiety, and teach others without weariness" (Ibid., vii. 33).
"In letters I am perhaps equal to other men, but the character of the
superior man, carrying out in his conduct what he professes, is what
I have not yet attained to" (Ibid., vii. 32).
Notwithstanding this modesty, there are traces—few indeed, but
not obscure—of that conviction of a peculiar mission which all great
prophets have entertained, and without which even Confucius would
scarcely have been ranked among them. The most distinct of these
is the following passage:—"The Master was put in fear in K'wang. He
said, 'After the death of king Wan, was not the cause of truth lodged
here in me? If Heaven had wished to let this cause of truth perish,
then I, a future mortal, should not have got such a relation to that
cause. While Heaven does not let the cause of truth perish, what can
the people of K'wang do to me?'" (Lun Yu, ix. 5). These remarkable
words would be conclusive, if they stood alone. But they do not
stand alone. In another place we find him thus lamenting the pain of
being generally misunderstood, which is apt to be so keenly felt by
exalted and sensitive natures. "The Master said, 'Alas! there is no
one that knows me.' Tse-kung said, 'What do you mean by thus
saying—that no one knows you?' The Master replied, 'I do not
murmur against Heaven. I do not grumble against men. My studies
lie low, and my penetration rises high. But there is Heaven;—that
knows me!'" (Ibid., xiv. 37). Men might reject his labors and despise
his teaching, but he would complain neither against Heaven nor
against them. If he was not known by men, he was known by
Heaven, and that was enough. On another occasion, "the Master
said, 'Heaven produced the virtue that is in me, Hwan T'uy—what
can he do to me?'"[12]
These passages are the more remarkable, because Confucius was
not in the ordinary sense a believer in God. That is, he never,
throughout his instructions, says a single word implying
acknowledgment of a personal Deity; a Creator of the world; a Being
whom we are bound to worship as the author of our lives and the
ruler of our destinies. He has even been suspected of omitting from
his edition of the Shoo-king and the She-king everything that could
support the comparatively theistic doctrine of his contemporary, Laò-
tsé (By V. von Strauss, T. T. K., p. xxxviii). That his high respect for
antiquity would have permitted such a procedure is, to say the least,
very improbable; and Dr. Legge is no doubt right in acquitting him of
any willful suppression of, or addition to, the ancient articles of
Chinese faith (C. C., vol. i. Prolegomena, p. 99). For our present
purpose it is enough to note that he avoided all discussion on the
higher problems of religion; and contented himself with speaking,
and that but rarely, of a vague, and hardly personal Being which he
called Heaven. Thus, in a book attributed (perhaps erroneously) to
his grandson, he is reported as saying, "Sincerity is the very way of
Heaven" (Chung Yung, xx. 18). Of king Woo and the duke of Chow,
two ancient worthies, he says: "By the ceremonies of the sacrifices
to Heaven and Earth they served God" (where he seems to
distinguish between Heaven and God, whom I believe he never
mentions but here); "and by the ceremonies of the ancestral temple
they sacrificed to their ancestors. He who understands the
ceremonies of the sacrifices to Heaven and Earth, and the meaning
of the several sacrifices to ancestors, would find the government of
a kingdom as easy as to look into his palm" (Ibid., xix. 6).
Elsewhere, he remarks that "he who is greatly virtuous will be sure
to receive the appointment of heaven" (Ibid., xvii. 5). Again:
"Heaven, in the production of things, is surely bountiful to them,
according to their qualities" (Ibid., xvii. 3). Nothing very definite can
be gathered from these passages, as to his opinions concerning the
nature of the power of which he spoke thus obscurely. Yet it would
be rash to find fault with him on that account. His language may
have been, and in all probability was, the correct expression of his
feelings. His mind was not of the dogmatic type; and if he does not
teach his disciples any very intelligible principles concerning spiritual
matters, it is simply because he is honestly conscious of having none
to teach.
There are, indeed, indications which might be taken to imply the
existence of an esoteric doctrine. "To those," he says, "whose talents
are above mediocrity, the highest subjects may be announced. To
those who are below mediocrity, the highest subjects may not be
announced" (Lun Yu, vi. 19). We are further told that Tsze-kung
said, "the Master's personal displays of his principles, and ordinary
descriptions of them may be heard. His discourses about man's
nature, and the way of Heaven, cannot be heard" (Ibid., v. 12). This
last passage appears to mean that they were not open to the
indiscriminate multitude, nor perhaps to all of the disciples. But we
may reasonably suppose that the intimate friends who recorded his
sayings were considered by him to be above mediocrity, and were
the depositaries of all he had to tell them on religious matters.
Yet this, little as it was, may not always have been rightly
understood. Once, for example, he says to a disciple, "Sin, my
doctrine is that of an all-pervading unity." This is interpreted by the
disciple (in the Master's absence) to mean only that his doctrine is
"to be true to the principles of our nature, and the benevolent
exercise of them to others" (Ibid., iv. 15). I can hardly believe that
Confucius would have taught so simple a lesson under so obscure a
figure; and it is possible that the reserve that he habitually practiced
with regard to his religious faith may have prevented a fuller
explanation. "The subjects on which the Master did not talk were—
extraordinary things, feats of strength, disorder, and spiritual beings"
(Lun Yu, vii. 20). And although, in the Doctrine of the Mean (a work
which is perhaps less authentic than the Analects) we find him
discoursing freely on spiritual beings, which, he says, "abundantly
display the powers that belong to them" (Chung Yung, 16), there are
portions of the Analects which confirm the impression that he did
not readily venture into these extra-mundane regions. Heaven itself,
he once pointed out to an over-curious disciple, preserves an
unbroken silence (Lun Yu, xvii. 19). Interrogated "about serving the
spirits of the dead," he gave this striking answer: "While you are not
able to serve men, how can you serve their spirits?" And when "Ke
Loo added, 'I venture to ask about death?' he was answered, 'While
you do not know life, how can you know about death?'" (Ibid., xi.
11). Another instance of a similar reticence is presented by his
conduct during an illness. "The Master being very sick, Tsze-Loo
asked leave to pray for him. He said, 'May such a thing be done?'
Tsze-Loo replied, 'It may. In the prayers it is said, Prayer has been
made to the spirits of the upper and lower worlds.' The Master said,
'My praying has been for a long time'" (Ibid., vii. 34). I am unable to
see "the satisfaction of Confucius with himself," which Dr. Legge
discovers in this reply. To me it appears simply to indicate the devout
attitude of his mind, which is evinced by many other passages in his
conversation. In short, though we may complain of the indefinite
character of the faith he taught, and wish that he had expressed
himself more fully, there can scarcely be a doubt that Confucius had
a deeply religious mind; and that he looked with awe and reverence
upon that power which he called by the name of "Heaven," which
controlled the progress of events, and would not suffer the cause of
truth to perish altogether.
It is true, however, that he confined himself chiefly, and indeed
almost entirely, to moral teaching. His main object undoubtedly was
to inculcate upon his friends, and if possible to introduce among the
people at large, those great principles of ethics which he thought
would restore the virtue and well-being of ancient times. Those
principles are aptly summarized in the following verse: "The duties
of universal obligation are five, and the virtues wherewith they are
practiced are three. The duties are those between sovereign and
minister, between father and son, between husband and wife,
between elder brother and younger, and those belonging to the
intercourse of friends. Those five are the duties of universal
obligation. Knowledge, magnanimity, and energy, these three are the
virtues universally binding; and the means by which they carry the
duties into practice is singleness" (Chung Yung, xx. 7). In the
Analects, "Gravity, generosity of soul, sincerity, earnestness, and
kindness," are said to constitute perfect virtue (Lun Yu, xvii. 6).
It is as an earnest and devoted teacher, both by example and by
precept, of these and other virtues, that Confucius must be judged.
And in order to assist the formation of such a judgment, let us take
his doctrine of Reciprocity, to which I shall return in another place.
"Tsze-kung asked, saying, 'Is there one word which may serve as a
rule of practice for all one's life?' The Master said, 'Is not Reciprocity
such a word? What you do not want done to yourself, do not do to
others'" (Lun Yu, xv. 23). On a kindred topic he thus delivered his
opinion: "Some one said, 'What do you say concerning the principle
that injury should be recompensed with kindness?' The Master said,
'With what, then, will you recompense kindness? Recompense injury
with justice, and recompense kindness with kindness'" (Ibid., xiv.
26).
If in the above sentence he may be thought to fall short of the
highest elevation, there are some among his apothegms, the point
and excellence of which have, perhaps, never been surpassed. Take
for instance these:—"The superior man is catholic and no partizan.
The mean man is a partizan and not catholic." "Learning without
thought is labor lost; thought without learning is perilous" (Ibid., ii.
14, 15). Or these:—"I will not be afflicted at men's not knowing me;
I will be afflicted that I do not know men" (Ibid., i. 16). "A scholar,
whose mind is set on truth, and who is ashamed of bad clothes and
bad food, is not fit to be discoursed with" (Ibid., iv. 9). "The superior
man is affable, but not adulatory; the mean is adulatory, but not
affable" (Ibid., xiii. 23). "Where the solid qualities are in excess of
accomplishments, we have rusticity; where the accomplishments are
in excess of the solid qualities, we have the manners of a clerk.
When the accomplishments and solid qualities are equally blended,
we then have the man of complete virtue" (Lun Yu, vi. 16). Lastly, I
will quote one which, with a slight change of terms, might have
emanated from the pen of Thomas Carlyle: "There are three things
of which the superior man stands in awe:—He stands in awe of the
ordinances of heaven; he stands in awe of great men; he stands in
awe of the words of sages. The mean man does not know the
ordinances of heaven, and consequently does not stand in awe of
them. He is disrespectful to great men. He makes sport of the words
of sages" (Ibid., xvi. 8).
These, and various other recorded sayings, go far to explain, if not
to justify, the unbounded admiration of his faithful follower, Tsze-
kung: "Our Master cannot be attained to, just in the same way as
the heavens cannot be gone up to by the steps of a stair. Were our
Master in the position of the prince of a State, or the chief of a
family, we should find verified the description which has been given
of a sage's rule:—he would plant the people, and forthwith they
would be established; he would lead them on, and forthwith they
would follow him; he would make them happy, and forthwith
multitudes would resort to his dominions; he would stimulate them,
and forthwith they would be harmonious. While he lived, he would
be glorious. When he died, he would be bitterly lamented. How is it
possible for him to be attained to?" (Ibid., xix. 25.)

Section II.—Laò-tsé.[13]

Concerning the life of Laò-tsé, the founder of the smallest of the


three sects of China (Confucians, Buddhists, and Taouists), we have
only the most meagre information. Scarcely anything is known either
of his personal character or of his doctrine, except through his book.
His birth-year is unknown to us, and can only be approximately
determined by means of the date assigned to his famous interview
with his great contemporary, Confucius. This occurred in b. c. 517,
when Laò-tsé was very old. He may, therefore, have been born
about the year b. c. 600.[14] All we can say of his career is, that he
held an office in the State of Tseheu, that of "writer (or historian) of
the archives." When visited by Confucius, who was the master of a
rival school, he is said to have addressed him in these terms:
—"Those whom you talk about are dead, and their bones are
mouldered to dust; only their words remain. When the superior man
gets his time, he mounts aloft; but when the time is against him, he
moves as if his feet were entangled. I have heard that a good
merchant, though he has rich treasures deeply stored, appears as if
he were poor; and that the superior man, whose virtue is complete,
is yet to outward seeming stupid. Put away your proud air and many
desires; your insinuating habit and wild will. These are of no
advantage to you. This is all which I have to tell you." After this
interview, Confucius thus expressed his opinion of the older
philosopher to his disciples:—"I know how birds can fly, how fishes
can swim, and how animals can run. But the runner may be snared,
the swimmer may be hooked, and the flyer may be shot by the
arrow. But there is the dragon. I cannot tell how he mounts on the
wind through the clouds and rises to heaven. To-day I have seen
Laò-tsé, and can only compare him to the dragon" (C. C., vol. i.
Proleg. p. 65.—T. T. K., p. liii.—L. T.., p. iv).
Troubles in the State in which he held office induced him to retire,
and to seek the frontier. Here the officer in command requested him
to write a book, the result of which request was the Taò-tĕ-Kīng. "No
one knows," says the Chinese historian, "where he died. Laò-tsé was
a hidden sage" (T. T. K., p. lvi).
To this very scanty historical information we may add such
indications as Laò-tsé himself has given us of his personality. One of
these is contained in the twentieth chapter of his work, in which he
tells us that while other men are radiant with pleasure, he is calm,
like a child that does not yet smile. He wavers to and fro, as one
who knows not where to turn. Other men have abundance; he is as
it were deprived of all. He is like a stupid fellow, so confused does he
feel. Ordinary men are enlightened; he is obscure and troubled in
mind. Like the sea he is forgotten, and driven about like one who
has no certain resting-place. All other men are of use; he alone is
clownish like a peasant. He alone is unlike other men, but he honors
the nursing mother (T. T. K., ch. xx).
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like