0% found this document useful (0 votes)
17 views

Statistical Data Mining Using SAS Applications Second Edition Chapman Hall CRC Data Mining and Knowledge Discovery Series George Fernandez - Get the ebook in PDF format for a complete experience

The document provides information about various ebooks available for instant download on ebookgate.com, focusing on topics related to data mining and statistical analysis. It highlights several titles in the Chapman Hall CRC Data Mining and Knowledge Discovery Series, including works on SAS applications, Bayesian data analysis, and biological knowledge discovery. The series aims to summarize new developments and applications in data mining, integrating mathematical, statistical, and computational methods.

Uploaded by

inmiepfann
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Statistical Data Mining Using SAS Applications Second Edition Chapman Hall CRC Data Mining and Knowledge Discovery Series George Fernandez - Get the ebook in PDF format for a complete experience

The document provides information about various ebooks available for instant download on ebookgate.com, focusing on topics related to data mining and statistical analysis. It highlights several titles in the Chapman Hall CRC Data Mining and Knowledge Discovery Series, including works on SAS applications, Bayesian data analysis, and biological knowledge discovery. The series aims to summarize new developments and applications in data mining, integrating mathematical, statistical, and computational methods.

Uploaded by

inmiepfann
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Statistical Data Mining Using SAS Applications


Second Edition Chapman Hall CRC Data Mining and
Knowledge Discovery Series George Fernandez

https://ebookgate.com/product/statistical-data-mining-using-
sas-applications-second-edition-chapman-hall-crc-data-
mining-and-knowledge-discovery-series-george-fernandez/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Data Mining with R Learning with Case Studies Chapman Hall


CRC Data Mining and Knowledge Discovery Series 1st Edition
Torgo
https://ebookgate.com/product/data-mining-with-r-learning-with-case-
studies-chapman-hall-crc-data-mining-and-knowledge-discovery-
series-1st-edition-torgo/
ebookgate.com

Pattern Discovery Using Sequence Data Mining Applications


and Studies 1st Edition Pradeep Kumar

https://ebookgate.com/product/pattern-discovery-using-sequence-data-
mining-applications-and-studies-1st-edition-pradeep-kumar/

ebookgate.com

Bayesian Data Analysis Second Edition Chapman Hall CRC


Texts in Statistical Science Andrew Gelman

https://ebookgate.com/product/bayesian-data-analysis-second-edition-
chapman-hall-crc-texts-in-statistical-science-andrew-gelman/

ebookgate.com

Data Mining and Knowledge Discovery for Geoscientists 1st


Edition Guangren Shi (Auth.)

https://ebookgate.com/product/data-mining-and-knowledge-discovery-for-
geoscientists-1st-edition-guangren-shi-auth/

ebookgate.com
Data Mining Using SAS Enterprise Miner A Case Study
Approach 2nd Edition

https://ebookgate.com/product/data-mining-using-sas-enterprise-miner-
a-case-study-approach-2nd-edition/

ebookgate.com

Knowledge Discovery with Support Vector Machines Wiley


Series on Methods and Applications in Data Mining 1st
Edition Lutz H. Hamel
https://ebookgate.com/product/knowledge-discovery-with-support-vector-
machines-wiley-series-on-methods-and-applications-in-data-mining-1st-
edition-lutz-h-hamel/
ebookgate.com

Knowledge Discovery Practices and Emerging Applications of


Data Mining Trends and New Domains 1st Edition A.V.
Senthil Kumar
https://ebookgate.com/product/knowledge-discovery-practices-and-
emerging-applications-of-data-mining-trends-and-new-domains-1st-
edition-a-v-senthil-kumar/
ebookgate.com

Data mining and medical knowledge management cases and


applications 1st Edition Petr Berka

https://ebookgate.com/product/data-mining-and-medical-knowledge-
management-cases-and-applications-1st-edition-petr-berka/

ebookgate.com

Biological Knowledge Discovery Handbook Preprocessing


Mining and Postprocessing of Biological Data 1st Edition
Mourad Elloumi
https://ebookgate.com/product/biological-knowledge-discovery-handbook-
preprocessing-mining-and-postprocessing-of-biological-data-1st-
edition-mourad-elloumi/
ebookgate.com
Statistical
Data Mining
Using SAS
Applications
Second Edition

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 1 5/18/10 3:36:35 PM


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

SERIES EDITOR
Vipin Kumar
University of Minnesota
Department of Computer Science and Engineering
Minneapolis, Minnesota, U.S.A

AIMS AND SCOPE


This series aims to capture new developments and applications in data mining and knowledge
discovery, while summarizing the computational tools and techniques useful in data analysis. This
series encourages the integration of mathematical, statistical, and computational methods and
techniques through the publication of a broad range of textbooks, reference works, and hand-
books. The inclusion of concrete examples and applications is highly encouraged. The scope of the
series includes, but is not limited to, titles in the areas of data mining and knowledge discovery
methods and applications, modeling, algorithms, theory and foundations, data and knowledge
visualization, data mining systems and tools, and privacy and security issues.

PUBLISHED TITLES
UNDERSTANDING COMPLEX DATASETS: GEOGRAPHIC DATA MINING AND
DATA MINING WITH MATRIX DECOMPOSITIONS KNOWLEDGE DISCOVERY, SECOND EDITION
David Skillicorn Harvey J. Miller and Jiawei Han
COMPUTATIONAL METHODS OF FEATURE TEXT MINING: CLASSIFICATION, CLUSTERING,
SELECTION AND APPLICATIONS
Huan Liu and Hiroshi Motoda Ashok N. Srivastava and Mehran Sahami
CONSTRAINED CLUSTERING: ADVANCES IN BIOLOGICAL DATA MINING
ALGORITHMS, THEORY, AND APPLICATIONS Jake Y. Chen and Stefano Lonardi
Sugato Basu, Ian Davidson, and Kiri L. Wagstaff
INFORMATION DISCOVERY ON ELECTRONIC
KNOWLEDGE DISCOVERY FOR HEALTH RECORDS
COUNTERTERRORISM AND LAW ENFORCEMENT Vagelis Hristidis
David Skillicorn
TEMPORAL DATA MINING
MULTIMEDIA DATA MINING: A SYSTEMATIC Theophano Mitsa
INTRODUCTION TO CONCEPTS AND THEORY
Zhongfei Zhang and Ruofei Zhang RELATIONAL DATA CLUSTERING: MODELS,
ALGORITHMS, AND APPLICATIONS
NEXT GENERATION OF DATA MINING Bo Long, Zhongfei Zhang, and Philip S. Yu
Hillol Kargupta, Jiawei Han, Philip S. Yu,
Rajeev Motwani, and Vipin Kumar KNOWLEDGE DISCOVERY FROM DATA STREAMS
João Gama
DATA MINING FOR DESIGN AND MARKETING
Yukio Ohsawa and Katsutoshi Yada STATISTICAL DATA MINING USING SAS
APPLICATIONS, SECOND EDITION
THE TOP TEN ALGORITHMS IN DATA MINING George Fernandez
Xindong Wu and Vipin Kumar

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 2 5/18/10 3:36:35 PM


Chapman & Hall/CRC
Data Mining and Knowledge Discovery Series

Statistical
Data Mining
Using SAS
Applications
Second Edition

George Fernandez

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 3 5/18/10 3:36:35 PM


CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2010 by Taylor and Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed in the United States of America on acid-free paper


10 9 8 7 6 5 4 3 2 1

International Standard Book Number-13: 978-1-4398-1076-7 (Ebook-PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface.......................................................................................................... xiii
Acknowledgments.........................................................................................xxi
About the Author....................................................................................... xxiii
1. Data Mining: A Gentle Introduction......................................................1
1.1 Introduction.......................................................................................1
1.2 Data Mining: Why It Is Successful in the IT World...........................2
1.2.1 Availability of Large Databases: Data Warehousing...............2
1.2.2 Price Drop in Data Storage and Efficient Computer
Processing..............................................................................3
1.2.3 New Advancements in Analytical Methodology....................3
1.3 Benefits of Data Mining.....................................................................4
1.4 Data Mining: Users............................................................................4
1.5 Data Mining: Tools............................................................................6
1.6 Data Mining: Steps............................................................................6
1.6.1 Identification of Problem and Defining the Data
Mining Study Goal...............................................................6
1.6.2 Data Processing.....................................................................6
1.6.3 Data Exploration and Descriptive Analysis............................7
1.6.4 Data Mining Solutions: Unsupervised Learning Methods........8
1.6.5 Data Mining Solutions: Supervised Learning Methods.........8
1.6.6 Model Validation...................................................................9
1.6.7 Interpret and Make Decisions..............................................10
1.7 Problems in the Data Mining Process...............................................10
1.8 SAS Software the Leader in Data Mining........................................10
1.8.1 SEMMA: The SAS Data Mining Process............................11
1.8.2 SAS Enterprise Miner for Comprehensive Data Mining
Solution...............................................................................11
1.9 Introduction of User-Friendly SAS Macros for Statistical
Data Mining....................................................................................12
1.9.1 Limitations of These SAS Macros........................................13

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 5 5/18/10 3:36:36 PM


vi ◾ Contents

1.10 Summary..........................................................................................13
References...................................................................................................13
2. Preparing Data for Data Mining...........................................................15
2.1 Introduction.....................................................................................15
2.2 Data Requirements in Data Mining.................................................15
2.3 Ideal Structures of Data for Data Mining.........................................16
2.4 Understanding the Measurement Scale of Variables.........................16
2.5 Entire Database or Representative Sample........................................17
2.6 Sampling for Data Mining...............................................................17
2.6.1 Sample Size..........................................................................18
2.7 User-Friendly SAS Applications Used in Data Preparation...............18
2.7.1 Preparing PC Data Files before Importing into SAS Data.......18
2.7.2 Converting PC Data Files to SAS Datasets Using the
SAS Import Wizard.............................................................20
2.7.3 EXLSAS2 SAS Macro Application to Convert PC Data
Formats to SAS Datasets.....................................................21
2.7.4 Steps Involved in Running the EXLSAS2 Macro................22
2.7.5 Case Study 1: Importing an Excel File Called “Fraud”
to a Permanent SAS Dataset Called “Fraud”.......................24
2.7.6 SAS Macro Applications—RANSPLIT2: Random
Sampling from the Entire Database.....................................25
2.7.7 Steps Involved in Running the RANSPLIT2 Macro...........26
2.7.8 Case Study 2: Drawing Training (400), Validation
(300), and Test (All Left-Over Observations) Samples
from the SAS Data Called “Fraud”......................................30
2.8 Summary..........................................................................................33
References...................................................................................................33

3. Exploratory Data Analysis....................................................................35


3.1 Introduction.....................................................................................35
3.2 Exploring Continuous Variables.......................................................35
3.2.1 Descriptive Statistics............................................................35
3.2.1.1 Measures of Location or Central Tendency.........36
3.2.1.2 Robust Measures of Location..............................36
3.2.1.3 Five-Number Summary Statistics........................37
3.2.1.4 Measures of Dispersion........................................37
3.2.1.5 Standard Errors and Confidence Interval
Estimates.............................................................38
3.2.1.6 Detecting Deviation from Normally
Distributed Data.................................................38
3.2.2 Graphical Techniques Used in EDA
of Continuous Data.............................................................39

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 6 5/18/10 3:36:36 PM


Contents ◾ vii

3.3 Data Exploration: Categorical Variable............................................ 42


3.3.1 Descriptive Statistical Estimates of Categorical Variables.......42
3.3.2 Graphical Displays for Categorical Data..............................43
3.4 SAS Macro Applications Used in Data Exploration......................... 44
3.4.1 Exploring Categorical Variables Using the SAS Macro
FREQ2............................................................................... 44
3.4.1.1 Steps Involved in Running the FREQ2 Macro...... 46
3.4.2 Case Study 1: Exploring Categorical Variables in a SAS
Dataset................................................................................47
3.4.3 EDA Analysis of Continuous Variables Using SAS
Macro UNIVAR2...............................................................49
3.4.3.1 Steps Involved in Running the UNIVAR2
Macro..................................................................51
3.4.4 Case Study 2: Data Exploration of a Continuous
Variable Using UNIVAR2..................................................53
3.4.5 Case Study 3: Exploring Continuous Data by a Group
Variable Using UNIVAR2..................................................58
3.4.5.1 Data Descriptions................................................58
3.5 Summary......................................................................................... 64
References.................................................................................................. 64

4. Unsupervised Learning Methods..........................................................67


4.1 Introduction.....................................................................................67
4.2 Applications of Unsupervised Learning Methods.............................68
4.3 Principal Component Analysis.........................................................69
4.3.1 PCA Terminology...............................................................70
4.4 Exploratory Factor Analysis..............................................................71
4.4.1 Exploratory Factor Analysis versus Principal
Component Analysis...........................................................72
4.4.2 Exploratory Factor Analysis Terminology............................73
4.4.2.1 Communalities and Uniqueness..........................73
4.4.2.2 Heywood Case....................................................73
4.4.2.3 Cronbach Coefficient Alpha................................74
4.4.2.4 Factor Analysis Methods.....................................74
4.4.2.5 Sampling Adequacy Check in Factor
Analysis...............................................................75
4.4.2.6 Estimating the Number of Factors.......................75
4.4.2.7 Eigenvalues..........................................................76
4.4.2.8 Factor Loadings...................................................76
4.4.2.9 Factor Rotation................................................... 77
4.4.2.10 Confidence Intervals and the Significance
of Factor Loading Converge................................78
4.4.2.11 Standardized Factor Scores..................................78

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 7 5/18/10 3:36:36 PM


viii ◾ Contents

4.5 Disjoint Cluster Analysis..................................................................80


4.5.1 Types of Cluster Analysis.....................................................80
4.5.2 FASTCLUS: SAS Procedure to Perform Disjoint
Cluster Analysis...................................................................81
4.6 Biplot Display of PCA, EFA, and DCA Results...............................82
4.7 PCA and EFA Using SAS Macro FACTOR2...................................82
4.7.1 Steps Involved in Running the FACTOR2 Macro...............83
4.7.2 Case Study 1: Principal Component Analysis of 1993
Car Attribute Data............................................................. 84
4.7.2.1 Study Objectives................................................. 84
4.7.2.2 Data Descriptions................................................85
4.7.3 Case Study 2: Maximum Likelihood FACTOR Analysis
with VARIMAX Rotation of 1993 Car Attribute Data.........97
4.7.3.1 Study Objectives..................................................97
4.7.3.2 Data Descriptions................................................97
4.7.3 CASE Study 3: Maximum Likelihood FACTOR
Analysis with VARIMAX Rotation Using a
Multivariate Data in the Form of Correlation Matrix........ 116
4.7.3.1 Study Objectives................................................ 116
4.7.3.2 Data Descriptions.............................................. 117
4.8 Disjoint Cluster Analysis Using SAS Macro DISJCLS2.................121
4.8.1 Steps Involved in Running the DISJCLS2 Macro..............124
4.8.2 Case Study 4: Disjoint Cluster Analysis of 1993 Car
Attribute Data...................................................................125
4.8.2.1 Study Objectives................................................125
4.8.2.2 Data Descriptions..............................................126
4.9 Summary........................................................................................140
References.................................................................................................140

5. Supervised Learning Methods: Prediction..........................................143


5.1 Introduction...................................................................................143
5.2 Applications of Supervised Predictive Methods..............................144
5.3 Multiple Linear Regression Modeling............................................. 145
5.3.1 Multiple Linear Regressions: Key Concepts and
Terminology...................................................................... 145
5.3.2 Model Selection in Multiple Linear Regression.................148
5.3.2.1 Best Candidate Models Selected Based on
AICC and SBC..................................................149
5.3.2.2 Model Selection Based on the New SAS
PROC GLMSELECT.......................................149
5.3.3 Exploratory Analysis Using Diagnostic Plots.....................150
5.3.4 Violations of Regression Model Assumptions....................154
5.3.4.1 Model Specification Error..................................154

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 8 5/18/10 3:36:36 PM


Contents ◾ ix

5.3.4.2 Serial Correlation among the Residual..............154


5.3.4.3 Influential Outliers............................................ 155
5.3.4.4 Multicollinearity................................................ 155
5.3.4.5 Heteroscedasticity in Residual Variance............ 155
5.3.4.6 Nonnormality of Residuals................................156
5.3.5 Regression Model Validation.............................................156
5.3.6 Robust Regression.............................................................156
5.3.7 Survey Regression.............................................................. 157
5.4 Binary Logistic Regression Modeling............................................. 158
5.4.1 Terminology and Key Concepts........................................ 158
5.4.2 Model Selection in Logistic Regression.............................. 161
5.4.3 Exploratory Analysis Using Diagnostic Plots.....................162
5.4.3.1 Interpretation....................................................163
5.4.3.2 Two-Factor Interaction Plots between
Continuous Variables.........................................164
5.4.4 Checking for Violations of Regression Model
Assumptions......................................................................164
5.4.4.1 Model Specification Error..................................164
5.4.4.2 Influential Outlier.............................................164
5.4.4.3 Multicollinearity................................................165
5.4.4.4 Overdispersion...................................................165
5.5 Ordinal Logistic Regression...........................................................165
5.6 Survey Logistic Regression.............................................................166
5.7 Multiple Linear Regression Using SAS Macro REGDIAG2...........167
5.7.1 Steps Involved in Running the REGDIAG2 Macro..........168
5.8 Lift Chart Using SAS Macro LIFT2..............................................169
5.8.1 Steps Involved in Running the LIFT2 Macro....................170
5.9 Scoring New Regression Data Using the SAS Macro RSCORE2..... 170
5.9.1 Steps Involved in Running the RSCORE2 Macro.............171
5.10 Logistic Regression Using SAS Macro LOGIST2...........................172
5.11 Scoring New Logistic Regression Data Using the SAS Macro
LSCORE2......................................................................................173
5.12 Case Study 1: Modeling Multiple Linear Regressions.....................173
5.12.1 Study Objectives................................................................173
5.12.1.1 Step 1: Preliminary Model Selection..................175
5.12.1.2 Step 2: Graphical Exploratory Analysis and
Regression Diagnostic Plots...............................179
5.12.1.3 Step 3: Fitting the Regression Model and
Checking for the Violations of Regression
Assumptions...................................................... 191
5.12.1.4 Remedial Measure: Robust Regression to
Adjust the Regression Parameter Estimates
to Extreme Outliers...........................................203

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 9 5/18/10 3:36:37 PM


x ◾ Contents

5.13 Case Study 2: If–Then Analysis and Lift Charts............................ 206


5.13.1 Data Descriptions............................................................. 208
5.14 Case Study 3: Modeling Multiple Linear Regression
with Categorical Variables..............................................................212
5.14.1 Study Objectives................................................................212
5.14.2 Data Descriptions..............................................................212
5.15 Case Study 4: Modeling Binary Logistic Regression.......................232
5.15.1 Study Objectives................................................................232
5.15.2 Data Descriptions............................................................. 234
5.15.2.1 Step 1: Best Candidate Model Selection............235
5.15.2.2 Step 2: Exploratory Analysis/Diagnostic Plots.....237
5.15.2.3 Step 3: Fitting Binary Logistic Regression.........239
5.16 Case Study: 5 Modeling Binary Multiple Logistic Regression....... 260
5.16.1 Study Objectives............................................................... 260
5.16.2 Data Descriptions..............................................................261
5.17 Case Study: 6 Modeling Ordinal Multiple Logistic Regression..... 286
5.17.1 Study Objectives............................................................... 286
5.17.2 Data Descriptions............................................................. 286
5.18 Summary........................................................................................301
References.................................................................................................301

6. Supervised Learning Methods: Classification.....................................305


6.1 Introduction...................................................................................305
6.2 Discriminant Analysis................................................................... 306
6.3 Stepwise Discriminant Analysis..................................................... 306
6.4 Canonical Discriminant Analysis.................................................. 308
6.4.1 Canonical Discriminant Analysis Assumptions................ 308
6.4.2 Key Concepts and Terminology in Canonical
Discriminant Analysis.......................................................309
6.5 Discriminant Function Analysis..................................................... 310
6.5.1 Key Concepts and Terminology in Discriminant
Function Analysis.............................................................. 310
6.6 Applications of Discriminant Analysis............................................313
6.7 Classification Tree Based on CHAID.............................................313
6.7.1 Key Concepts and Terminology in Classification Tree
Methods............................................................................ 314
6.8 Applications of CHAID................................................................. 316
6.9 Discriminant Analysis Using SAS Macro DISCRIM2................... 316
6.9.1 Steps Involved in Running the DISCRIM2 Macro........... 317
6.10 Decision Tree Using SAS Macro CHAID2.................................... 318
6.10.1 Steps Involved in Running the CHAID2 Macro............... 319

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 10 5/18/10 3:36:37 PM


Contents ◾ xi

6.11 Case Study 1: Canonical Discriminant Analysis and Parametric


Discriminant Function Analysis.....................................................320
6.11.1 Study Objectives................................................................320
6.11.2 Case Study 1: Parametric Discriminant Analysis...............321
6.11.2.1 Canonical Discriminant Analysis (CDA)..........328
6.12 Case Study 2: Nonparametric Discriminant Function Analysis..... 346
6.12.1 Study Objectives............................................................... 346
6.12.2 Data Descriptions..............................................................347
6.13 Case Study 3: Classification Tree Using CHAID...........................363
6.13.1 Study Objectives............................................................... 364
6.13.2 Data Descriptions............................................................. 364
6.14 Summary........................................................................................375
References.................................................................................................376
7. Advanced Analytics and Other SAS Data Mining Resources.............377
7.1 Introduction...................................................................................377
7.2 Artificial Neural Network Methods...............................................378
7.3 Market Basket Analysis..................................................................379
7.3.1 Benefits of MBA................................................................380
7.3.2 Limitations of Market Basket Analysis..............................380
7.4 SAS Software: The Leader in Data Mining.....................................381
7.5 Summary........................................................................................382
References.................................................................................................382

Appendix I: Instruction for Using the SAS Macros...............................383


Appendix II: Data Mining SAS Macro Help Files...................................387
Appendix III: Instruction for Using the SAS Macros with Enterprise
Guide Code Window..........................................................441
Index........................................................................................................... 443

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 11 5/18/10 3:36:37 PM


© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 12 5/18/10 3:36:37 PM


Preface

Objective
The objective of the second edition of this book is to introduce statistical data min-
ing concepts, describe methods in statistical data mining from sampling to decision
trees, demonstrate the features of user-friendly data mining SAS tools and, above
all, allow the book users to download compiled data mining SAS (Version 9.0 and
later) macro files and help them perform complete data mining. The user-friendly
SAS macro approach integrates the statistical and graphical analysis tools available
in SAS systems and provides complete statistical data mining solutions without
writing SAS program codes or using the point-and-click approach. Step-by-step
instructions for using SAS macros and interpreting the results are emphasized in
each chapter. Thus, by following the step-by-step instructions and downloading
the user-friendly SAS macros described in the book, data analysts can perform
complete data mining analysis quickly and effectively.

Why Use SAS Software?


The SAS Institute, the industry leader in analytical and decision support solu-
tions, offers a comprehensive data mining solution that allows you to explore large
quantities of data and discover relationships and patterns that lead to intelligent
decision-making. Enterprise Miner, SAS Institute’s data mining software, offers
an integrated environment for businesses that need to conduct comprehensive data
mining. However, if the Enterprise Miner software is not licensed at your organiza-
tion, but you have license to use other SAS BASE, STAT, and GRAPH modules,
you could still use the power of SAS to perform complete data mining by using the
SAS macro applications included in this book.
Including complete SAS codes in the data mining book for performing com-
prehensive data mining solutions is not very effective because a majority of business
and statistical analysts are not experienced SAS programmers. Quick results from
data mining are not feasible since many hours of code modification and debugging
program errors are required if the analysts are required to work with SAS program

xiii

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 13 5/18/10 3:36:37 PM


xiv ◾ Preface

codes. An alternative to the point-and-click menu interface modules is the user-


friendly SAS macro applications for performing several data mining tasks, which
are included in this book. This macro approach integrates statistical and graphical
tools available in the latest SAS systems (version 9.2) and provides user-friendly data
analysis tools, which allow the data analysts to complete data mining tasks quickly,
without writing SAS programs, by running the SAS macros in the background.
SAS Institute also released a learning edition (LE) of SAS software in recent years
and the readers who have no access to SAS software can buy a personal edition of
SAS LE and enjoy the benefits of these powerful SAS macros (See Appendix 3 for
instructions for using these macros with SAS EG and LE).

Coverage:
The following types of analyses can be performed using the user-friendly SAS macros.

◾◾ Converting PC databases to SAS data


◾◾ Sampling techniques to create training and validation samples
◾◾ Exploratory graphical techniques:
−− Univariate analysis of continuous response
−− Frequency data analysis for categorical data
◾◾ Unsupervised learning:
−− Principal component
−− Factor and cluster analysis
−− k-mean cluster analysis
−− Biplot display
◾◾ Supervised learning: Prediction
−− Multiple regression models
• Partial and VIF plots, plots for checking data and model problems
• Lift charts
• Scoring
• Model validation techniques
−− Logistic regression
• Partial delta logit plots, ROC curves false positive/negative plots
• Lift charts
◾◾ Model validation techniques
Supervised learning: Classification
−− Discriminant analysis
• Canonical discriminant analysis—biplots
• Parametric discriminant analysis
• Nonparametric discriminant analysis
• Model validation techniques
−− CHAID—decisions tree methods
• Model validation techniques

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 14 5/18/10 3:36:37 PM


Preface ◾ xv

Why Do I Believe the Book Is Needed?


During the last decade, there has been an explosion in the field of data warehousing
and data mining for knowledge discovery. The challenge of understanding data has
led to the development of new data mining tools. Data-mining books that are cur-
rently available mainly address data-mining principles but provide no instructions
and explanations to carry out a data-mining project. Also, many existing data ana-
lysts are interested in expanding their expertise in the field of data-mining and are
looking for how-to books on data mining by using the power of the SAS STAT and
GRAPH modules. Business school and health science instructors teaching in MBA
programs or MPH are currently incorporating data mining into their curriculum and
are looking for how-to books on data mining using the available software. Therefore,
this second edition book on statistical data mining, using SAS macro applications,
easily fills the gap and complements the existing data-mining book market.

Key Features of the Book


No SAS programming experience required: This is an essential how-to guide, espe-
cially suitable for data analysts to practice data mining techniques for knowl-
edge discovery. Thirteen very unique user-friendly SAS macros to perform
statistical data mining are described in the book. Instructions are given in the
book in regard to downloading the compiled SAS macro files, macro-call file,
and running the macro from the book’s Web site. No experience in modify-
ing SAS macros or programming with SAS is needed to run these macros.
Complete analysis in less than 10 min.: After preparing the data, complete predic-
tive modeling, including data exploration, model fitting, assumption checks,
validation, and scoring new data, can be performed on SAS datasets in less
than 10 min.
SAS enterprise minor not required: The user-friendly macros work with the
standard SAS modules: BASE, STAT, GRAPH, and IML. No additional
SAS modules or the SAS enterprise miner is required.
No experience in SAS ODS required: Options are available in the SAS mac-
ros included in the book to save data mining output and graphics in RTF,
HTML, and PDF format using SAS new ODS features.
More than 150 figures included in this second edition: These statistical data min-
ing techniques stress the use of visualization to thoroughly study the struc-
ture of data and to check the validity of statistical models fitted to data. This
allows readers to visualize the trends and patterns present in their database.

Textbook or a Supplementary Lab Guide


This book is suitable for adoption as a textbook for a statistical methods course in
statistical data mining and research methods. This book provides instructions and

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 15 5/18/10 3:36:37 PM


xvi ◾ Preface

tools for quickly performing a complete exploratory statistical method, regression


analysis, logistic regression multivariate methods, and classification analysis. Thus,
it is ideal for graduate level statistical methods courses that use SAS software.
Some examples of potential courses:

◾◾ Biostatistics
◾◾ Research methods in public health
◾◾ Advanced business statistics
◾◾ Applied statistical methods
◾◾ Research methods
◾◾ Advanced data analysis

What Is New in the Second Edition?


◾◾ Active internet connection is no longer required to run these macros: After down-
loading the compiled SAS macros and the mac-call files and installing them
in the C:\ drive, users can access these macros directly from their desktop.
◾◾ Compatible with version 9 : All the SAS macros are compatible with SAS ver-
sion 9.13 and 9.2 Windows (32 bit and 64 bit).
◾◾ Compatible with SAS EG: Users can run these SAS macros in SAS Enterprise
Guide (4.1 and 4.2) code window and in SAS learning Edition 4.1 by using
the special macro-call files and special macro files included in the download-
able zip file. (See Appendixes 1 and 3 for more information.)
◾◾ Convenient help file location: The help files for all 13 included macros are now
separated from the chapter and included in Appendix 2.
◾◾ Publication quality graphics: Vector graphics format such as EMF can be gen-
erated when output file format TXT is chosen. Interactive ActiveX graphics
can be produced when Web output format is chosen.
◾◾ Macro-call error check: The macro-call input values are copied to the first 10
title statements in the first page of the output files. This will help to track the
macro input errors quickly.

Additionally the following new features are included in the SAS-specific macro
application:

I. Chapter 2
a. Converting PC data files to SAS data (EXLSAS2 macro)
−− All numeric (m) and categorical variables (n) in the Excel file are converted to
X1-Xm and C1-Cn, respectively. However, the original column names will be
used as the variable labels in the SAS data. This new feature helps to maximize
the power of the user-friendly SAS macro applications included in the book.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 16 5/18/10 3:36:37 PM


Preface ◾ xvii

−− Options for renaming any X1-X n or C1-C n variables in a SAS data step are
available in EXLSAS2 macro application.
−− Using SAS ODS graphics features in version 9.2, frequency distribu-
tion display of all categorical variables will be generated when WORD,
HTML, PDF, and TXT format are selected as output file formats.
b. Randomly splitting data (RANSPLIT2)
−− Many different sampling methods such as simple random sampling, stratified
random sampling, systematic random sampling, and unrestricted random
sampling are implemented using the SAS SURVEYSELECT procedure.

II. Chapter 3
a. Frequency analysis (FREQ2)
−− For one-way frequency analysis, the Gini and Entropy indexes are
reported automatically.
−− Confidence interval estimates for percentages in frequency tables are
automatically generated using the SAS SURVEYFREQ procedure. If
survey weights are specified, then these confidence interval estimates are
adjusted for survey sampling and design structures.
b. Univariate analysis (UNIVAR2)
−− If survey weights are specified, then the reported confidence interval
estimates are adjusted for survey sampling and design structures using
SURVEYMEAN procedure.

III. Chapter 4
a. PCA and factor analysis (FACTOR2)
−− PCA and factor analysis can be performed using the covariance matrix.
−− Estimation of Cronbach coefficient alpha and their 95% confidence inter-
vals when performing latent factor analysis.
−− Factor pattern plots (New 9.2: statistical graphics feature) before and
after rotation.
−− Assessing the significance and the nature of factor loadings (New 9.2:
statistical graphics feature).
−− Confidence interval estimates for factor loading when ML factor analysis
is used.
b. Disjoint cluster analysis (DISJCLUS2)

IV. Chapter 5
a. Multiple linear regressions (REGDIAG2)
−− Variable screening step using GLMSELECT and best candidate model
selection using AICC and SBC.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 17 5/18/10 3:36:37 PM


xviii ◾ Preface

−− Interaction diagnostic plots for detecting significant interaction between


two continuous variables or between a categorical and continuous
variable.
−− Options are implemented to run the ROBUST regression using SAS
ROBUSTREG when extreme outliers are present in the data.
−− Options are implemented to run SURVEYREG regression using SAS
SURVEYREG when the data is coming from a survey data and the
design weights are available.

b. Logistic regression (LOGIST2)


−− Best candidate model selection using AICC and SBC criteria by compar-
ing all possible combination of models within an optimum number of
subsets determined by the sequential step-wise selection using AIC.
−− Interaction diagnostic plots for detecting significant interaction between two
continuous variables or between a categorical and continuous variable.
−− LIFT charts for assessing the overall model fit are automatically generated.
−− Options are implemented to run survey logistic regression using SAS
PROC SURVEYLOGISTIC when the data is coming from a survey data
and the design weights are available.

V. Chapter 6

CHAID analysis (CHAID2)


−− Large data (>1000 obs) can be used.
−− Variable selection using forward and stepwise selection and backward
elimination methods.
−− New SAS SGPLOT graphics are used in data exploration.

Potential Audience
◾◾ This book is suitable for SAS data analysts, who need to apply data mining
techniques using existing SAS modules for successful data mining, without
investing a lot of time in buying new software products, or spending time on
additional software learning.
◾◾ Graduate students in business, health sciences, biological, engineering, and
social sciences can successfully complete data analysis projects quickly using
these SAS macros.
◾◾ Big business enterprises can use data mining SAS macros in pilot studies
involving the feasibility of conducting a successful data mining endeavor
before investing big bucks on full-scale data mining using SAS EM.
◾◾ Finally, any SAS users who want to impress their boss can do so with quick and
complete data analysis, including fancy reports in PDF, RTF, or HTML format.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 18 5/18/10 3:36:38 PM


Preface ◾ xix

Additional Resources
Book’s Web site: A Web site has been setup at http://www.cabnr.unr.edu/gf/dm.
Users can find information in regard to downloading the sample data files used in
the book, and additional reading materials. Users are also encouraged to visit this
page for information on any errors in the book, SAS macro updates, and links for
additional resources.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 19 5/18/10 3:36:38 PM


© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 20 5/18/10 3:36:38 PM


Acknowledgments

I am indebted to many individuals who have directly and indirectly contributed


to the development of this book. I am grateful to my professors, colleagues,
and my former and present students who have presented me with consulting
problems over the years that have stimulated me to develop this book and
the accompanying SAS macros. I would also like to thank the University of
Nevada–Reno and the Center for Research Design and Analysis faculty and
staff for their support during the time I spent on writing the book and in revis-
ing the SAS macros.
I have received constructive comments about this book from many CRC Press
anonymous reviewers, whose advice has greatly improved this edition. I would like
to acknowledge the contribution of the CRC Press staff from the conception to the
completion of this book. I would also like to thank the SAS Institute for providing
me with an opportunity to continuously learn about this powerful software for the
past 23 years and allowing me to share my SAS knowledge with other users.
I owe a great debt of gratitude to my family for their love and support as well
as their great sacrifice during the last 12 months while I was working on this book.
I cannot forget to thank my late dad, Pancras Fernandez, and my late grandpa,
George Fernandez, for their love and support, which helped me to take challeng-
ing projects and succeed. Finally, I would like to thank the most important person
in my life, my wife Queency Fernandez, for her love, support, and encouragement
that gave me the strength to complete this book project within the deadline.

George Fernandez
University of Nevada-Reno
[email protected]

xxi

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 21 5/18/10 3:36:38 PM


© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 22 5/18/10 3:36:38 PM


About the Author

George Fernandez, Ph.D., is a professor of applied statistical methods and serves


as the director of the Reno Center for Research Design and Analysis, University of
Nevada. His publications include an applied statistics book, a CD-Rom, 60 journal
papers, and more than 30 conference proceedings. Dr. Fernandez has more than 23
years of experience teaching applied statistics courses and SAS programming.
He has won several best-paper and poster presentation awards at regional and
international conferences. He has presented several invited full-day workshops on
applications of user-friendly statistical methods in data mining for the American
Statistical Association, including the joint meeting in Atlanta (2001); Western SAS*
users conference in Arizona (2000), in San Diego (2002) and San Jose (2005); and
at the 56th Deming’s conference, Atlantic City (2003). He was keynote speaker
and workshop presenter for the 16th Conference on Applied Statistics, Kansas State
University, and full-day workshop presenter at the 57th session of the International
Statistical Institute conference at Durbin, South Africa (2009). His recent paper,
“A new and simpler way to calculate body’s Maximum Weight Limit–BMI made
simple,” has received worldwide recognition.

* This was originally an acronym for statistical analysis system. Since its founding and adoption
of the term as its trade name, the SAS Institute, headquartered in North Carolina, has consid-
erably broadened its scope.

xxiii

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 23 5/18/10 3:36:38 PM


© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 24 5/18/10 3:36:38 PM


Chapter 1

Data Mining: A Gentle


Introduction

1.1 Introduction
Data mining, or knowledge discovery in databases (KDD), is a powerful infor-
mation technology tool with great potential for extracting previously unknown
and potentially useful information from large databases. Data mining automates
the process of finding relationships and patterns in raw data and delivers results
that can either be utilized in an automated decision support system or assessed by
decision makers. Many successful enterprises practice data mining for intelligent
decision making.1 Data mining allows the extraction of nuggets of knowledge
from business data that can help enhance customer relationship management
(CRM)2 and can help estimate the return on investment (ROI).3 Using power-
ful advanced analytical techniques, data mining enables institutions to turn raw
data into valuable information and thus gain a critical competitive advantage.
With data mining, the possibilities are endless. Although data mining appli-
cations are popular among forward-thinking businesses, other disciplines that
maintain large databases could reap the same benefits from properly carried out
data mining. Some of the potential applications of data mining include charac-
terizations of genes in animal and plant genomics, clustering and segmentations
in remote sensing of satellite image data, and predictive modeling in wildfire inci-
dence databases.
The purpose of this chapter is to introduce data mining concepts, provide some
examples of data mining applications, list the most commonly used data min-
ing techniques, and briefly discuss the data mining applications available in the

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 1 5/18/10 3:36:38 PM


2 ◾ Statistical Data Mining Using SAS Applications

SAS software. For a thorough discussion of data mining concept, methods, and
applications, see the following publications.4–6

1.2 Data Mining: Why it is Successful in the IT World


In today’s world, we are overwhelmed with data and information from various sources.
Advances in the field of IT make the collection of data easier than ever before. A busi-
ness enterprise has various systems such as transaction processing system, HR manage-
ment system, accounting system, and so on, and each of these systems collects huge
piles of data everyday. Data mining is an important part of business intelligence that
deals with how an organization uses, analyzes, manages, and stores data it collects
from various sources to make better decisions. Businesses that have already invested in
business intelligence solutions will be in a better position to undertake right measures
to survive and continue its growth. Data mining solutions provide an analytical insight
into the performance of an organization based on historical data, but the economic
impact on an organization is linked to many issues and, in many cases, to external
forces and unscrupulous activities. The failure to predict this does not undermine the
role of data mining for organizations, but on the contrary, makes it more important,
especially for regulatory bodies of governments, to predict and identify such practices
in advance and take necessary measures to avoid such circumstances in future. The
main components of data mining success are described in the following subsections.

1.2.1 Availability of Large Databases: Data Warehousing


Data mining derives its name from the fact that analysts search for valuable informa-
tion in gigabytes of huge databases. For the past two decades, we have seen a dramatic
increase—at an explosive rate—in the amount of data being stored in electronic
format. The increase in the use of electronic data-gathering devices such as point-
of-sale, Web logging, or remote sensing devices has contributed to this explosion of
available data. The amount of data accumulated each day by various businesses and
scientific and governmental organizations around the world is daunting. With data
warehousing, business enterprises can collect data from any source within or outside
the organization, reorganize the data, and place it in new dynamic storage for effi-
cient utilization. Business enterprises of all kinds now computerize all their business
activities and their abilities to manage their valuable data resources. One hundred
gigabytes of databases are now common, and terabyte (1000 GB) databases are now
feasible in enterprises. Data warehousing techniques enable forward-thinking busi-
nesses to collect, save, maintain, and retrieve data in a more productive way.
Data warehousing (DW) collects data from many different sources, reorga-
nizes it, and stores it within a readily accessible repository that DW should support
relational, hierarchical, and multidimensional database management systems, and
is designed specifically to meet the needs of data mining. A DW can be loosely

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 2 5/18/10 3:36:38 PM


Data Mining ◾ 3

defined as any centralized data repository that makes it possible to extract archived
operational data and overcome inconsistencies between different data formats.
Thus, data mining and knowledge discovery from large databases become feasible
and productive with the development of cost-effective data warehousing.
A successful data warehousing operation should have the potential to integrate
data wherever it is located and whatever its format. It should provide the busi-
ness analyst with the ability to quickly and effectively extract data tables, resolve
data quality problems, and integrate data from different sources. If the quality of
the data is questionable, then business users and decision makers cannot trust the
results. In order to fully utilize data sources, data warehousing should allow you
to make use of your current hardware investments, as well as provide options for
growth as your storage needs expand. Data warehousing systems should not limit
customer choices, but instead should provide a flexible architecture that accommo-
dates platform-independent storage and distributed processing options.
Data quality is a critical factor for the success of data warehousing projects.
If business data is of an inferior quality, then the business analysts who query the
database and the decision makers who receive the information cannot trust the
results. The quality of individual records is necessary to ensure that the data is
accurate, updated, and consistently represented in the data warehousing.

1.2.2 Price Drop in Data Storage and


Efficient Computer Processing
Data warehousing became easier, more efficient, and cost-effective as the cost of
data processing and database development dropped. The need for improved and
effective computer processing can now be met in a cost-effective manner with par-
allel multiprocessor computer technology. In addition to the recent enhancement
of exploratory graphical statistical methods, the introduction of new machine-
learning methods based on logic programming, artificial intelligence, and genetic
algorithms have opened the doors for productive data mining. When data mining
tools are implemented on high-performance parallel-processing systems, they can
analyze massive databases in minutes. Faster processing means that users can auto-
matically experiment with more models to understand complex data. High speed
makes it practical for users to analyze huge quantities of data.

1.2.3 New Advancements in Analytical Methodology


Data mining algorithms embody techniques that have existed for at least 10 years,
but have only recently been implemented as mature, reliable, understandable tools
that consistently outperform older methods. Advanced analytical models and algo-
rithms, including data visualization and exploration, segmentation and cluster-
ing, decision trees, neural networks, memory-based reasoning, and market basket

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 3 5/18/10 3:36:38 PM


4 ◾ Statistical Data Mining Using SAS Applications

analysis, provide superior analytical depth. Thus, quality data mining is now fea-
sible with the availability of advanced analytical solutions.

1.3 Benefits of Data Mining


For businesses that use data mining effectively, the payoffs can be huge. By applying
data mining effectively, businesses can fully utilize data about customers’ buying
patterns and behavior, and can gain a greater understanding of customers’ motiva-
tions to help reduce fraud, forecast resource use, increase customer acquisition, and
halt customer attrition. After a successful implementation of data mining, one can
sweep through databases and identify previously hidden patterns in one step. An
example of pattern discovery is the analysis of retail sales data to identify seem-
ingly unrelated products that are often purchased together. Other pattern discov-
ery problems include detecting fraudulent credit card transactions and identifying
anomalous data that could represent data entry keying errors. Some of the specific
benefits associated with successful data mining are listed here:

◾◾ Increase customer acquisition and retention.


◾◾ Uncover and reduce frauds (determining if a particular transaction is out of the
normal range of a person’s activity and flagging that transaction for verification).
◾◾ Improve production quality, and minimize production losses in manufacturing.
◾◾ Increase upselling (offering customers a higher level of services or products
such as a gold credit card versus a regular credit card) and cross-selling (selling
customers more products based on what they have already bought).
◾◾ Sell products and services in combinations based on market-basket analysis (by
determining what combinations of products are purchased at a given time).

1.4 Data Mining: Users


A wide range of companies have deployed successful data mining applications recently.1
While the early adopters of data mining belong mainly to information-intensive indus-
tries such as financial services and direct mail marketing, the technology is applicable
to any institution looking to leverage a large data warehouse to extract information
that can be used in intelligent decision making. Data mining applications reach across
industries and business functions. For example, telecommunications, stock exchanges,
credit card, and insurance companies use data mining to detect fraudulent use of their
services; the medical industry uses data mining to predict the effectiveness of surgical
procedures, diagnostic medical tests, and medications; and retailers use data mining
to assess the effectiveness of discount coupons and sales’ promotions. Data mining has
many varied fields of application, some of which are listed as follows:

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 4 5/18/10 3:36:38 PM


Data Mining ◾ 5

◾◾ Retail/Marketing : An example of pattern discovery in retail sales is to iden-


tify seemingly unrelated products that are often purchased together. Market-
basket analysis is an algorithm that examines a long list of transactions in
order to determine which items are most frequently purchased together. The
results can be useful to any company that sells products, whether it is in a
store, a catalog, or directly to the customer.
◾◾ Banking : A credit card company can leverage its customer transaction data-
base to identify customers most likely to be interested in a new credit product.
Using a small test mailing, the characteristics of customers with an affinity
for the product can be identified. Data mining tools can also be used to
detect patterns of fraudulent credit card use, including detecting fraudulent
credit card transactions and identifying anomalous data that could represent
data entry keying errors. It identifies “loyal” customers, predicts customers
likely to change their credit card affiliation, determines credit card spend-
ing by customer groups, finds hidden correlations between different financial
indicators, and can identify stock trading rules from historical market data.
It also finds hidden correlations between different financial indicators and
identifies stock trading rules from historical market data.
◾◾ Insurance and health care: It claims analysis—that is, which medical procedures
are claimed together. It predicts which customers will buy new policies, identi-
fies behavior patterns of risky customers, and identifies fraudulent behavior.
◾◾ Transportation: State departments of transportation and federal highway
institutes can develop performance and network optimization models to pre-
dict the life-cycle cost of road pavement.
◾◾ Product manufacturing companies : They can apply data mining to improve
their sales process to retailers. Data from consumer panels, shipments, and
competitor activity can be applied to understand the reasons for brand
and store switching. Through this analysis, manufacturers can select pro-
motional strategies that best reach their target customer segments. The
distribution schedules among outlets can be determined, loading patterns
can be analyzed, and the distribution schedules among outlets can be
determined.
◾◾ Health care and pharmaceutical industries: Pharmaceutical companies can
analyze their recent sales records to improve their targeting of high-value
physicians and determine which marketing activities will have the greatest
impact in the next few months. The ongoing, dynamic analysis of the data
warehouse allows the best practices from throughout the organization to be
applied in specific sales situations.
◾◾ Internal Revenue Service (IRS) and Federal Bureau of Investigation (FBI): The
IRS uses data mining to track federal income tax frauds. The FBI uses data
mining to detect any unusual pattern or trends in thousands of field reports
to look for any leads in terrorist activities.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 5 5/18/10 3:36:38 PM


6 ◾ Statistical Data Mining Using SAS Applications

1.5 Data Mining: Tools


All data mining methods used now have evolved from the advances in computer
engineering, statistical computation, and database research. Data mining meth-
ods are not considered to replace traditional statistical methods but extend the
use of statistical and graphical techniques. Once it was thought that automated
data mining tools would eliminate the need for statistical analysts to build pre-
dictive models. However, the value that an analyst provides cannot be automated
out of existence. Analysts will still be needed to assess model results and validate
the plausibility of the model predictions. Since data mining software lacks the
human experience and intuition to recognize the difference between a relevant
and irrelevant correlation, statistical analysts will remain in great demand.

1.6 Data Mining: Steps


1.6.1 Identification of Problem and Defining
the Data Mining Study Goal
One of the main causes of data mining failure is not defining the study goals based
on short- and long-term problems facing the enterprise. The data mining specialist
should define the study goal in clear and sensible terms of what the enterprise hopes
to achieve and how data mining can help. Well-identified study problems lead to
formulated data mining goals, and data mining solutions geared toward measur-
able outcomes.4

1.6.2 Data Processing
The key to successful data mining is using the right data. Preparing data for mining
is often the most time-consuming aspect of any data mining endeavor. A typical
data structure suitable for data mining should contain observations (e.g., custom-
ers and products) in rows and variables (demographic data and sales history) in
columns. Also, the measurement levels (interval or categorical) of each variable in
the dataset should be clearly defined. The steps involved in preparing the data for
data mining are as follows:

Preprocessing: This is the data-cleansing stage, where certain information that is


deemed unnecessary and may slow down queries is removed. Also, the data is
checked to ensure that a consistent format (different types of formats used in
dates, zip codes, currency, units of measurements, etc.) exists. There is always
the possibility of having inconsistent formats in the database because the data
is drawn from several sources. Data entry errors and extreme outliers should
be removed from the dataset since influential outliers can affect the modeling
results and subsequently limit the usability of the predicted models.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 6 5/18/10 3:36:38 PM


Data Mining ◾ 7

Data integration: Combining variables from many different data sources is an


essential step since some of the most important variables are stored in differ-
ent data marts (customer demographics, purchase data, and business trans-
action). The uniformity in variable coding and the scale of measurements
should be verified before combining different variables and observations from
different data marts.
Variable transformation: Sometimes, expressing continuous variables in stan-
dardized units, or in log or square-root scale, is necessary to improve the
model fit that leads to improved precision in the fitted models. Missing value
imputation is necessary if some important variables have large proportions of
missing values in the dataset. Identifying the response (target) and the predic-
tor (input) variables and defining their scale of measurement are important
steps in data preparation since the type of modeling is determined by the
characteristics of the response and the predictor variables.
Splitting database: Sampling is recommended in extremely large databases
because it significantly reduces the model training time. Randomly splitting
the data into “training,” “validation,” and “testing” is very important in cali-
brating the model fit and validating the model results. Trends and patterns
observed in the training dataset can be expected to generalize the complete
database if the training sample used sufficiently represents the database.

1.6.3 Data Exploration and Descriptive Analysis


Data exploration includes a set of descriptive and graphical tools that allow explora-
tion of data visually both as a prerequisite to more formal data analysis and as an
integral part of formal model building. It facilitates discovering the unexpected as
well as confirming the expected. The purpose of data visualization is pretty simple:
let the user understand the structure and dimension of the complex data matrix.
Since data mining usually involves extracting “hidden” information from a data-
base, the understanding process can get a bit complicated. The key is to put users
in a context they feel comfortable in, and then let them poke and prod until they
understand what they did not see before. Understanding is undoubtedly the most
fundamental motivation to visualizing the model.
Simple descriptive statistics and exploratory graphics displaying the distribution
pattern and the presence of outliers are useful in exploring continuous variables.
Descriptive statistical measures such as the mean, median, range, and standard
deviation of continuous variables provide information regarding their distribu-
tional properties and the presence of outliers. Frequency histograms display the
distributional properties of the continuous variable. Box plots provide an excellent
visual summary of many important aspects of a distribution. The box plot is based
on the 5-number summary plot that is based on the median, quartiles, and extreme
values. One-way and multiway frequency tables of categorical data are useful in

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 7 5/18/10 3:36:38 PM


8 ◾ Statistical Data Mining Using SAS Applications

summarizing group distributions, relationships between groups, and checking for


rare events. Bar charts show frequency information for categorical variables and dis-
play differences among the different groups in them. Pie charts compare the levels
or classes of a categorical variable to each other and to the whole. They use the size
of pie slices to graphically represent the value of a statistic for a data range.

1.6.4 Data Mining Solutions: Unsupervised Learning Methods


Unsupervised learning methods are used in many fields under a wide variety of
names. No distinction between the response and predictor variable is made in unsu-
pervised learning methods. The most commonly practiced unsupervised methods
are latent variable models (principal component and factor analyses), disjoint clus-
ter analyses, and market-basket analysis.

◾◾ Principal component analysis (PCA): In PCA, the dimensionality of multi-


variate data is reduced by transforming the correlated variables into linearly
transformed uncorrelated variables.
◾◾ Factor analysis (FA): In FA, a few uncorrelated hidden factors that explain the
maximum amount of common variance and are responsible for the observed
correlation among the multivariate data are extracted.
◾◾ Disjoint cluster analysis (DCA): It is used for combining cases into groups
or clusters such that each group or cluster is homogeneous with respect to
certain attributes.
◾◾ Association and market-basket analysis: Market-basket analysis is one of the
most common and useful types of data analysis for marketing. Its purpose
is to determine what products customers purchase together. Knowing what
products consumers purchase as a group can be very helpful to a retailer or
to any other company.

1.6.5 Data Mining Solutions: Supervised Learning Methods


The supervised predictive models include both classification and regression models.
Classification models use categorical response, whereas regression models use con-
tinuous and binary variables as targets. In regression, we want to approximate the
regression function, while in classification problems, we want to approximate the
probability of class membership as a function of the input variables. Predictive mod-
eling is a fundamental data mining task. It is an approach that reads training data
composed of multiple input variables and a target variable. It then builds a model that
attempts to predict the target on the basis of the inputs. After this model is developed,
it can be applied to new data that is similar to the training data, but that does not
contain the target.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 8 5/18/10 3:36:39 PM


Data Mining ◾ 9

◾◾ Multiple linear regressions (MLRs): In MLR, the association between the two
sets of variables is described by a linear equation that predicts the continuous
response variable from a function of predictor variables.
◾◾ Logistic regressions: It allows a binary or an ordinal variable as the response
variable and allows the construction of more complex models rather than
straight linear models.
◾◾ Neural net (NN) modeling: It can be used for both prediction and classifica-
tion. NN models enable the construction of train and validate multiplayer
feed-forward network models for modeling large data and complex interac-
tions with many predictor variables. NN models usually contain more param-
eters than a typical statistical model, and the results are not easily interpreted
and no explicit rationale is given for the prediction. All variables are treated
as numeric, and all nominal variables are coded as binary. Relatively more
training time is needed to fit the NN models.
◾◾ Classification and regression tree (CART ): These models are useful in
generating binary decision trees by splitting the subsets of the dataset
using all predictor variables to create two child nodes repeatedly, begin-
ning with the entire dataset. The goal is to produce subsets of the data
that are as homogeneous as possible with respect to the target variable.
Continuous, binary, and categorical variables can be used as response
variables in CART.
◾◾ Discriminant function analysis: This is a classification method used to deter-
mine which predictor variables discriminate between two or more natu-
rally occurring groups. Only categorical variables are allowed to be the
response variable, and both continuous and ordinal variables can be used as
predictors.
◾◾ CHAID decision tree (Chi-square Automatic Interaction Detector): This is a
classification method used to study the relationships between a categorical
response measure and a large series of possible predictor variables, which may
interact among one another. For qualitative predictor variables, a series of chi-
square analyses are conducted between the response and predictor variables
to see if splitting the sample based on these predictors leads to a statistically
significant discrimination in the response.

1.6.6 Model Validation
Validating models obtained from training datasets by independent validation data-
sets is an important requirement in data mining to confirm the usability of the
developed model. Model validation assess the quality of the model fit and protect
against overfitted or underfitted models. Thus, it could be considered as the most
important step in the model-building sequence.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 9 5/18/10 3:36:39 PM


10 ◾ Statistical Data Mining Using SAS Applications

1.6.7 Interpret and Make Decisions


Decision making is one of the most critical steps for any successful business. No
matter how good you are at making decisions, you know that making an intel-
ligent decision is difficult. The patterns identified by the data mining solutions
can be interpreted into knowledge, which can then be used to support business
decision making.

1.7 Problems in the Data Mining Process


Many of the so-called data mining solutions currently available on the market
today either do not integrate well, are not scalable, or are limited to one or two
modeling techniques or algorithms. As a result, highly trained quantitative experts
spend more time trying to access, prepare, and manipulate data from disparate
sources, and less time modeling data and applying their expertise to solve busi-
ness problems. And the data mining challenge is compounded even further as the
amount of data and complexity of the business problems increase. It is usual for the
database to often be designed for purposes different from data mining, so proper-
ties or attributes that would simplify the learning task are not present, nor can they
be requested from the real world.
Data mining solutions rely on databases to provide the raw data for modeling,
and this raises problems in that databases tend to be dynamic, incomplete, noisy,
and large. Other problems arise as a result of the adequacy and relevance of the
information stored. Databases are usually contaminated by errors, so it cannot be
assumed that the data they contain is entirely correct. Attributes, which rely on
subjective or measurement judgments, can give rise to errors in such a way that
some examples may even be misclassified. Errors in either the values of attributes
or class information are known as noise. Obviously, where possible, it is desirable to
eliminate noise from the classification information as this affects the overall accu-
racy of the generated rules. Therefore, adopting a software system that provides a
complete data mining solution is crucial in the competitive environment.

1.8 SAS Software the Leader in Data Mining


SAS Institute,7 the industry leader in analytical and decision-support solutions,
offers a comprehensive data mining solution that allows you to explore large quanti-
ties of data and discover relationships and patterns that lead to proactive decision
making. The SAS data mining solution provides business technologists and quan-
titative experts the necessary tools to obtain the enterprise knowledge for helping
their organizations to achieve a competitive advantage.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 10 5/18/10 3:36:39 PM


Data Mining ◾ 11

1.8.1 SEMMA: The SAS Data Mining Process


The SAS data mining solution is considered a process rather than a set of analytical
tools. The acronym SEMMA8 refers to a methodology that clarifies this process.
Beginning with a statistically representative sample of your data, SEMMA makes it
easy to apply exploratory statistical and visualization techniques, select and trans-
form the most significant predictive variables, model the variables to predict out-
comes, and confirm a model’s accuracy. The steps in the SEMMA process include
the following:

Sample your data by extracting a portion of a large dataset big enough to contain
the significant information, and yet small enough to manipulate quickly.
Explore your data by searching for unanticipated trends and anomalies in order
to gain understanding and ideas.
Modify your data by creating, selecting, and transforming the variables to focus
on the model selection process.
Model your data by allowing the software to search automatically for a combina-
tion of data that reliably predicts a desired outcome.
Assess your data by evaluating the usefulness and reliability of the findings from
the data mining process.

By assessing the results gained from each stage of the SEMMA process, you can
determine how to model new questions raised by the previous results, and thus pro-
ceed back to the exploration phase for additional refinement of the data. The SAS
data mining solution integrates everything you need for discovery at each stage of
the SEMMA process: These data mining tools indicate patterns or exceptions and
mimic human abilities for comprehending spatial, geographical, and visual infor-
mation sources. Complex mining techniques are carried out in a totally code-free
environment, allowing you to concentrate on the visualization of the data, discov-
ery of new patterns, and new questions to ask.

1.8.2 SAS Enterprise Miner for Comprehensive


Data Mining Solution
Enterprise Miner,9,10 SAS Institute’s enhanced data mining software, offers an inte-
grated environment for businesses that need to conduct comprehensive data mining.
Enterprise Miner combines a rich suite of integrated data mining tools, empower-
ing users to explore and exploit huge databases for strategic business advantages.
In a single environment, Enterprise Miner provides all the tools needed to match
robust data mining techniques to specific business problems, regardless of the
amount or source of data, or complexity of the business problem. However, many
small business, nonprofit institutions, and academic universities are still currently

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 11 5/18/10 3:36:39 PM


12 ◾ Statistical Data Mining Using SAS Applications

not using the SAS Enterprise Miner, but they are licensed to use SAS BASE, STAT,
and GRAPH modules. Thus, these user-friendly SAS macro applications for data
mining are targeted at this group of customers. Also, providing the complete SAS
codes for performing comprehensive data mining solutions is not very effective
because a majority of the business and statistical analysts are not experienced SAS
programmers. Quick results from data mining are not feasible since many hours
of code modification and debugging program errors are required if the analysts are
required to work with SAS program code.

1.9 Introduction of User-Friendly SAS


Macros for Statistical Data Mining
As an alternative to the point-and-click menu interface modules, the user-friendly
SAS macro applications for performing several data mining tasks are included in
this book. This macro approach integrates the statistical and graphical tools avail-
able in SAS systems and provides user-friendly data analysis tools that allow the
data analysts to complete data mining tasks quickly without writing SAS programs
by running the SAS macros in the background. Detailed instructions and help files
for using the SAS macros are included in each chapter. Using this macro approach,
analysts can effectively and quickly perform complete data analysis and spend more
time exploring data and interpreting graphs and output rather than debugging
their program errors, etc. The main advantages of using these SAS macros for data
mining are as follows:

◾◾ Users can perform comprehensive data mining tasks by inputting the macro
parameters in the macro-call window and by running the SAS macro.
◾◾ SAS code required for performing data exploration, model fitting, model
assessment, validation, prediction, and scoring are included in each macro.
Thus, complete results can be obtained quickly by using these macros.
◾◾ Experience in SAS output delivery system (ODS) is not required because
options for producing SAS output and graphics in RTF, WEB, and PDF are
included within the macros.
◾◾ Experience in writing SAS programs code or SAS macros is not required to
use these macros.
◾◾ SAS-enhanced data mining software Enterprise Miner is not required to run
these SAS macros.
◾◾ All SAS macros included in this book use the same simple user-friendly format.
Thus, minimum training time is needed to master the usage of these macros.
◾◾ Regular updates to the SAS macros will be posted in the book Web site. Thus,
readers can always use the updated features in the SAS macros by download-
ing the latest versions.

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 12 5/18/10 3:36:39 PM


Data Mining ◾ 13

1.9.1 Limitations of These SAS Macros


These SAS macros do not use SAS Enterprise Miner. Thus, SAS macros are not
included for performing neural net, CART, and market-basket analysis since these
data mining tools require the SAS special data mining software SAS Enterprise
Miner.

1.10 Summary
Data mining is a journey—a continuous effort to combine your enterprise knowl-
edge with the information you extracted from the data you have acquired. This
chapter briefly introduces the concept and applications of data mining techniques;
that is, the secret and intelligent weapon that unleashes the power in your data. The
SAS institute, the industry leader in analytical and decision support solutions, pro-
vides the powerful software called Enterprise Miner to perform complete data min-
ing solutions. However, many small business and academic institutions do not have
the license to use the application, but they have the license for SAS BASE, STAT,
and GRAPH. As an alternative to the point-and-click menu interface modules,
user-friendly SAS macro applications for performing several statistical data mining
tasks are included in this book. Instructions are given in the book for downloading
and applying these user-friendly SAS macros for producing quick and complete
data mining solutions.

References
1. SAS Institute Inc., Customer success stories at http://www.sas.com/success/ (last
accessed 10/07/09).
2. SAS Institute Inc., Customer relationship management (CRM) at http://www.sas.
com/solutions/crm/index.html (last accessed 10/07/09).
3. SAS Institute Inc., SAS Enterprise miner product review at http://www.sas.com/
products/miner/miner_review.pdf (last accessed 10/07/09).
4. Two Crows Corporation, Introduction to Data Mining and Knowledge Discovery, 3rd
ed., 1999 at http://www.twocrows.com/intro-dm.pdf.
5. Berry, M. J. A. and Linoff, G. S. Data Mining Techniques: For Marketing, Sales, and
Customer Support, John Wiley & Sons, New York, 1997.
6. Berry, M. J. A. and Linoff, G. S., Mastering Data Mining: The Art and Science of Customer
Relationship Management, Second edition, John Wiley & Sons, New York, 1999.
7. SAS Institute Inc., The Power to Know at http://www.sas.com.
8. SAS Institute Inc., Data Mining Using Enterprise Miner Software: A Case Study Approach,
1st ed., Cary, NC, 2000.
9. SAS Institute Inc., The Enterprise miner, http://www.sas.com/products/miner/index.
html (last accessed 10/07/09).
10. SAS Institute Inc., The Enterprise miner standalone tutorial, http://www.cabnr.unr.
edu/gf/dm/em.pdf (last accessed 10/07/09).

© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 13 5/18/10 3:36:39 PM


© 2010 by Taylor and Francis Group, LLC

K10535_Book.indb 14 5/18/10 3:36:39 PM


15
16 ◾ Statistical Data Mining Using SAS Applications
Preparing Data for Data Mining ◾ 17
18 ◾ Statistical Data Mining Using SAS Applications
Preparing Data for Data Mining ◾ 19
20 ◾ Statistical Data Mining Using SAS Applications
Preparing Data for Data Mining ◾ 21
Exploring the Variety of Random
Documents with Different Content
reinstatement or to obtain another situation for him, and was
successful. There may be other motives; but here is a point that
must go far to confirm Butler’s declaration that he is the victim of a
conspiracy.’
I listened greedily. I kept my eyes, smarting and burning, fastened
upon my uncle’s face.
‘What is scuttling a ship?’ I asked.
‘Did I not explain? It is boring a hole in her so that she may sink.’
‘Who says that Tom bored a hole in his ship?’
‘Rotch and Nodder and two seamen.’
‘Did they see him bore the hole?’
‘They affirm that they saw the holes which he had bored, and
discovered a tree-nail auger in his cabin.’
‘Oh, he would not do it!’ I cried. ‘It is a lie! He is innocent!’
Here my aunt advised me to go to bed, and said that she herself
could sit up no longer. But I detained my uncle for another half hour
with many feverish, impassioned questions, before I could force
myself from the room, and a church bell struck one through the
stillness of the snowing night as I went to the bedroom that had
been prepared for me.
My uncle was to see Tom next morning at Newgate, and told me he
would inquire the rules and bring about a meeting between my
sweetheart and me as speedily as possible. After breakfast, my box
was put into a coach, and I drove to my house in Stepney. Mr.
Stanford came into the hall to speak to me. I forced a wild smile and
a hurried bow and pushed past. I could not address him nor listen to
what he had to say. When I went upstairs and sat down in my own
room, the room in which Tom and Will had dined with me, where I
had passed hours in sweet musings upon my lover, where there
were many little things he had given me—a picture I had admired, a
screen, a little French chimney clock, above all, his miniature—I
believed my heart was breaking. I wept and wept; I could not stay
my tears. My maid stood beside me, caressed and tried to control
me, then drew off and stood looking at me, afraid.
By-and-by I rallied, and since activity was life to me—for sitting still
and thinking were heart-breaking and soul-withering to one situated
as I was, without a father or a mother to carry her grief to, without
an intimate friend to open herself to—I considered what I should do;
and then I reflected that all the money which I could scrape
together might be needful for Tom’s defence. Thereupon I went
straight to the bank into which my trustees paid my money, and
ascertained how my account stood. I saw the manager of the bank
and asked him to what amount he would allow me to overdraw,
should the need arise, and he told me that I was at liberty to
overdraw to a considerable sum against the security of the title-
deeds of my house, which were in his possession, and which had
been originally lodged at the bank by my father.
This and other errands I went upon helped to kill the day, and the
distraction did me a little good. In the afternoon, before it was dusk,
I walked as far as Ludgate Hill, and turned into the Old Bailey, and
went a little distance up Newgate Street, and continued walking
there that I might be near Tom. I crossed the street and looked at
the horrible walls, dark with the grime of London, and at the spiked
gates, and at a huddle of miserable, tattered wretches at one of
those gates, as though they yearned in their starvation and misery
for the prison food and the shelter of the cells within; and I
wondered in what part behind those fortress-like walls my
sweetheart was, what his thoughts were, what he was doing, if he
was thinking of me as I was of him, until I stamped the pavement in
a sudden agony of mind, and crossed the street to the walls, and
went along the pavement close beside them, to and fro, to and fro.
The dusk drove me away at last, and being very weary, I called a
coach and went to my aunt’s, that I might get the latest news of
Tom. My uncle had had a long interview with my sweetheart in the
morning.
‘He is fairly cheerful and hopeful,’ said he. ‘You will scarcely know
him, though. His anxiety during the long voyage home in the man-
of-war has pinched and wrinkled and shrunk him. You’ll see him to-
morrow. We will go together.’
‘Uncle, you will employ the very best people on his side.’ He named a
well-known Old Bailey pleader of those days. ‘Do not stint in money,
uncle. All that I have in the world is Tom’s,’ I said.
‘The deuce of it is,’ exclaimed my uncle, thumping his knee, ‘we have
no witnesses to call except as to character. It’s four-tongued positive
swearing on one side, and single-tongued negative swearing on the
other.’
So ran our talk. It was all about Tom. As on the previous evening so
now again I kept my kind-hearted uncle up till past midnight with
my feverish questions. My aunt had asked me to sleep in their
house, and I gladly consented, partly that I might be instantly ready
to accompany my uncle to Newgate at the appointed time, and
partly because I dreaded the loneliness of my home, the long and
dismal solitude of the evening and the night in a scene crowded with
memories of my father and my mother and my sweetheart, of my
childhood, of the sunny hours of my holiday rambling and of careless
merry days of independence. I could not sleep, through thinking of
the morrow’s meeting. It was seven months since Tom and I had
kissed and parted. He had sailed away full of hope. He had written
in high spirits. And now he was a prisoner in Newgate; his ship
taken from him; the prospects of the voyage ruined; his innocent,
manly heart infamously shamed and degraded, charged with a crime
which might banish him for ever from England!
‘Do not be shocked,’ said my uncle, in the morning, ‘because you will
not be suffered to speak to him face to face. You will presently see
what I mean. It is mere prison routine—a quite necessary discipline.
There’s nothing in it.’
After all these years I but vaguely remember as much of this horrible
jail as we traversed. My heart beat with a pulse of fever; my sight
fell dim in the gloom after the whiteness of the day outside. I
seemed to see nothing, but I looked always for my sweetheart as we
advanced. I recollect little more than the door of Newgate jail, with
its flanking of huge, black, fortress-like wall, the iron-grated
windows, the heavy, open doors faced with iron, the dark passages,
in one of which hung an oil lamp, and the strange sight beyond this
gloomy passage of stone floor touched with barred sunlight flowing
through an iron grating. Many structural changes have been made in
the interior of Newgate since those days. We entered a passage
walled on either hand by gratings and wirework. Some warders in
high hats and blue coats—warders or constables, I know not which—
stood outside this passage. My uncle was at my side, and we waited
for my sweetheart to appear. There was but one prisoner then
present. He was conversing through the grating with a dark-skinned,
black-eyed woman of about forty, immensely stout and dressed in
many bright colours. He was clothed in the garb of the felon, and
was enormously thick-set and powerfully built; you saw the muscles
of his arms tighten the sleeves of his jacket as he gesticulated with
Hebraic demonstrativeness to the woman whose voice was as harsh
as a parrot’s. His hair was cropped close; where his whiskers and
beard were shaved his skin was a dark coarse blue; he was deeply
pitted with small-pox; his nose lay somewhat flat upon his face with
very thick nostrils; his brows were black and heavily thatched, and
the eyes they protected were coal black as the Indian’s, but
amazingly darting. My uncle looked at him with interest, and
whispered:
‘I was at that man’s trial. He was sentenced to the hulks and to
transportation for life for receiving stolen goods and keeping a
notorious house. He is a Jew prize-fighter, and one of the very best
that ever stood up in a ring. Three years ago he beat the Scotch
champion Sandy Toomer into pulp. He’s a terrible ruffian, and a
villain of the deepest dye, but a noble prize-fighter, and I am sorry
for Barney Abram.’
The felon took no notice of us spite of my uncle staring at him, as
though he had been one of the greatest of living men. I glanced at
the horrid creature, but thought only of Tom.
I was glad of the delay in his coming. I had time to collect myself
and to force an expression of calmness into my face. On a sudden
he appeared! He came in by the side of a warder from the direction
of a yard, in which my uncle afterwards told me prisoners who had
not yet had their trials took the air. He was dressed in his own
clothes, in seafaring apparel somewhat soiled by wear. I had feared
to see him in the vile attire of a convict, and was spared a dreadful
shock, when I looked and beheld my dear one as I remembered
him! But oh! not as I remembered him! He had let his beard grow;
he was shaggy and scarce recognisable with it, and his hair was
longer than formerly. His cheeks were sunk, his eyes dull, like the
eyes of one who has not slept for weeks, his lips pale, his
complexion strange and hardly describable, owing to the pallor that
had sifted through, so to speak, and mottled the sun-brown of his
skin. But his old beauty was there to my love; my heart gave a great
leap when I saw him; and I cried his name and extended my arms
against the wire of the grating.
He looked at me steadfastly for some moments with his teeth hard
set upon his under lip, as though he dared not attempt to speak
until he had conquered his emotion and mastered such tears as burn
like fire in the brain of a man. My uncle gently saluted him through
the bars, and then motioned with his hand, and, taking me by the
arm, led me down to the extremity of this jail meeting-place, and
Tom walked on the opposite side until he was abreast. My uncle then
moved some distance away and stood watching the Jew prize-
fighter. A warder walked leisurely to and fro; and others at a little
distance stood like sentinels.
My sweetheart’s first words were:
‘Marian, before God I am innocent.’
‘Tom, I know it—I know it, dearest, and your innocence shall be
proved.’
‘Before God I am innocent,’ he repeated softly and without passion
in his tones or posture. ‘It is a devilish plot of Rotch to ruin me. I
don’t know why the carpenter Nodder should swear against me. I
had no quarrel with the man. But he’d go to the gallows for drink,
and in that Rotch found his opportunity since he needed a witness.’
‘You will be able to prove your innocence.’
‘Rotch,’ he continued, still speaking softly and without temper, ‘bored
holes in the lazarette; then plugged the lining and hid the auger in
my cabin. Nodder swears that I borrowed the auger from him. A lie,
Marian—a wicked, horrible lie. Why should I borrow an auger? Why
should I, as captain, handle such a tool as that when there is a
carpenter in the ship? Rotch brought some of the men aft to listen to
the water running into the lazarette. He says that he went below to
break out stores and heard it. A hellish lie, Marian. He swears that
he plugged the holes to stop the leaks and came up with the men to
search my cabin. I was in my cabin when they entered, and on the
scoundrel Rotch charging me with attempting to scuttle the barque
and imperilling the lives of the crew, I pulled a pistol out of my
drawer and would have shot him. They threw themselves upon me,
and Rotch called to them to search the cabin, and they found the
auger in the place where the villain had hidden it. But this was not
all. Rotch swore before the Consul at Rio that he had seen me go
into the lazarette, and that he had mentioned the circumstance to
Nodder, but that neither suspected what I was doing until Rotch
himself went below for some boatswain’s stores, and then he heard
the water running in. Marian,’ and here he slightly raised his voice, ‘it
is a conspiracy, artfully planned, artfully executed, artfully related,
with the accursed accident of the over-insured venture to make it
significant as death, and God alone knows how it may go with me.’
A warder paused and looked at us, then passed on.
‘Don’t say that,’ I cried; ‘it breaks my heart to hear you say that. You
are innocent. My uncle will employ clever men. They will question
and question and prove the wretches liars, and our turn will come.’
‘I blundered by over-insuring, but I blundered more fearfully still
when in a moment of confidence I told the villain Rotch what money
I had embarked in this voyage, and to what extent I had protected
myself.’
‘Tom, whatever happens I am with you. Oh, if it should come to
their killing you they shall kill me too, Tom.’
He pressed his hands to his heart and then sobbed twice or thrice.
My love, my grief, my misery raged in me; I felt that I had strength
to tear down the strong iron grating which separated us, that I
might get to him, clasp him to me, give him the comfort of my
bosom, the tenderness of my caressing cheek. It worked like
madness in my soul to be held apart from him, to see him and not
be able to fling my arms around him.
We looked at each other in silence. I was about to speak when a bell
rang, and a strong voice called out: ‘Time’s up!’ The prize-fighter
was gone. A warder marched quickly along to Tom and touched him
on the shoulder, and my uncle called to me: ‘Come, Marian.’ Tom
cried: ‘God bless you, dear,’ but my vision was blind with tears, a
sudden swooning headache made me stagger, and until I was in the
street I was scarcely sensible of more than of being led through the
passages and out through the gate by my uncle.
CHAPTER X
SHE ATTENDS HER SWEETHEART’S TRIAL

Down to the date of the trial, suspense and expectation lay in so


crushing a burden upon me that life was hardly supportable. In this
time I ceased to wonder that people had the courage to perish by
their own hands. Twice after that first visit I saw Tom in Newgate,
but those interviews were restricted by the rules of the place to a
quarter of an hour, and always the bell sounded and the rude voice
of the warder broke in at the moment when I had most to say and
most to hearken to.
The trial of my sweetheart took place at the Central Criminal Court
on April 17th. The judge was the stony-hearted Maule—memory may
deceive me, but I am almost sure it was Mr. Justice Maule. For Tom’s
defence my uncle had secured the services of the celebrated Mr.
Sergeant Shee, with whom were Mr. Doane and Mr. C. Jones. I drove
down to the Old Bailey with my aunt early in the morning. The court
was not inconveniently crowded. It was one of those cases which do
not excite much attention. A Cash-man or a Bishop would have
blocked the court with eager spectators of both sexes, but the perils
and crimes of the ocean do not appeal to the land-going public.
The judge took his seat at ten o’clock, and Tom was brought in and
placed at the bar, charged by indictment that ‘he endeavoured,
feloniously and maliciously, to cast away and destroy a certain vessel
called the Arab Chief on the high sea, within the jurisdiction of the
Admiralty of England, and also of the Central Criminal Court, with
intent to prejudice divers persons as part owners of or underwriters
to the same vessel.’ He pleaded ‘Not guilty.’ He spoke very low, but
his tones were steady. He looked ill, haggard, and wasted. A great
number of persons who were to appear as witnesses were in court,
and I searched the many faces with burning eyes for the two
wretches who had brought my sweetheart and me to this horrible
pass. But my aunt did not know them, and there was no one at hand
to tell me which among those men were Rotch and Nodder.
The case against Tom, as stated at the opening of the prosecution,
was merely an elaborate version of the narrative of the facts which
he had himself briefly related to me in Newgate. Though nobody had
been defrauded, since the ship had not been sunk and no money
claimed or paid, yet as much emphasis was laid by the prosecution
upon the number of offices in which Tom had insured as though my
sweetheart’s guilt were beyond question, as though the prosecution
indeed had seen him make holes in the ship and sink her, as though
he had then arrived in England and received three or four thousand
pounds in excess of the worth of the property.
The person who addressed the Court for the prosecution had a very
clear, musical voice; he had handsome eyes, and would pause at
every pointed passage of his opening with an eloquent, appealing,
concerned look at the jury. His sweet, persuasive tones and looks
doubled to my fear the horrible significance of his statements, and I
abhorred him whilst I watched him and listened, and could have
killed him in my concealed fright and rage for his cool and coaxing
and polished utterance of what I knew to be hellish lies. Often would
I watch the jury with a devouring gaze. They were in two rows, six
in a row, in a box, and one or another who was above would
sometimes lean over and whisper, and one would take a note, and
one would sit for ten minutes at a time motionless, with his eyes
upon the person speaking. The counsel and gentlemen in wigs and
gowns sat around a big table loaded with books and papers. A
crowd of people hung about outside this sort of well, formed by the
table and its circular benches and backs, and whispered and stared
and grinned and took snuff. The judge sat, stern and heavily wigged,
not far from the jury. Sometimes he took notes; sometimes his chin
sank upon his breast. He seemed to see nothing, and if ever he
spoke he appeared to address a vision in midair.
I’ll not trouble you with the particulars of this trial. I am passing
rapidly now into another scene of life. One witness after another
stepped into the box to prove the several insurances which had been
effected by Tom; others to testify to the value of the Arab Chief and
her lading. The name of Samuel Rotch was then pronounced, and
the man came out of a group of people and briskly ascended to give
evidence. The hot blood stung in my cheeks when I saw him. My
heart beat as though I was stricken with fever. Tom looked at him
and kept his eyes upon him all the while that the wretch was
answering questions and giving his evidence, but I never once
observed that he even so much as glanced at my sweetheart.
I had expected—nay, indeed, I had prayed—to behold an ill-looking
villain, and I believe it told heavily against us that he was an
exceedingly good-looking man. His features were regular; his eyes of
dark blue, bright and steadfast in their gaze. His white and regular
teeth shone like light when he parted his lips. He was coloured by
the sun to the manly complexion of the seaman, and he was about
Tom’s height, well built, but without my sweetheart’s fine, upright,
commanding carriage. His voice had a frank note. His replies were
quickly delivered, and there was not the least stammer or hesitation
in his statements. Added to all this, he spoke with an educated
accent.
He told his story plainly, and was not to be shaken. He gave a
reason for going into the lazarette which my sweetheart’s counsel
seemed unable to challenge. It was shown through his evidence that
the size of the holes (an inch and a quarter) which were found
plugged in the inner skin exactly corresponded with the diameter of
the tree-nail auger which had been discovered in Tom’s cabin. His
evidence was that whilst in the lazarette he had heard the sound of
water running into the ship betwixt the lining and the side; he took
his lantern to the place of the noise and saw the plugged holes. He
went on deck and called to Benjamin Nodder, who acted as second
mate and carpenter; he likewise summoned others of the crew and
they all went into the lazarette and saw the plugged holes and heard
the water coming in. Then to preserve their lives and save the ship
from sinking they ripped up the plank and plugged the outer holes,
thus stopping the leaks, and afterwards repaired in a body to the
captain’s cabin. Captain Butler threatened to shoot the witness. He
was secured, and the cabin searched and the auger found. They
proceeded to Rio, and on their arrival Rotch called upon the British
Consul, who on the evidence sworn before him thought proper to
give the charge of the ship to a new captain and send home the
prisoner, together with Rotch, Nodder, and two of the seamen who
had descended into the lazarette.
The witness was asked why he suspected the captain of attempting
to scuttle the ship instead of any other of the crew.
He answered:
‘Because I had seen the captain go into the lazarette.’
‘Was it unusual for a captain to enter the lazarette of his own
vessel?’
‘No captain,’ the fellow answered, ‘would think of entering a
lazarette.’
‘What other grounds for suspicion had he?’
The man replied, the captain had told him that his share in the ship,
together with his venture in the cargo and freight, were heavily
insured; also, on one occasion, the captain had talked to him about
a ship whose master had been sentenced and executed for casting
her away; and he had added significantly that it was a good job the
law had been changed, and that a man might now venture for a
fortune without jeopardising his life.
Tom steadfastly regarded Rotch whilst he gave his evidence; and I
knew by the look in my sweetheart’s face that the villain in the
witness-box fiendishly lied in every syllable he uttered.
Many questions in cross-examination were asked, and all of them
Rotch answered steadily, bowing respectfully whenever the judge
put a question; and he always looked very straight, with a fine air of
candour and honesty, at the person who interrogated him. He was
asked if he had not quarrelled with Captain Butler at Valparaiso. He
answered yes. The particulars of that quarrel were dramatically
related by Sergeant Shee. Rotch said that every word was true, but
that he and Captain Butler had long ago shaken hands over that
affair and dismissed it from their memory. He was asked if the
prisoner had not reported him on one occasion for insubordination
and neglect of duty, and if he had not been dismissed in
consequence, though subsequently another berth had been procured
for him by the prisoner? He answered yes, it was quite true. He was
asked if it was the fact that one of the owners of the Arab Chief had
promised him the berth of captain of that ship in any case, since,
whether guilty or innocent, Captain Butler would not, after this
accusation, be again employed? He replied it was true; but then the
other side qualified what was to me a damning admission by saying
that the fellow was distantly connected with the owner aforesaid.
The next witness was Benjamin Nodder. This fellow was a rough
seaman of a commonplace type, hunched about the shoulders and
bandy-legged, with red hair falling about his ears in coarse raw
streaks, like slices of carrot; he was wall-eyed, that is, one eye
looked away when the other gazed straight. His voice was harsh as
the noise of an axe sharpened on a grindstone, and when he stood
up in the box he leered unsteadily around him with an effort to
stand with dignity, as though he was tipsy. His examination was little
more than a repetition of what had been gone through with Rotch.
He was followed by two seamen who had no further evidence to
give than that they had helped to stop the leaks and had seen the
captain draw a pistol upon Rotch in his cabin; they also testified to
the discovery of the auger, one of them saving that he recollected
Mr. Nodder telling the men that Captain Butler had come forward
and borrowed an auger.
‘Mr. Nodder,’ said this witness, ‘told us men that he couldn’t imagine
what the capt’n wanted an auger for; two days after the hole was
found bored in the lazarette.’
Thus ran the questions and the answers. Tom looked steadily at the
witnesses as they spoke; but he made no sign; his arms lay
motionless, folded upon his breast. Twice or thrice I saw his
eyebrows faintly lift, and his lips part as though to a deep breath of
irrepressible horror and amazement.
The Court adjourned for lunch after the two seamen had given their
evidence; I remained in the court with my aunt. Mr. Johnstone came
to us, and I asked him what he thought the verdict would be.
‘Wait for it! Wait for it!’ he exclaimed, petulant with worry and
doubts. ‘Did not I tell Butler that he had heavily blundered in over-
insuring? And how well Rotch gave his evidence! How frank were the
devil’s admissions! Never a wink or a stutter with him from
beginning to end! But the twelve have yet to hear the sergeant.
Keep up your spirits, Marian!’ And he abruptly left us, but not
without exchanging a look with his wife. I caught that look, and my
heart sank and turned cold, as though the hand of death had
grasped it.
When the Court reassembled, five witnesses were called to speak to
Tom’s character. It was shortly before four when the judge had
finished summing up. I had followed Sergeant Shee’s address with
impassioned attention, eagerly watching the faces of the jurymen as
he spoke, and detesting the judge for the sleepy air with which he
listened and the barristers at the table and the people round about
for their inattention and frequent whispers and passing of papers
one to another on business of their own, as though the drama of life
or death to me which had nearly filled the day had grown tiresome,
and they were waiting for the curtain. Then I had followed with a
maddening conflict of emotion, but with an ever-gaining feeling of
sickness and faintness, like to the sense of a poisoned and killing
conviction slowly creeping to the heart against its maddest current
of hopes and protests—thus had I listened to the address of the
counsel for the prosecution who replied upon the whole case; and
now I listened to Mr. Justice Maule’s summing-up, a tedious and
inconclusive address. He made little of the points which I believed he
would have insisted upon. He talked like a tired man, he retold the
testimony, and I seemed to find a prejudice against Tom throughout
his delivery.
Then it was left to the jury, and the jury, after an absence of twenty
minutes, returned with the verdict of ‘Guilty’ against the prisoner.
My aunt clutched my hand. I felt a shock as though the blood in my
veins had been arrested in ice in its course. Mr. Justice Maule
proceeded to pass sentence. He spoke in a sing-song voice, as
though at every instant he must interrupt himself with a yawn. He
said that the prisoner had been found guilty, after a fair and
impartial trial, of the offence of having feloniously and wilfully
attempted to destroy the ship Arab Chief for the purpose of
defrauding the underwriters. That was the conclusion the jury had
arrived at, and he was perfectly satisfied with this verdict. And then
he pointed out the gravity of the offence, and how such acts tended
to check the spirit of mercantile adventure, and how impossible it
would be for insurance companies to exist if they were not protected
by the law. He rejoiced that the penalty applied to this crime was no
longer capital. At the same time it was his duty to inflict a severe
punishment. The sentence of the Court was that the prisoner should
be transported beyond the seas for the term of fourteen years.
My aunt sprang to her feet and shrieked aloud when this awful
sentence was delivered. I sat dumb and motionless. Never once
throughout the day had Tom looked in our direction. Now, on my
aunt shrieking, he turned his head, saw me, and pointed upward, as
though surrendering our love to God. The next moment he had
stepped out of sight.
My uncle came to us. He was white and terribly agitated and
shocked.
‘Come!’ he exclaimed. ‘Come along out of this now. We have had
enough of it.’
He took me by the hand, and I arose, but I could not speak; I
seemed to have been deprived of sensation in the limbs; indeed, I
do not know what had come to me. I looked towards the bar where
Tom had been standing and sighed, and then walked with my uncle,
my aunt following. We passed out of the court and got into the Old
Bailey; and when in Ludgate Hill, my uncle called a coach, and we
were driven to his home. Nothing was said saving that my uncle
once asked, ‘Who cried out?’ My aunt answered:
‘I did.’
I sat rigid, looking with blind eyes at the passing show of the streets.
But how am I to describe my feelings! Ask a mother whose child has
suddenly died upon her lap; ask a wife whose husband has fallen
dead at her feet; ask an adoring lover whose sweetheart, taking
refuge with him from a summer thunder-cloud, is slain by a bolt; ask
such people so smitten to tell you what they feel! Nor can my
tongue utter what was in me as we drove to my uncle’s home after
the trial.
When we were arrived my manner frightened my aunt; she feared
I’d do myself a mischief and would not lose sight of me. I sat in a
chair and never spoke, though I answered when I was addressed,
and obeyed mechanically; as, for example, if my aunt entreated me
to come to the table and eat I quitted my chair and took up the
knife and fork, but without eating. My gaze was fixed! I saw nothing
but Tom standing at the bar of the Old Bailey, hearkening to his
sentence, lifting up his hand to me and looking upward. If I turned
my eyes toward my aunt, Tom was behind her. If my uncle sat
before me and addressed me, the vision of Tom painted in bright
colours receiving sentence and lifting his hand was behind him.
Once during the evening of the day of the trial, when my uncle came
into the parlour, my aunt turned to him and said:
‘If she would only cry!’
She took me to her bed that night, and I lay without speech, seeing
Tom as in a vision, and hearing the sentence over and over again
repeated. I may have slept; I cannot tell. My aunt wished me to
remain in bed next morning, but when she was dressed I got up and
followed her to the parlour.
My uncle sat by a glowing fire; he was deeply interested in a
newspaper and was probably reading a report of the trial.
‘Aunt,’ I said, speaking for the first time, and in a voice so harsh and
unmusical that my uncle, not knowing that I had entered, looked up
with gesture of surprise and dropped the newspaper, ‘I wish to go
home.’
‘No, dear, not yet.’
I was about to speak, to say that I believed my going to the house
where my father and mother had lived—to the house that was full of
old associations, where I had thought to dwell with Tom when we
were married—would soothe and do me good. I was about to tell
her this, but could not for giving way; and, hiding my face in my
hands, I bowed my head upon the table, neither of them speaking
nor attempting in any way to arrest the passion of tears.
I felt better after this dreadful outbreak; it seemed to have cleansed
my brain and to give room for my heart to beat and for my spirits to
stir in. I looked at the good things upon the table, the eggs and
bacon, the ham and the rest, and said:
‘How do they feed prisoners in jail?’
‘Now, don’t trouble about that, Marian,’ said my uncle. ‘Captain
Butler has been a sailor, and he has been bred up on food compared
to which the worst fare in the worst jail in England is delicious.’
‘What will they do with him?’
‘Until they despatch him across the seas they’ll keep him in prison at
Newgate, perhaps, or they’ll send him to Millbank or to the Hulks.
No man can tell.’
‘Don’t fret yourself now with these inquiries, Marian,’ said my aunt.
‘How do they treat convicts in jail, uncle?’
‘Very well, indeed. Better than the majority of them deserve. They
feed them, clothe them, and teach them trades to enable them to
live honestly by-and-by.’
‘In what sort of ships do the convicts sail?’
‘Oh, in average merchantmen. Owners tender, and a ship is hired.
There were twenty-one of them chartered last year at about four
p’un’ ten a ton.’
‘Twenty-one!’ cried my aunt. ‘I wonder there are any rascals left in
England. Twenty-one! Only think! And perhaps two hundred rogues
in each ship.’
‘At least,’ exclaimed my uncle.
‘Are they passenger ships?’ I asked.
‘Many of them.’
‘Could one take one’s passage in a convict ship?’
‘Love you, no! No more than one could take one’s passage in a man-
of-war.’
‘Marian, you are making no breakfast,’ said my aunt.
‘What do they do with the convicts when they arrive at their
destination?’ I inquired.
‘Why,’ said my uncle, passing his cup for more tea, ‘I can only tell
you what I have read. The convicts are lent out as servants to
persons in want of labour on their farms, houses, shops, and so on;
some of them are sent up country to make roads. I don’t know
whether they are paid for their work. They are well fed. It commonly
ends in their setting up in business for themselves; and ninety-nine
out of every hundred felons, after they have been out in the colonies
for a few years, wouldn’t come home—to stay at home, I mean—on
any account whatever. If I were a poor man, I should not at all
object to being transported.’
‘Don’t say such things!’ exclaimed my aunt.
‘I shall follow Tom wherever he is sent,’ said I, pushing my chair
from the table.
‘What! To Norfolk Island, for instance? What would you do there?’
said my uncle. ‘Far better wait in this country, my dear, until Captain
Butler returns. They’ll be giving him a ticket-of-leave before long.
He’s bound to behave himself well.’
I stepped to the window and looked out. There had been a note of
coldness in my uncle’s pronunciation of the words, ‘Captain Butler.’ I
had also caught a startled look, which was nearly horror, in my aunt
when I said that I would follow my sweetheart wherever he was
sent. I turned presently and said:
‘When shall I be able to see Tom?’
‘Once only every three months, I am afraid,’ answered my uncle.
‘The rules vary with the prisons, but I think you will find that letters
and visits are allowed once every three months only. I’ll inquire.’
‘Shall we hear if he is sent to another place?’
‘We shall always be able to learn where he is.’
He was growing tired of my questions and left the table, having
finished his breakfast.
‘I shall want to know what his defence has cost,’ said I; ‘I wish to
pay.’
He nodded, and, pulling out his watch, said that he must go to
business downstairs. I ran after him as he was leaving the room,
and, grasping him by the arm, cried impetuously: ‘Uncle, do you
believe Tom guilty?’
‘I’d not say so if I thought so,’ he answered looking at me, and I
guessed by my feelings that my eyes sparkled and my cheeks were
red. ‘Let me go, my girl. Everything passes, and to all of us comes a
day when we discover that there is nothing under the sun which is
worth a tear.’
I dropped my hand, and we walked out of the room. My aunt eyed
me strenuously as I paced the floor. I could not sit, my heart was full
of rage, and all the while a resolution was forming and hardening in
me; indeed I caught myself thinking aloud, and often I’d halt with
my hand clenched like one distraught. My aunt presently said:
‘Why not sit down, dear, and nurse your strength a little? You have
been sorely tried. Cannot we arrange for another trip to the
seaside?’
‘And leave——’ I cried, and broke short off and forced myself to say
softly: ‘No, aunt.’
‘But what do you mean to do? I wish to act as a mother to you,
Marian. I thank God you are not his wife.’
‘Don’t say that!’
‘But I must say it!’ she exclaimed, bridling. ‘It’s through me that you
are not his wife, and I rejoice heartily that I advised you as I did.
What! Would you, with your means and your beauty and your
opportunities, be the wife of a convict?’
I felt the temper in me swelling into madness. I durst not stay, for I
dreaded myself then, and flung out of the room, leaving her talking.
I ran upstairs to put on my outdoor clothes, and when I returned my
aunt was on the landing. She exclaimed that she had not meant
what she said. I looked her earnestly in the face, for I did not
believe her; but already my temper was gone. Ill-temper lives but a
short time when there is great misery. I kissed her and thanked her
for her kindness and love, and, telling her I must go home to look
after things, I left the house.
Chapter XI
SHE VISITS H.M.S. ‘WARRIOR’

I remained at home several days, seeing nobody, waited upon by my


maid and denying myself to everybody. My aunt sent to inquire after
me, and my maid’s answers satisfied her. I pulled the blinds down
and sat alone in my grief, with Tom’s miniature upon my knee. But
always at dusk I stole forth and walked in the Old Bailey, close
against the walls of Newgate Prison, that I might be near my dear
one. I wrote to him and took my chance of the letter reaching his
hands. I told him that no man was ever more truly loved by his
sweetheart; that wherever he went I would go; and let them send
him where they would, he would find me there; and I swore to him
that he was innocent, the victim of a monstrous, transparent
conspiracy, and I said I prayed every night to God to punish the
villains who had brought us to this miserable state.
It was about a fortnight after the trial that one of my trustees,
Captain Galloway, asked me by letter for an appointment; he
presented himself with Captain Fairman, the other trustee. They
were both bluff, hearty seamen of the old school, somewhat
resembling each other, though not connected. The motive of their
visit was to get me to give up Tom. Captain Galloway had not
forgotten my treatment of his son, and talked with ill-advised heat.
He did not deny that he considered Captain Butler guilty. I listened
with contempt at first, but this gave way to temper which rose into
wrath, and I fairly gave the devil they had aroused within me his
way. When they had gone I caught sight of myself in a mirror, and I
looked as flaming and red and swelling and breathless as any mad
murderess in a padded cell.
I guessed my aunt was at the bottom of these captains’ visits. She
must have asked Mr. Stanford to talk to me too; otherwise I doubt if
he had dared venture it. Yet I listened to the fellow patiently till he
told me that he spoke as the representative of my mother on earth;
that made me think of my father and I started up. I meant no
physical violence though I was capable of it then, but my manner of
jumping up was so menacing that he instantly started from his chair
and hastened out of the room, slamming the door after him.
I would not trust my uncle to obtain news of Tom. I knew that all
interested in me wished me to break off with my sweetheart, and
would hoodwink me if they could by keeping me in ignorance that
Tom had been sent out of the country. A clerk named Woolfe who
had been in my uncle’s employ had started for himself; he was a
shrewd, unscrupulous young dog. I bargained with him to get me
news of Tom, and to work all methods of communication practicable
by bribery. From him I learned that my sweetheart had been
removed from Newgate to Millbank. The fellow took a hundred
guineas from me in all, but did no more for the money than discover
where Tom was; and one day, about four months after Tom’s
conviction, this young rogue of a lawyer called upon me at Stepney
to say that Tom had been transferred from Millbank to H.M.S.
Warrior hulk, moored off Woolwich Dockyard.
‘Are you sure?’ I cried.
‘I am now from Millbank,’ said he.
‘And what will happen next?’ I demanded.
‘They’ll keep him at forced labour at the dockyard,’ he answered, ‘till
a transport hauls alongside the hulk for a cargo.’
‘When will that be?’
‘Impossible to say, miss.’
‘Will you get me the rules of the hulk?’
‘They are the same as the jails.’
‘But I have not seen Captain Butler since his conviction, nor heard
from him, nor know whether he has received my letters.’
He answered that he would make inquiries and call. He was
intelligibly punctual, because he had to receive ten guineas, but he
brought me what I wanted to know, and to my joy I learned that I
was at liberty to visit Tom next day, and that he would be brought
on board to see me if he was ashore when I arrived.
The morning following I dressed with care. I wore black clothes. I
had worn black ever since my sweetheart was taken from me. I put
on a black veil, and going into the street, walked till I met with a
coach, and drove to Blackwall. I had not visited those parts since
Tom and I and the others had seen Will Johnstone off, and I dared
not glance in the direction of the hotel in which my sweetheart had
made love to me and asked me to marry him. Indeed, my heart
needed all the fortitude my spirit could give it.
It was a bright, hot day. The sky was high with delicate, frostlike
cloud, and the running river blue with the reflection of the heavens.
The wind was a light summer breeze and blew from London, and
many ships of many rigs floated before it, some of them lifting lofty
fabrics of swelling breasts of canvas, some of them dark with a
weather-stained look, like my father’s coasters. Here at Blackwall I
took a boat, and told the man to row me to the Warrior hulk.
‘You know her?’ said I.
He was an elderly man, dressed in a tall hat and jersey; he exposed
a few yellow fangs as he lay back on his oars and said:
‘Know her? Yes. Know the Warrior! Yah might as well ask me if I
know St. Paul’s. Going aboard?’
‘Yes.’
‘Friend aboard?’
I inclined my head.
‘I had a nevvey locked up in that there hulk,’ said the man. ‘He had
six year. Now’s out and doon well. He drove a light cart drawn by a
nag as could trot, and called hisself a pig-dealer. Do ’spectable pig-
dealers break into houses o’ night? The Warrior cured my nevvey. He
ain’t above talking of that ship. Get him in the mood, and he’ll spin
yah some queer yarns about her.’
‘How are the prisoners treated?’
‘Sights o’ stone-breaking and stacking o’ timber. They put my nevvey
to draw carts. They sunk his name and caa’d him a number. A man
doan’ feel a man when he’s a number. But the job my nevvey least
enjoyed was scraping shot.’
‘How are they fed?’
‘By contract. Yah knows what that means. Beef all veins. Ever heard
of “smiggins,” miss?’
‘No.’
‘It’s hulk soup: convicts’ name for greasy warm water. Call it twenty
year ago, I was passing a hulk stationed afore the Defence came up;
a boat was ’longside with provisions for the day; what d’ye think?
With my own eyes I see the prisoners as was hoisting the grub out
of the boat chuck it overboard. Was they flogged?’
He shook his head, grinning horribly.
His manners and answers shocked and depressed me, and I asked
him no more questions.
‘Ain’t it rather sing’ler,’ said he, after a few minutes’ pause, ‘that
there’s only one flower as ’ll grow upon a convict’s grave?’
‘Is that so?’
‘Ay. And what flower d’ye think it is, miss?’ said he, again showing
his fangs.
‘I don’t know.’
‘It’s a nettle. If yah should care to visit the burial-ground yonder,’ he
continued, with a backward nod of his head in the direction of
Woolwich, ‘yah ’ll see for yourself. As if nothen would blow ower a
convict but that! Of course the finger o’ nater’s in it. The finger o’
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like