100% found this document useful (1 vote)
76 views

Introduction to Machine Learning Second Edition Adaptive Computation and Machine Learning Ethem Alpaydin - The ebook is available for quick download, easy access to content

The document is a promotional material for various machine learning ebooks available for instant download at ebookgate.com. It includes titles such as 'Introduction to Machine Learning' by Ethem Alpaydin and 'Python Machine Learning' by Sebastian Raschka, among others. The document provides links to these resources and emphasizes the convenience of accessing digital formats like PDF, ePub, and MOBI.

Uploaded by

ariasayermi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
76 views

Introduction to Machine Learning Second Edition Adaptive Computation and Machine Learning Ethem Alpaydin - The ebook is available for quick download, easy access to content

The document is a promotional material for various machine learning ebooks available for instant download at ebookgate.com. It includes titles such as 'Introduction to Machine Learning' by Ethem Alpaydin and 'Python Machine Learning' by Sebastian Raschka, among others. The document provides links to these resources and emphasizes the convenience of accessing digital formats like PDF, ePub, and MOBI.

Uploaded by

ariasayermi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Introduction to Machine Learning Second Edition


Adaptive Computation and Machine Learning Ethem
Alpaydin

https://ebookgate.com/product/introduction-to-machine-
learning-second-edition-adaptive-computation-and-machine-
learning-ethem-alpaydin/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Introduction to Machine Learning 3rd Edition Ethem


Alpaydin

https://ebookgate.com/product/introduction-to-machine-learning-3rd-
edition-ethem-alpaydin/

ebookgate.com

Python Machine Learning Machine Learning and Deep Learning


with Python scikit learn and TensorFlow 2nd Edition
Sebastian Raschka
https://ebookgate.com/product/python-machine-learning-machine-
learning-and-deep-learning-with-python-scikit-learn-and-
tensorflow-2nd-edition-sebastian-raschka/
ebookgate.com

Machine Learning An Algorithmic Perspective Second Edition


Stephen Marsland

https://ebookgate.com/product/machine-learning-an-algorithmic-
perspective-second-edition-stephen-marsland/

ebookgate.com

Probabilistic Machine Learning An Introduction 1st Edition


Kevin P. Murphy

https://ebookgate.com/product/probabilistic-machine-learning-an-
introduction-1st-edition-kevin-p-murphy/

ebookgate.com
Machine Learning with Spark Develop intelligent machine
learning systems with Spark 2 x 2nd Edition Rajdeep Dua

https://ebookgate.com/product/machine-learning-with-spark-develop-
intelligent-machine-learning-systems-with-spark-2-x-2nd-edition-
rajdeep-dua/
ebookgate.com

Scala for Machine Learning 1st Edition Nicolas

https://ebookgate.com/product/scala-for-machine-learning-1st-edition-
nicolas/

ebookgate.com

Lifelong Machine Learning 2nd Edition Zhiyuan Chen

https://ebookgate.com/product/lifelong-machine-learning-2nd-edition-
zhiyuan-chen/

ebookgate.com

Mastering Machine Learning with Spark 2 x Harness the


potential of machine learning through spark 1st Edition
Alex Tellez
https://ebookgate.com/product/mastering-machine-learning-with-
spark-2-x-harness-the-potential-of-machine-learning-through-spark-1st-
edition-alex-tellez/
ebookgate.com

Signal processing theory and machine learning 1st Edition


Diniz

https://ebookgate.com/product/signal-processing-theory-and-machine-
learning-1st-edition-diniz/

ebookgate.com
Introduction
to
Machine
Learning

Second
Edition
Adaptive Computation and Machine Learning

Thomas Dietterich, Editor


Christopher Bishop, David Heckerman, Michael Jordan, and Michael
Kearns, Associate Editors

A complete list of books published in The Adaptive Computation and


Machine Learning series appears at the back of this book.
Introduction
to
Machine
Learning

Second
Edition

Ethem Alpaydın

The MIT Press


Cambridge, Massachusetts
London, England
© 2010 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or informa-
tion storage and retrieval) without permission in writing from the publisher.

For information about special quantity discounts, please email


[email protected].

Typeset in 10/13 Lucida Bright by the author using LATEX 2ε .


Printed and bound in the United States of America.

Library of Congress Cataloging-in-Publication Information

Alpaydin, Ethem.
Introduction to machine learning / Ethem Alpaydin. — 2nd ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-262-01243-0 (hardcover : alk. paper)
1. Machine learning. I. Title
Q325.5.A46 2010
006.3’1—dc22 2009013169
CIP

10 9 8 7 6 5 4 3 2 1
Brief Contents

1 Introduction 1
2 Supervised Learning 21
3 Bayesian Decision Theory 47
4 Parametric Methods 61
5 Multivariate Methods 87
6 Dimensionality Reduction 109
7 Clustering 143
8 Nonparametric Methods 163
9 Decision Trees 185
10 Linear Discrimination 209
11 Multilayer Perceptrons 233
12 Local Models 279
13 Kernel Machines 309
14 Bayesian Estimation 341
15 Hidden Markov Models 363
16 Graphical Models 387
17 Combining Multiple Learners 419
18 Reinforcement Learning 447
19 Design and Analysis of Machine Learning Experiments 475
A Probability 517
Contents

Series Foreword xvii

Figures xix

Tables xxix

Preface xxxi

Acknowledgments xxxiii

Notes for the Second Edition xxxv

Notations xxxix

1 Introduction 1
1.1 What Is Machine Learning? 1
1.2 Examples of Machine Learning Applications 4
1.2.1 Learning Associations 4
1.2.2 Classification 5
1.2.3 Regression 9
1.2.4 Unsupervised Learning 11
1.2.5 Reinforcement Learning 13
1.3 Notes 14
1.4 Relevant Resources 16
1.5 Exercises 18
1.6 References 19

2 Supervised Learning 21
2.1 Learning a Class from Examples 21
viii Contents

2.2 Vapnik-Chervonenkis (VC) Dimension 27


2.3 Probably Approximately Correct (PAC) Learning 29
2.4 Noise 30
2.5 Learning Multiple Classes 32
2.6 Regression 34
2.7 Model Selection and Generalization 37
2.8 Dimensions of a Supervised Machine Learning Algorithm 41
2.9 Notes 42
2.10 Exercises 43
2.11 References 44

3 Bayesian Decision Theory 47


3.1 Introduction 47
3.2 Classification 49
3.3 Losses and Risks 51
3.4 Discriminant Functions 53
3.5 Utility Theory 54
3.6 Association Rules 55
3.7 Notes 58
3.8 Exercises 58
3.9 References 59

4 Parametric Methods 61
4.1 Introduction 61
4.2 Maximum Likelihood Estimation 62
4.2.1 Bernoulli Density 63
4.2.2 Multinomial Density 64
4.2.3 Gaussian (Normal) Density 64
4.3 Evaluating an Estimator: Bias and Variance 65
4.4 The Bayes’ Estimator 66
4.5 Parametric Classification 69
4.6 Regression 73
4.7 Tuning Model Complexity: Bias/Variance Dilemma 76
4.8 Model Selection Procedures 80
4.9 Notes 84
4.10 Exercises 84
4.11 References 85

5 Multivariate Methods 87
5.1 Multivariate Data 87
Contents ix

5.2 Parameter Estimation 88


5.3 Estimation of Missing Values 89
5.4 Multivariate Normal Distribution 90
5.5 Multivariate Classification 94
5.6 Tuning Complexity 99
5.7 Discrete Features 102
5.8 Multivariate Regression 103
5.9 Notes 105
5.10 Exercises 106
5.11 References 107

6 Dimensionality Reduction 109


6.1 Introduction 109
6.2 Subset Selection 110
6.3 Principal Components Analysis 113
6.4 Factor Analysis 120
6.5 Multidimensional Scaling 125
6.6 Linear Discriminant Analysis 128
6.7 Isomap 133
6.8 Locally Linear Embedding 135
6.9 Notes 138
6.10 Exercises 139
6.11 References 140

7 Clustering 143
7.1 Introduction 143
7.2 Mixture Densities 144
7.3 k-Means Clustering 145
7.4 Expectation-Maximization Algorithm 149
7.5 Mixtures of Latent Variable Models 154
7.6 Supervised Learning after Clustering 155
7.7 Hierarchical Clustering 157
7.8 Choosing the Number of Clusters 158
7.9 Notes 160
7.10 Exercises 160
7.11 References 161

8 Nonparametric Methods 163


8.1 Introduction 163
8.2 Nonparametric Density Estimation 165
x Contents

8.2.1 Histogram Estimator 165


8.2.2 Kernel Estimator 167
8.2.3 k-Nearest Neighbor Estimator 168
8.3 Generalization to Multivariate Data 170
8.4 Nonparametric Classification 171
8.5 Condensed Nearest Neighbor 172
8.6 Nonparametric Regression: Smoothing Models 174
8.6.1 Running Mean Smoother 175
8.6.2 Kernel Smoother 176
8.6.3 Running Line Smoother 177
8.7 How to Choose the Smoothing Parameter 178
8.8 Notes 180
8.9 Exercises 181
8.10 References 182

9 Decision Trees 185


9.1 Introduction 185
9.2 Univariate Trees 187
9.2.1 Classification Trees 188
9.2.2 Regression Trees 192
9.3 Pruning 194
9.4 Rule Extraction from Trees 197
9.5 Learning Rules from Data 198
9.6 Multivariate Trees 202
9.7 Notes 204
9.8 Exercises 207
9.9 References 207

10 Linear Discrimination 209


10.1 Introduction 209
10.2 Generalizing the Linear Model 211
10.3 Geometry of the Linear Discriminant 212
10.3.1 Two Classes 212
10.3.2 Multiple Classes 214
10.4 Pairwise Separation 216
10.5 Parametric Discrimination Revisited 217
10.6 Gradient Descent 218
10.7 Logistic Discrimination 220
10.7.1 Two Classes 220
Contents xi

10.7.2 Multiple Classes 224


10.8 Discrimination by Regression 228
10.9 Notes 230
10.10 Exercises 230
10.11 References 231

11 Multilayer Perceptrons 233


11.1 Introduction 233
11.1.1 Understanding the Brain 234
11.1.2 Neural Networks as a Paradigm for Parallel
Processing 235
11.2 The Perceptron 237
11.3 Training a Perceptron 240
11.4 Learning Boolean Functions 243
11.5 Multilayer Perceptrons 245
11.6 MLP as a Universal Approximator 248
11.7 Backpropagation Algorithm 249
11.7.1 Nonlinear Regression 250
11.7.2 Two-Class Discrimination 252
11.7.3 Multiclass Discrimination 254
11.7.4 Multiple Hidden Layers 256
11.8 Training Procedures 256
11.8.1 Improving Convergence 256
11.8.2 Overtraining 257
11.8.3 Structuring the Network 258
11.8.4 Hints 261
11.9 Tuning the Network Size 263
11.10 Bayesian View of Learning 266
11.11 Dimensionality Reduction 267
11.12 Learning Time 270
11.12.1 Time Delay Neural Networks 270
11.12.2 Recurrent Networks 271
11.13 Notes 272
11.14 Exercises 274
11.15 References 275

12 Local Models 279


12.1 Introduction 279
12.2 Competitive Learning 280
xii Contents

12.2.1 Online k-Means 280


12.2.2 Adaptive Resonance Theory 285
12.2.3 Self-Organizing Maps 286
12.3 Radial Basis Functions 288
12.4 Incorporating Rule-Based Knowledge 294
12.5 Normalized Basis Functions 295
12.6 Competitive Basis Functions 297
12.7 Learning Vector Quantization 300
12.8 Mixture of Experts 300
12.8.1 Cooperative Experts 303
12.8.2 Competitive Experts 304
12.9 Hierarchical Mixture of Experts 304
12.10 Notes 305
12.11 Exercises 306
12.12 References 307

13 Kernel Machines 309


13.1 Introduction 309
13.2 Optimal Separating Hyperplane 311
13.3 The Nonseparable Case: Soft Margin Hyperplane 315
13.4 ν-SVM 318
13.5 Kernel Trick 319
13.6 Vectorial Kernels 321
13.7 Defining Kernels 324
13.8 Multiple Kernel Learning 325
13.9 Multiclass Kernel Machines 327
13.10 Kernel Machines for Regression 328
13.11 One-Class Kernel Machines 333
13.12 Kernel Dimensionality Reduction 335
13.13 Notes 337
13.14 Exercises 338
13.15 References 339

14 Bayesian Estimation 341


14.1 Introduction 341
14.2 Estimating the Parameter of a Distribution 343
14.2.1 Discrete Variables 343
14.2.2 Continuous Variables 345
14.3 Bayesian Estimation of the Parameters of a Function 348
Contents xiii

14.3.1 Regression 348


14.3.2 The Use of Basis/Kernel Functions 352
14.3.3 Bayesian Classification 353
14.4 Gaussian Processes 356
14.5 Notes 359
14.6 Exercises 360
14.7 References 361

15 Hidden Markov Models 363


15.1 Introduction 363
15.2 Discrete Markov Processes 364
15.3 Hidden Markov Models 367
15.4 Three Basic Problems of HMMs 369
15.5 Evaluation Problem 369
15.6 Finding the State Sequence 373
15.7 Learning Model Parameters 375
15.8 Continuous Observations 378
15.9 The HMM with Input 379
15.10 Model Selection in HMM 380
15.11 Notes 382
15.12 Exercises 383
15.13 References 384

16 Graphical Models 387


16.1 Introduction 387
16.2 Canonical Cases for Conditional Independence 389
16.3 Example Graphical Models 396
16.3.1 Naive Bayes’ Classifier 396
16.3.2 Hidden Markov Model 398
16.3.3 Linear Regression 401
16.4 d-Separation 402
16.5 Belief Propagation 402
16.5.1 Chains 403
16.5.2 Trees 405
16.5.3 Polytrees 407
16.5.4 Junction Trees 409
16.6 Undirected Graphs: Markov Random Fields 410
16.7 Learning the Structure of a Graphical Model 413
16.8 Influence Diagrams 414
xiv Contents

16.9 Notes 414


16.10 Exercises 417
16.11 References 417

17 Combining Multiple Learners 419


17.1 Rationale 419
17.2 Generating Diverse Learners 420
17.3 Model Combination Schemes 423
17.4 Voting 424
17.5 Error-Correcting Output Codes 427
17.6 Bagging 430
17.7 Boosting 431
17.8 Mixture of Experts Revisited 434
17.9 Stacked Generalization 435
17.10 Fine-Tuning an Ensemble 437
17.11 Cascading 438
17.12 Notes 440
17.13 Exercises 442
17.14 References 443

18 Reinforcement Learning 447


18.1 Introduction 447
18.2 Single State Case: K-Armed Bandit 449
18.3 Elements of Reinforcement Learning 450
18.4 Model-Based Learning 453
18.4.1 Value Iteration 453
18.4.2 Policy Iteration 454
18.5 Temporal Difference Learning 454
18.5.1 Exploration Strategies 455
18.5.2 Deterministic Rewards and Actions 456
18.5.3 Nondeterministic Rewards and Actions 457
18.5.4 Eligibility Traces 459
18.6 Generalization 461
18.7 Partially Observable States 464
18.7.1 The Setting 464
18.7.2 Example: The Tiger Problem 465
18.8 Notes 470
18.9 Exercises 472
18.10 References 473
Contents xv

19 Design and Analysis of Machine Learning Experiments 475


19.1 Introduction 475
19.2 Factors, Response, and Strategy of Experimentation 478
19.3 Response Surface Design 481
19.4 Randomization, Replication, and Blocking 482
19.5 Guidelines for Machine Learning Experiments 483
19.6 Cross-Validation and Resampling Methods 486
19.6.1 K-Fold Cross-Validation 487
19.6.2 5×2 Cross-Validation 488
19.6.3 Bootstrapping 489
19.7 Measuring Classifier Performance 489
19.8 Interval Estimation 493
19.9 Hypothesis Testing 496
19.10 Assessing a Classification Algorithm’s Performance 498
19.10.1 Binomial Test 499
19.10.2 Approximate Normal Test 500
19.10.3 t Test 500
19.11 Comparing Two Classification Algorithms 501
19.11.1 McNemar’s Test 501
19.11.2 K-Fold Cross-Validated Paired t Test 501
19.11.3 5 × 2 cv Paired t Test 502
19.11.4 5 × 2 cv Paired F Test 503
19.12 Comparing Multiple Algorithms: Analysis of Variance 504
19.13 Comparison over Multiple Datasets 508
19.13.1 Comparing Two Algorithms 509
19.13.2 Multiple Algorithms 511
19.14 Notes 512
19.15 Exercises 513
19.16 References 514

A Probability 517
A.1 Elements of Probability 517
A.1.1 Axioms of Probability 518
A.1.2 Conditional Probability 518
A.2 Random Variables 519
A.2.1 Probability Distribution and Density Functions 519
A.2.2 Joint Distribution and Density Functions 520
A.2.3 Conditional Distributions 520
A.2.4 Bayes’ Rule 521
xvi Contents

A.2.5 Expectation 521


A.2.6 Variance 522
A.2.7 Weak Law of Large Numbers 523
A.3 Special Random Variables 523
A.3.1 Bernoulli Distribution 523
A.3.2 Binomial Distribution 524
A.3.3 Multinomial Distribution 524
A.3.4 Uniform Distribution 524
A.3.5 Normal (Gaussian) Distribution 525
A.3.6 Chi-Square Distribution 526
A.3.7 t Distribution 527
A.3.8 F Distribution 527
A.4 References 527

Index 529
Series Foreword

The goal of building systems that can adapt to their environments and
learn from their experience has attracted researchers from many fields,
including computer science, engineering, mathematics, physics, neuro-
science, and cognitive science. Out of this research has come a wide
variety of learning techniques that are transforming many industrial and
scientific fields. Recently, several research communities have converged
on a common set of issues surrounding supervised, semi-supervised, un-
supervised, and reinforcement learning problems. The MIT Press Series
on Adaptive Computation and Machine Learning seeks to unify the many
diverse strands of machine learning research and to foster high-quality
research and innovative applications.
The MIT Press is extremely pleased to publish this second edition of
Ethem Alpaydın’s introductory textbook. This book presents a readable
and concise introduction to machine learning that reflects these diverse
research strands while providing a unified treatment of the field. The
book covers all of the main problem formulations and introduces the
most important algorithms and techniques encompassing methods from
computer science, neural computation, information theory, and statis-
tics. The second edition expands and updates coverage of several areas,
particularly kernel machines and graphical models, that have advanced
rapidly over the past five years. This updated work continues to be a
compelling textbook for introductory courses in machine learning at the
undergraduate and beginning graduate level.
Figures

1.1 Example of a training dataset where each circle corresponds


to one data instance with input values in the corresponding
axes and its sign indicates the class. 6
1.2 A training dataset of used cars and the function fitted. 10

2.1 Training set for the class of a “family car.” 22


2.2 Example of a hypothesis class. 23
2.3 C is the actual class and h is our induced hypothesis. 25
2.4 S is the most specific and G is the most general hypothesis. 26
2.5 We choose the hypothesis with the largest margin, for best
separation. 27
2.6 An axis-aligned rectangle can shatter four points. 28
2.7 The difference between h and C is the sum of four
rectangular strips, one of which is shaded. 30
2.8 When there is noise, there is not a simple boundary
between the positive and negative instances, and zero
misclassification error may not be possible with a simple
hypothesis. 31
2.9 There are three classes: family car, sports car, and luxury
sedan. 33
2.10 Linear, second-order, and sixth-order polynomials are fitted
to the same set of points. 36
2.11 A line separating positive and negative instances. 44

3.1 Example of decision regions and decision boundaries. 54


xx Figures

4.1 θ is the parameter to be estimated. 67


4.2 (a) Likelihood functions and (b) posteriors with equal priors
for two classes when the input is one-dimensional. 71
4.3 (a) Likelihood functions and (b) posteriors with equal priors
for two classes when the input is one-dimensional. 72
4.4 Regression assumes 0 mean Gaussian noise added to the
model; here, the model is linear. 74
4.5 (a) Function, f (x) = 2 sin(1.5x), and one noisy (N (0, 1))
dataset sampled from the function. 78
4.6 In the same setting as that of figure 4.5, using one hundred
models instead of five, bias, variance, and error for
polynomials of order 1 to 5. 79
4.7 In the same setting as that of figure 4.5, training and
validation sets (each containing 50 instances) are generated. 81
4.8 In the same setting as that of figure 4.5, polynomials of
order 1 to 4 are fitted. 83

5.1 Bivariate normal distribution. 91


5.2 Isoprobability contour plot of the bivariate normal
distribution. 92
5.3 Classes have different covariance matrices. 96
5.4 Covariances may be arbitary but shared by both classes. 97
5.5 All classes have equal, diagonal covariance matrices, but
variances are not equal. 98
5.6 All classes have equal, diagonal covariance matrices of
equal variances on both dimensions. 99
5.7 Different cases of the covariance matrices fitted to the same
data lead to different boundaries. 101

6.1 Principal components analysis centers the sample and then


rotates the axes to line up with the directions of highest
variance. 115
6.2 (a) Scree graph. (b) Proportion of variance explained is given
for the Optdigits dataset from the UCI Repository. 117
6.3 Optdigits data plotted in the space of two principal
components. 118
6.4 Principal components analysis generates new variables that
are linear combinations of the original input variables. 121
Figures xxi

6.5 Factors are independent unit normals that are stretched,


rotated, and translated to make up the inputs. 122
6.6 Map of Europe drawn by MDS. 126
6.7 Two-dimensional, two-class data projected on w. 129
6.8 Optdigits data plotted in the space of the first two
dimensions found by LDA. 132
6.9 Geodesic distance is calculated along the manifold as
opposed to the Euclidean distance that does not use this
information. 134
6.10 Local linear embedding first learns the constraints in the
original space and next places the points in the new space
respecting those constraints. 136

7.1 Given x, the encoder sends the index of the closest code
word and the decoder generates the code word with the
received index as x  . 147
7.2 Evolution of k-means. 148
7.3 k-means algorithm. 149
7.4 Data points and the fitted Gaussians by EM, initialized by
one k-means iteration of figure 7.2. 153
7.5 A two-dimensional dataset and the dendrogram showing
the result of single-link clustering is shown. 159

8.1 Histograms for various bin lengths. 166


8.2 Naive estimate for various bin lengths. 167
8.3 Kernel estimate for various bin lengths. 168
8.4 k-nearest neighbor estimate for various k values. 169
8.5 Dotted lines are the Voronoi tesselation and the straight
line is the class discriminant. 173
8.6 Condensed nearest neighbor algorithm. 174
8.7 Regressograms for various bin lengths. ‘×’ denote data
points. 175
8.8 Running mean smooth for various bin lengths. 176
8.9 Kernel smooth for various bin lengths. 177
8.10 Running line smooth for various bin lengths. 178
8.11 Kernel estimate for various bin lengths for a two-class
problem. 179
8.12 Regressograms with linear fits in bins for various bin lengths. 182
xxii Figures

9.1 Example of a dataset and the corresponding decision tree. 186


9.2 Entropy function for a two-class problem. 189
9.3 Classification tree construction. 191
9.4 Regression tree smooths for various values of θr . 195
9.5 Regression trees implementing the smooths of figure 9.4
for various values of θr . 196
9.6 Example of a (hypothetical) decision tree. 197
9.7 Ripper algorithm for learning rules. 200
9.8 Example of a linear multivariate decision tree. 203

10.1 In the two-dimensional case, the linear discriminant is a


line that separates the examples from two classes. 213
10.2 The geometric interpretation of the linear discriminant. 214
10.3 In linear classification, each hyperplane Hi separates the
examples of Ci from the examples of all other classes. 215
10.4 In pairwise linear separation, there is a separate hyperplane
for each pair of classes. 216
10.5 The logistic, or sigmoid, function. 219
10.6 Logistic discrimination algorithm implementing gradient
descent for the single output case with two classes. 222
10.7 For a univariate two-class problem (shown with ‘◦’ and ‘×’ ),
the evolution of the line w x + w0 and the sigmoid output
after 10, 100, and 1,000 iterations over the sample. 223
10.8 Logistic discrimination algorithm implementing gradient
descent for the case with K > 2 classes. 226
10.9 For a two-dimensional problem with three classes, the
solution found by logistic discrimination. 226
10.10 For the same example in figure 10.9, the linear
discriminants (top), and the posterior probabilities after the
softmax (bottom). 227

11.1 Simple perceptron. 237


11.2 K parallel perceptrons. 239
11.3 Perceptron training algorithm implementing stochastic
online gradient descent for the case with K > 2 classes. 243
11.4 The perceptron that implements AND and its geometric
interpretation. 244
11.5 XOR problem is not linearly separable. 245
11.6 The structure of a multilayer perceptron. 247
Figures xxiii

11.7 The multilayer perceptron that solves the XOR problem. 249
11.8 Sample training data shown as ‘+’, where xt ∼ U(−0.5, 0.5),
and y t = f (xt ) + N (0, 0.1). 252
11.9 The mean square error on training and validation sets as a
function of training epochs. 253
11.10 (a) The hyperplanes of the hidden unit weights on the first
layer, (b) hidden unit outputs, and (c) hidden unit outputs
multiplied by the weights on the second layer. 254
11.11 Backpropagation algorithm for training a multilayer
perceptron for regression with K outputs. 255
11.12 As complexity increases, training error is fixed but the
validation error starts to increase and the network starts to
overfit. 259
11.13 As training continues, the validation error starts to increase
and the network starts to overfit. 259
11.14 A structured MLP. 260
11.15 In weight sharing, different units have connections to
different inputs but share the same weight value (denoted
by line type). 261
11.16 The identity of the object does not change when it is
translated, rotated, or scaled. 262
11.17 Two examples of constructive algorithms. 265
11.18 Optdigits data plotted in the space of the two hidden units
of an MLP trained for classification. 268
11.19 In the autoassociator, there are as many outputs as there
are inputs and the desired outputs are the inputs. 269
11.20 A time delay neural network. 271
11.21 Examples of MLP with partial recurrency. 272
11.22 Backpropagation through time. 273

12.1 Shaded circles are the centers and the empty circle is the
input instance. 282
12.2 Online k-means algorithm. 283
12.3 The winner-take-all competitive neural network, which is a
network of k perceptrons with recurrent connections at the
output. 284
12.4 The distance from x a to the closest center is less than the
vigilance value ρ and the center is updated as in online
k-means. 285
xxiv Figures

12.5 In the SOM, not only the closest unit but also its neighbors,
in terms of indices, are moved toward the input. 287
12.6 The one-dimensional form of the bell-shaped function used
in the radial basis function network. 289
12.7 The difference between local and distributed representations. 290
12.8 The RBF network where ph are the hidden units using the
bell-shaped activation function. 292
12.9 (-) Before and (- -) after normalization for three Gaussians
whose centers are denoted by ‘*’. 296
12.10 The mixture of experts can be seen as an RBF network
where the second-layer weights are outputs of linear models. 301
12.11 The mixture of experts can be seen as a model for
combining multiple models. 302

13.1 For a two-class problem where the instances of the classes


are shown by plus signs and dots, the thick line is the
boundary and the dashed lines define the margins on either
side. 314
13.2 In classifying an instance, there are four possible cases. 316
13.3 Comparison of different loss functions for r t = 1. 318
13.4 The discriminant and margins found by a polynomial
kernel of degree 2. 322
13.5 The boundary and margins found by the Gaussian kernel
with different spread values, s 2 . 323
13.6 Quadratic and -sensitive error functions. 329
13.7 The fitted regression line to data points shown as crosses
and the -tube are shown (C = 10,  = 0.25). 331
13.8 The fitted regression line and the -tube using a quadratic
kernel are shown (C = 10,  = 0.25). 332
13.9 The fitted regression line and the -tube using a Gaussian
kernel with two different spreads are shown
(C = 10,  = 0.25). 332
13.10 One-class support vector machine places the smoothest
boundary (here using a linear kernel, the circle with the
smallest radius) that encloses as much of the instances as
possible. 334
13.11 One-class support vector machine using a Gaussian kernel
with different spreads. 336
Figures xxv

13.12 Instead of using a quadratic kernel in the original space (a),


we can use kernel PCA on the quadratic kernel values to
map to a two-dimensional new space where we use a linear
discriminant (b); these two dimensions (out of five) explain
80 percent of the variance. 337

14.1 The generative graphical model. 342


14.2 Plots of beta distributions for different sets of (α, β). 346
14.3 20 data points are drawn from p(x) ∼ N (6, 1.52 ), prior is
p(μ) ∼ N (4, 0.82 ), and posterior is then
p(μ|X) ∼ N (5.7, 0.32 ). 347
14.4 Bayesian linear regression for different values of α and β. 351
14.5 Bayesian regression using kernels with one standard
deviation error bars. 354
14.6 Gaussian process regression with one standard deviation
error bars. 357
14.7 Gaussian process regression using a Gaussian kernel with
s 2 = 0.5 and varying number of training data. 359

15.1 Example of a Markov model with three states. 365


15.2 An HMM unfolded in time as a lattice (or trellis) showing all
the possible trajectories. 368
15.3 Forward-backward procedure. 371
15.4 Computation of arc probabilities, ξt (i, j). 375
15.5 Example of a left-to-right HMM. 381

16.1 Bayesian network modeling that rain is the cause of wet


grass. 388
16.2 Head-to-tail connection. 390
16.3 Tail-to-tail connection. 391
16.4 Head-to-head connection. 392
16.5 Larger graphs are formed by combining simpler subgraphs
over which information is propagated using the implied
conditional independencies. 394
16.6 (a) Graphical model for classification. (b) Naive Bayes’
classifier assumes independent inputs. 397
16.7 Hidden Markov model can be drawn as a graphical model
where q t are the hidden states and shaded O t are observed. 398
xxvi Figures

16.8 Different types of HMM model different assumptions about


the way the observed data (shown shaded) is generated
from Markov sequences of latent variables. 399
16.9 Bayesian network for linear regression. 401
16.10 Examples of d-separation. 403
16.11 Inference along a chain. 404
16.12 In a tree, a node may have several children but a single parent. 406
16.13 In a polytree, a node may have several children and several
parents, but the graph is singly connected; that is, there is a
single chain between Ui and Yj passing through X. 407
16.14 (a) A multiply connected graph, and (b) its corresponding
junction tree with nodes clustered. 410
16.15 (a) A directed graph that would have a loop after
moralization, and (b) its corresponding factor graph that is
a tree. 412
16.16 Influence diagram corresponding to classification. 415
16.17 A dynamic version where we have a chain of graphs to
show dependency in weather in consecutive days. 416

17.1 Base-learners are dj and their outputs are combined using


f (·). 424
17.2 AdaBoost algorithm. 432
17.3 Mixture of experts is a voting method where the votes, as
given by the gating system, are a function of the input. 434
17.4 In stacked generalization, the combiner is another learner
and is not restricted to being a linear combination as in
voting. 436
17.5 Cascading is a multistage method where there is a sequence
of classifiers, and the next one is used only when the
preceding ones are not confident. 439

18.1 The agent interacts with an environment. 448


18.2 Value iteration algorithm for model-based learning. 453
18.3 Policy iteration algorithm for model-based learning. 454
18.4 Example to show that Q values increase but never decrease. 457
18.5 Q learning, which is an off-policy temporal difference
algorithm. 458
18.6 Sarsa algorithm, which is an on-policy version of Q learning. 459
18.7 Example of an eligibility trace for a value. 460
Figures xxvii

18.8 Sarsa(λ) algorithm. 461


18.9 In the case of a partially observable environment, the agent
has a state estimator (SE) that keeps an internal belief state
b and the policy π generates actions based on the belief
states. 465
18.10 Expected rewards and the effect of sensing in the Tiger
problem. 468
18.11 Expected rewards change (a) if the hidden state can change,
and (b) when we consider episodes of length two. 470
18.12 The grid world. 472

19.1 The process generates an output given an input and is


affected by controllable and uncontrollable factors. 479
19.2 Different strategies of experimentation with two factors
and five levels each. 480
19.3 (a) Typical ROC curve. (b) A classifier is preferred if its ROC
curve is closer to the upper-left corner (larger AUC). 491
19.4 (a) Definition of precision and recall using Venn diagrams.
(b) Precision is 1; all the retrieved records are relevant but
there may be relevant ones not retrieved. (c) Recall is 1; all
the relevant records are retrieved but there may also be
irrelevant records that are retrieved. 492
19.5 95 percent of the unit normal distribution lies between
−1.96 and 1.96. 494
19.6 95 percent of the unit normal distribution lies before 1.64. 496

A.1 Probability density function of Z, the unit normal


distribution. 525
Tables

2.1 With two inputs, there are four possible cases and sixteen
possible Boolean functions. 37

5.1 Reducing variance through simplifying assumptions. 100

11.1 Input and output for the AND function. 244


11.2 Input and output for the XOR function. 245

17.1 Classifier combination rules. 425


17.2 Example of combination rules on three learners and three
classes. 425

19.1 Confusion matrix for two classes. 489


19.2 Performance measures used in two-class problems. 490
19.3 Type I error, type II error, and power of a test. 497
19.4 The analysis of variance (ANOVA) table for a single factor
model. 507
Preface

Machine learning is programming computers to optimize a performance


criterion using example data or past experience. We need learning in
cases where we cannot directly write a computer program to solve a given
problem, but need example data or experience. One case where learning
is necessary is when human expertise does not exist, or when humans
are unable to explain their expertise. Consider the recognition of spoken
speech—that is, converting the acoustic speech signal to an ASCII text;
we can do this task seemingly without any difficulty, but we are unable
to explain how we do it. Different people utter the same word differently
due to differences in age, gender, or accent. In machine learning, the ap-
proach is to collect a large collection of sample utterances from different
people and learn to map these to words.
Another case is when the problem to be solved changes in time, or
depends on the particular environment. We would like to have general-
purpose systems that can adapt to their circumstances, rather than ex-
plicitly writing a different program for each special circumstance. Con-
sider routing packets over a computer network. The path maximizing
the quality of service from a source to destination changes continuously
as the network traffic changes. A learning routing program is able to
adapt to the best path by monitoring the network traffic. Another ex-
ample is an intelligent user interface that can adapt to the biometrics of
its user—namely, his or her accent, handwriting, working habits, and so
forth.
Already, there are many successful applications of machine learning
in various domains: There are commercially available systems for rec-
ognizing speech and handwriting. Retail companies analyze their past
sales data to learn their customers’ behavior to improve customer rela-
xxxii Preface

tionship management. Financial institutions analyze past transactions


to predict customers’ credit risks. Robots learn to optimize their behav-
ior to complete a task using minimum resources. In bioinformatics, the
huge amount of data can only be analyzed and knowledge extracted us-
ing computers. These are only some of the applications that we—that
is, you and I—will discuss throughout this book. We can only imagine
what future applications can be realized using machine learning: Cars
that can drive themselves under different road and weather conditions,
phones that can translate in real time to and from a foreign language,
autonomous robots that can navigate in a new environment, for example,
on the surface of another planet. Machine learning is certainly an exciting
field to be working in!
The book discusses many methods that have their bases in different
fields: statistics, pattern recognition, neural networks, artificial intelli-
gence, signal processing, control, and data mining. In the past, research
in these different communities followed different paths with different
emphases. In this book, the aim is to incorporate them together to give a
unified treatment of the problems and the proposed solutions to them.
This is an introductory textbook, intended for senior undergraduate
and graduate-level courses on machine learning, as well as engineers
working in the industry who are interested in the application of these
methods. The prerequisites are courses on computer programming, prob-
ability, calculus, and linear algebra. The aim is to have all learning algo-
rithms sufficiently explained so it will be a small step from the equations
given in the book to a computer program. For some cases, pseudocode
of algorithms are also included to make this task easier.
The book can be used for a one-semester course by sampling from the
chapters, or it can be used for a two-semester course, possibly by dis-
cussing extra research papers; in such a case, I hope that the references
at the end of each chapter are useful.
The Web page is http://www.cmpe.boun.edu.tr/∼ethem/i2ml/ where I
will post information related to the book that becomes available after the
book goes to press, for example, errata. I welcome your feedback via
email to [email protected].
I very much enjoyed writing this book; I hope you will enjoy reading it.
Acknowledgments

The way you get good ideas is by working with talented people who are
also fun to be with. The Department of Computer Engineering of Boğaziçi
University is a wonderful place to work, and my colleagues gave me all the
support I needed while working on this book. I would also like to thank
my past and present students on whom I have field-tested the content
that is now in book form.
While working on this book, I was supported by the Turkish Academy
of Sciences, in the framework of the Young Scientist Award Program (EA-
TÜBA-GEBİP/2001-1-1).
My special thanks go to Michael Jordan. I am deeply indebted to him
for his support over the years and last for this book. His comments on
the general organization of the book, and the first chapter, have greatly
improved the book, both in content and form. Taner Bilgiç, Vladimir
Cherkassky, Tom Dietterich, Fikret Gürgen, Olcay Taner Yıldız, and anony-
mous reviewers of the MIT Press also read parts of the book and provided
invaluable feedback. I hope that they will sense my gratitude when they
notice ideas that I have taken from their comments without proper ac-
knowledgment. Of course, I alone am responsible for any errors or short-
comings.
My parents believe in me, and I am grateful for their enduring love
and support. Sema Oktuğ is always there whenever I need her, and I will
always be thankful for her friendship. I would also like to thank Hakan
Ünlü for our many discussions over the years on several topics related to
life, the universe, and everything.
This book is set using LATEX macros prepared by Chris Manning for
which I thank him. I would like to thank the editors of the Adaptive Com-
putation and Machine Learning series, and Bob Prior, Valerie Geary, Kath-
xxxiv Acknowledgments

leen Caruso, Sharon Deacon Warne, Erica Schultz, and Emily Gutheinz
from the MIT Press for their continuous support and help during the
completion of the book.
Notes for the Second Edition

Machine learning has seen important developments since the first edition
appeared in 2004. First, application areas have grown rapidly. Internet-
related technologies, such as search engines, recommendation systems,
spam fiters, and intrusion detection systems are now routinely using ma-
chine learning. In the field of bioinformatics and computational biology,
methods that learn from data are being used more and more widely. In
natural language processing applications—for example, machine transla-
tion—we are seeing a faster and faster move from programmed expert
systems to methods that learn automatically from very large corpus of
example text. In robotics, medical diagnosis, speech and image recogni-
tion, biometrics, finance, sometimes under the name pattern recognition,
sometimes disguised as data mining, or under one of its many cloaks,
we see more and more applications of the machine learning methods we
discuss in this textbook.
Second, there have been supporting advances in theory. Especially, the
idea of kernel functions and the kernel machines that use them allow
a better representation of the problem and the associated convex opti-
mization framework is one step further than multilayer perceptrons with
sigmoid hidden units trained using gradient-descent. Bayesian meth-
ods through appropriately chosen prior distributions add expert know-
ledge to what the data tells us. Graphical models allow a representa-
tion as a network of interrelated nodes and efficient inference algorithms
allow querying the network. It has thus become necessary that these
three topics—namely, kernel methods, Bayesian estimation, and graphi-
cal models—which were sections in the first edition, be treated in more
length, as three new chapters.
Another revelation hugely significant for the field has been in the real-
xxxvi Notes for the Second Edition

ization that machine learning experiments need to be designed better. We


have gone a long way from using a single test set to methods for cross-
validation to paired t tests. That is why, in this second edition, I have
rewritten the chapter on statistical tests as one that includes the design
and analysis of machine learning experiments. The point is that testing
should not be a separate step done after all runs are completed (despite
the fact that this new chapter is at the very end of the book); the whole
process of experimentation should be designed beforehand, relevant fac-
tors defined, proper experimentation procedure decided upon, and then,
and only then, the runs should be done and the results analyzed.

It has long been believed, especially by older members of the scientific


community, that for machines to be as intelligent as us, that is, for ar-
tificial intelligence to be a reality, our current knowledge in general, or
computer science in particular, is not sufficient. People largely are of
the opinion that we need a new technology, a new type of material, a
new type of computational mechanism or a new programming methodol-
ogy, and that, until then, we can only “simulate” some aspects of human
intelligence and only in a limited way but can never fully attain it.
I believe that we will soon prove them wrong. First we saw this in
chess, and now we are seeing it in a whole variety of domains. Given
enough memory and computation power, we can realize tasks with rela-
tively simple algorithms; the trick here is learning, either learning from
example data or learning from trial and error using reinforcement learn-
ing. It seems as if using supervised and mostly unsupervised learn-
ing algorithms—for example, machine translation—will soon be possible.
The same holds for many other domains, for example, unmanned navi-
gation in robotics using reinforcement learning. I believe that this will
continue for many domains in artificial intelligence, and the key is learn-
ing. We do not need to come up with new algorithms if machines can
learn themselves, assuming that we can provide them with enough data
(not necessarily supervised) and computing power.

I would like to thank all the instructors and students of the first edition,
from all over the world, including the reprint in India and the German
translation. I am grateful to those who sent me words of appreciation
and errata or who provided feedback in any other way. Please keep those
emails coming. My email address is [email protected].
The second edition also provides more support on the Web. The book’s
Notes for the Second Edition xxxvii

Web site is http://www.cmpe.boun.edu.tr/∼ethem/i2ml.

I would like to thank my past and present thesis students, Mehmet Gönen,
Esma Kılıç, Murat Semerci, M. Aydın Ulaş, and Olcay Taner Yıldız, and also
those who have taken CmpE 544, CmpE 545, CmpE 591, and CmpE 58E
during these past few years. The best way to test your knowledge of a
topic is by teaching it.
It has been a pleasure working with the MIT Press again on this second
edition, and I thank Bob Prior, Ada Brunstein, Erin K. Shoudy, Kathleen
Caruso, and Marcy Ross for all their help and support.
Notations

x Scalar value
x Vector
X Matrix
x T
Transpose
−1
X Inverse

X Random variable
P (X) Probability mass function when X is discrete
p(X) Probability density function when X is continuous
P (X|Y ) Conditional probability of X given Y
E[X] Expected value of the random variable X
Var(X) Variance of X
Cov(X, Y ) Covariance of X and Y
Corr(X, Y ) Correlation of X and Y

μ Mean
2
σ Variance
Σ Covariance matrix
m Estimator to the mean
s2 Estimator to the variance
S Estimator to the covariance matrix
xl Notations

N (μ, σ 2 ) Univariate normal distribution with mean μ and vari-


ance σ 2
Z Unit normal distribution: N (0, 1)
Nd (μ, Σ) d-variate normal distribution with mean vector μ and
covariance matrix Σ

x Input
d Number of inputs (input dimensionality)
y Output
r Required output
K Number of outputs (classes)
N Number of training instances
z Hidden value, intrinsic dimension, latent factor
k Number of hidden dimensions, latent factors
Ci Class i
X Training sample
{xt }N
t=1 Set of x with index t ranging from 1 to N
{x , r }t
t t
Set of ordered pairs of input and desired output with
index t

g(x|θ) Function of x defined up to a set of parameters θ


arg maxθ g(x|θ) The argument θ for which g has its maximum value
arg minθ g(x|θ) The argument θ for which g has its minimum value
E(θ|X) Error function with parameters θ on the sample X
l(θ|X) Likelihood of parameters θ on the sample X
L(θ|X) Log likelihood of parameters θ on the sample X

1(c) 1 if c is true, 0 otherwise


#{c} Number of elements for which c is true
δij Kronecker delta: 1 if i = j, 0 otherwise
1 Introduction

1.1 What Is Machine Learning?

To s ol v e a problem on a computer, we need an algorithm. An algo-


rithm is a sequence of instructions that should be carried out to trans-
form the input to output. For example, one can devise an algorithm for
sorting. The input is a set of numbers and the output is their ordered
list. For the same task, there may be various algorithms and we may be
interested in finding the most efficient one, requiring the least number of
instructions or memory or both.
For some tasks, however, we do not have an algorithm—for example,
to tell spam emails from legitimate emails. We know what the input is:
an email document that in the simplest case is a file of characters. We
know what the output should be: a yes/no output indicating whether the
message is spam or not. We do not know how to transform the input
to the output. What can be considered spam changes in time and from
individual to individual.
What we lack in knowledge, we make up for in data. We can easily
compile thousands of example messages some of which we know to be
spam and what we want is to “learn” what consititutes spam from them.
In other words, we would like the computer (machine) to extract auto-
matically the algorithm for this task. There is no need to learn to sort
numbers, we already have algorithms for that; but there are many ap-
plications for which we do not have an algorithm but do have example
data.
With advances in computer technology, we currently have the ability to
store and process large amounts of data, as well as to access it from phys-
ically distant locations over a computer network. Most data acquisition
2 1 Introduction

devices are digital now and record reliable data. Think, for example, of a
supermarket chain that has hundreds of stores all over a country selling
thousands of goods to millions of customers. The point of sale terminals
record the details of each transaction: date, customer identification code,
goods bought and their amount, total money spent, and so forth. This
typically amounts to gigabytes of data every day. What the supermarket
chain wants is to be able to predict who are the likely customers for a
product. Again, the algorithm for this is not evident; it changes in time
and by geographic location. The stored data becomes useful only when
it is analyzed and turned into information that we can make use of, for
example, to make predictions.
We do not know exactly which people are likely to buy this ice cream
flavor, or the next book of this author, or see this new movie, or visit this
city, or click this link. If we knew, we would not need any analysis of the
data; we would just go ahead and write down the code. But because we
do not, we can only collect data and hope to extract the answers to these
and similar questions from data.
We do believe that there is a process that explains the data we observe.
Though we do not know the details of the process underlying the gener-
ation of data—for example, consumer behavior—we know that it is not
completely random. People do not go to supermarkets and buy things
at random. When they buy beer, they buy chips; they buy ice cream in
summer and spices for Glühwein in winter. There are certain patterns in
the data.
We may not be able to identify the process completely, but we believe
we can construct a good and useful approximation. That approximation
may not explain everything, but may still be able to account for some part
of the data. We believe that though identifying the complete process may
not be possible, we can still detect certain patterns or regularities. This
is the niche of machine learning. Such patterns may help us understand
the process, or we can use those patterns to make predictions: Assuming
that the future, at least the near future, will not be much different from
the past when the sample data was collected, the future predictions can
also be expected to be right.
Application of machine learning methods to large databases is called
data mining. The analogy is that a large volume of earth and raw ma-
terial is extracted from a mine, which when processed leads to a small
amount of very precious material; similarly, in data mining, a large vol-
ume of data is processed to construct a simple model with valuable use,
1.1 What Is Machine Learning? 3

for example, having high predictive accuracy. Its application areas are
abundant: In addition to retail, in finance banks analyze their past data
to build models to use in credit applications, fraud detection, and the
stock market. In manufacturing, learning models are used for optimiza-
tion, control, and troubleshooting. In medicine, learning programs are
used for medical diagnosis. In telecommunications, call patterns are an-
alyzed for network optimization and maximizing the quality of service.
In science, large amounts of data in physics, astronomy, and biology can
only be analyzed fast enough by computers. The World Wide Web is huge;
it is constantly growing, and searching for relevant information cannot be
done manually.
But machine learning is not just a database problem; it is also a part
of artificial intelligence. To be intelligent, a system that is in a changing
environment should have the ability to learn. If the system can learn and
adapt to such changes, the system designer need not foresee and provide
solutions for all possible situations.
Machine learning also helps us find solutions to many problems in vi-
sion, speech recognition, and robotics. Let us take the example of rec-
ognizing faces: This is a task we do effortlessly; every day we recognize
family members and friends by looking at their faces or from their pho-
tographs, despite differences in pose, lighting, hair style, and so forth.
But we do it unconsciously and are unable to explain how we do it. Be-
cause we are not able to explain our expertise, we cannot write the com-
puter program. At the same time, we know that a face image is not just a
random collection of pixels; a face has structure. It is symmetric. There
are the eyes, the nose, the mouth, located in certain places on the face.
Each person’s face is a pattern composed of a particular combination
of these. By analyzing sample face images of a person, a learning pro-
gram captures the pattern specific to that person and then recognizes by
checking for this pattern in a given image. This is one example of pattern
recognition.
Machine learning is programming computers to optimize a performance
criterion using example data or past experience. We have a model defined
up to some parameters, and learning is the execution of a computer pro-
gram to optimize the parameters of the model using the training data or
past experience. The model may be predictive to make predictions in the
future, or descriptive to gain knowledge from data, or both.
Machine learning uses the theory of statistics in building mathematical
models, because the core task is making inference from a sample. The
4 1 Introduction

role of computer science is twofold: First, in training, we need efficient


algorithms to solve the optimization problem, as well as to store and pro-
cess the massive amount of data we generally have. Second, once a model
is learned, its representation and algorithmic solution for inference needs
to be efficient as well. In certain applications, the efficiency of the learn-
ing or inference algorithm, namely, its space and time complexity, may
be as important as its predictive accuracy.
Let us now discuss some example applications in more detail to gain
more insight into the types and uses of machine learning.

1.2 Examples of Machine Learning Applications

1.2.1 Learning Associations

In the case of retail—for example, a supermarket chain—one application


of machine learning is basket analysis, which is finding associations be-
tween products bought by customers: If people who buy X typically also
buy Y , and if there is a customer who buys X and does not buy Y , he
or she is a potential Y customer. Once we find such customers, we can
target them for cross-selling.
association rule In finding an association rule, we are interested in learning a conditional
probability of the form P (Y |X) where Y is the product we would like to
condition on X, which is the product or the set of products which we
know that the customer has already purchased.
Let us say, going over our data, we calculate that P (chips|beer) = 0.7.
Then, we can define the rule:

70 percent of customers who buy beer also buy chips.

We may want to make a distinction among customers and toward this,


estimate P (Y |X, D) where D is the set of customer attributes, for exam-
ple, gender, age, marital status, and so on, assuming that we have access
to this information. If this is a bookseller instead of a supermarket, prod-
ucts can be books or authors. In the case of a Web portal, items corre-
spond to links to Web pages, and we can estimate the links a user is likely
to click and use this information to download such pages in advance for
faster access.
1.2 Examples of Machine Learning Applications 5

1.2.2 Classification

A credit is an amount of money loaned by a financial institution, for


example, a bank, to be paid back with interest, generally in installments.
It is important for the bank to be able to predict in advance the risk
associated with a loan, which is the probability that the customer will
default and not pay the whole amount back. This is both to make sure
that the bank will make a profit and also to not inconvenience a customer
with a loan over his or her financial capacity.
In credit scoring (Hand 1998), the bank calculates the risk given the
amount of credit and the information about the customer. The informa-
tion about the customer includes data we have access to and is relevant in
calculating his or her financial capacity—namely, income, savings, collat-
erals, profession, age, past financial history, and so forth. The bank has
a record of past loans containing such customer data and whether the
loan was paid back or not. From this data of particular applications, the
aim is to infer a general rule coding the association between a customer’s
attributes and his risk. That is, the machine learning system fits a model
to the past data to be able to calculate the risk for a new application and
then decides to accept or refuse it accordingly.
classification This is an example of a classification problem where there are two
classes: low-risk and high-risk customers. The information about a cus-
tomer makes up the input to the classifier whose task is to assign the
input to one of the two classes.
After training with the past data, a classification rule learned may be
of the form

IF income> θ1 AND savings> θ2 THEN low-risk ELSE high-risk

for suitable values of θ1 and θ2 (see figure 1.1). This is an example of


discriminant a discriminant; it is a function that separates the examples of different
classes.
prediction Having a rule like this, the main application is prediction: Once we have
a rule that fits the past data, if the future is similar to the past, then we
can make correct predictions for novel instances. Given a new application
with a certain income and savings, we can easily decide whether it is low-
risk or high-risk.
In some cases, instead of making a 0/1 (low-risk/high-risk) type de-
cision, we may want to calculate a probability, namely, P (Y |X), where
X are the customer attributes and Y is 0 or 1 respectively for low-risk
Another Random Document on
Scribd Without Any Related Topics
Scandinavian upon the scene will mean two cotton patches instead
of one, one for him and one for the negro.
True, there are many of our ambitious farmers who can’t secure
enough laborers to make enough cotton to get rich as fast as they
would, but they’ll get rich quick enough in using our present
available labor, thereby raising less cotton, getting more for it, and,
at the same time, either resting their lands or sowing it in small
grain. And when these wealthy farmers plant less cotton, they
benefit every poor fellow, standing between the plow handles, in the
entire South.
And this one or two horse farmer is the ideal citizen, anyway. The
best communities are those made up of small farmers. The
churches, schools and social conditions in a community, in which one
man owns all the land and runs forty plows, are not as good as one
with forty land owners. Oglethorpe county would, no doubt, be a
better county today, if Jim Smith had never been born, and this is
not said with the spirit of criticising a single act of his life, either.
It’s all right for immigrants to build our cities, our railroads, our
manufacturing enterprises or to labor in the cities, on the railroads,
in the factories or even take servants’ places, but the South doesn’t
need any more cotton raisers.—Gwinnett (Ga.) Journal.

The Railroad Power.

The railroad magnates have divided up the lines in this country


among nine families of plutocrats, who by controlling transportation
of passengers and freight, can control the Government. They are
divided as follows:
Harriman 22,276
Vanderbilt 20,493
Pennsylvania 20,138
Hill 19,407
Morgan 18,789
Gould 13,789
Moore 13,028
Rockefeller 10,293
Santa Fe 7,809
Total 146,112
That is three-fourths of the mileage of the country and the control
of the main lines in every state and territory. It puts into the hands
of these men a greater power than was ever exercised by any group
of kings, lords and dukes who ever formed a community of interests.
In all past history, to overthrow such a power as that, a resort to
long and bloody wars was the only recourse. It remains to be seen
whether the great peace movements of the last few years, for which
Andrew Carnegie has built a temple at The Hague, will produce a
sentiment strong enough to settle this question peaceably. Would
Andrew Carnegie encourage anarchistic disorders if he thought there
was a danger of a reduction of the tariff on steel? Railroad
combination and robber tariffs are only another manifestation of
what we once called the “money power.”—Omaha (Neb.,)
Investigator.

Populism Forever.

Whatever may be the future of the People’s party, whether it is


doomed to pass away and give place to some other party that will
present its principles, or whether it may yet rise as it deserves to do
and get control of the Government, remains yet to be seen.
One thing is certain. It has already accomplished more in this
Government in the last fifteen years in the way of creating public
sentiment and political conviction than both of the old parties.
Fifteen years ago the two old parties were discussing nothing but
Tariff. The Money question, the Railroad question and the Trust
question were entirely ignored by them. Not because the leaders of
the two old parties did not know the magnitude and importance of
these questions; but they got their campaign boodle from these rich
corporations and they were willing to accept the money and let the
common people perish.
When the Populists came on the scene they began to “cry aloud
and spare not.”
They showed that the same laws of supply and demand that
regulated the prices of other commodities, also regulated the price
of money, that when money was plentiful, prices were high, all kinds
of business prosperous and labor fully employed and well paid. While
on the other hand when money was scarce it was high; all industries
paralized, men out of work and their families suffering for bread.
They showed that the great railroad corporations had secured the
public franchises and were taxing the people without their consent
and without mercy.
They showed that their great corporations, growing enormously
rich were combining together and forming “Trusts” and that they will
eventually control all prices, and as completely own and control the
country as did the Barons under the Feudal system.
Their cry and their plea was invincible.
Their arguments could not be answered.
Ridicule and abuse might serve to keep them down for awhile, but
the just indictment against the two old parties was destined some
day to be sustained.
Today the Populist looks on with pleasure and sees his principles
growing in public favor every day.
The Republican president and the Democratic leader are endorsing
the very doctrines that were fifteen years ago considered the most
radical.
All honor to the Populists.—Nevada County (Ark.) Picayune.
Sense.

Can there be such a thing as a radical conservative? John Temple


Graves thinks Col. Pleasant Stovall, of the Savannah Press, fits that
kind of a job. We believe that a trimmer—if that is what Col. Graves
means to call Col. Stovall—is the most decent citizen that afflicts
human society. He stops rushing things when a sense of the
proprieties tells him that a thing has been rushed far enough to
make it coarse or common (use which word you prefer), and the
result is that his hair does not grow too long, nor his ears too
puritanically short.
Yes, sir, the medium grade takes the cake. Except in an occasional
storm, the radicals may overrun all opposition; but it don’t last, and
the fellow that wins on an extravagant moral issue may be found in
the ditch dead drunk as soon as public sentiment gets normal and
resorts for a season to common sense arrangements of its ethics
and politics.
Yes, the men who make an over-display of honesty for the season
always get left as soon as the folks get back to their normal
qualities. Common sense controls when the excitement has passed.
—Cordele Rambler.

A Lesson in Fusion.

Hearst got beat for Governor of New York while the balance of the
State ticket he was on got elected. A few of the successful
candidates are Independence League men, but most of them are
straight Democrats. Thus Hearst’s reform work was turned to the
benefit of corrupt and foul Tammany. We hope this lesson in fusion
will be enough for the League. Hearst was defeated by about
60,000, while the other State candidates on the League-Democratic
fusion ticket were elected by small pluralities. Tammany scratched
Hearst. The Wall Street element of the Democratic party either
scratched him or voted the Republican ticket. We are inclined to
think well of Hearst because of those who scratched him. Hearst
says the fight for the rights of the people is still on. With his great
daily papers he can do a vast work toward overthrowing the rule of
the money power, if he gets into the middle of the road and stays
there. But if he endeavors to work within the old party he will do
more to prevent the success of the people than forty Clevelands
could do. Maybe the thing wasn’t hardly ripe and he had to back out,
which he did by fusing with the Democrats after the League had
nominated a straight ticket. We are guessing that Hearst will be in
the middle of the road supporting Tom Watson for President in 1908.
—Missouri World.

A Significant Vote.

Whatever may be said of W. R. Hearst’s individual sincerity and


integrity of purpose, the vote which almost landed him in the
governor’s chair of New York State—a position which is next to the
presidency—is the vote which is dissatisfied with corporation
conditions. It is a significant vote. And the strong anti-Hearst
sentiment among the “upper ten” all over the South is also
significant. Everywhere men are, consciously or unconsciously,
taking their positions along lines of economics.—Farmers’ Journal,
Abilene, Texas.

Where He Belongs.

Tom Watson is back in Georgia, where he belongs. He is too


warm, too impulsive, too frank and too honest for New York—cold,
calculating, deceitful, hateful New York.—Farmers’ Journal, Abilene,
Tex.
Letters From the People.

Half Jackal and Half Hyena.


Lucian L. Knight, Los Angeles, Cal.
I cannot silence without gratifying the impulse which prompts me to write
you at once after reading what you have to say about that superlative
scoundrel, Mann. As a friend who holds you in loyal and affectionate
admiration, I resent with all my heart the treatment which you have received
at the hands of one who is man in name only and who in nature is half jackal
and half hyena. I am aware how true it is that the most insignificant of
insects may vex even the noble lion. Mann may be a millionaire, but I
warrant his shillings are dirty. How miserably poor the wretch is when the
only assets he has in the world are the millions which lie in the bank!
I know how little things have annoyed me at times and I know how much I
have appreciated an impulsive word of sympathy, even from “the least of
these.”
I am sorry I could not get the extracts you were to send me in time for
Vol. I of the Reminiscences which will be ready for publication within the next
few weeks; but I hope to get them in time for Vol. II. The work will require
two volumes. I have met with so much encouragement that I am warranted
in beginning at once upon Vol. II. Disappointed in other cherished plans and
prospects, I am now putting my life’s ambition into this work which I trust to
be able to make of literary and historic value to Georgia.

An Anonymous Slander Rebuked.


W. H. Eddy, Los Angeles, Cal.
Can you explain what quality there is in human nature, that prompts some
specimens of the race to the doing of things which are admirably adapted to
the hindrance of all that they profess to be in earnest support of? Now, for
instance: in The Appeal to Reason of October 27th, is an unsigned article
headed, “Populism and the Pap.” Being unsigned would indicate that it came
from the pen of J. A. Wayland, or that of F. D. Warren, the present Managing
Editor. Now, both of these men have been earnest workers for several years,
or at least for what they conceive to be the cause of socialism. It would
seem as though, when two men are associated in a work which each of them
has advocated and written signed articles in favor of, lauding it as a
movement opposed to dishonesty, deceit and all the baser tendencies of the
depraved mind, that when he takes a notion to stultify himself, descend to
the lowest practices of the gutter blackguard, he would have consideration
enough for his associate to sign his name, so that decent people who may be
so unfortunate as to peruse it, and such of the patrons of the paper as have
both decency and brains enough to resent it, might not blame the innocent
party inadvertently.
If socialism is anything, or is to be anything in human history which is to
make for the betterment of humanity, then it must rest upon the
fundamental principles of honesty, justice and truth. All those who would not
be cursed by its adoption to-morrow, by reason of their lack of development
and their consequent lack of capacity to appreciate its meaning and the
obligations inherent in it, assert that brotherly love, the golden rule, and the
Sermon on the Mount, are also corner-stones in the foundation of its most
noble structure.
But let us leave the latter out of consideration for the present. It is not
conceivable that there can be dug up in the office of the Appeal one
individual so low in the scale of human development as not to concede
honor, truth and justice as being the beginning, the very A B C of socialism.
What must we think, then, of the estimate that the one who penned the
abominable article referred to, places upon the intelligence of his readers, to
say nothing of the hundreds who have drawn from the little stock of their
earnings which are really needed for the comfort of themselves and family, to
assist this poor degenerate in distributing his venom and flaunting his idiocy
in the faces of a nation of intelligent people, to the disheartenment of
thousands of advocates of socialism, and the great glee of those who aver
that the animus of socialism and of all socialists is such as “Breathes the hot
breath of brutal hate, and riots as it runs” through the two columns of the
“Appeal?” To those who have not been unfortunate enough to read it, suffice
it to say that it is a wholly uncalled for, unsocialistic, and, from every point of
view, rascally, assault upon Hon. Thomas E. Watson, and to make it, if
possible, more pusillanimous, it is given publicity just at a time when, on
account of Mr. Watson’s having been grievously misused by his former
business associates in the Watson’s Magazine enterprise, Mr. Watson is
deserving of the sympathy of every person who possesses a spark of
decency. It is cowardly in the extreme to strike a person when he is down or
crippled.
Mr. Watson’s life history is now an open book to anyone, not an absolute
ignoramus, in this broad country. He has his prejudices. He has his own
political ideas, which, at the test of the ballot box, have been shown to be
largely in the minority, but, to his honor be it said, the fact that they were
not the winning card, has never caused him for one moment to falter in
devoting time, money and energy in their advocacy, a fact which of itself
would give the lie to the baseless, senseless and hypocritical charge of
treachery and double dealing, which, by their own statements, finds 250,000
duplications in this issue of the Appeal.
It is doing too much honor to quote from it, but the readers of progress
will excuse the presentation of some short samples. “Tom appears to have a
grudge against whatever tends toward progress.” Think of that in reference
to the father of the rural free delivery postal system which carries tens of
thousands of copies of this very diatribe of lies to the farmers of the country,
in whose interest Watson succeeded in having this system established, as the
Congressional Record will bear witness. Again, “But he has been repudiated
by the respectable democratic press of his state—as witness the merciless
exposure of his methods by the Atlanta Constitution and the Macon
Telegraph!” Respectable! The Atlanta Constitution and the Macon Telegraph!
Socialists of intelligence, what have you to say of the creature so lost to
decency as to, in the columns of the leading socialist weekly paper of
America, if not of the world, so far as circulation goes, laud the Atlanta
Constitution and the Macon Telegraph, notoriously the most mercenary and
most thoroughly corporation-serving papers of the entire South, and for no
other reason—for he can plead no other—than because they are fighting
Tom Watson, who happens to be under the ban of his displeasure?
And why are the ultra corporation journals fighting Tom Watson? Because
honest Tom Watson is sacrificing his private interests in a determined effort
to defeat the machinations of the Walter Parkers, the Herrins and the Abe
Ruefs of his beloved state.
Again: “That Watson received the price for his perfidy is not for a moment
to be doubted.” Whoever penned those lines either knew that he was
penning a most villainous lie or he is too ignorant to be worthy of the
contempt of a chimpanzee. There isn’t a person with intelligence enough to
write connectedly on truth, or any part of the scurrilous rot this creature did,
but knows perfectly well that if Tom Watson had been corruptible, he could
have received ten times more to have sold himself to the very forces this
creature is supposed to be fighting, than it has ever been claimed he did get.
That is just as true of Tom Watson as it is of ’Gene V. Debs. Everyone,
including the writer of that malicious screed, knows that they both could be
rolling luxuriously in wealth if they had but followed the course of these very
papers which he is pleased to declare “respectable.”
But the last quotation is manly as compared with this one: “It is said that
he has been up for sale before, and was knocked down to the highest
bidder,” etc., “It is said!” The language of the conscienceless gossip, the
method of the footpad, with the sand-bag, or the gas-pipe who strikes you
out of the dark. Again: “John M. Barnes, a man for whose veracity many
stand ready to vouch, etc.” Very good, Mr. No-name. Mr. Barnes is good
enough authority to use in an effort to injure your brother man, Mr. Thos.
Watson. Would you accept Mr. John M. Barnes’ statement as to the offices
you, your associates and the Appeal to Reason are performing in America,
and would you abide by and endorse them in your own case? Never! And
that very fact impeaches your honesty in quoting, as against Tom Watson,
the slanders of corporation hirelings and political hacks, whom you know are
fighting him for what there is in it. No! no! When you go straining a point,
you always prove too much.
Be something like a man, and bid him God speed in his task of awakening
the people to their dangers, even if he does leave them short of being full-
fledged socialists.
Tom Watson’s opinions are not in all respects mine. In fact, there are many
points on which we do not agree. But if I have outgrown the tenets of
Populism, and he has not, or if he sees so many falling away from the mere
party organization of populism as to be heart-sore and discouraged, and
chooses to advocate the same principles under the name of Jeffersonian
Democracy, he is still entitled to the respect of friend and foe alike, until he
sacrifices principle to greed or puerile hatred.

Wants to Follow Watson’s Pen.


L. A. Benson, Clay Center, Kan.
I write to inquire as to the truth or falsity of the rumor that you have
severed your connection with the Magazine which bears your name. I have
been a voting Prohibitionist since 1885. I bought the first issue of “Tom
Watson’s” and read it, and have hungered for its appearance ever since. I
have read every line of Editorial and other matter which came from your pen.
I was beginning to think myself so much of a Populist that I could “keep
step.” From 1894 until 1901 I lived in Philadelphia, Pa., and judging the
Populists from the caricatures appearing in Eastern papers I felt surprised to
find that I have all along possessed just such views as constitute the essence
of Populism. I find myself unwilling to give up the opportunity to follow your
pen. I will regard it as a matter of genuine kindness to me if you will put me
in connection with the Magazine which takes your copy and spreads it among
your many disciples and admirers. I regard the work which you are doing as
fundamental, and I am aware that you, like all leaders in reforms which
touch the money-king, will suffer. If it be in the power of “Old Plute” to
crucify you, he will not be too tender. He will not be lacking in heartless
cruelty. But while you are bidding high for the hate and vengeance of “Old
Plute” you are winning the glorious title of “friend” and “brother” to those
who are crushed ’neath the heel of this heartless, greedy foe. To oppose him
and to stand the loving helper of men, is to trace the footsteps of the Man of
Galilee, up a modern Calvary. In plain language, it is the essence of pure and
undefiled religion. Here’s my hand, brother, and may God bless and prosper
you.

Hedging on Human Life.


N. B. McDowell, Ronceverte, W. Va.
In reply to your question in Watson’s Magazine of August: “Is it true that
railroad corporations insure the lives of the railroad mail clerks?” I cannot
speak for the railroad corporations, but it was developed in court here that
the St. Lawrence Broom & Manufacturing Co., the largest corporation in this
section, has the lives of its employes insured for its benefit. This company
employs a large number of men and boys and has never made any
provisions for their protection against the inclemency of the weather, or the
many dangers of machinery that might be averted.
I was an employe of this company for fifteen years and have seen a
number of men and boys mangled and maimed for life, but it was not known
until quite recently that the corporation received insurance for every employe
that got crippled. A boy got his hand cut off and sued the company for
damages and it was clearly proven that the employes were insured for the
benefit of the company.

Must Have the “Jeffersonian.”


D. H. Chamberlain, Harriston, Miss.
A few days ago I saw in the Memphis Commercial-Appeal that you had
severed your connection with Watson’s Magazine. I am a subscriber and
have taken the Magazine solely on account of your Editorials, which I regard
as the finest and most forceful I ever read in any Magazine. My subscription
is about run out and if you are no longer connected with the publication I do
not care to renew. This is my reason for making this inquiry, and I will be
glad to hear the report is untrue. If it is correct let me know if it will be your
purpose to edit a similar periodical. In that event you can count on me as a
subscriber, even if the subscription price should be increased to $5.00 per
annum.
I think you are doing a great and necessary work in your attempts to
arouse the people to the dangers that now menace the liberties of this
unhappy land, “to hastening ills a prey.”
There is one point on which I am constrained to criticise you and that is
your ill-advised attacks on W. J. Bryan, which is something I am utterly
unable to understand.
Why do you do this when you both stand for the same things? It seems to
me unfortunate, to say the least, that soldiers of the hosts of Reform should
turn their artillery upon each other when so much ammunition is needed to
fight the cohorts of Plutocracy, and in this connection nothing will ever be
accomplished in the way of bringing this Government again into the
possession of the people if any such suicidal policy is pursued. The reformers
must get together if this Republic is to be preserved, if it is not even now too
late to save it. Of this I am certain: we have no time left us for internal
dissensions, and I hope that so splendid a soldier of the common good as
yourself, will, in the future, refrain from stirring up discord in the ranks of
Reform, and reserve your ammunition entirely for our enemies.

As to Gins.
R. W. Barkley, New York City. November 12, 1906.
I note that you are proposed as President of the Cotton Association. I have
read your Magazine from the first number until Mann got it, and I know your
desire to benefit the South. I control the patent rights on a cotton gin which
works on a new principle and which leaves the cotton in natural lengths,
thereby enhancing the price to the planter by one to five cents per pound.
The gin can be run by hand, or by power, and a few farmers can own one in
common and thereby earn money by ginning their own cotton. The gin
consists of “mechanism for gradually opening and loosening the cotton fibres
while still attached to the seeds, with means for thereafter removing the
seeds.” Just take a little cotton and gradually pull the fibres apart, without,
however, separating them from the seed, until you have a large puff ball and
then see how easily they come off at the seed. Well, that is what this
machine does. No “gin cut” cotton in it. Seed practically unhurt, also. Am
looking for money wherewith to build a large machine, (the inventor made
the working model by hand himself); it does the work fairly well, but it is
getting to be ram-shackle for demonstration purposes, and then for capital
wherewith to work the gin commercially. Such a gin ought to interest you
and also the Cotton Association.
Editor’s Note.—Having just been run through one new and improved gin—
known as Town Topics—and having been badly “gin cut” myself, have but
slight inclination for new inventions of the gin variety.

Getting Used to It.


S. R. Sikes, Ocilla, Georgia.
I have your card of November 10th, advising me of your withdrawal from
Watson’s Magazine, and of your intention of publishing in the near future
Watson’s Jeffersonian. I desire to express my sympathy for you in your recent
trouble with the New York publication, and to assure you of my friendship
and best wishes for you in your new enterprise, “The Jeffersonian.”
I feel sure that you have been treated very unfairly by those New York
people, and I feel a spirit of resentment for you, and I am to-day writing
them to discontinue mailing Watson’s Magazine to me, and to erase my
name from their list of subscribers. (Copy of letter enclosed.)
I would feel worse for you over this transaction than I do if it were not for
the fact you have been unfairly treated and falsely accused so many times
during the last ten or fifteen years, until I suppose you have to some extent
become toughened so that you can stand such treatment better than the
average man, and I see very plainly now, and have seen for quite a while
past, that the current of public sentiment is rapidly drifting your way. I desire
to offer you all the encouragement I possibly can in the noble work you are
doing—educating the common people of the country on the public issues
that are now facing the American people, and in this connection I will state
to you that I have been with you, so far as my ability extends, in this battle,
and on some occasions have been severely criticised for taking your part and
standing by the principles of original democracy in the days when the
Democratic Party was seeking to destroy the principles upon which our
government was founded. Of course I will subscribe for the Jeffersonian. I
want the first copy that is printed and each succeeding issue. Mail me a few
sample copies, and I think I can induce some others to subscribe.

(Copy.)
November 14, 1906.
Editor Watson’s Magazine, New York.
Dear Sir:—After reading and carefully considering the recent differences
between you and the Honorable Thos. E. Watson, I wish to say to you that I
think Mr. Watson has been treated very unfairly. I am a great admirer of Mr.
Watson and his writings, and this led me to subscribe to the Magazine in its
beginning. I have been highly pleased with it, and especially so with Mr.
Watson’s editorials, but as he has been forced to sever his connection with
the Magazine, and as his writings were the principle things which induced me
to subscribe to the Magazine, I write to request that you erase my name
from your list of subscribers. If I remember correctly, my subscription is paid
up to March 1st, 1907, but under the circumstances I do not wish another
copy mailed to my address.
Very respectfully,
S. R. Sikes.

Watson Was Its Strength.


V. L. Anthony, Jr., Hurtsboro, Ala.
I subscribed for Watson’s Magazine on account of your connection with it.
Now, as you are no longer with it, I wish your new Magazine when you start
it.

“The Gang” Insults the Readers.


D. J. Henderson, Sr., Ocilla, Ga.
When the stockholders of the Watson’s Magazine attempted to restrict you
as Editor and Manager, causing you to sever your connection with it, they
struck, what I call, a death blow to the Magazine. All of its readers who
believe in pure Jeffersonian Democracy felt the insult as keenly as you. I
enclose you copy of a letter I sent last week to DeFrance, ordering mine
discontinued. I am a subscriber to the Weekly Jeffersonian and will be to the
Magazine you contemplate starting in Atlanta as soon as the first issue is out.
The editor of the Ocilla Star, whom I asked you some time back to
exchange your Magazine with, has, since that time, “passed over the River to
rest in the shade.” The paper will be continued by his two young sons, who, I
know, if not doing so, will be pleased to exchange with The Jeffersonian.
May the blessings of Heaven be upon you and yours.

(Copy.)
Ocilla, Ga., Oct. 24, 1906.
Mr. C. Q. DeFrance, New York City.
Dear Sir:—Please strike my name from the list of subscribers to the Watson
Magazine. I learn the stockholders endeavored to place restrictions on Mr.
Watson as Editor and Manager, and he, for that reason, severed his
connection with it. Thank God for that. I am glad to know he had so much
manhood about him. Tom Watson is one among the greatest statesmen the
United States has. It is a source of satisfaction to know that he will neither
speak nor write with a corporation muzzle on. When the stockholders
attempted to restrict Mr. Watson in his Editorials for the Magazine, they didn’t
only insult him, but they insulted every reader of it who believes in the pure
Jeffersonian principles which Mr. Watson so ably advocates and defends. I
would be proud of Tom Watson were he from any other section of the Union.
He being a Southern man and a Georgian at that, I am exceedingly proud of
him. I fear somebody has been taken upon the Mount and shown the
glorious things the railroads will do if they will only fall down and worship
them. If no Watson is with the Magazine then no Magazine for me.
Respectfully,
D. J. Henderson, Sr.

“It Would Be a Noble Charity.”


Chas. D. Hunt, Gueydan, La.
Reading with interest your valuable editorials in the October number and
the most striking and interesting subject, “It Would be a Noble Charity”—
here you have treated a subject in a light that any person could not help
from shielding with an honest heart, with a strong desire in mind to spread
the cause of charity, but you have almost been selfish with your subject.
What of the territory bordering along the Gulf of Mexico? That is, the
extreme portion.
Here we have settlers of almost ancient times. They are not altogether
uncivilized, but are not able to meet the demands of our educated
requirements. Hence are we to still keep them back or are we to give them a
helping hand? These people know nothing of education and its help in life,
but toil with an earnest heart to maintain merely a scant living and to bring
the younger class up in their own path.
I think if the Humane Society would stop and think deeply in regard to the
young boys and girls that spend their school days in hard labor out of school
there would be something done to protect them and give them a chance for
a better future than is now before them.
It would be surprising to anyone who has had the advantages of education
and really felt its real value in life to stroll along the prairies and see just how
many bright young boys and girls are out of touch with the educated world.
Why? Their parents are not able to aid them to secure an education, but are
more than willing.
Do you not think much could be done in both mountain and prairie
territories?

Wants Only the Real Thing.


Burton H. Jeffers, Rose, N. Y.
Having read in the Missouri World that you had ceased writing for Watson’s
Magazine, I was greatly surprised. I had supposed that you owned the
Magazine, and had taken great pleasure in securing subscriptions for it in this
vicinity. Some of those subscriptions have just expired, and the subscribers
say they don’t want it again if you are not going to write for it. That would
seem to sound the death knell of the now so-called Watson’s Magazine.
When you start your new magazine send me a sample copy, and I will
endeavor to give your subscription list a little boost. I would also like a few
sample copies of your weekly paper.

You Shall Hear Again.


J. J. Hunt, El Paso, Tex.
Looking forward and hoping to get the November number of your
Magazine, I hear, verbally, that you have severed your connection with that
periodical, and I don’t care now whether it ever appears again, for what I
read in it was what you wrote, little else. I know you quit your association
there for good cause and that your work will not end, and believe I shall
hear from you again. I dare say two thirds of the readers of Watson’s
Magazine, like myself, will care little for it now.
While I’m traveling, please know that I’m still a citizen of good old Georgia,
living at 257 South Pryor Street, Atlanta.

Samples of the Replies DeFrance May Expect to Get to Begging Letters.


(Copy.)
Boaz, Ala., Nov. 16, 1906.
C. Q. DeFrance.
Dear Sir:—In this paragraph you say too much to get my help; you say
that Mr. Watson’s backers cared nothing for Mr. Watson’s “ideas.” But the
money which you hoped to get out of those who do care for Mr. Watson and
his “ideas” was the object in view. Had it not been for Mr. Watson and the
principles he advocated I would not have been a subscriber. This and the fact
that Mr. Watson was forced to resign because of non-payment of his salary
and what you say in the fourth paragraph forever settles it with me. When
Mr. Watson betrays the people’s cause or trust for any cause I am done, for I
would never confide in another man as I have in him, but he will never
forsake the people.
Yours truly,
T. B. Mosley.

(Copy.)
Mr. C. Q. DeFrance, Business Manager, New York City, N. Y.
Dear Sir:—I received the November number of the Watson’s Magazine a
few days ago, and your circular letter and subscription blanks today, and in
reply would say that I am one of those who much prefer the play of Hamlet
with the Prince of Denmark left in.
Further comments are unnecessary. My subscription expires February,
1907. Please discontinue same with the November number received.
Yours very truly,
A. A. DeLong.

(Copy.)
Watson’s Magazine, 121 W. 42nd Street, New York, N. Y.
Gentlemen:—You will please discontinue my subscription to Watson’s
Magazine.
I subscribed to this periodical in order to read Mr. Watson’s editorials; and,
inasmuch as he is no longer identified with this publication, it is useless to
send it to me any longer.
Very truly,
Burgess Smith.

Wants the Jeffersonian.


Chas. E. Harris, Alton, Ill.
I was very much surprised on buying Watson’s Magazine for November to
find that you were no longer connected with it as editor. Of course I will have
no further use for the Magazine as I only bought it in the first place for your
writings.
I saw in one of the St. Louis papers that you intended to start another
Magazine and call it Watson’s Jeffersonian. Please advise me if this is true and
when it will be ready for publication, as I want you to put me down on your
subscription list.
I hope to cast my first vote for you and Bryan in 1908.

No Sham for Him.


Benjamin H. Hill, West Point, Ga.
Some time ago I subscribed for your splendid Magazine solely and only to
get to read your articles therein and I notice to-day’s number, with one
exception, contains nothing from your able pen. Without your articles I
would not give ten cents a year for it, in fact don’t want it at any price. I
desire to read after you, but don’t want this other trash.
I regret your trouble and hope it will yet prove for your benefit and help.

The Jeffersonian is the Answer.


Chas. Buttlar, Oakland, Cal.
I have been a regular purchaser of your Magazine from its beginning, and
it is with the deepest regret that I learn that you have withdrawn from the
Magazine. I presume that the enemies of truth have destroyed its publication
as they have done heretofore with others.
Will you kindly inform me, also, whether you will start a new Magazine or
paper by which we may enjoy the education that you have given us? I wish
you success and strength to overcome all opposition.

Hamlet Without the Dane.


H. G. Sumner, Passaic, N. J.
Watson’s Magazine without Watson is of course no longer Watson’s
Magazine. I haven’t seen the November number, but it must be like “Hamlet”
with Hamlet left out.
I am proud of having been one of the faithful in 1904, when I heard you
speak at Jersey City and again in that matchless “old fashioned stump
speech” at the Grand Central Palace in New York, where I managed to jam
myself through the crowd on the platform and get hold of your hand for a
second.
The monthly visits of your Magazine were like those of a dear friend
dropping in for an evening to discuss matters which should be of the gravest
concern to every true American. I have the numbers all bound in volumes,
but now my set is complete much sooner than I had anticipated.
A few minutes ago I took up a copy of The Public, in which I saw a
paragraph to the effect that you would soon start a new Magazine. I hope
this may be true, and I want to be one of the first subscribers, for I am
anxious for the continuation of the “Life of Jackson” and for more of your
Editorials.

From a Constant Reader.


Jas. E. Dillon, Otwell, Ind.
Too late I learned the sad story of Watson’s Magazine. I have been a
subscriber to it from the first number, and I did not want to miss a number.
I sent a long list of names to it a few days before I found it out, for sample
copies. But it has lost its attraction to me and I hope such men as DeFrance
and Mann will soon be relegated.
I have enclosed a few names that might subscribe to the new Jeffersonian
Magazine.
I hope you will have success in spreading the truth and nothing but the
truth.

Wants a “Watson’s” With a Watson in It


H. C. Britt, Sparta, Ga.
I have been informed that you would send to the present subscribers to
Watson’s Magazine, if they so desired, your new publication, free of cost, for
the time for which they had paid their subscriptions to the Magazine. I am a
subscriber to the Magazine, and have been from the date of its very first
issue, and my subscription is paid to the corresponding date in 1907.
I took the Magazine because of your connection with it. I would appreciate
the opportunity of getting acquainted with your new Magazine.
Returning Thanks to My Friends

Shouldering the responsibilities and the financial burden of a new


magazine, is a serious matter. As nearly everybody will understand, it
involves tens of thousands of dollars in the way of necessary expense, and
whether you will ever see that money again depends entirely upon
circumstances over which you yourself have nothing like absolute control.
There are already so many brilliant and beautiful magazines circulating
throughout the Union, that establishing another is a venture that borders
upon temerity.
But in my case there was no alternative. It had to be done. Flesh and
blood could not bear the infamous treatment which was being handed out to
me by that fat rascal, Col. W. D. Mann, and that lean sneak, C. Q. DeFrance.
Out of consideration for the subscribers, as well as in justice to myself, it was
absolutely necessary that I should establish a magazine of my own, which
should extend to the subscribers of the New York Magazine the privilege of
securing the remainder of their terms from a magazine which was, in fact,
what the name of the New York Magazine had led subscribers to believe it to
be.
To have Mann and DeFrance publishing, in New York, a Watson’s
Magazine, and securing money from thousands of innocent people, who
would subscribe upon the faith of my name, and would then be told
falsehoods as to why I was no longer writing for it, would have been an
intolerable situation.
To remain silent and acquiescent under those circumstances, would have
been to make myself a party to the fraud. I understand that Col. Mann and
C. Q. DeFrance are using, for themselves, the money sent to the Watson’s
Magazine by those who are not aware of the fact that there is no Watson
connected with that Magazine. If they do not extend to the subscriber the
option of getting his money back, or of having it sent to the genuine
Watson’s Magazine, they will be cheats and swindlers; and they ought to be
made to plead on the criminal side of the Court, where the appearance of
Col. Mann would not be considered extraordinary.
Not wanting to be a party to a fraud by making no effort to defeat it, and
not having a disposition to lie down quietly while those two rascals trampled
upon me, I announced the purpose of establishing Watson’s Jeffersonian
Magazine. Of course it was hoped that my friends would stand by me. It was
hoped that those subscribers who had gone to the Magazine in New York
would follow my Magazine in Atlanta.
Did the subscribers of the New York Magazine want a real Watsonian
Magazine, or was it just any old magazine that they were after? Were those
subscribers men and women who had faith in me, and who were attached to
myself, my work and my message? Would they have sufficient interest in the
matter to sympathize with me, and follow me? These were the questions.
They could not be answered until the opportunity was offered for the
subscribers themselves to act.
With grateful heart, I hereby return profound thanks to those steadfast
and earnest comrades who have already enrolled themselves with Watson’s
Jeffersonian Magazine.
These friends did not wait for a sample copy; they did not wait for the day
of publication. They had faith. They knew in advance what my Magazine
would be. They had confidence. They knew perfectly well that their money
would be safe in my hands. Therefore, from California to North Carolina and
from Florida to Michigan, they have poured in upon me their letters of
sympathy and encouragement. And together with these letters they sent
remittances to cover their subscriptions in advance.
* * *
From so great a number it is difficult, and perhaps not quite fair, to single
out individuals, but as it is my intention to carry in the Magazine from month
to month, a Department in which those who are most active in their support
of the Magazine will be mentioned by name, a few will be mentioned now.
Others will be mentioned later.
I want every one of my friends who have written me, to feel that their
encouragement and support is profoundly appreciated.
You naturally inquire, who was the very first subscriber to Watson’s
Jeffersonian Magazine?
Dr. Cicero Gibson, Thomson, Ga.
Following closely after Dr. Gibson, came a good many others who arrived
so nearly together that it would perhaps be unjust to some to say which was
literally the first-comer, yet I cannot refrain from selecting a few for special
mention.
There was my gallant and loyal friend of LaGrange, Ga., Dr. Frank Ridley,
whose letter I am going to print in full:
“Please know that I am in thorough sympathy with you in the matter of
the Magazine contention, and all other matters.
“I have written to the New York office of the Watson’s Magazine,
instructing them to discontinue sending me their paper. I am enclosing check
for $1.50 for Watson’s Jeffersonian Magazine. With sentiments of my continued
warm regard and friendship.
“P. S. Please see that my subscription begins with the first issue.”
* * *
Then there was the very warm-hearted H. J. Mullins, Franklin, Tenn.; there
was J. J. Gordy of Richland, Ga., who had rendered such faithful service for
the New York Magazine, and who transferred his zeal and influence
immediately to the Jeffersonian; there was whole-souled Frank Burkett, of
Okolona, Miss.; and there was that gray-haired but warm-hearted veteran,
Thos. H. Tibbles, of Nebraska.
From the Empire State, Texas, came the cheering response of sturdy
Milton Park. From Salem, Va., W. H. Tinsley spoke words of encouragement.
And my good old friend, Allison W. Smith, of North Georgia, went to work as
earnestly and as promptly for the new Magazine as he had done for the old.
And how can I fail to mention Paul Dixon of Chilicothe, Mo? A truer man
does not live. Of the many who have stood by me at this juncture and shown
a willingness to co-operate, none has been more emphatic than Dixon &
Lankford, who enjoy the distinction of publishing one of the three Mid-Road
Populist papers which stood the storm, and did not go down in consequence
of the awful mistake and of Fusion.
From Los Angeles, Cal., came a heart-warming letter from Lucian L.
Knight, which you will find elsewhere in this number of the Magazine. From
Athens, Ga., came a most welcome letter from A. D. Cheney; and his bright
boy, Jean Cheney, took up the work of canvassing his community, with
results so extremely helpful to me that I mention his name in grateful
recognition of his service.
From A. G. Thurman Zabel, of Petersburg, Mich., comes the following:
“I saw a notice in the Missouri World, that you were about to publish a
new Magazine. Enclosed find remittance for which send me your Magazine as
long as that pays for, and then let me know, and I will remit for it for a
longer term. I am glad to learn that you will continue your good work.”
From Kentucky comes a cordial word from that veteran editor and gifted
gentleman, Hon. Henry Watterson, who, on the eve of his departure for
Europe, drops a line to the Atlanta management of the Magazine to say:
“Mr. Watson has few greater admirers or better friends than I am and
whenever the Courier-Journal can do anything to advance his personal
interests, it is always at his hands.”
Judge John J. Hunt, one of those level-headed Democratic leaders who did
his best to prevent the awful mistake that was made by the Men in Control,
in 1896, was swift with his assurances of hearty co-operation and support.
And my old college friend. Alex. Keese, of Atlanta, was not behind anybody
in the warmth and vigor of his protestations against the wrong which had
been done me by those knaves in New York.
Nor should I forget stanch W. S. Morgan, of Hardy, Ark., nor J. M. Mallett
of Cleburne, Tex., both of whom were emphatic in their denunciation of the
New York outrage:—Nor yet sturdy Jo. A. Parker.
From far-off Seattle, State of Washington, the voice of The Patriarch, was
heard in scathing condemnation of what had been done by Col. Mann and
DeFrance; and from New Jersey, Dr. Geo. H. Cromie was equally emphatic.
My good friend, C. E. Parker, of Bainbridge, not only enlisted under my
banner, turning his back upon those New York knaves, but he remitted the
largest individual check that was sent—$13.80—and the largest number of
subscribers received in any one remittance.
From the Hawaiian Islands, came a cordial hand-shake from that veteran
of the Reform Wars, John M. Horner. From Paris, France, spoke the
sympathetic voice of John Adams Thayer—the brainy, nervy man who achieved
such a wonderful success for Everybody’s Magazine.
Nor must I omit from the Roll of Honor the name of Prof. M. W. Parks,
President of the Georgia Normal and Industrial School, and President also of
the Georgia Educational Association. His letter was a noble tribute which I
highly value.
Taylor J. Shields, of Vineland, Ala., has my sincere thanks for his generous
words.
Bay City, Michigan, is the home of an ardent, personally unknown friend
whose hand I hope some day to shake—his name is Francis F. McGinniss.
And I must find room to mention my untiring friend, Col. W. A. Huff, of
Macon; R. E. Thompson, of Toomsuber, Miss.; J. S. Ward, Jr., of Thomasville;
Ben Hill, of West Point, Ga., and Clarence Cunningham, of Waterloo, S. C.;
Rev. R. L. Benson, Clay Center, Kan.; H. G. Sumner, Passaic, N. J.; Chas.
Butler, Oakland, Cal.; Theron Fisk, Sioux Falls, S. D.; and Prof. Z. I.
Fitzpatrick, of Madison, Ga.
And then there is W. F. Smith, of Flovilla, Ga., who has never flickered in
his loyal comradeship any more than has that noble old Roman, Gen. William
Phillips, of Marietta.
Here is a specimen of the way they are writing to me and below it a
sample of how they are writing to the bogus Watson’s Magazine:
Dixie, Ga., Nov. 23, 1906.
Hon. T. E. Watson, Thomson, Ga.
Dear Sir:—I am writing the New York office today cancelling my
subscription to the counterfeit Watson’s Magazine. Old man Mann, and
DeFrance are a set of fools, if for one moment they entertained the thought
that they could detain the followers of the real Tom Watson on their
subscription list. The thing can’t be done. When the readers of the New York
Watson’s Magazine find out the truth about the manner in which they treated
the genuine Tom Watson, you will see them leaving like rats leaving a sinking
ship. The idea of Watson’s Magazine with Watson left out! Might as well try
to run a locomotive steam engine without steam. Tom Watson was the steam
—the electricity—the spirit—yes, the very life of the Magazine, and without
him its name is Dennis. It is a burning shame, the way they have treated
you.
I am sending my check for $1.50 for your Jeffersonian Magazine, and wish
for it the success that you so justly deserve. I hope to be able to get others
interested in the new publication. You have thousands and thousands of true
and tried friends in old Georgia, and in fact, in every State in the United
States, and the numbers are growing all the time, and every effort of the
enemies of truth to put you in the background only brings you more
prominently before the masses as the friend of Good Government.
May God abundantly bless and prosper you and yours, is the sincere
prayer of your friend and brother,
(Signed) G. B. Crane.

(Copy.)
Dixie, Ga., Nov. 23, 1906.
Watson’s Magazine, 2 West 40th St., New York City.
Gentlemen:—I hereby cancel my subscription to Watson’s Magazine, and
ask you to refund balance that you are due me on same. I do not care to
read your slanderous vaporing about Tom Watson. You will soon find out
that, bad as you try to make him out to be, he was really the Magazine, and
without him it will sink in the cesspool of public contempt—as it should do.
Yours very truly,
(Signed) G. B. Crane.
Here are others, clean, clear-cut and business-like:
(Copy.)
Honaker, Va., Nov. 14, 1906.
Watson’s Magazine Co., New York.
Gentlemen:—The November number of Watson’s Magazine is at hand. As
Mr. Watson is no longer the Magazine, will you please discontinue my
subscription and return to me the three month’s unexpired subscription price,
and oblige,
Yours truly,
J. L. Kibler.
Dearing, Ga., Nov. 14, 1906.
Hon. Thos. E. Watson.
I received a card from you yesterday concerning the Magazine. I noticed
your proposition to make good the subscription to the Watson Magazine. I
think mine will be out in June, but I got it at club rates and don’t want to be
a burden to you, but I don’t want a Watson’s without a Watson in it, so you
send me the Watson’s Jeffersonian Magazine, and I’ll see you and pay for six
months at least, as I have great confidence in you as a reform leader and
want to help what little I can.
Yours truly,
J. J. Pennington.
G. M. Stembridge, of Milledgeville, is good enough to say, in subscribing,
“you are doing more for the Reform cause than any other man in the United
States.” “If ever anybody wants to whip you,” writes friend M. S. Chiles, of
Macon, in remitting his subscription, “I will be pleased to push you aside and
say, ‘Whip me first.’” “I would not carry the New York publication from the
postoffice,” says W. W. Shamhart, of Newton, Ill. Dr. R. R. Smith, of Burtons,
Miss., doesn’t “like the jingle of the editorials” of the bogus Watson’s
Magazine for November and would like to see any response that I may make.
Verily, he shall see it. J. K. Sears, of McCoy, Oregon, wants a Magazine
“published at Atlanta by Tom Watson and not by Col. Mann at New York.”
“Please enter my name from now till doomsday,” writes Prof. J. H. Camp, of
Chicago, who at the same time cancels his subscription to what he calls “the
New York dummy.” Dr. J. D. Allen, of Milledgeville, enrolls himself and says,
“send me the first copy.” “I shall always be a subscriber,” writes W. W.
Bennett, Esq., of Baxley. “The reason I subscribed to the other Magazine,”
says W. W. Arendell, of Gause, Tex., “was that you were the editor,” so of
course he wants the genuine Watson’s Jeffersonian. B. L. Milling, of Neal, Ga.,
was a subscriber to Watson’s Magazine of New York from the first issue, “and
would continue to be, had it not been that ‘the gang’ tried to impose upon
you,” he writes. Likewise C. W. King, of Rome, Ga., “only subscribed to the
New York publication on account of your colors flying at the mast-head, so”—
he writes—“of course I wish to enter my name as a subscriber to your new
venture.” H. Gillabaugh, of Missoula, Montana, thinks the bogus Watson’s
Magazine as at present conducted, is “like a church with the devil as pastor.”
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like