100% found this document useful (1 vote)
9 views

Data Mining With Python Theory Application And Case Studies 1st Edition Di Wu pdf download

The document is about the book 'Data Mining with Python: Theory, Application, and Case Studies' by Di Wu, which focuses on practical data mining techniques using Python. It covers the data mining pipeline, including data collection, integration, analysis, and visualization, with tutorials and case studies for hands-on learning. The book is aimed at students, data scientists, and business analysts to help them apply data mining concepts effectively.

Uploaded by

tsviaasrol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
9 views

Data Mining With Python Theory Application And Case Studies 1st Edition Di Wu pdf download

The document is about the book 'Data Mining with Python: Theory, Application, and Case Studies' by Di Wu, which focuses on practical data mining techniques using Python. It covers the data mining pipeline, including data collection, integration, analysis, and visualization, with tutorials and case studies for hands-on learning. The book is aimed at students, data scientists, and business analysts to help them apply data mining concepts effectively.

Uploaded by

tsviaasrol
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Data Mining With Python Theory Application And

Case Studies 1st Edition Di Wu download

https://ebookbell.com/product/data-mining-with-python-theory-
application-and-case-studies-1st-edition-di-wu-57374480

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Data Mining With Python 1st Edition Di Wu Computer Science Teacher

https://ebookbell.com/product/data-mining-with-python-1st-edition-di-
wu-computer-science-teacher-60069178

Learning Data Mining With Python Robert Layton

https://ebookbell.com/product/learning-data-mining-with-python-robert-
layton-49587924

Web Data Mining With Python Discover And Extract Information From The
Web Using Python English Edition Dr Ranjana Rajnish

https://ebookbell.com/product/web-data-mining-with-python-discover-
and-extract-information-from-the-web-using-python-english-edition-dr-
ranjana-rajnish-50883858

Mastering Data Mining With Python Find Patterns Hidden In Your Data
Megan Squire

https://ebookbell.com/product/mastering-data-mining-with-python-find-
patterns-hidden-in-your-data-megan-squire-36155322
Learning Data Mining With Pythonsecond Edition Use Python To
Manipulate Data And Build Predictive Models Robert Layton

https://ebookbell.com/product/learning-data-mining-with-pythonsecond-
edition-use-python-to-manipulate-data-and-build-predictive-models-
robert-layton-56235842

Learning Data Mining With Python 2nd Robert Layton

https://ebookbell.com/product/learning-data-mining-with-python-2nd-
robert-layton-7197742

Learning Data Mining With Python Second Edition Robert Layton Layton

https://ebookbell.com/product/learning-data-mining-with-python-second-
edition-robert-layton-layton-11607382

Learning Data Mining With Python 2nd Edition Layton Robert

https://ebookbell.com/product/learning-data-mining-with-python-2nd-
edition-layton-robert-11860758

Web Data Mining With Python Discover And Extract Information From The
Web Using Python Ranjana Rajnish

https://ebookbell.com/product/web-data-mining-with-python-discover-
and-extract-information-from-the-web-using-python-ranjana-
rajnish-49050076
Data Mining with Python

Data is everywhere and it’s growing at an unprecedented rate. But making sense of all that data
is a challenge. Data Mining is the process of discovering patterns and knowledge from large data
sets, and Data Mining with Python focuses on the hands-on approach to learning Data Mining.
It showcases how to use Python Packages to fulfil the Data Mining pipeline, which is to collect,
integrate, manipulate, clean, process, organize, and analyze data for knowledge.

The contents are organized based on the Data Mining pipeline, so readers can naturally prog-
ress step by step through the process. Topics, methods, and tools are explained in three aspects:
“What it is” as a theoretical background, “why we need it” as an application orientation, and
“how we do it” as a case study.

This book is designed to give students, data scientists, and business analysts an understanding of
Data Mining concepts in an applicable way. Through interactive tutorials that can be run, modi-
fied, and used for a more comprehensive learning experience, this book will help its readers gain
practical skills to implement Data Mining techniques in their work.

Dr. Di Wu is an Assistant Professor of Finance, Information Systems, and Economics department


of Business School, Lehman College. He obtained a Ph.D. in Computer Science from the Graduate
Center, CUNY. Dr. Wu's research interests includeTemporal extensions to RDF and semantic
web, Applied Data Science, and Experiential Learning and Pedagogy in Business Education.
Dr. Wu developed and taught courses including Strategic Management, Databases, Business
Statistics, Management Decision Making, Programming Languages (C++, Java, and Python),
Data Structures and Algorithms, Data Mining, Big Data, and Machine Learning.
Chapman & Hall/CRC

The Python Series


About the Series

Python has been ranked as the most popular programming language, and it is widely used in
education and industry. This book series will offer a wide range of books on Python for students
and professionals. Titles in the series will help users learn the language at an introductory and
advanced level, and explore its many applications in data science, AI, and machine learning.
Series titles can also be supplemented with Jupyter notebooks.

Image Processing and Acquisition using Python, Second Edition


Ravishankar Chityala, Sridevi Pudipeddi
Python Packages
Tomas Beuzen and Tiffany-Anne Timbers
Statistics and Data Visualisation with Python
Jesús Rogel-Salazar
Introduction to Python for Humanists
William J.B. Mattingly

Python for Scientific Computation and Artificial Intelligence


Stephen Lynch

Learning Professional Python Volume 1: The Basics


Usharani Bhimavarapu and Jude D. Hemanth

Learning Professional Python Volume 2: Advanced


Usharani Bhimavarapu and Jude D. Hemanth

Learning Advanced Python from Open Source Projects


Rongpeng Li

Foundations of Data Science with Python


John Mark Shea

Data Mining with Python: Theory, Application, and Case Studies


Di Wu

For more information about this series please visit: https://www.crcpress.com/Chapman--Hall-


CRC/book-series/PYTH
Data Mining with Python
Theory, Application, and Case Studies

Di Wu
First edition published 2024
by CRC Press
2385 Executive Center Drive, Suite 320, Boca Raton, FL 33431

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2024 Di Wu

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not
available on CCC please contact [email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for iden-
tification and explanation without intent to infringe.

ISBN: 978-1-032-61264-5 (hbk)


ISBN: 978-1-032-59890-1 (pbk)
ISBN: 978-1-003-46278-1 (ebk)

DOI: 10.1201/9781003462781

Typeset in Latin Modern font


by KnowledgeWorks Global Ltd.

Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.

Access the Support Material]: https://www.routledge.com/9781032598901


Dedication
Students, Staff, and Colleagues
at University of Colorado Boulder and Lehman College
Contents

List of Figures xi

Foreword xix

Preface xxi

Author Bios xxiii

Section I Data Wrangling

Chapter 1 ■ Data Collection 3


1.1 COLLECT DATA FROM FILES 4
1.1.1 Tutorial – Collect Data from Files 5
1.1.2 Documentation 13
1.2 COLLECT DATA FROM THE WEB 14
1.2.1 Tutorial – Collect Data from Web 15
1.2.2 Case Study – Collect Weather Data from Web 20
1.3 COLLECT DATA FROM SQL DATABASES 23
1.3.1 Tutorial – Collect Data from SQLite 24
1.3.2 Case Study – Collect Shopping Data from SQLite 28
1.4 COLLECT DATA THROUGH APIS 31
1.4.1 Tutorial – Collect Data from Yahoo 32

Chapter 2 ■ Data Integration 37


2.1 DATA INTEGRATION 37
2.1.1 Tutorial – Data Integration 38
2.1.2 Case Study – Data Science Salary 44

vii
viii ■ Contents

Chapter 3 ■ Data Statistics 53


3.1 DESCRIPTIVE DATA ANALYSIS 53
3.1.1 Tutorial – Statistical Understanding 54
3.1.2 Case Study – Statistical Understanding of YouTube and Spotify 59

Chapter 4 ■ Data Visualization 66


4.1 DATA VISUALIZATION WITH PANDAS 67
4.1.1 Tutorial – Data Visualization with Pandas 67
4.2 DATA VISUALIZATION WITH MATPLOTLIB 76
4.2.1 Tutorial – Data Visualization with Matplotlib 77
4.3 DATA VISUALIZATION WITH SEABORN 106
4.3.1 Tutorial – Data Visualization with Seaborn 106

Chapter 5 ■ Data Preprocessing 131


5.1 DEALING WITH MISSING VALUES 131
5.1.1 Tutorial – Handling Missing Values 132
5.2 DEALING WITH OUTLIERS 139
5.2.1 Tutorial – Detect Outliers Using IQR 139
5.2.2 Tutorial – Detect Outliers Using Statistics 144
5.3 DATA REDUCTION 146
5.3.1 Tutorial – Dimension Elimination 146
5.3.2 Tutorial – Sampling 148
5.4 DATA DISCRETIZATION AND SCALING 150
5.4.1 Tutorial – Data Discretization 151
5.4.2 Tutorial – Data Scaling 154
5.5 DATA WAREHOUSE 157
5.5.1 Tutorial – Data Cube 158
5.5.2 Tutorial – Pivot Table 162

Section II Data Analysis

Chapter 6 ■ Classification 171


6.1 NEAREST NEIGHBOR CLASSIFIERS 172
6.1.1 Tutorial – Iris Binary Classification Using KNN 172
6.1.2 Tutorial – Iris Multiclass Classification Using KNN 177
6.1.3 Tutorial – Iris Binary Classification Using RNN 182
6.1.4 Tutorial – Iris Multiclass Classification Using RNN 188
6.1.5 Case Study – Breast Cancer Classification Using Nearest
Neighbor Classifiers 193
Contents ■ ix

6.2 DECISION TREE CLASSIFIERS 196


6.2.1 Tutorial – Iris Binary Classification Using Decision Tree 197
6.2.2 Tutorial – Iris Multiclass Classification Using Decision Tree 204
6.2.3 Case Study – Breast Cancer Classification Using Decision Tree 212
6.3 SUPPORT VECTOR MACHINE CLASSIFIERS 215
6.3.1 Tutorial – Iris Binary Classification Using SVM 215
6.3.2 Tutorial – Iris Multiclass Classification Using SVM 218
6.3.3 Case Study – Breast Cancer Classification Using SVM 220
6.4 NAIVE BAYES CLASSIFIERS 222
6.4.1 Tutorial – Iris Binary Classification Using Naive Bayes 222
6.4.2 Tutorial – Iris Multiclass Classification Using Naive Bayes 225
6.4.3 Case Study – Breast Cancer Classification Using Naive Bayes 227
6.5 LOGISTIC REGRESSION CLASSIFIERS 229
6.5.1 Tutorial – Iris Binary Classification Using Logistic Regression 229
6.5.2 Tutorial – Iris Multiclass Classification Using Logistic
Regression 231
6.5.3 Case Study – Breast Cancer Classification Using Logistic
Regression 234
6.6 CLASSIFICATION METHODS’ COMPARISON 236
6.6.1 Case Study – Wine Classification Using Multiple Classifiers 236

Chapter 7 ■ Regression 242


7.1 SIMPLE REGRESSION 242
7.1.1 Tutorial – California Housing Price 243
7.1.2 Tutorial – California Housing Price 249
7.2 MULTIPLE REGRESSION 254
7.2.1 Tutorial – California Housing Price 255
7.3 REGULARIZATION 259
7.3.1 Tutorial – Regularization 259
7.3.2 Case Study – California Housing Price 263
7.4 CROSS-VALIDATION 270
7.4.1 Tutorial – Cross-Validation 270
7.4.2 Case Study – California Housing Price 273
7.5 ENSEMBLE METHODS 275
7.5.1 Tutorial – Iris Binary Classification Using Random Forests 276
7.5.2 Tutorial – Iris Multi Classification Using Random Forests 278
x ■ Contents

7.5.3 Case Study – California Housing Price 280


7.6 REGRESSION METHODS’ COMPARISON 288
7.6.1 Case Study – Diabetes 288

Chapter 8 ■ Clustering 298


8.1 PARTITION CLUSTERING 298
8.1.1 Tutorial 299
8.1.2 Case Study 309
8.2 HIERARCHICAL CLUSTERING 313
8.2.1 Tutorial 313
8.2.2 Case Study 316
8.3 DENSITY-BASED CLUSTERING 318
8.3.1 Tutorial 318
8.3.2 Case Study 321
8.4 GRID-BASED CLUSTERING 324
8.4.1 Tutorial 324
8.4.2 Case Study 327
8.5 PRINCIPAL COMPONENT ANALYSIS 331
8.5.1 Tutorial 332
8.5.2 Case Study 344
8.6 CLUSTERING METHODS’ COMPARISON 351
8.6.1 Case Study 351

Chapter 9 ■ Frequent Patterns 356


9.1 FREQUENT ITEMSET AND ASSOCIATION RULES 356
9.1.1 Tutorial – Finding Frequent Itemset 357
9.1.2 Tutorial – Detecting Association Rules 358
9.2 APRIORI AND FP-GROWTH ALGORITHMS 361
9.2.1 Tutorial – Apriori Algorithm 361
9.2.2 Tutorial – FP-Growth Algorithm 364
9.2.3 Case Study – Online Retail 366

Chapter 10 ■ Outlier Detection 370


10.1 OUTLIER DETECTION 371
10.1.1 Tutorial 371
10.1.2 Case Study 379

Index 389
List of Figures

4.1 A Scatter Plot 69


4.2 A Line Plot 69
4.3 Another Line Plot 70
4.4 An Area Plot 70
4.5 Another Area Plot 71
4.6 A Bar Plot 71
4.7 A Horizontal Bar Plot 72
4.8 A Histogram 72
4.9 Another Histogram Plot 73
4.10 Another Histogram Plot 73
4.11 Another Histogram Plot with Density 74
4.12 A Box Plot 74
4.13 Another Box Plot 75
4.14 A Pie Plot 75
4.15 A Color Map 76
4.16 A Simple Plot 78
4.17 A Scatter Plot with Marker o 78
4.18 A Scatter Plot with Marker * 79
4.19 A Scatter Plot with Marker. 79
4.20 A Scatter Plot with Marker , 80
4.21 A Scatter Plot with Marker x 80
4.22 A Scatter Plot with Marker X 81
4.23 A Scatter Plot with Marker + 81
4.24 A Scatter Plot with Marker P 82
4.25 A Scatter Plot with Marker s 82
4.26 A Scatter Plot with Marker D 83
4.27 A Scatter Plot with Marker d 83
4.28 A Scatter Plot with Marker p 84

xi
xii ■ LIST OF FIGURES

4.29 A Scatter Plot with Marker H 84


4.30 A Scatter Plot with Marker h 85
4.31 A Scatter Plot with Marker o 85
4.32 A Scatter Plot with Markerˆ 86
4.33 A Scatter Plot with Marker < 86
4.34 A Scatter Plot with Marker > 87
4.35 A Scatter Plot with Marker 1 87
4.36 A Scatter Plot with Marker 2 88
4.37 A Scatter Plot with Marker 3 88
4.38 A Scatter Plot with Marker 4 89
4.39 A Scatter Plot with Marker | 89
4.40 A Scatter Plot with Marker - 90
4.41 A Line Plot 90
4.42 A Line Plot 91
4.43 A Line Plot 91
4.44 A Line Plot 92
4.45 A Line Plot 92
4.46 A Line Plot 93
4.47 A Line Plot 93
4.48 A Line Plot 94
4.49 A Line Plot 94
4.50 A Line Plot 95
4.51 A Line Plot 95
4.52 A Line Plot 96
4.53 A Line Plot 96
4.54 A Line Plot 97
4.55 A Scatter Plot 97
4.56 A Colorbar Plot 98
4.57 A Scatter Plot with Different Dot-Sizes 98
4.58 A Scatter Plot with Colorbar and Different Dot-Sizes 99
4.59 A Bar Plot 100
4.60 A Histogram Plot 100
4.61 Another Histogram Plot 101
4.62 A Pie Plot 101
4.63 An Explode Pie Plot 102
4.64 A Box Plot 102
LIST OF FIGURES ■ xiii

4.65 A Violin Plot 103


4.66 A Multi-Plot 104
4.67 A Multi-Plot with Legend and Grid 104
4.68 A Multi-Plot as Stacks 105
4.69 A Multi-Plot as Columns 106
4.70 A Default Relational Plot 107
4.71 A Default Relational Plot with Gender Differentiation 107
4.72 A Default Relational Plot with Day Differentiation 108
4.73 A Default Relational Plot with Time Differentiation 108
4.74 A Default Relational Plot with Time Differentiation in Multicolumns 109
4.75 A Default Relational Plot with Size Differentiation 109
4.76 A Default Relational Plot with Size Differentiation and Different
Dot-Sizes 110
4.77 A Default Relational Plot with Large Size Differentiation 110
4.78 A Default Relational Plot with Large Size Differentiation
and Transparency 111
4.79 A Default Relational Plot with Categorical Xs 111
4.80 A Line Relational Plot 112
4.81 A Line Relational Plot with Gender Differentiation 112
4.82 A Line Relational Plot with Gender Differentiation in Multicolumns 113
4.83 A Default Distribution Plot 113
4.84 A Default Distribution Plot in Multicolumns 114
4.85 A Default Distribution Plot with Gender Differentiation 114
4.86 A Default Distribution Plot with Gender Differentiation
in Multicolumns 115
4.87 A KDE Distribution Plot with Gender Differentiation 115
4.88 A KDE Distribution Plot with Gender Differentiation and Stacking 116
4.89 A KDE Distribution Plot with Gender Differentiation, Stacking in
Multicolumns 116
4.90 A KDE Distribution Plot with Two Attributes 117
4.91 A KDE Distribution Plot with Two Attributes and Gender
Differentiation 117
4.92 A KDE Distribution Plot with Two Attributes and Rug 118
4.93 An ECDF Distribution Plot 118
4.94 An ECDF Distribution Plot with Gender Differentiation 119
4.95 An ECDF Distribution Plot with Gender Differentiation
in Multicolumns 119
xiv ■ LIST OF FIGURES

4.96 A Default Categorical Plot 120


4.97 A Default Categorical Plot with Gender Differentiation 120
4.98 A Box Categorical Plot 121
4.99 A Box Categorical Plot with Gender Differentiation 121
4.100 A Violin Categorical Plot 122
4.101 A Violin Categorical Plot with Gender Differentiation 122
4.102 Another Violin Categorical Plot with Gender Differentiation 123
4.103 A Violin Plot with Gender Differentiation and Quartile 123
4.104 A Bar Categorical Plot 124
4.105 A Bar Categorical Plot with Gender Differentiation 124
4.106 A Joint Plot 125
4.107 Another Joint Plot 125
4.108 Another Joint Plot 126
4.109 Another Joint Plot 126
4.110 Another Joint Plot 127
4.111 Another Joint Plot 127
4.112 Another Joint Plot 128
4.113 Another Joint Plot 128
4.114 A Pair Plot 129
4.115 A Pair Plot with Gender Differentiation 130

5.1 The Distribution Plot before Removing Outliers 141


5.2 The Box Plot before Removing Outliers 141
5.3 The Distribution Plot after Removing Outliers 143
5.4 The Box Plot after Removing Outliers 143
5.5 The Box Plot before Removing Outliers 145
5.6 The Box Plot after Removing Outliers 145

6.1 A Scatter Plot of Sepal Length VS Sepal Width with Species


Differentiation 173
6.2 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 178
6.3 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 184
6.4 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 189
6.5 Accuracy of KNN Models 196
6.6 Accuracy of RNN Models 196
LIST OF FIGURES ■ xv

6.7 A Scatter Plot of Sepal Length VS Sepal Width with Species


Differentiation 198
6.8 A Default Decision Tree 200
6.9 A Decision Tree Trained with Entropy 201
6.10 A Decision Tree Trained with Max Depth as 1 202
6.11 A Decision Tree Trained with Max Depth as 3 203
6.12 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 206
6.13 A Default Decision Tree 208
6.14 A Decision Tree Trained with Entropy 209
6.15 A Decision Tree Trained with Max Depth as 1 210
6.16 A Decision Tree Trained with Max Depth as 3 211
6.17 Accuracy VS Max Depth for Different Splitting Criteria 215
6.18 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 217
6.19 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 219
6.20 Accuracy VS Regularization Parameter (C) for SVM 222
6.21 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 224
6.22 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 226
6.23 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 230
6.24 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 233
6.25 Accuracy VS Regularization Parameter (C) 236
6.26 Accuracy Comparison Among Classification Methods 241

7.1 A Scatter Plot of Total Rooms VS Total Bedrooms 244


7.2 A Comparison with Predicted and True Values 245
7.3 A Scatter Plot of Median Income VS Median House Value 246
7.4 A Comparison with Predicted and True Values 247
7.5 A Scatter Plot of Households VS Population 248
7.6 A Comparison with Predicted and True Values 249
7.7 A Scatter Plot of X VS Y 250
7.8 A Scatter Plot of X VS Y 251
7.9 A Comparison with Predicted and True Values 253
xvi ■ LIST OF FIGURES

7.10 A Comparison with Predicted and True Values 254


7.11 A Comparison with Predicted and True Values 257
7.12 A Comparison with Predicted and True Values 258
7.13 A Scatter Plot of X VS Y 260
7.14 Performance Comparison between Polynomial Regression
and Regularization 270
7.15 A Scatter Plot of X VS Y 271
7.16 Mean Squared Error for Different Cross-Validation Techniques 273
7.17 Mean Squared Error for Different Cross-Validation Techniques 275
7.18 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 277
7.19 A Scatter Plot of Sepal Length VS Sepal Width with Species
Differentiation 279
7.20 R-Squared Scores for Different Models 287
7.21 Comparison of Regression Methods on Diabetes Dataset 297

8.1 A Scatter Plot of X VS Y 300


8.2 K-Means Result with Cluster Differentiation 300
8.3 KMedoids Result with Cluster Differentiation 301
8.4 A Scatter Plot of X VS Y 302
8.5 K-Means Result with Cluster Differentiation 302
8.6 K-Medoids Result with Cluster Differentiation 303
8.7 A Scatter Plot of X VS Y 304
8.8 K-Means Result with Cluster Differentiation 304
8.9 K-Medoids Result with Cluster Differentiation 305
8.10 A Scatter Plot of X VS Y 306
8.11 K-Means Result with Cluster Differentiation 306
8.12 K-Medoids Result with Cluster Differentiation 307
8.13 A Scatter Plot of X VS Y 308
8.14 K-Means Result with Cluster Differentiation 308
8.15 K-Medoids Result with Cluster Differentiation 309
8.16 A Comparison with K-Means and K-Medoids Clustering 313
8.17 A Scatter Plot of Feature1 VS Feature2 315
8.18 DBSCAN Result with Three Clusters 320
8.19 Comparison among DBSCAN Results 323
8.20 A Scatter Plot of Feature1 VS Feature2 325
8.21 STING Clustering Result 326
LIST OF FIGURES ■ xvii

8.22 CLIQUE Clustering Result 327


8.23 CLIQUE Clustering Result 327
8.24 A Scatter Plot of Feature1 VS Feature2 328
8.25 STING Clustering Result 329
8.26 OPTICS Clustering Result 330
8.27 DBSCAN Clustering Result 331
8.28 Digit 0 333
8.29 Digit 0 333
8.30 Digit 1 334
8.31 Digit 2 334
8.32 Digit 3 335
8.33 Digit 4 335
8.34 Digit 5 336
8.35 Digit 6 336
8.36 Digit 7 337
8.37 Digit 8 337
8.38 Digit 9 338
8.39 K-Means Clustering 353
8.40 Agglomerative Clustering 354
8.41 DBSCAN Clustering 355

10.1 A Scatter Plot of Feature1 VS Feature2 with Colorbar 372


10.2 Outlier Detection by Z-Score 373
10.3 Outlier Detection by IQR 374
10.4 Outlier Detection by One-Class SVM 375
10.5 Outlier Detection by Isolation Forest 377
10.6 Outlier Detection by DBSCAN 378
10.7 Outlier Detection by LOF 379
10.8 Outlier Detection by Z-Score 383
10.9 Outlier Detection by IQR 384
10.10 Outlier Detection by One-Class SVM 385
10.11 Outlier Detection by Isolation Forest 386
10.12 Outlier Detection by DBSCAN 387
10.13 Outlier Detection by LOF 388
Foreword

WHY WE NEED THIS BOOK


Data is everywhere and it’s growing at an unprecedented rate. But making sense
of all that data is a challenge. Data Mining is the process of discovering patterns
and knowledge from large data sets. This book focuses on the hands-on approach
to learn Data Mining. This book is designed to give you an understanding of Data
Mining concepts in an applicable way. The tutorials in this book will help you to gain
practical skills to implement Data Mining techniques in your work. Whether you are
a student, a data scientist, or a business analyst, this book is a must-read for you.

xix
Preface

HOW TO USE THIS BOOK


This book is served as complementary to a theoretical Data Mining course. We intend
to keep the introductions brief and simple and concentrate on detailed tutorials. The
book is divided into two parts: Part 1 covers the preparation of data or Data Wrangling.
Part 2 covers the analysis of data or Data Analysis. For readers’ convenience, besides
including all tutorials within pages, we also provide the .ipynb files with associated
data sets through links. When you run the .ipynb files, please make sure the data
path is updated in your local/cloud environment.

WHY THIS BOOK IS DIFFERENT


While there are many books, websites, online courses about the topic, we differentiate
our book in multiple ways:
• We organized the contents based on the Data Mining pipeline, so readers can
naturally gain the formal process from raw data to knowledge step by step.
Readers can have a full stack of consistent learning, rather than learning from
pieces from multiple sources.
• For the topics, methods, and tools we cover in the book, we explain them in
three aspects: “What it is” as a theoretical background, “Why we need it” as
an application orientation, and “How we do it” as a case study.
• Our book is “LIVE”. All tutorials are runnable interactive Python notebooks in
.ipynb format. Students can run them, modify them, and use them.

xxi
Author Bios

Dr. Di Wu is an Assistant Professor of Finance, Information Systems, and Economics


department of Business School, Lehman College. He obtained a Ph.D. in Computer
Science from the Graduate Center, CUNY. Dr. Wu’s research interests are 1) Temporal
extensions to RDF and semantic web, 2) Applied Data Science, and 3) Experiential
Learning and Pedagogy in business education. Dr. Wu developed and taught courses
including Strategic Management, Databases, Business Statistics, Management Decision
Making, Programming Languages (C++, Java, and Python), Data Structures and
Algorithms, Data Mining, Big Data, and Machine Learning.

xxiii
I
Data Wrangling

1
CHAPTER 1

Data Collection

D ata collection is a crucial step in the process of obtaining valuable insights


and making informed decisions. In today’s interconnected world, data can be
found in a multitude of sources, ranging from traditional files such as .csv, .html,
.txt, .xlsx, .html, and .json, to databases powered by SQL, websites hosting relevant
information, and APIs (Application Programming Interfaces) offered by companies.
To efficiently gather data from these diverse sources, various tools can be employed.
These tools encompass an array of technologies, including web scraping frameworks,
database connectors, data extraction libraries, and specialized APIs, all designed to
facilitate the collection and extraction of data from different sources. By leveraging
these tools, organizations can harness the power of data and gain valuable insights to
drive their decision-making processes.
Python offers a rich ecosystem of packages for data collection. Some commonly used
Python packages for data collection include: including:
• Pandas: Pandas is a powerful library for data manipulation and analysis. It
provides data structures and functions to efficiently work with structured data,
making it suitable for data collection from CSV files, Excel spreadsheets, and
SQL databases.
• BeautifulSoup: Beautiful Soup is a Python library for web scraping. It helps
parse HTML and XML documents, making it useful for extracting data from
websites.
• Requests: Requests is a versatile library for making HTTP requests. It simplifies
the process of interacting with web services and APIs, allowing data retrieval
from various sources.
• mysql-connector-python, psycopg2, and sqlite3: These libraries are Python
connectors for MySQL, PostgreSQL, and sqlite databases, respectively. They
enable data collection by establishing connections to these databases, executing
queries, and retrieving data.

DOI: 10.1201/9781003462781-1 3
4 ■ Data Mining with Python

• Yahoo Finance: The Yahoo Finance library provides an interface to access


financial data from Yahoo Finance. It allows you to fetch historical stock prices,
company information, and other financial data.
These are just a few examples of Python packages commonly used for data collection.
We will cover them in detail with tutorials and case studies. Depending on the specific
data sources and requirements, there are many more packages available to facilitate
data collection in Python.

1.1 COLLECT DATA FROM FILES

Storing data in different file formats allows for versatility and compatibility with
various applications and tools.
• CSV (Comma-Separated Values): CSV files store tabular data in plain text
format, where each line represents a row, and values are separated by commas (or
other delimiters). CSV files are simple, human-readable, and widely supported.
They can be easily opened and edited using spreadsheet software or text editors.
However, CSV files may not support complex data structures, and there is no
standardized format for metadata or data types. Pandas provides the read_csv()
function, allowing you to read CSV files into a DataFrame object effortlessly.
It automatically detects the delimiter, handles missing values, and provides
convenient methods for data manipulation and analysis.
• TXT (Plain Text): TXT files contain unformatted text with no specific structure
or metadata. TXT files are lightweight, widely supported, and can be easily
opened with any text editor. However, TXT files lack a standardized structure or
format, making it challenging to handle data that requires specific organization
or metadata. Pandas offers the read_csv() function with customizable delimiters
to read text files with structured data. By specifying the appropriate delimiter,
you can read text files into a DataFrame for further analysis.
• XLSX (Microsoft Excel): XLSX is a file format used by Microsoft Excel to
store spreadsheet data with multiple sheets, formatting, formulas, and metadata.
XLSX files support complex spreadsheets with multiple tabs, cell formatting,
and formulas. They are widely used in business and data analysis scenarios.
However, XLSX files can be large, and manipulating them directly can be
memory-intensive. Additionally, XLSX files require software like Microsoft Excel
to view and edit. Pandas provides the read_excel() function, enabling the
reading of XLSX files into DataFrames. It allows you to specify the sheet name,
range of cells, and other parameters to extract data easily.
• JSON (JavaScript Object Notation): JSON is a lightweight, human-readable
data interchange format that represents structured data as key-value pairs, lists,
and nested objects. JSON is easy to read and write, supports complex nested
structures, and is widely used for data interchange between systems. However,
JSON files can be larger than their equivalent CSV representations, and handling
Data Collection ■ 5

complex nested structures may require additional processing. Pandas provides


the read_json() function to read JSON data directly into a DataFrame. It
handles both simple and nested JSON structures, allowing for convenient data
exploration and analysis.
• XML (eXtensible Markup Language): XML files store structured data using
tags that define elements and their relationships. XML is designed to be self-
descriptive and human-readable. XML files provide a flexible and extensible
format for storing structured data. They are widely used for data interchange
and can represent complex hierarchical structures. However, XML files can be
verbose and have larger file sizes compared to other formats. Parsing XML files
can be more complex due to the nested structure and the need for specialized
parsing libraries. Pandas provides the read_xml() function to directly read XML
files into a DataFrame. It provides several options for handling different XML
structures, such as extracting data from specific tags, handling attributes, and
parsing nested elements.
• HTML (Hypertext Markup Language): HTML files are primarily used for
structuring and presenting content on the web. They consist of tags that define
the structure and formatting of the data. HTML files provide a rich structure for
representing web content and can include images, links, and other multimedia
elements. However, HTML files are designed for web display, so extracting
structured data from them can be more complex due to the presence of non-
tabular content and formatting tags. Pandas provides the read_html() function,
which can extract tabular data from HTML tables into a DataFrame.

1.1.1 Tutorial – Collect Data from Files


We may have stored data in multiple types of files, such as text, csv, excel, xml, html,
etc. We can load them into dataframes.
import pandas as pd

1.1.1.1 CSV
We have done this when we learned pandas. You can get the path of your csv file,
and feed the path to the function read_csv.

Default setting A lot cases, default setting will do the job.


df = pd.read_csv('/content/ds_salaries.csv')

df.head()

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
6 ■ Data Mining with Python

2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

Customize setting You can manipulate arguments for your specific csv file
df = pd.read_csv('/content/ds_salaries.csv', header = None)
df.head()

0 1 2 3 \
0 NaN work_year experience_level employment_type
1 0.0 2020 MI FT
2 1.0 2020 SE FT
3 2.0 2020 SE FT
4 3.0 2020 MI FT

4 5 6 7 \
Data Collection ■ 7

0 job_title salary salary_currency salary_in_usd


1 Data Scientist 70000 EUR 79833
2 Machine Learning Scientist 260000 USD 260000
3 Big Data Engineer 85000 GBP 109024
4 Product Data Analyst 20000 USD 20000

8 9 10 11
0 employee_residence remote_ratio company_location company_size
1 DE 0 DE L
2 JP 0 JP S
3 GB 50 GB M
4 HN 0 HN S

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 608 entries, 0 to 607
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 607 non-null float64
1 1 608 non-null object
2 2 608 non-null object
3 3 608 non-null object
4 4 608 non-null object
5 5 608 non-null object
6 6 608 non-null object
7 7 608 non-null object
8 8 608 non-null object
9 9 608 non-null object
10 10 608 non-null object
11 11 608 non-null object
dtypes: float64(1), object(11)
memory usage: 57.1+ KB

df = pd.read_csv('/content/ds_salaries.csv', header = None, skiprows=1)


df.head()

0 1 2 3 4 5 6 7 8 9 \
0 0 2020 MI FT Data Scientist 70000 EUR 79833 DE 0
1 1 2020 SE FT Machine Learning Scientist 260000 USD 260000 JP 0
2 2 2020 SE FT Big Data Engineer 85000 GBP 109024 GB 50
3 3 2020 MI FT Product Data Analyst 20000 USD 20000 HN 0
4 4 2020 SE FT Machine Learning Engineer 150000 USD 150000 US 50

10 11
0 DE L
1 JP S
2 GB M
3 HN S
4 US L
8 ■ Data Mining with Python

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 607 non-null int64
1 1 607 non-null int64
2 2 607 non-null object
3 3 607 non-null object
4 4 607 non-null object
5 5 607 non-null int64
6 6 607 non-null object
7 7 607 non-null int64
8 8 607 non-null object
9 9 607 non-null int64
10 10 607 non-null object
11 11 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

df = pd.read_csv('/content/ds_salaries.csv', header = None,


skiprows=1, skipfooter=300)
df.head()

0 1 2 3 4 5 6 7 8 9 \
0 0 2020 MI FT Data Scientist 70000 EUR 79833 DE 0
1 1 2020 SE FT Machine Learning Scientist 260000 USD 260000 JP 0
2 2 2020 SE FT Big Data Engineer 85000 GBP 109024 GB 50
3 3 2020 MI FT Product Data Analyst 20000 USD 20000 HN 0
4 4 2020 SE FT Machine Learning Engineer 150000 USD 150000 US 50

10 11
0 DE L
1 JP S
2 GB M
3 HN S
4 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307 entries, 0 to 306
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 307 non-null int64
1 1 307 non-null int64
2 2 307 non-null object
3 3 307 non-null object
4 4 307 non-null object
5 5 307 non-null int64
6 6 307 non-null object
Data Collection ■ 9

7 7 307 non-null int64


8 8 307 non-null object
9 9 307 non-null int64
10 10 307 non-null object
11 11 307 non-null object
dtypes: int64(5), object(7)
memory usage: 28.9+ KB

1.1.1.2 TXT
If the txt follows csv format, then it can be read as a csv file
df = pd.read_csv('/content/ds_salaries.txt')
df

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT
.. ... ... ... ...
602 602 2022 SE FT
603 603 2022 SE FT
604 604 2022 SE FT
605 605 2022 SE FT
606 606 2022 MI FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000
.. ... ... ... ...
602 Data Engineer 154000 USD 154000
603 Data Engineer 126000 USD 126000
604 Data Analyst 129000 USD 129000
605 Data Analyst 150000 USD 150000
606 AI Scientist 200000 USD 200000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L
.. ... ... ... ...
602 US 100 US M
603 US 100 US M
604 US 0 US M
605 US 100 US M
606 IN 100 US L

[607 rows x 12 columns]


10 ■ Data Mining with Python

1.1.1.3 Excel
df = pd.read_excel('/content/ds_salaries.xlsx')

df.head()

Unnamed: 0 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
Data Collection ■ 11

1.1.1.4 json

df = pd.read_json('/content/ds_salaries.json')
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
12 ■ Data Mining with Python

1.1.1.5 XML
df = pd.read_xml('/content/ds_salaries.xml')
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB
Data Collection ■ 13

1.1.1.6 HTM
df = pd.read_html('/content/ds_salaries.htm')[0]
df.head()

FIELD1 work_year experience_level employment_type \


0 0 2020 MI FT
1 1 2020 SE FT
2 2 2020 SE FT
3 3 2020 MI FT
4 4 2020 SE FT

job_title salary salary_currency salary_in_usd \


0 Data Scientist 70000 EUR 79833
1 Machine Learning Scientist 260000 USD 260000
2 Big Data Engineer 85000 GBP 109024
3 Product Data Analyst 20000 USD 20000
4 Machine Learning Engineer 150000 USD 150000

employee_residence remote_ratio company_location company_size


0 DE 0 DE L
1 JP 0 JP S
2 GB 50 GB M
3 HN 0 HN S
4 US 50 US L

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 FIELD1 607 non-null int64
1 work_year 607 non-null int64
2 experience_level 607 non-null object
3 employment_type 607 non-null object
4 job_title 607 non-null object
5 salary 607 non-null int64
6 salary_currency 607 non-null object
7 salary_in_usd 607 non-null int64
8 employee_residence 607 non-null object
9 remote_ratio 607 non-null int64
10 company_location 607 non-null object
11 company_size 607 non-null object
dtypes: int64(5), object(7)
memory usage: 57.0+ KB

1.1.2 Documentation
It is always good to have a reference of the read files functions in pandas. You can
find it via https://pandas.pydata.org/docs/reference/io.html
14 ■ Data Mining with Python

1.2 COLLECT DATA FROM THE WEB

Collecting data from the web is essential for various reasons:


• Access to vast amounts of information: The web contains an immense amount
of data on diverse topics. By collecting data from the web, you can tap into
this vast information pool and gain insights that can inform decision-making,
research, analysis, and more.
• Real-time and up-to-date data: The web provides a platform for the dissemina-
tion of real-time and up-to-date information. By collecting data from the web,
you can stay informed about the latest news, trends, market updates, social
media activity, and other dynamic sources of information.
• Competitive intelligence: Collecting data from the web allows you to monitor
your competitors, track their activities, analyze their strategies, and gain insights
into the market landscape. This can help you make informed decisions and stay
ahead in a competitive environment.
• Research and analysis: Web data collection is crucial for research, analysis,
and data-driven insights. By collecting data from diverse sources, you can
validate hypotheses, perform statistical analysis, conduct sentiment analysis,
and uncover patterns or trends that can enhance understanding and drive
informed decision-making.
The web has many websites, including structured websites, semi-structured websites,
and unstructured websites, that differ in terms of their organization and consistency.
• Structured Websites: Structured websites have a well-defined and organized
format, making it easy to locate specific information. They often follow a
consistent layout and have clearly defined sections. Structured websites generally
pose fewer challenges for data collection as the information is neatly organized.
However, occasional variations in page layouts or changes in website structure
can introduce some level of complexity. To collect data from structured websites,
you can utilize libraries like Beautiful Soup or lxml in Python. These libraries
enable you to parse the HTML structure of the web pages and extract desired
data using specific tags or CSS selectors.
• Semi-Structured Websites: Semi-structured websites contain a mixture of struc-
tured and unstructured data. While certain sections might be organized, others
may have varying formats or lack consistent organization. The main challenge
with semi-structured websites is the inconsistency in data presentation. The lack
of uniformity in structure and formatting requires additional effort to identify
and extract the relevant data. Similar to structured websites, libraries like
Beautiful Soup or lxml can help parse and extract data from semi-structured
websites. However, you may need to employ additional techniques such as
regular expressions or data cleaning procedures to handle variations in data
presentation.
Data Collection ■ 15

• Unstructured Websites: Unstructured websites lack a clear organization or


predefined structure. They may have free-form text, multimedia content, and
unorganized data scattered across multiple pages. Unstructured websites pose
the most significant challenges for data collection due to the absence of consis-
tent structure. The data may be embedded within paragraphs, images, or other
non-tabular formats, requiring sophisticated techniques for extraction. For un-
structured websites, natural language processing (NLP) techniques and machine
learning algorithms can be employed to extract relevant information. These
methods involve parsing the web content, identifying patterns, and applying
text processing algorithms to extract structured data.
In summary, structured websites provide a clear structure, making data collection rel-
atively straightforward. Semi-structured websites introduce some variability, requiring
careful handling of inconsistencies. Unstructured websites present the most significant
challenges, necessitating advanced techniques such as NLP and machine learning to
extract structured information. Python libraries like Beautiful Soup, lxml, and NLP
frameworks can assist in parsing and extracting data from these different types of
websites, adapting to their specific characteristics and complexities.

1.2.1 Tutorial – Collect Data from Web


import pandas as pd

1.2.1.1 Wiki
Some websites maintains structured data, which is easy to read
table = pd.read_html('https://en.wikipedia.org/wiki/
List_of_countries_by_GDP_(nominal)#Table')

for i in table:
print(type(i))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>

for i in table:
print(i.columns)

Int64Index([0], dtype='int64')
Int64Index([0, 1, 2], dtype='int64')
MultiIndex([( 'Country/Territory', 'Country/Territory'),
( 'UN Region', 'UN Region'),
16 ■ Data Mining with Python

( 'IMF[1][13]', 'Estimate'),
( 'IMF[1][13]', 'Year'),
( 'World Bank[14]', 'Estimate'),
( 'World Bank[14]', 'Year'),
('United Nations[15]', 'Estimate'),
('United Nations[15]', 'Year')],
)
...
Int64Index([0, 1], dtype='int64')

df = table[2]
df.head()

Country/Territory UN Region IMF[1][13] World Bank[14] \


Country/Territory UN Region Estimate Year Estimate Year
0 World — 101560901 2022 96513077 2021
1 United States Americas 25035164 2022 22996100 2021
2 China Asia 18321197 [n 1]2022 17734063 [n 3]2021
3 Japan Asia 4300621 2022 4937422 2021
4 Germany Europe 4031149 2022 4223116 2021

United Nations[15]
Estimate Year
0 85328323 2020
1 20893746 2020
2 14722801 [n 1]2020
3 5057759 2020
4 3846414 2020

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217 entries, 0 to 216
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 (Country/Territory, Country/Territory) 217 non-null object
1 (UN Region, UN Region) 217 non-null object
2 (IMF[1][13], Estimate) 217 non-null object
3 (IMF[1][13], Year) 217 non-null object
4 (World Bank[14], Estimate) 217 non-null object
5 (World Bank[14], Year) 217 non-null object
6 (United Nations[15], Estimate) 217 non-null object
7 (United Nations[15], Year) 217 non-null object
dtypes: object(8)
memory usage: 13.7+ KB

1.2.1.2 Web Scraping


Some websites are semi-structured, which has metadata, such as labels, classes, etc,
so we can look into their source code, and do web scraping.
Note: You need to have a basic understanding of html, xml, in order to
understand the source code and collect data from these websites.
Data Collection ■ 17

Note: Some websites prevent users from scraping or scraping rapidly.


The first thing we’ll need to do to scrape a web page is to download the page. We can
download pages using the Python requests library.
The requests library will make a GET request to a web server, which will download
the HTML contents of a given web page for us. There are several different types of
requests we can make using requests, of which GET is just one. If you want to learn
more, check out our API tutorial.
Let’s try downloading a simple sample website, https://dataquestio.github.io/web-
scraping-pages/simple.html.

Download by requests We’ll need to first import the requests library, and then
download the page using the requests.get method:
import requests

page = requests.get("https://dataquestio.github.io/
web-scraping-pages/simple.html")
page

<Response [200]>

After running our request, we get a Response object. This object has a status_code
property, which indicates if the page was downloaded successfully:
page.status_code

200

A status_code of 200 means that the page downloaded successfully. We won’t fully
dive into status codes here, but a status code starting with a 2 generally indicates
success, and a code starting with a 4 or a 5 indicates an error.
We can print out the HTML content of the page using the content property:
page.content

b'<!DOCTYPE html>\n<html>\n <head>\n <title>A simple example


page</title>\n </head>\n <body>\n <p>Here is some
simple content for this page.</p>\n </body>\n</html>'

Parsing by BeautifulSoup As you can see above, we now have downloaded an HTML
document.
We can use the BeautifulSoup library to parse this document, and extract the text
from the p tag.
18 ■ Data Mining with Python

from bs4 import BeautifulSoup


soup = BeautifulSoup(page.content, 'html.parser')

We can now print out the HTML content of the page, formatted nicely, using the
prettify method on the BeautifulSoup object.
print(soup.prettify())

<!DOCTYPE html>
<html>
<head>
<title>
A simple example page
</title>
</head>
<body>
<p>
Here is some simple content for this page.
</p>
</body>
</html>

This step isn’t strictly necessary, and we won’t always bother with it, but it can be
helpful to look at prettified HTML to make the structure of the and where tags are
nested easier to see.

Finding Tags Finding all instances of a tag at once What we did above was useful for
figuring out how to navigate a page, but it took a lot of commands to do something
fairly simple. If we want to extract a single tag, we can instead use the find_all
method, which will find all the instances of a tag on a page.
if we are looking for the title, we can look for <title> tag
soup.find_all('title')

[<title>A simple example page</title>]

for t in soup.find_all('title'):
print(t.get_text())

A simple example page

If we are looking for text, we can look for <p> tag


for t in soup.find_all('p'):
print(t.get_text())

Here is some simple content for this page.


Data Collection ■ 19

If you instead only want to find the first instance of a tag, you can use the find method,
which will return a single BeautifulSoup object:
soup.find('p').get_text()

{"type":"string"}

Searching for tags by class and id:


Classes and ids are used by CSS to determine which HTML elements to apply certain
styles to. But when we’re scraping, we can also use them to specify the elements we
want to scrape.
Let’s try another page.
page = requests.get("https://dataquestio.github.io/
web-scraping-pages/ids_and_classes.html")
soup = BeautifulSoup(page.content, 'html.parser')
soup

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
First paragraph.
</p>
<p class="inner-text">
Second paragraph.
</p>
</div>
<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>
<p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>
</body>
</html>

Now, we can use the find_all method to search for items by class or by id. In the
below example, we’ll search for any p tag that has the class outer-text:
soup.find_all('p', class_='outer-text')

[<p class="outer-text first-item" id="second">


<b>
20 ■ Data Mining with Python

First outer paragraph.


</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

In the below example, we’ll look for any tag that has the class outer-text:
soup.find_all(class_="outer-text")

[<p class="outer-text first-item" id="second">


<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

We can also search for elements by id:


soup.find_all(id="first")

[<p class="inner-text first-item" id="first">


First paragraph.
</p>]

1.2.2 Case Study – Collect Weather Data from Web


1.2.2.1 Downloading Weather Data
We now know enough to proceed with extracting information about the local weather
from the National Weather Service website!
The local weather of Boulder, CO is: https://forecast.weather.gov/MapClick.php?la
t=40.0466&lon=-105.2523#.YwpRBy2B1f0
Time to Start Scraping!
We now know enough to download the page and start parsing it. In the below code,
we will:
• Download the web page containing the forecast.
• Create a BeautifulSoup class to parse the page.
• Find the div with id seven-day-forecast, and assign to seven_day
• Inside seven_day, find each individual forecast item. Extract and print the first
forecast item.
Data Collection ■ 21

import requests
from bs4 import BeautifulSoup

page = requests.get("https://forecast.weather.gov/
MapClick.php?lat=40.0466&lon=-105.2523#.YwpRBy2B1f0")
soup = BeautifulSoup(page.content, 'html.parser')
seven_day = soup.find(id="seven-day-forecast")
forecast_items = seven_day.find_all(class_="tombstone-container")
print(forecast_items)

[<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny...>
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Mostly clear...>
...

tonight = forecast_items[0]
print(tonight.prettify())

<div class="tombstone-container">
<p class="period-name">
Today
<br/>
<br/>
</p>
<p>
<img alt="Today: Sunny, with a high near 88.
Northwest wind 9 to 13 mph,
with gusts as high as 21 mph. "
class="forecast-icon" src="newimages/medium/few.png"
title="Today: Sunny, with a high near 88.
Northwest wind 9 to 13 mph,
with gusts as high as 21 mph. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 88 °F
</p>
</div>

1.2.2.2 Extracting Information of Tonight


As we can see, inside the forecast item tonight is all the information we want. There
are four pieces of information we can extract:
• The name of the forecast item – in this case, Tonight.
• The description of the conditions – this is stored in the title property of img.
• A short description of the conditions – in this case, Sunny and hot.
• The temperature hight – in this case, 98 degrees.
22 ■ Data Mining with Python

We’ll extract the name of the forecast item, the short description, and the temperature
first, since they’re all similar:
period = tonight.find(class_="period-name").get_text()
short_desc = tonight.find(class_="short-desc").get_text()
temp = tonight.find(class_="temp").get_text()
print(period)
print(short_desc)
print(temp)

Today
Sunny
High: 88 °F

Now, we can extract the title attribute from the img tag. To do this, we just treat the
BeautifulSoup object like a dictionary, and pass in the attribute we want as a key:
img = tonight.find("img")
desc = img['title']
print(desc)

Today: Sunny,
with a high near 88.
Northwest wind 9 to 13 mph,
with gusts as high as 21 mph.

1.2.2.3 Extract all Nights!


Now that we know how to extract each individual piece of information, we can combine
our knowledge with CSS selectors and list comprehensions to extract everything at
once.
In the below code, we will:
Select all items with the class period-name inside an item with the class tombstone-
container in seven_day. Use a list comprehension to call the get_text method on each
BeautifulSoup object.
period_tags = seven_day.select(".tombstone-container .period-name")
periods = [pt.get_text() for pt in period_tags]
periods

['Today',
'Tonight',
'Sunday',
'SundayNight',
'Monday',
'MondayNight',
'Tuesday',
'TuesdayNight',
'Wednesday']
Data Collection ■ 23

As we can see above, our technique gets us each of the period names, in order.
We can apply the same technique to get the other three fields:
short_descs = [sd.get_text() for sd in seven_day.select(
".tombstone-container .short-desc")]
temps = [t.get_text() for t in seven_day.select(
".tombstone-container .temp")]
descs = [d["title"] for d in seven_day.select(
".tombstone-container img")]

print(short_descs)
print(temps)
print(descs)

['Sunny', 'Mostly Clear', 'Sunny thenSlight ChanceT-storms',...]


['High: 88 °F', 'Low: 59 °F', 'High: 88 °F', 'Low: 57 °F', ...]
['Today: Sunny, with a high near 88. Northwest wind 9 to 13 mph...]

1.2.2.4 Deal with Data


We can now combine the data into a Pandas DataFrame and analyze it. A DataFrame
is an object that can store tabular data, making data analysis easy.
In order to do this, we’ll call the DataFrame class, and pass in each list of items that
we have. We pass them in as part of a dictionary.
Each dictionary key will become a column in the DataFrame, and each list will become
the values in the column:
import pandas as pd
weather = pd.DataFrame({
"period": periods,
"short_desc": short_descs,
"temp": temps,
"desc":descs
})
weather

Now let’s save it to CSV.


weather.to_csv('data/Boulder_Weather_7_Days.csv')

1.3 COLLECT DATA FROM SQL DATABASES

Storing data in SQL databases offers several advantages and considerations. The
advantages are:
• Advantages of Storing Data in SQL Databases: Structured Storage: SQL
databases provide a structured storage model with tables, rows, and columns,
allowing for efficient organization and retrieval of data.
24 ■ Data Mining with Python

• Data Integrity and Consistency: SQL databases enforce data integrity through
constraints, such as primary keys, unique keys, and referential integrity, ensuring
the accuracy and consistency of the stored data.
• Querying and Analysis: SQL databases offer powerful query languages (e.g.,
SQL) that enable complex data retrieval, filtering, aggregations, and analysis
operations.
• ACID Compliance: SQL databases adhere to ACID (Atomicity, Consistency,
Isolation, Durability) properties, ensuring reliable and transactional data opera-
tions.
To collect data from a SQL database, you need to establish a connection to the database
server. This typically involves providing connection details such as server address,
port, username, and password. Once connected, you can use SQL queries to extract
data from the database. Queries can range from simple retrieval of specific records to
complex joins, aggregations, and filtering operations. Python provides several libraries
for interacting with SQL databases, such as sqlite3, psycopg2, pymysql, and pyodbc.
These libraries allow you to establish connections, execute SQL queries, and retrieve
the query results into Python data structures for further processing.

1.3.1 Tutorial – Collect Data from SQLite


1.3.1.1 What is SQLite
A file with the .sqlite extension is a lightweight SQL database file created with
the SQLite software. It is a database in a file itself and implements a self-contained,
full-featured, highly-reliable SQL database engine.
We use SQLite to demonstrate the approach to access SQL databases. They follow
similar steps. You just need to setup your account credentials in the connect so you
can connect the server.

1.3.1.2 Read an SQLite Database in Python


We use a Python package, sqlite3, to deal with SQLite databases. Once the sqlite3
package is imported, the general steps are:
1. Create a connection object that connects the SQLite database.
2. Create a cursor object
3. Create a query statement
4. Execute the query statement
5. Fetch the query result to result
6. If all work is done, close the connection.
We use the built-in SQLite database Chinook as the example here. We connect with
the database, and show all the tables it contains.
Data Collection ■ 25

import sqlite3

connection = sqlite3.connect('/content/ds_salaries.sqlite')
cursor = connection.cursor()

query = '''
SELECT name FROM sqlite_master
WHERE type='table';
'''

cursor.execute(query)
results = cursor.fetchall()
results

[('ds_salaries',)]

1.3.1.3 Play with the SQLite Databases


Using SQL statements, you can play with the SQLite Databases and get the data you
need.
query = '''SELECT *
FROM ds_salaries'''

cursor.execute(query)
results = cursor.fetchall()
results

[(None,
'work_year',
'experience_level',
'employment_type',
'job_title',
'salary',
'salary_currency',
'salary_in_usd',
'employee_residence',
'remote_ratio',
'company_location',
'company_size'),
(0,
'2020',
'MI',
'FT',
'Data Scientist',
'70000',
'EUR',
'79833',
'DE',
'0',
'DE',
'L'),
Random documents with unrelated
content Scribd suggests to you:
“’Course,” says Catty.
“I’m going to lend it to you, and I’ll tell you why,” says the captain.
“In the first place, it is good business for me and the bank. A bank
makes its money by lending to folks at interest. The more it lends
where it knows it will be paid back the more it makes. Then, a bank
has to help a community to grow and develop. Nothing like a good
bank to make a town. We furnish the capital, and men build houses
with it, and start stores and factories. Then they come and deposit
the money they make in the bank. Then the bank has more to lend
to somebody else, and it makes more money. Kind of an endless
chain. Really, our business is helping other folks to build up their
businesses. See?”
“Yes,” says Catty.
“I’ve had my eye on you and your father. I know about this young
Phillips that’s gone in with you. It was a good move. I know what
the women of this town are trying to do to you, and how you’ve
acted. I think you are honest. You are going at things right.
Somehow, tramps or no tramps, I’ve got confidence in you folks. Is
it true,” he says, “that you are teaching your father manners so he’ll
be equipped to meet folks when he’s a successful business man?”
“Yes,” says Catty.
“Learning them yourself, too?”
“Yes, sir.”
“Good!” says the captain. “It’s the kind of thing banks take into
consideration. Didn’t know that, did you?... Um!... It shows you
mean to succeed. The will to succeed is a fine asset, my boy, and
I’m proving it to you. Because it is one of the reasons I’m loaning
you this money. Get your father and Phillips to come and sign a note
and the money will be here for them.... Good luck to you.”
“Thank you, sir,” says Catty.
“Thank you,” says the captain, just like he was talking to another
business man.
We hustled back, you can bet.
“Git over to the bank quick and sign a note ’fore he changes his
mind,” says Catty, calm as a puddle of rain-water. “The money’s
there waitin’ fur you.”
“You’re joking,” says Jack.
“You’ll find the money hain’t no joke,” says Catty. “You and Dad—
git!”
Mr. Atkins shook his head sadlike. “’Tain’t no use,” says he. “I’m
fated to be a business man. There hain’t no hope, nohow.”
“Not a mite,” says Catty. “And, Jack, think up all the manners you kin
to teach us. Captain Winton says they’re an asset, whatever an asset
is. Anyhow, it’s somethin’ that helps you borrow money.”
“Catty,” says Jack, “I’ll cram the pair of you so full of manners that
you’ll look like a busy day in a crowded dancing-school.”
“Guess we kin manage to use all you got to spare,” says Catty, as
sober as a judge.
CHAPTER XII

Just on the edge of town was a big stock-farm where a company


raised Holstein cattle. There were half a dozen big barns—bigger
than any barns in our county, all painted white. Everything inside the
barns was white, too. The way those folks kept their cattle you
would have thought they were made out of diamonds instead of
beef just like any other cow. There was a bull there that we heard
had cost more than fifty thousand dollars. Catty and I were talking
about that bull and we figured that a steak off of him would cost
about a hundred dollars a bite. Eating that bull would be sort of like
as if cannibals was to capture Mr. Rockefeller and eat him.
All at once Catty says, “They must be paintin’ on them barns all the
time to keep ’em so white.”
“Dunno,” says I.
“Bet they do,” says he. “I’m goin’ out to find out about it. Comin’?”
We set out to walk. It took us ’most an hour to get there, and when
we did get there those barns looked even bigger than I had
remembered them. We went right through the gate and up to a little
house marked “Office.” There were two men in there, one in overalls
and another slim man, not exactly dressed up, but not looking as if
he spent much time washing out stalls. Catty rapped on the door,
and the man in overalls looked up and said to come in.
“I’m looking for the boss,” says Catty. “I’m him,” says the man in
overalls. “What is it?”
“You got fine, white, clean-lookin’ barns,” says Catty.
The man that didn’t have overalls sort of grinned, and the boss he
kind of grinned, too. “Much ’bleeged,” says he. “Is that what you
come to say?”
“No,” says Catty. “What I come to say was that that biggest barn
with Number Three painted on it don’t look as good as the rest.
Seems like it needed paintin’.”
Both men laughed. “Now I call that neighborly,” says the boss.
“Wasn’t figgerin’ on offerin’ to paint it for us, was you?”
“I was,” says Catty, and both the men laughed again.
“Fetch your brush?” says the boss.
Catty looked at him kind of solemn. “I come to talk business,” says
he, firm but polite. “If you’re figgerin’ on havin’ that barn painted, I’d
like to git the chance to bid on the job.”
The men laughed again, but this time the man that was in his
regular clothes says: “Let me have a word with this kid. He’s got
something on his mind, I guess.” The other man nodded.
“What makes you think you could paint that barn?” says the man.
“Well,” says Catty, “we painted Mr. Manning’s big warehouse, and we
done a good, satisfactory job. Mr. Manning said so. I kin refer you to
him.”
“Who is we?,” says the man.
“Dad and me—and Jack Phillips. Jack’s a partner now. We calc’late to
be engineers, architects, contractors, painters, and interior
decorators,” says he.
“Is that all?” says the overall man. “Don’t you do plain and fancy
cookin’ and crochet lace?”
Catty looked at him full in the eye for a minute and then he says,
without a smile, “If you kin show me where there’s any money in it
for the firm, well tackle it,” says he.
“By Jing!” says the man in the clothes, and he leaned forward a
little. “Tell me some more. Are you the outside man for the firm? Do
you bring in the business?”
“I’ve got most of it so far. We started in to do paintin’ and
paperhangin’ alone, but the folks in town took a dislike to us and the
wimmin got in another painter and paper-hanger that’s underbiddin’
us. We hain’t gettin’ much in that line. There wa’n’t nothin’ for us to
do but branch out. So we went into buildin’ and architecture and
sich.”
“Why didn’t the folks like you?”
Catty told him, and the man listened like he was interested.
“And you’re going to stick?” he asked. “You figure you can beat
public opinion?” Catty’s mouth shut tight a minute and his eyes got
bright.
“We’ll stick or bust,” says he. “We’re respectable. We hain’t shiftless
any more, leastways most of the time we hain’t. Dad has his
shiftless days, but they’re gettin’ fewer and fewer, and I keep my eye
on him sharp. Pretty soon he’ll be respectable all the way through.
We calc’late to give everybody a square deal, and if we kin jest keep
on gittin’ work until everybody sees we hain’t tramps, but
respectable folks, I don’t see but what we’ll git where I want to git.”
“Where’s that?” says the man.
“I want Dad should be the most respectable business man in this
county,” says Catty.
“Did you ever hear the like!” says the man to the boss.
“Never did,” says the boss. “That barn’s been needin’ paintin’ for
months. Can’t spare the men to do it off the regular work, and
couldn’t git anybody in town to tackle it. Just had to let it slide.”
“Can you get men to do this job?” asked the boss.
“Yes.”
“How many?”
“As many as Dad thinks is necessary to do it right. Dad he knows his
business well.”
“Give us a figger, then,” says the boss. Catty thought a minute.
“Mister,” says he, “don’t all these buildin’s have to be painted about
once a year—inside and out?”
“Mostly whitewash inside,” says the boss. “Then,” says Catty, “why
hain’t it good business for both of us if I was to give you a figger on
doin’ all the work by the year? Doin’ all the paintin’ and whitewashin’
necessary, and takin’ all that worry off of your hands.”
“Young man,” says the man with the clothes, “you have ideas. You
see where you’re going, and I’m going to make a bet that you get
there. That is a business-like proposition. You make a proposition
along that line.”
“Thankee, sir. Good mornin’. I’ll have that bid in so quick you’ll be
s’prised.”
We hustled back, and Catty got his father to hire a rig and drive right
out to the stock-farm. Mr. Atkins spent all the rest of the day there,
and Catty spent the rest of the day there sort of moving his father
along from one thing to another and seeing to it he didn’t lie down
in any shady spots or take any strolls back into the woods. Mr. Atkins
made heaps of measurements, and talked a lot to the boss, and
when he got to talking business and got really interested he acted
like he was another man. He spoke kind of sharp and brisk, and he
give you the idea that he knew what he was about. It was funny the
way he was changing. You couldn’t notice it much every day, but if
you looked at him as he was now and like he was when he first
came to town, you wouldn’t believe what you saw.
That night he and Jack Phillips sat up late going over figures, and early the
next morning they had things ready to show to the stock-farm company

That night he and Jack Phillips sat up late going over figures, and
early the next morning they had things ready to show to the stock-
farm company.
“Catty ought to take the figures out,” says Jack. “He landed the job.”
Mr. Atkins looked at Catty and heaved a big breath.
“He done so,” says he, and his eyes sort of twinkled. “Catty’s a
terror. He’s a-ruinin’ my life. Fust I know he’ll make a rich man out of
me, and I’ll have to buy me one of them silk hats like he was talking
about, and nobody knows but what I’ll have to git me a cane to
wear Sundays.”
“Catty,” says Jack, “is the best man in this firm.”
Well, we walked out to the farm and showed those men the figures,
and Catty had listened so he was able to explain anything they didn’t
understand. The upshot of it was that the boss signed the contract
Jack had written, and Atkins & Phillips had landed the job of doing
all the painting for the Greenfields Holstein Farms for a year. It was a
whaling big contract, too. Catty figured they would make a fair living
out of it, even if they didn’t get another stroke of work to do.
“We’re growin’,” says he. “Now we got to save out money so as to git
a lot ahead to branch out with. What we need most right now is
money.”
“It’s what most folks needs,” says I. “I need a little myself. ’Most
always do.”
“Now,” says Catty, “we got to give a little time to Jim Bockers. We
got all the work we need this minute, so I kind of figger to git some
for Jim. The more he gits the more money he loses. I don’t calc’late
runnin’ us out of town ’s goin’ to be very profitable for some folks.”
We walked up the street to take a look at the cellar of Mr.
Witherspoon’s house, and just in front of the bank we saw Mr. Arthur
Peabody Kinderhook talking to Captain Winton. I heard Captain
Winton say:
“Don’t you think it would be advisable, Mr. Kinderhook, to interest a
certain amount of local capital in your enterprises? I’m sure a
number of our citizens would be willing to invest.”
“No.... No....” says Mr. Kinderhook. “I’m in business to make money
for myself. The profits from this manufacturing operation will be
handsome. I have the money to swing it without outside help. Why
should I let in anybody else?”
“There really doesn’t seem to be any reason,” said Captain Winton,
disappointed-like. “But I wish you would think it over.” Somehow it
had got out that what Mr. Kinderhook was going to manufacture was
a patent churn that got more butter out of cream than any other,
and did it so easy that it wasn’t any work at all. It was a patent
dingus that Mr. Kinderhook had the secret of, and folks was talking it
about that he would make millions of dollars out of it. About a dozen
times that week I heard one man or another sayin’ that he wished
Kinderhook would let him stick in a little of his savings.
The rumor was around town that day that Kinderhook had bought a
big piece of land along the railroad and was going to start in pretty
soon to build a factory. Folks said there would be maybe three or
four hundred people hired to work there, and everybody was getting
excited. I heard one man say it would double the population of our
town and make everybody’s property worth double what it had
been, and that if every one there didn’t get rich out of it, why, it
would be their own fault.
Catty told me that all sounded good to him. If lots of folks moved
there, then there would be houses to build and paint and paper, and
so Atkins & Phillips would make a lot more money. He was always
thinking about Atkins & Phillips and making money and getting so
respectable folks would be afraid to set down to the table with him.
It seemed like he didn’t have anything else in his mind. Why, he
even got to worrying about the way he talked and his father talked,
and said it wasn’t the way respectable folks used words. He said
they didn’t speak correct.
“Neither do I,” says I.
“But you’re goin’ to school to learn,” says he. “They teach you how
to talk in school, don’t they?”
“Yes,” says I.
“Why?” says he.
“Because,” says I, “they have to have school from half past eight in
the morning till half past three in the afternoon, and if they didn’t
think up a lot of different things to teach, why, they wouldn’t know
what to do with all their time.”
“Rats!” says he. “They teach you everything on purpose. They got a
reason for it. You learn figgerin’ so’s to be able to count money and
do business. That’s that. They teach you geography so’s you’ll know
where to find places in the world if you want to git to ’em or sell
things to ’em. They teach you writin’ and readin’ so’s you’ll be able
to write letters about business and read letters and printed things
about buyin’ and sellin’ goods. That’s why. All business. They run
schools just so’s you can learn how to make a livin’—with the
exception of teachin’ you how to talk. There hain’t but one reason
for that. Bein’ able to talk right is a mark of bein’ respectable.
There’s a certain way the best kind of folks talk, and if you kin talk
that way, why, right off everybody believes you’re one of them....
And that’s good business, too. Bein’ respectable is useful in business,
as Dad and me has found out. If we’d been respectable we wouldn’t
’a’ had all this trouble here.... So I’m goin’ to git after Dad to make
him talk right.”
You see every word he said had something to do with business or
being respectable. He had ’em on the brain. Table manners and
clothes and talking right—nothing but the idea of being respectable,
and so being able to do business the way it ought to be done, and
the more business you done, why, the more respectable you was.
That was Catty’s idea. Maybe he was right. I dunno.
Well, sir, a couple of weeks after that Mr. Kinderhook came into the
store and says, “Can I have a sign printed here?”
“You kin,” says Catty, and he called his father. “This gentleman,” says
he, “wants to have a sign painted.”
“I want a very large sign, sir,” says Mr. Kinderhook, beaming at Mr.
Atkins like he wanted to kiss him. “I want it erected on a piece of
property I have arranged to purchase as the site of my factory. The
sign is to be ten feet high and thirty feet long, and I wish to have it
white with enormous black letters—do you get the idea?”
“Want the letters to spell anythin’?” says Mr. Atkins, interested-like,
“or was you jest figgerin’ on any letters at all put on helter-skelter?”
Mr. Kinderhook looked at him kind of funny a minute, and then he
says: “I want the following words lettered: ‘This Is the Site of the
Kinderhook Farm Utilities Corporation. Our Enormous Factory Will Be
Completed January First.’ Can you manage it, my good man?”
“I kin,” said Mr. Atkins. “I calc’late I could put ’most anything onto
sich a sign. I kin put that on easy. If you want, I kin put on
somethin’ real hard.”
“That will do very well,” says Mr. Kinderhook, and he turned to walk
toward the door.
“Was you calc’latin’ on payin’ for it?” says Mr. Atkins.
“Certainly—certainly.”
“Um!... Int’rested to know how much it ’ll cost you?”
“To be sure.”
“Then why didn’t you ask?” says Mr. Atkins. “When folks gives an
order, and don’t worry none about how much they got to pay for it, I
always git a sneakin’ idea it’s because they don’t calc’late they’ll ever
have to pay. Funny notion, hain’t it?”
“Very,” says Mr. Kinderhook, with a funny kind of a grin. “But you
must know me, sir. My name is Kinderhook.”
“Seen you ’round town,” says Mr. Atkins. “Been sort of lookin’ you
over once or twice. Int’restin’ feller, you be, I sh’u’d say. Got
int’restin’ and everythin’. Always wear that high hat?”
“I have done so for years.”
“Thought so. Habit, hain’t it? Wa’n’t born with it on, was you?”
Mr. Kinderhook laughed like he saw a mighty good joke. “No,” he
said, “but my mother gave it to me soon after.”
“Price of that sign ’ll be a even hunderd dollars,” says Mr. Atkins.
“Perfectly satisfactory,” says Mr. Kinderhook, and he started for the
door again.
“If it’s so doggone satisfactory,” says Mr. Atkins, “jest suppose you
plunk down the money—now?”
“Before the sign’s completed? Why, that isn’t my way of doing
business, sir.”
“It’s mine—in some cases,” says Mr. Atkins. “One hunderd dollars—in
advance. No hunderd—no sign.”
“Don’t you trust me—me? I tell you I am Arthur Peabody
Kinderhook.”
“Heard you say so. Tell you how it is: ’Tain’t that I mistrust you exact
—and ’tain’t that I trust you. I dunno nothin’ about you. If I was to
build that sign and spend money for lumber and paint, and put a lot
of work onto it, I might worry about whether I was a-goin’ to git
paid—if I hadn’t got paid in advance. Worryin’ upsets my stummick
and puts me off’n my meals. That’s the idee, mister.”
Mr. Kinderhook laughed again, and with a pompous kind of gesture
took out his pocket-book and threw five twenty-dollar bills onto the
counter. “There you are,” he says, in a grand kind of way. “That
shows you I’ve got the money.”
“What it shows,” says Mr. Atkins, “is that I got the money. That’s
what int’rests me.... Afternoon, mister.”
Catty was staring at his father and so was I. The whole business
wasn’t like Mr. Atkins at all. There was something shrewd about it
that didn’t seem like Catty’s father. And he seemed like he was
interested in getting money—which gen’ally he wasn’t. It sort of
showed what he could be like if he wanted to—the kind of a man
folks wouldn’t smouge very often.
“What ails you, Dad?” says Catty. “I never seen you act so before.”
“Um!... Keep your eye peeled, Catty, and maybe you’ll see me act
like it ag’in. Somethin’ about that feller that don’t set right,
somehow. There’s somethin’ about that feller—somethin’ about that
feller—” He scratched his head and bit his thumb and rapped his
knuckles on the counter. “Now did I ever see that feller before, or
didn’t I?... And if I did, where did I?... And if I didn’t, what makes
me think I did?... Um!... If ever I seen him it was some place and
doin’ somethin’ that kind of set me ag’in’ him.... Kind of funny. Set
my teeth on edge, that feller did.”
“But he’s a millionaire, Dad. Maybe we kin make lots of money out of
him.”
“Millionaire, hey? Don’t say. Wa-al, I swan to man!... I’m a-goin’ to
set down and think about that man, and remember if I kin
remember him. I’ll call him to mind if I have to set and remember
every man I ever seen since I was knee-high to a milkin’-stool. I’ll
check ’em off one by one, I will.... It’s made me itch, I’m that
curious.... Catty, I hain’t goin’ to do another tap of work till I
remember who that feller is—if he’s anybody.”
And, just as he always did, Mr. Atkins kept his word to the letter.
CHAPTER XIII

It seemed like the town got more and more excited every day about
Arthur Peabody Kinderhook and his factory. Nobody talked about
much of anything else, and every afternoon you could see a dozen
men out walking around the field where the factory was to be,
pacing off distances and fooling themselves into thinking they knew
just where the buildings were to be, and how big they would look
and everything. And Mr. Atkins was on strike.
Yes, sir, since the day he sold the sign to Mr. Kinderhook he hadn’t
done a tap of work, but had just sat around thinking and thinking
and thinking, laying to remember when he had seen the man before
and where he had seen him.
Then the news got around that Kinderhook had agreed to sell
Captain Winton some stock in his factory, and folks were almost
crazy. They thought it meant that maybe they could get some, and if
they did get some they would get rich and never have to work any
more, but just sit around and draw dividends and travel and smoke
five-cent cigars. But Kinderhook wouldn’t sell to anybody else.
And then, one morning, I went down to the store and Mr. Atkins was
at work again. I knew right off he had thought where he had seen
Mr. Kinderhook, because he was the kind of man who kept his word,
and he had said he wouldn’t work till he remembered.
“Mornin’!” says I. “I see you’ve found out about the Kinderhook
feller.”
“Uh!...” says Mr. Atkins, and Catty grinned.
“Dad’s remembered,” says he, “but he’s a-thinkin’ it over. He won’t
tell us till he’s got it figgered out to suit himself. I don’t care so long
as he sticks at work and keeps on tryin’ to be respectable. I got him
now so he kin eat pie with a fork. It was a chore to teach him, but
it’s done. Next he’ll be eatin’ soup without makin’ a noise like a horse
runnin’ through a mud-puddle.”
Catty said all this as sober as a judge. He wasn’t poking fun at his
father nor being impudent. He was just saying what was so in the
best way he knew how.
All of a sudden Mr. Atkins spoke up.
“Wee-wee,” says he, “what’s your notions about medicine-shows?”
“Medicine-shows?” says I. “What about ’em?”
“Regard ’em as proper and respectable?” says he.
“They’re lots of fun,” says I, “especially when they pull folks’s teeth
free and have real Injuns doin’ war-dances and things.”
“You like ’em, then?”
“Sure,” says I.
“But s’posin’ the feller sells medicine for a dollar a bottle that he
guarantees to cure up rheumatics and cramps in the stummick and
chills and warts and corns and freckles and backache and earache—
and supposin’ that medicine hain’t worth the speck on a toad’s ear to
cure anythin’? How about that?”
“Did the doctor know it?” says I.
“Yes,” says he.
“Then he was a cheat,” says I.
“To be sure,” says he. “A cheat. I calc’late that’s what he was—and
maybe worse. How’d you look at sich a feller—as bein’ what Catty
calls respectable, or not?”
“Not,” says I.
“Um!... My way of thinkin’, too. If you seen sich a feller runnin’ some
other business and aimin’ to get aholt of folks’s money, what would
your notions be about how he was goin’ to treat ’em?”
“I’d guess,” says I, “that he was goin’ to cheat ’em, too.”
“My idee,” says Mr. Atkins, and he went into the back room and
stirred around for half an hour without saying another word. Catty
and I talked about lots of things and told what we was going to do
when we got rich and grown up and all that. Catty was going to own
some kind of a business that was the most respectable business
there was in the world. He hadn’t picked out what the business
would be yet, because he couldn’t figure what was the most
respectable. I told him being a minister looked awful respectable to
me, but he says that wasn’t a business, but only marrying folks on
week-days and talking on Sundays, and that there wasn’t any money
in it, anyhow, so far as he could see. He thought some about being
a judge or a Senator. I didn’t care for either of those ways of earning
a living, myself. My leaning was toward something better than either
of them. I aimed to be a clown in a circus or else a cowboy and
discover a gold-mine and all that. I’d changed my mind some. Once
I was going to be a circus performer—one of the trapeze kind—and I
set some angleworms to stewing on top of the barn. Everybody
knows circus fellers git so supple by rubbing angleworm oil onto
themselves. But when my worms was stewed out and I went
anywheres near them I made up my mind I didn’t care about
trapeze-performing if I had to butter myself with that kind of
perfume.
Just when we were arguing hardest Mr. Atkins came back and says,
sudden as a thunderclap:
“This here Kinderhook man used to run one of them snide medicine-
shows. Wore a silk hat and pulled teeth and had tame Injuns and
all.”
You could have knocked me down with a puff-ball. Why, this
Kinderhook man looked as if he’d never owned anything less than a
national bank, and he was the kind of a fellow that you would pick
to be the boss deacon of a church and all that. And him pulling teeth
and selling snide medicine!
Catty slid down off the counter. “Then,” says he, “he aims to cheat
the folks of this town out of their money.”
“And serve ’em right,” says Mr. Atkins.
“That hain’t no way to talk, Dad, and you’d know it if you was
respectable. But you’re gettin’ respectabler every day. It ’ll come if
you jest have patience.”
“Don’t want it to come too hard,” says Mr. Atkins.
“We got to stop it,” says Catty.
“Codfish!” says Mr. Atkins. “Wouldn’t lift my hand for folks that’s
acted like these.”
“Dunno’s I care so much about the folks,” says Catty, “but the idea
of anybody gettin’ cheated sort of riles me. I’m goin’ to tell folks who
Kinderhook is.”
“Think they’ll b’lieve you?” says Mr. Atkins. “Not much. Who be you?
You’re a young tramp that folks wants to run out of town, and I’m
an old tramp that they’re tryin’ to put out of business. If we was to
step in and interfere, what you s’pose would happen? They’d put us
in jail, most like. They wouldn’t b’lieve our word ag’in’ Kinderhook’s.
Better keep your mouth shut, Catty.”
Catty stood and thought a few minutes, and then he shook his head
and said he guessed his father was right. “But we know what’s true,”
he says, “and it’s our duty as respectable folks to put a stop to it....
And I’m a-goin’ to.”
“How?” says I.
“Hadn’t but one way,” says he, “and that is to git proof that folks ’ll
have to believe. We kin do that, Wee-wee, and we’ll go to work and
watch Kinderhook, and foller him and nose out jest what he’s up to.
It’s our job. We kin do it between-times while I’m helpin’ to run our
own business and make Dad respectable. Want to help, Wee-wee?”
“Be reg’lar detectives?” says I.
“Sure,” says he.
“You bet,” says I. “But why not tell folks right out?”
Catty looked at me like he was sorry for anybody that didn’t have
any more brains than to ask that.
“Because,” says he, “folks is all het up over this here man
Kinderhook. They think he’s the greatest man in the world, and
anybody would git in trouble that said a word ag’in’ him. Anybody
would, but what would folks do to us? Lemme ask you that. They
want to run us out anyhow, but if we was to spread a story about
Kinderhook they’d ride us out on a rail.”
“Guess you’re right,” says I, “but how’ll we go about provin’ it? And
when we’ve got it proved, what ’ll we do?”
“I dunno,” says Catty. “That ’ll have to come when it comes. Main
thing is for us to tend to our business and watch our chance. We kin
ketch Kinderhook at it if he’s meanin’ some snide game. He’ll be
showin’ it somehow.”
“Whatever you say,” says I.
From that minute I was a heap more interested in Kinderhook than I
had been before. As soon as you find out something like that about
a man you begin to notice things, and to watch, and to figure out
what he means when he says anything. It’s a lot of fun, and I didn’t
want to do anything else but just trail ’round after him, but Catty
wouldn’t have that. He wouldn’t neglect his regular business, and he
wouldn’t let down on learning manners and then teaching his father
what he had learned. At the rate Catty was going I figured out he
would be the most respectable and the politest man in the world by
the time he was old enough to vote. Most folks get manners sort of
by accident. They just sop manners up, anyhow, as they go along,
and never notice it, but Catty made a businessof it same as he’d
make a business of learning to pull teeth or cut off legs like a doctor.
There was quite a lot of talk around town about Jack Phillips coming
into business with the Atkinses and about how they managed to find
something to do in spite of what the women thought about it. It
made the folks that didn’t like Catty and his father more determined
than ever to get rid of them, and you could hear women and men
talking it over almost any time if you were to listen. More than one
woman came to my mother to complain about my going around with
Catty so much, and a couple of men spoke to Dad, but they never
did it more than once. I heard Dad say to one man:
“Look here, Mr. Withers, my son plays with the Atkins boy because I
want him to. I’ve studied that boy, and if he isn’t worth half a dozen
of the ordinary kids in this town then I’m willing to pick up and move
away. Catty’s got brains and ambition and he’s aboveboard, with
nothing sneaking about him. You say you won’t let your boy play
with mine if the Atkins boy is around. Well, I’m satisfied. If anything
is wrong with Catty Atkins, then I hope Wee-wee catches it.”
I guess Jim Bockers was beginning to get sick of his bargain about
this time. I know of a dozen painting or paperhanging jobs that
Catty worked up and made a bid on. His bid every time was just a
little less than it would actually cost to do the work—and then the
folks would go to Jim and Jim would have to live up to his
advertisement and do the work for even less. It was rotten business,
and Catty said he couldn’t last long.
One day Catty says to me: “I wisht you’d drop in to Jim Bockers’s
when you get a chance and sort of find out how he’s gettin’ along.
We’re making some money off his rent, but he’s ruined the
paperhangin’ business. If it wasn’t for that stock-farm and a few
outside jobs that part of our business would be dead. We ought to
be makin’ twice what we are, and if anything comes of this
Kinderhook boom we ought to almost git rich. Jest kind of sound Jim
out.”
So I dropped in that afternoon. Jim had a nice shop with lots of wall-
paper in rolls all put away in little square pigeonholes, and shelves of
paint and brushes, and a shop full of ladders and things. It was a
high-toned place, all right, but Jim didn’t look very happy.
“Howdy, Mr. Bockers?” says I. “How’s business?”
“Lots of business,” says Jim, as gloomy as an undertaker.
“You ought to be grinnin’, then,” says I.
“Hain’t no money into it,” says Jim.
“How’s that?” says I.
“Them Atkins fellers,” says he.
“But you’re gittin’ all the work,” says I.
“The more I git the more I lose,” says he.
“How’s that?”
“Why, my sister-in-law, she got me to open this shop to run them
folks out. She says they didn’t have no capital and that I could
underbid ’em and bust ’em in a couple of weeks. That looked all
right to me, ’cause she lent me some money to put with what I’d
saved, and I started in.”
“Sounds good,” says I.
“Sounded too good,” says he. “I figgered they’d bid so as to make
money, and that I could underbid ’em down to cost and break even.
I could ’a’ stood that—just to break even for a while till they was got
rid of, and then I’d have all the business to myself.”
“Didn’t it work?”
“Work nothin’! Them Atkinses done me. They’re sharpers. They
cheated me.”
“How?” says I, gettin’ interested.
“They bid too low. They bid below cost themselves, and then I had
to take the business for less ’n that. It cost me money every time I
done a job. Calc’late I’ve lost a couple hundred dollars since I come,
and no outlook for doin’ better.”
“Why don’t you git out?” says I.
“Can’t afford it. Hain’t got the money to move. Got all this stock;
besides, my sister-in-law’s so dead set on runnin’ them fellers out of
town that I dassen’t quit.”
“Run away,” says I.
“I’d lose my stock,” says he, “and I’ve lost more ’n enough already.”
“Um!...” I says, thinking it over. “What if you could sell your stock?”
“Got a lease on this store, or rather that sister-in-law of mine has. It
runs for a year. The rent’s got to be paid.”
“That’s her lookout, hain’t it? She got you into this mess, didn’t
she?”
“Calc’late she did.”
“Well?” says I.
“Wee-wee,” says he, after a few minutes, “I wisht I could find
somebody that would pay me somethin’ for this stock. I kin lose
money on it and still be ahead. I’d sell and scoot if I could git cash
money.”
“You stay where you be,” says I, and off I ran to find Catty.
I found him in the store, lecturing his father about clothes and
telling him how he ought to buy a good suit, with a dress-up hat for
Sundays, and how he had to do it with the first money they could
spare. “It means a lot. You go around lookin’ swell, and folks won’t
remember how you used to look. First you know you’ll be as
respectable as anybody. You’ll be gettin’ elected a director in the
bank.”
“Catty,” said I, busting right in on him, “Jim Bookers is ready to quit.
He’ll sell out for cash, and scoot.”
“Honest?” says he.
“Honest Injun,” says I. “Come on.”
I looked around Catty’s shop. They didn’t commence to have the
stock Jim did. It would be fine if they could get Jim’s and move it in.
Catty and I hustled over to Jim’s.
“Hear you’re willin’ to sell,” says Catty. “For cash,” says Jim.
“And sign an agreement sayin’ you won’t go into business in this
town again for ten years?” says Catty.
“You bet. I got all the business here that I want for a hundred
years.”
“How much you want for everything?”
“Dunno till I take inventory.”
“Let’s take it,” says Catty, and in a minute we went at it hammer and
tongs. It took us till late that night, but when we were through we
knew exactly what that stock and stuff of Jim’s had cost.
“Set a price,” says Catty.
Jim did, and Catty just laughed. Right off he told Jim what he would
pay, and it was a lot less. “I’m lookin’ for a bargain,” Catty says.
“That’s my price, cash. You kin take it or leave it. I’ll give you ten
minutes to think it over, and if you don’t take it then the offer is all
off and we don’t make a deal.”
“You’re robbin’ me,” says Jim.
“You tried to rob Dad and me,” says Catty. “You’re gittin’ what’s
comin’ to you.”
Jim he argued and fussed and hollered and haggled. But Catty just
kept looking at the clock. “Time’s up,” he says. “What’s your
answer?”
Jim he goggled and strangled, but there wasn’t anything for it. He
had to take his medicine.
“All right,” says he. “Cash.”
“Cash,” says Catty, “as soon as the bank opens.”.
Early in the morning Catty and I went to Mr. Wade in his office full of
Napoleons, and had him draw up what papers we ought to have,
and then we took Mr. Atkins and Jack Phillips to the bank and got
the money. Jim Bockers signed the paper that Mr. Wade said was a
bill of sale, and hustled for the train. He wanted to get away before
his sister-in-law found out.
Catty was tickled. “Now we’re all right,” says he. “I figger we made
close to two hunderd dollars on this deal, and we got the paintin’
and paperhangin’ business of this town right by the ear. Anybody
that wants some done has got to come to us. I guess maybe this
hain’t a move toward gettin’ respectable.”
They set to work and moved Jim’s stock over to their own store and
put the ladders and scaffolds and things in the shed. It was the first
time they had really been in shape to do business. Even Mr. Atkins
acted kind of tickled. He hated to show it, but every once in a while
you could see he was really getting interested in the business and
that work wasn’t as disagreeable for him as it used to be.
Catty was moving along toward where he wanted to be.
CHAPTER XIV

A day or two after that Catty and I were sitting on the platform of
the station, waiting for the train to come in with some things Jack
Phillips had ordered. Along came Captain Winton, the president of
the bank, and Mr. Moss, the hardware-man. They sat down a little
ways from us and began to talk.
“We’ve got to get ready for it,” says Captain Winton. “It won’t be
long before mill-hands will be moving in here with their families, and
we haven’t any places for them to live. I’ve been thinking it over,
and it looks to me like some of us could get together and build a
dozen houses or so and pick up a nice profit—or make a good
income from rents.”
“I’ve been thinking about that, too. You own a piece of land down
the new factory way, don’t you?”
“Ten acres,” says Captain Winton. “We could run streets through and
start in by building a dozen cottages. Then, if the thing went as I
expect, we could put up more.”
“How much would we have to put into it?”
“Well, my guess is that we could put up the houses for a couple of
thousand apiece—maybe twenty-five hundred. The bank would loan
on each house and lot fifteen hundred. A dozen, including land and
everything, would stand us in thirty-six thousand, and we would
have to raise half of that.”
“I’ve got a few thousand lying loose,” says Mr. Moss. “I wouldn’t
want to put everything into this building thing, because I’m still
hoping to persuade Mr. Kinderhook to sell me a block of stock—say
five thousand dollars. He’s pretty friendly with me.”
“I don’t know. He seems to want to hang onto it.”
“You got some,” says Mr. Moss.
“That was on account of the bank, I guess. He wanted to have us
interested.... But I think we can get four or five men here to go into
this building thing. We could form a company. I’ll put in my land at
five thousand and take another five thousand besides.”
“You can count on me for two or three thousand, and Gage ’ll come
in for some, and so will Gordon and Piddlecomb and Bockers.”
“Tell any of them you see to meet at the bank this afternoon. We
want to go at it as quickly as we can.”
Then the train came in. We didn’t hear any more, but there didn’t
seem to be any more to hear. On the way back to the store Catty
was pretty quiet. As soon as we got there he hollered for Jack
Phillips.
“Jack,” he says, “there’s goin’ to be a dozen houses built here all in a
bunch, and we got to land the job.”
“Tell me about it,” says Jack.
Catty told him all we had heard, and Jack got quite excited. “I
wonder how they’ll let the contract. On bids, probably.”
“With Mr. Gage and Mr. Bockers mixed up in it we won’t have much
of a chance,” says Catty.
“That’s right,” says Jack, and he looked discouraged, but Catty spoke
right up and says: “We got to have a chance. We got to land that
job. There’s big money in it.”
“Pretty big. We ought to make five thousand dollars, anyhow, and
maybe more.”
“If they know we’re biddin’ we’ll never land it,” says Catty, “so we
got to fool ’em. It’s fair. We’ll do ’em as good a job as anybody if we
get a chance, and it hain’t right for them to act like they will toward
us.... I guess I got an idee. You’ll have to do it, Jack. We’ll git up a
company and call it by a fancy name. It ’ll be a company over to
Harleyville. You’ll have to go over there and have letter-heads
printed and kind of make believe have an office, and we kin do all
the business by mail. Then, when the contracts are all signed up,
we’ll be the folks that do the work. How’s that?”
“Bully,” says Jack.
I guess Catty was right about the chance the Atkinses would have
had to land the contract, because I heard Mr. Gage and Mr. Bockers
talking in Gage’s back yard, and they both said that it didn’t matter
what kind of a bid the Atkinses made they wouldn’t let them do the
work.
“My wife’s dead set on getting those people out of town,” says Mr.
Gage.
“So’s mine. If they have the nerve to make a bid, why, we’ll just
throw it out.”
I told this to Catty and he grinned a little and then squared up his
jaw. In a day or two there was an advertisement for plans and bids
in the paper, and Jack went over to Harleyville. He had been working
on plans and specifications, and he had had letter-paper printed with
“North American Construction Company” on it. He signed his letters
that way, with only an initial “P” under it in pen and ink. They were
fine letters, too, and guaranteed the kind of work that would be
done—and it would be the best work, Jack said. He said he wanted
Atkins & Phillips to get to be known everywhere as a firm that did
better work than anybody else and always did what it guaranteed to
do.
Atkins & Phillips didn’t make any bid at all. Mr. Wade was appointed
by the North American Construction Company to be its agent in
town, and it gave him quite a reputation, because the name
sounded as if it was a whopping big company. Mr. Wade knew all
about it, and the way he laughed was enough to make your sides
ache. He said it was regular Napoleon tactics, fooling the enemy and
hitting them hard where they weren’t looking for it. He got right on
the job and kept after Captain Winton, who wouldn’t care himself
who got the job, and he kept after Mr. Gage and Mr. Bockers, until
they thought the North American Construction Company was about
right. He said the company would put up a bond to do the work
right.
There were three other bids from out-of-town companies, but
between Jack’s letters and Mr. Wade the North American landed the
job and the contracts were signed by Captain Winton, president of
the building company, and by Mr. Wade as agent for the North
American, and the bond was made and everything. Nobody said a
word, and then the lumber began to come and carpenters from out
of town—and the work started.
Well, maybe you think there wasn’t a row then when folks found out
Jack Phillips was in charge of the job and Mr. Atkins was a kind of
foreman, and that the whole work was actually being done by Atkins
& Phillips. Mr. Gage got a lawyer and Bockers got one and they tried
every which way to break the contract, but it was no go. Captain
Winton sort of grinned and wouldn’t have anything to do with it. He
said if the Atkins folks were smart enough to get the contract he
guessed they were smart enough to carry it out, and he told them to
come to the bank if they needed any money—which they did.
It was right after this that Catty made his father go to the clothing-
store and buy two suits of clothes, one for business and one for
Sundays, and the right kind of hats to go with them. Well, sir, when
Mr. Atkins got dressed up in those duds you wouldn’t have known
him, and I guess he hardly knew himself. He kept his hair cut now,
and his beard trimmed down into a point, and if he wasn’t as good-
looking a man as we had in town, I’ll eat him. He didn’t look any
more like a tramp than Mr. Rockefeller did.
Those clothes seemed to make quite a difference in him, too. He
acted different. He didn’t act so much like Mr. Atkins any more, but
like another man that wasn’t shiftless at all and really liked to work.
That is, he acted that way part of the time, and when he was feeling
shiftless he sort of kept out of sight so folks wouldn’t see it. Catty
said his father was really getting interested in the work, and he was
hoping he would get interested in being respectable.
From that day nobody ever saw Mr. Atkins in any clothes but good
ones. He didn’t wear his painter’s suit, though he wanted to. Catty
wouldn’t let him. Every morning before Mr. Atkins could get out of
his room Catty looked him all over to see he was dressed right. It
was funny. It was almost as if Catty had taken a jack-knife and
whittled out a man, his father was getting to be so different to what
he used to be.
Catty and I began trailing around after Mr. Kinderhook whenever we
got a chance, but we hadn’t even seen anything that looked
suspicious. He just looked rich and important, and he acted rich and
important and puffed up. To see him sitting on the hotel porch like
he owned the whole state, and being kind to folks and behaving
toward them just like he thought they were as good as he was, was
a sight. He never talked about his factory and what he was going to
manufacture unless somebody started it first and then urged him on,
and then he acted sort of like the subject tired him and he didn’t
want to be bothered with it. We listened to him a dozen times, and
couldn’t see how he was planning to gouge anybody.
“Maybe he’s reformed,” says I.
“Bet he hain’t. He don’t look reformed,” says Catty. “If he was the
kind of man that was willin’ to make money sellin’ cheat medicine to
old women with the rheumatiz that wouldn’t help ’em a bit and
maybe made ’em worse, why, he’s bad yet. But I can’t see how he’s
plannin’ to be bad.”
“It’s sure he hain’t tryin’ to sell any shares in his factory.”
“Looks that way,” says Catty. “’Course he sold some to Captain
Winton.”
“But not to anybody else, and everybody is crazy to buy.”
“I heard him say this mornin’ that his company was all incorporated,
whatever that is, and he expected to start in buildin’ soon,” says
Catty. “I wonder what ‘incorporated’ is?”
“Haven’t any idee,” says I. “Maybe it means somethin’ like planned
out.”
“Maybe. I heard him tell Mr. Gage that he didn’t have any patent on
this churn of his, because if he was to patent it he would have to
give away the secret and somebody would sell it. He says there’s a
secret part, and nobody kin find out how to make it, so he hain’t
goin’ to git a patent at all, but just go to work and manufacture and
prevent anybody from findin’ out how it’s done.”
“Sounds kind of fishy,” says I. “Everybody swallers it down,” says
Catty, “but if there’s any cheatin’ in this I’ll bet it’s got somethin’ to
do with that secret.” That very afternoon we didn’t have anything
else to do, so we fussed around close to Mr. Kinderhook, keeping
watch of him and listening to what he had to say. After a while he
got up and walked down the street, and we trailed after him until he
got to the station. He went into the telegraph-office and wrote out a
message. We waited till he was gone and then we went right in
where Tom Purvis was clicking the keys. We could do that because I
knew Tom mighty well and he didn’t mind. We stood right back of
Tom’s chair, making believe we were interested in what he was doing
and how he sent messages, but really we wanted to get a sight of
what Mr. Kinderhook had written. Pretty soon Tom came to it and
began clacking away. I could read it over his shoulders. It was
addressed to a man by the name of Matthew Binger in New York,
and it said:

Come at once. Crop ripe.

Now that was a funny message, it seemed to me, because there


weren’t any crops ripe just then, and Mr. Kinderhook wasn’t
interested in crops if they had been. Catty and I went off after that,
but we couldn’t make any head nor tail of it. It just looked silly, but
anyhow we made up our minds we would keep meeting trains till
this man Binger came, and we would see what he was up to and
what crop Mr. Kinderhook had in mind.
Two days later a stranger got off of the train. He was short and fat,
and he looked almost as rich as Kinderhook did.
“Bet that’s Binger,” says Catty.
“Bet it is, too,” says I.
So we rode with Pazzy Bills back to the hotel and saw the man
register. Sure enough, his name was Matthew Binger and he asked if
a man named Kinderhook was stopping there. The clerk allowed
there was, and Mr. Binger asked the clerk if he would take up his
card. The clerk done so, and pretty soon down comes Mr.
Kinderhook, peering around like he was looking for somebody. He
didn’t recognize Mr. Binger any more than as if he had never heard
of him till the clerk says, “That’s the gentleman that wanted to see
you, Mr. Kinderhook,” and Kinderhook walked over, holding the card
in his hand and reading it.
“Mr. Binger?” says he, looking at the card again, as if he was making
sure he had the name right.
“Matthew Binger—yes. And is this Mr. Arthur Peabody Kinderhook?”
“It is. What can I do for you?”
“I have come down from New York to talk business with you. Where
can we go and be quiet?”
“Is your business secret, Mr. Binger? Because if it is, we can’t talk. I
don’t do secret business. There’s nothing about my business that
any of my good friends in this town can’t hear. Whatever you’ve got
to say to me can be said right out on the porch—or it can’t be said
at all.”
Mr. Binger he acted sulky, but it looked like there wasn’t anything he
could do about it, so they went out and sat in red rocking-chairs,
and Catty and I sat on the steps close by.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like