100% found this document useful (5 votes)
45 views

PDF (Ebook) Machine Learning Pocket Reference: Working with Structured Data in Python by Matt Harrison ISBN 9781492047544, 1492047546 download

The document provides information about various ebooks related to machine learning and data science, including titles, authors, and ISBNs. It highlights the 'Machine Learning Pocket Reference' by Matt Harrison, which serves as a practical guide for working with structured data in Python. The document also includes links for downloading these ebooks in different formats and mentions the author's background and the book's intended audience.

Uploaded by

ledyinattia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
45 views

PDF (Ebook) Machine Learning Pocket Reference: Working with Structured Data in Python by Matt Harrison ISBN 9781492047544, 1492047546 download

The document provides information about various ebooks related to machine learning and data science, including titles, authors, and ISBNs. It highlights the 'Machine Learning Pocket Reference' by Matt Harrison, which serves as a practical guide for working with structured data in Python. The document also includes links for downloading these ebooks in different formats and mentions the author's background and the book's intended audience.

Uploaded by

ledyinattia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Download Full Version ebook - Visit ebooknice.

com

(Ebook) Machine Learning Pocket Reference: Working


with Structured Data in Python by Matt Harrison
ISBN 9781492047544, 1492047546

https://ebooknice.com/product/machine-learning-pocket-
reference-working-with-structured-data-in-python-11877114

Click the button below to download

DOWLOAD EBOOK

Discover More Ebook - Explore Now at ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Learning the Pandas Library: Python Tools for Data


Munging, Analysis, and Visual by Matt Harrison; Michael
Prentiss ISBN 9781533598240, 153359824X
https://ebooknice.com/product/learning-the-pandas-library-python-
tools-for-data-munging-analysis-and-visual-7264632

ebooknice.com

(Ebook) Building Machine Learning Systems with Python, 2nd


Edition: Get more from your data through creating
practical machine learning systems with Python by Luis
Pedro Coelho, Willi Richert ISBN 9781784392772, 1784392774
https://ebooknice.com/product/building-machine-learning-systems-with-
python-2nd-edition-get-more-from-your-data-through-creating-practical-
machine-learning-systems-with-python-5468026
ebooknice.com

(Ebook) Adaptive Machine Learning Algorithms with Python:


Solve Data Analytics and Machine Learning Problems on Edge
Devices by Chanchal Chatterjee ISBN 9781484280164,
1484280164
https://ebooknice.com/product/adaptive-machine-learning-algorithms-
with-python-solve-data-analytics-and-machine-learning-problems-on-
edge-devices-42829800
ebooknice.com

(Ebook) Transactional Machine Learning with Data Streams


and AutoML: Build Frictionless and Elastic Machine
Learning Solutions with Apache Kafka in the Cloud Using
Python by Sebastian Maurice ISBN 9781484270226, 1484270223
https://ebooknice.com/product/transactional-machine-learning-with-
data-streams-and-automl-build-frictionless-and-elastic-machine-
learning-solutions-with-apache-kafka-in-the-cloud-using-
python-37321800
ebooknice.com
(Ebook) Advanced Data Analytics Using Python: With Machine
Learning, Deep Learning and NLP Examples by Mukhopadhyay,
Sayan ISBN 9781484234495, 1484234499
https://ebooknice.com/product/advanced-data-analytics-using-python-
with-machine-learning-deep-learning-and-nlp-examples-55670484

ebooknice.com

(Ebook) Data Science in the Cloud with Microsoft Azure


Machine Learning and Python by Stephen F. Elston ISBN
9781491936313, 1491936312
https://ebooknice.com/product/data-science-in-the-cloud-with-
microsoft-azure-machine-learning-and-python-36151136

ebooknice.com

(Ebook) Python Machine Learning. Machine Learning and Deep


Learning with Python, scikit-learn and TensorFlow by
Sebastian Raschka, Vahid Mirjalili ISBN 9781787125933,
1787125939
https://ebooknice.com/product/python-machine-learning-machine-
learning-and-deep-learning-with-python-scikit-learn-and-
tensorflow-6868034
ebooknice.com

(Ebook) Machine Learning with LightGBM and Python by


anonymous

https://ebooknice.com/product/machine-learning-with-lightgbm-and-
python-55462132

ebooknice.com

(Ebook) Statistical Methods for Machine Learning: Discover


How to Transform Data into Knowledge with Python by Jason
Brownlee
https://ebooknice.com/product/statistical-methods-for-machine-
learning-discover-how-to-transform-data-into-knowledge-with-
python-10440848
ebooknice.com
Machine
Learning
Pocket
Reference
Working with Structured Data
in Python

Matt Harrison
Machine Learning
Pocket Reference
Working with Structured Data
in Python

Matt Harrison
Machine Learning Pocket Reference
by Matt Harrison
Copyright © 2019 Matt Harrison. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North,
Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promo‐
tional use. Online editions are also available for most titles (http://oreilly.com).
For more information, contact our corporate/institutional sales department:
800-998-9938 or [email protected].

Acquisitions Editor: Rachel Roumeliotis


Development Editor: Nicole Tache
Production Editor: Christopher Faucher
Copyeditor: Sonia Saruba
Proofreader: Christina Edwards
Indexer: WordCo Indexing Services, Inc.
Interior Designer: David Futato
Cover Designer: Karen Montgomery
Illustrator: Rebecca Demarest
September 2019: First Edition
Revision History for the First Edition
2019-08-27: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781492047544 for release
details.

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Machine


Learning Pocket Reference, the cover image, and related trade dress are trade‐
marks of O’Reilly Media, Inc.
The views expressed in this work are those of the author, and do not represent
the publisher’s views. While the publisher and the author have used good faith
efforts to ensure that the information and instructions contained in this work
are accurate, the publisher and the author disclaim all responsibility for errors
or omissions, including without limitation responsibility for damages result‐
ing from the use of or reliance on this work. Use of the information and
instructions contained in this work is at your own risk. If any code samples or
other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to
ensure that your use thereof complies with such licenses and/or rights.

978-1-492-04754-4
[LSI]
Table of Contents

Preface ix

Chapter 1: Introduction 1
Libraries Used 2
Installation with Pip 5
Installation with Conda 6

Chapter 2: Overview of the Machine Learning Process 9

Chapter 3: Classification Walkthrough: Titanic Dataset 11


Project Layout Suggestion 11
Imports 12
Ask a Question 13
Terms for Data 13
Gather Data 15
Clean Data 16
Create Features 23
Sample Data 25

iii
Impute Data 25
Normalize Data 27
Refactor 27
Baseline Model 29
Various Families 29
Stacking 31
Create Model 32
Evaluate Model 33
Optimize Model 34
Confusion Matrix 35
ROC Curve 36
Learning Curve 38
Deploy Model 39

Chapter 4: Missing Data 41


Examining Missing Data 42
Dropping Missing Data 47
Imputing Data 47
Adding Indicator Columns 49

Chapter 5: Cleaning Data 51


Column Names 51
Replacing Missing Values 52

Chapter 6: Exploring 55
Data Size 55
Summary Stats 56
Histogram 58
Scatter Plot 59
Joint Plot 60

iv | Table of Contents
Pair Grid 63
Box and Violin Plots 64
Comparing Two Ordinal Values 65
Correlation 67
RadViz 71
Parallel Coordinates 73

Chapter 7: Preprocess Data 77


Standardize 77
Scale to Range 79
Dummy Variables 80
Label Encoder 81
Frequency Encoding 82
Pulling Categories from Strings 82
Other Categorical Encoding 84
Date Feature Engineering 86
Add col_na Feature 87
Manual Feature Engineering 88

Chapter 8: Feature Selection 89


Collinear Columns 90
Lasso Regression 92
Recursive Feature Elimination 94
Mutual Information 96
Principal Component Analysis 97
Feature Importance 97

Chapter 9: Imbalanced Classes 99


Use a Different Metric 99
Tree-based Algorithms and Ensembles 99

Table of Contents | v
Penalize Models 100
Upsampling Minority 100
Generate Minority Data 101
Downsampling Majority 101
Upsampling Then Downsampling 103

Chapter 10: Classification 105


Logistic Regression 106
Naive Bayes 111
Support Vector Machine 113
K-Nearest Neighbor 116
Decision Tree 119
Random Forest 127
XGBoost 133
Gradient Boosted with LightGBM 143
TPOT 148

Chapter 11: Model Selection 153


Validation Curve 153
Learning Curve 156

Chapter 12: Metrics and Classification Evaluation 159


Confusion Matrix 159
Metrics 162
Accuracy 164
Recall 164
Precision 164
F1 165
Classification Report 165
ROC 166

vi | Table of Contents
Precision-Recall Curve 167
Cumulative Gains Plot 169
Lift Curve 171
Class Balance 172
Class Prediction Error 173
Discrimination Threshold 175

Chapter 13: Explaining Models 177


Regression Coefficients 177
Feature Importance 178
LIME 178
Tree Interpretation 180
Partial Dependence Plots 181
Surrogate Models 185
Shapley 186

Chapter 14: Regression 191


Baseline Model 193
Linear Regression 194
SVMs 198
K-Nearest Neighbor 200
Decision Tree 202
Random Forest 208
XGBoost Regression 211
LightGBM Regression 218

Chapter 15: Metrics and Regression Evaluation 223


Metrics 223
Residuals Plot 226
Heteroscedasticity 227

Table of Contents | vii


Normal Residuals 228
Prediction Error Plot 230

Chapter 16: Explaining Regression Models 233


Shapley 233

Chapter 17: Dimensionality Reduction 239


PCA 239
UMAP 259
t-SNE 264
PHATE 268

Chapter 18: Clustering 273


K-Means 273
Agglomerative (Hierarchical) Clustering 280
Understanding Clusters 283

Chapter 19: Pipelines 289


Classification Pipeline 289
Regression Pipeline 292
PCA Pipeline 293

Index 295

viii | Table of Contents


Preface

Machine learning and data science are very popular right now
and are fast-moving targets. I have worked with Python and
data for most of my career and wanted to have a physical book
that could provide a reference for the common methods that I
have been using in industry and teaching during workshops to
solve structured machine learning problems.
This book is what I believe is the best collection of resources
and examples for attacking a predictive modeling task if you
have structured data. There are many libraries that perform a
portion of the tasks required and I have tried to incorporate
those that I have found useful as I have applied these techni‐
ques in consulting or industry work.
Many may lament the lack of deep learning techniques. Those
could be a book by themselves. I also prefer simpler techniques
and others in industry seem to agree. Deep learning for
unstructured data (video, audio, images), and powerful tools
like XGBoost for structured data.
I hope this book serves as a useful reference for you to solve
pressing problems.

ix
What to Expect
This book gives in-depth examples of solving common struc‐
tured data problems. It walks through various libraries and
models, their trade-offs, how to tune them, and how to inter‐
pret them.
The code snippets are meant to be sized such that you can use
and adapt them in your own projects.

Who This Book Is For


If you are just learning machine learning, or have worked with
it for years, this book should serve as a valuable reference. It
assumes some knowledge of Python, and doesn’t delve at all
into syntax. Rather it shows how to use various libraries to
solve real-world problems.
This will not replace an in-depth course, but should serve as a
reference of what an applied machine learning course might
cover. (Note: The author uses it as a reference for the data ana‐
lytics and machine learning courses he teaches.)

Conventions Used in This Book


The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames,
and file extensions.
Constant width
Used for program listings, as well as within paragraphs to
refer to program elements such as variable or function
names, databases, data types, environment variables, state‐
ments, and keywords.

x | Preface
TIP
This element signifies a tip or suggestion.

NOTE
This element signifies a general note.

WARNING
This element indicates a warning or caution.

Using Code Examples


Supplemental material (code examples, exercises, etc.) is avail‐
able at https://github.com/mattharrison/ml_pocket_reference.
This book is here to help you get your job done. In general, if
example code is offered with this book, you may use it in your
programs and documentation. You do not need to contact us
for permission unless you’re reproducing a significant portion
of the code. For example, writing a program that uses several
chunks of code from this book does not require permission.
Selling or distributing a CD-ROM of examples from O’Reilly
books does require permission. Answering a question by citing
this book and quoting example code does not require permis‐
sion. Incorporating a significant amount of example code from
this book into your product’s documentation does require
permission.
We appreciate, but do not require, attribution. An attribution
usually includes the title, author, publisher, and ISBN. For
example: “Machine Learning Pocket Reference by Matt Harrison
(O’Reilly). Copyright 2019 Matt Harrison, 978-1-492-04754-4.”

Preface | xi
If you feel your use of code examples falls outside fair use or
the permission given above, feel free to contact us at
[email protected].

O’Reilly Online Learning


For almost 40 years, O’Reilly
Media has provided technology
and business training, knowledge,
and insight to help companies succeed.
Our unique network of experts and innovators share their
knowledge and expertise through books, articles, conferences,
and our online learning platform. O’Reilly’s online learning
platform gives you on-demand access to live training courses,
in-depth learning paths, interactive coding environments, and
a vast collection of text and video from O’Reilly and 200+ other
publishers. For more information, please visit http://oreilly.com.

How to Contact Us
Please address comments and questions concerning this book
to the publisher:

O’Reilly Media, Inc.


1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, exam‐
ples, and any additional information. You can access this page
at http://www.oreilly.com/catalog/9781492047544.
To comment or ask technical questions about this book, send
email to [email protected].

xii | Preface
For more information about our books, courses, conferences,
and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments
Much thanks to my wife and family for their support. I’m
grateful to the Python community for providing a wonderful
language and toolset to work with. Nicole Tache has been
lovely to work with and provided excellent feedback. My tech‐
nical reviewers, Mikio Braun, Natalino Busa, and Justin Fran‐
cis, kept me honest. Thanks!

Preface | xiii
CHAPTER 1
Introduction

This is not so much an instructional manual, but rather notes,


tables, and examples for machine learning. It was created by the
author as an additional resource during training, meant to be
distributed as a physical notebook. Participants (who favor the
physical characteristics of dead-tree material) could add their
own notes and thoughts and have a valuable reference of cura‐
ted examples.
We will walk through classification with structured data. Other
common machine learning applications include predicting a
continuous value (regression), creating clusters, or trying to
reduce dimensionality, among others. This book does not dis‐
cuss deep learning techniques. While those techniques work
well for unstructured data, most recommend the techniques in
this book for structured data.
We assume knowledge and familiarity with Python. Learning
how to manipulate data using the pandas library is useful. We
have many examples using pandas, and it is an excellent tool
for dealing with structured data. However, some of the index‐
ing operations may be confusing if you are not familiar with
numpy. Full coverage of pandas could be a book in itself.

1
Libraries Used
This book uses many libraries. This can be a good thing and a
bad thing. Some of these libraries may be hard to install or con‐
flict with other library versions. Do not feel like you need to
install all of these libraries. Use “JIT installation” and only
install the libraries that you want to use as you need them.
>>> import autosklearn, catboost,
category_encoders, dtreeviz, eli5, fancyimpute,
fastai, featuretools, glmnet_py, graphviz,
hdbscan, imblearn, janitor, lime, matplotlib,
missingno, mlxtend, numpy, pandas, pdpbox, phate,
pydotplus, rfpimp, scikitplot, scipy, seaborn,
shap, sklearn, statsmodels, tpot, treeinterpreter,
umap, xgbfir, xgboost, yellowbrick

>>> for lib in [


... autosklearn,
... catboost,
... category_encoders,
... dtreeviz,
... eli5,
... fancyimpute,
... fastai,
... featuretools,
... glmnet_py,
... graphviz,
... hdbscan,
... imblearn,
... lime,
... janitor,
... matplotlib,
... missingno,
... mlxtend,
... numpy,
... pandas,
... pandas_profiling,
... pdpbox,
... phate,

2 | Chapter 1: Introduction
... pydotplus,
... rfpimp,
... scikitplot,
... scipy,
... seaborn,
... shap,
... sklearn,
... statsmodels,
... tpot,
... treeinterpreter,
... umap,
... xgbfir,
... xgboost,
... yellowbrick,
... ]:
... try:
... print(lib.__name__, lib.__version__)
... except:
... print("Missing", lib.__name__)
catboost 0.11.1
category_encoders 2.0.0
Missing dtreeviz
eli5 0.8.2
fancyimpute 0.4.2
fastai 1.0.28
featuretools 0.4.0
Missing glmnet_py
graphviz 0.10.1
hdbscan 0.8.22
imblearn 0.4.3
janitor 0.16.6
Missing lime
matplotlib 2.2.3
missingno 0.4.1
mlxtend 0.14.0
numpy 1.15.2
pandas 0.23.4
Missing pandas_profiling
pdpbox 0.2.0
phate 0.4.2

Libraries Used | 3
Missing pydotplus
rfpimp
scikitplot 0.3.7
scipy 1.1.0
seaborn 0.9.0
shap 0.25.2
sklearn 0.21.1
statsmodels 0.9.0
tpot 0.9.5
treeinterpreter 0.1.0
umap 0.3.8
xgboost 0.81
yellowbrick 0.9

NOTE
Most of these libraries are easily installed with pip or
conda. With fastai I need to use pip install
--no-deps fastai. The umap library is installed with pip
install umap-learn. The janitor library is installed
with pip install pyjanitor. The autosklearn library is
installed with pip install auto-sklearn.
I usually use Jupyter for doing an analysis. You can use
other notebook tools as well. Note that some, like Google
Colab, have preinstalled many of the libraries (though they
may be outdated versions).

There are two main options for installing libraries in Python.


One is to use pip (an acronym for Pip Installs Python), a tool
that comes with Python. The other option is to use Anaconda.
We will introduce both.

4 | Chapter 1: Introduction
Installation with Pip
Before using pip, we will create a sandbox environment to
install our libraries into. This is called a virtual environment
named env:
$ python -m venv env

NOTE
On Macintosh and Linux, use python; on Windows, use
python3. If Windows doesn’t recognize that from the com‐
mand prompt, you may need to reinstall or fix your install
and make sure you check the “Add Python to my PATH”
checkbox.

Then you activate the environment so that when you install


libraries, they go in the sandbox environment and not in the
global Python installation. As many of these libraries change
and are updated, it is best to lock down versions on a per-
project basis so you know that your code will run.
Here is how we activate the virtual environment on Linux and
Macintosh:
$ source env/bin/activate
You will notice that the prompt is updated, indicating that we
are using the virtual environment:
(env) $ which python
env/bin/python
On Windows, you will need to activate the environment by
running this command:
C:> env\Scripts\activate.bat

Installation with Pip | 5


Again, you will notice that the prompt is updated, indicating
that we are using the virtual environment:
(env) C:> where python
env\Scripts\python.exe
On all platforms, you can install packages using pip. To install
pandas, type:
(env) $ pip install pandas
Some of the package names are different than the library
names. You can search for packages using:
(env) $ pip search libraryname
Once you have your packages installed, you can create a file
with all of the versions of the packages using pip:
(env) $ pip freeze > requirements.txt
With this requirements.txt file you can easily install the pack‐
ages into a new virtual environment:
(other_env) $ pip install -r requirements.txt

Installation with Conda


The conda tool comes with Anaconda and lets us create envi‐
ronments and install packages.
To create an environment named env, run:
$ conda create --name env python=3.6
To activate this environment, run:
$ conda activate env
This will update the prompt on both Unix and Windows sys‐
tems. Now you can search for packages using:
(env) $ conda search libraryname
To install a package, like pandas, run:
(env) $ conda install pandas

6 | Chapter 1: Introduction
To create a file with the package requirements in it, run:
(env) $ conda env export > environment.yml
To install these requirements in a new environment, run:
(other_env) $ conda create -f environment.yml

WARNING
Some of the libraries mentioned in this book are not avail‐
able to install from Anaconda’s repository. Don’t fret. It
turns out you can use pip inside of a conda environment
(no need to create a new virtual environment), and install
these using pip.

Installation with Conda | 7


CHAPTER 2
Overview of the Machine Learning
Process

Cross-Industry Standard Process for Data Mining (CRISP-


DM) is a process for doing data mining. It has several steps that
can be followed for continuous improvement. They are:

• Business understanding
• Data understanding
• Data preparation
• Modeling
• Evaluation
• Deployment

Figure 2-1 shows my workflow for creating a predictive model


that expands on the CRISP-DM methodology. The walk‐
through in the next chapter will cover these basic steps.

9
Figure 2-1. Common workflow for machine learning.

10 | Chapter 2: Overview of the Machine Learning Process


CHAPTER 3
Classification Walkthrough:
Titanic Dataset

This chapter will walk through a common classification prob‐


lem using the Titanic dataset. Later chapters will dive into and
expand on the common steps performed during an analysis.

Project Layout Suggestion


An excellent tool for performing exploratory data analysis is
Jupyter. Jupyter is an open-source notebook environment that
supports Python and other languages. It allows you to create
cells of code or Markdown content.
I tend to use Jupyter in two modes. One is for exploratory data
analysis and quickly trying things out. The other is more of a
deliverable style where I format a report using Markdown cells
and insert code cells to illustrate important points or discover‐
ies. If you aren’t careful, your notebooks might need some
refactoring and application of software engineering practices
(remove globals, use functions and classes, etc.).
The cookiecutter data science package suggests a layout to cre‐
ate an analysis that allows for easy reproduction and sharing
code.

11
Imports
This example is based mostly on pandas, scikit-learn, and Yel‐
lowbrick. The pandas library gives us tooling for easy data
munging. The scikit-learn library has great predictive model‐
ing, and Yellowbrick is a visualization library for evaluating
models:
>>> import matplotlib.pyplot as plt
>>> import pandas as pd
>>> from sklearn import (
... ensemble,
... preprocessing,
... tree,
... )
>>> from sklearn.metrics import (
... auc,
... confusion_matrix,
... roc_auc_score,
... roc_curve,
... )
>>> from sklearn.model_selection import (
... train_test_split,
... StratifiedKFold,
... )
>>> from yellowbrick.classifier import (
... ConfusionMatrix,
... ROCAUC,
... )
>>> from yellowbrick.model_selection import (
... LearningCurve,
... )

12 | Chapter 3: Classification Walkthrough: Titanic Dataset


WARNING
You might find documentation and examples online that
include star imports like:
from pandas import *
Refrain from using star imports. Being explicit makes your
code easier to understand.

Ask a Question
In this example, we want to create a predictive model to answer
a question. It will classify whether an individual survives the
Titanic ship catastrophe based on individual and trip charac‐
teristics. This is a toy example, but it serves as a pedagogical
tool for showing many steps of modeling. Our model should be
able to take passenger information and predict whether that
passenger would survive on the Titanic.
This is a classification question, as we are predicting a label for
survival; either they survived or they died.

Terms for Data


We typically train a model with a matrix of data. (I prefer to use
pandas DataFrames because it is very nice to have column
labels, but numpy arrays work as well.)
For supervised learning, such as regression or classification,
our intent is to have a fuction that transforms features into a
label. If we were to write this as an algebra formula, it would
look like this:
y = f(X)
X is a matrix. Each row represents a sample of data or informa‐
tion about an individual. Every column in X is a feature. The
output of our function, y, is a vector that contains labels (for
classification) or values (for regression) (see Figure 3-1).

Ask a Question | 13
Exploring the Variety of Random
Documents with Different Content
Aquilla Rose stands proudly with his mowing machine
outside his home near Eagle Creek. He didn’t stand
that still when revenuers came around.
National Park Service
All the wonders of the Great Smoky Mountains—the nature, the
people, the stories, and the battles and the jests—affected Horace
Kephart mightily. This man whose own life had been “saved” by the
Smokies began to think in terms of repaying this mountain area in
kind. For during his years on Hazel Creek and Deep Creek and in
Bryson City, he saw the results of the “loggers’ steel,” results that
caused him to lament in a single phrase, “slash, crash, go the
devastating forces.” In 1923 he summarized his feelings about the
lumber industry:
“When I first came into the Smokies the whole region was one
superb forest primeval. I lived for several years in the heart of it. My
sylvan studio spread over mountain after mountain, seemingly
without end, and it was always clean and fragrant, always vital,
growing new shapes of beauty from day to day. The vast trees met
overhead like cathedral roofs.... Not long ago I went to that same
place again. It was wrecked, ruined, desecrated, turned into a
thousand rubbish heaps, utterly vile and mean.”
Kephart began to think in terms of a national park. He and a
Japanese photographer friend, George Masa, trekked the Smokies
and gathered concrete experience and evidence of the mountains’
wild splendor. At every opportunity, Kephart advocated the park idea
in newspapers, in brochures, and by word of mouth. He proudly
acknowledged that “I owe my life to these mountains and I want
them preserved that others may profit by them as I have.”
The concept of a national park for these southern mountains was not
a new one in 1920. Forty years earlier, a retired minister and former
state geologist, Drayton Smith, of Franklin, North Carolina, had
proposed “a national park in the mountains.” In 1885, Dr. Henry O.
Marcy of Boston, Massachusetts, had discussed future health
resorts in America and had considered “the advisability of securing
under state control a
large reservation of
the higher range as
a park.” By the turn
of the century, the
Appalachian
National Park
Association was
formed in Asheville,
North Carolina, and
publicized the idea
of a national park
somewhere in the
region, not
specifically the Great
Smokies. When the
Federal Government
seemed to rule out
this possibility, the
Association devoted
the bulk of its time
and effort to the
creation of national
forest reserves.
But people like
Horace Kephart
knew the difference
between a national
Edouard E. Exline park that
safeguarded trees
When the Civilian Conservation and a national forest
Corps moved into the Smokies in that allowed logging.
the 1930s, young men from the In 1923, a group
cities saw moonshine stills supporting a genuine
firsthand. Here one pretends to be a Great Smokies park
moonshiner and hangs his head formed in Knoxville,
low for the photographer. Tennessee. Mr. and
Mrs. Willis P. Davis,
of the Knoxville Iron Company, in the summer of that year had
enjoyed a trip to some of the country’s western parks. As they
viewed the wonders preserved therein, Mrs. Davis was reminded of
the natural magnificence near her own home. “Why can’t we have a
national park in the Great Smokies?” she asked her husband.
Back in Knoxville, Mr. Davis began to ask that question of friends
and associates. One of these was Col. David C. Chapman, a
wholesale druggist, who listened but did not heed right away:
Grace Newman sits enraptured as Jim Proffit plays the
guitar.
Burton Wolcott
“Not until I accidentally saw a copy of President Theodore
Roosevelt’s report on the Southern Appalachians did I have any idea
of just what we have here. In reading and rereading this report I
learned for the first time that the Great Smokies have some truly
superlative qualities. After that I became keenly interested in Mr.
Davis’ plan and realized that a national park should be a possibility.”
The Davises and Chapman led the formation of the Great Smoky
Mountains Conservation Association. Congressmen and Secretary
of the Interior Hubert Work were contacted. Work endorsed the
project, and two years later Congress passed an act authorizing
associations in Tennessee and North Carolina to buy lands and deed
them to the U.S. Government.
Problems immediately presented themselves. The citizens would
have to buy this park. Unlike Yellowstone and other previous land
grants from the Federal Government, the Smokies were owned by
many private interests and therefore presented a giant challenge to
hopeful fund raisers. To further complicate matters, no group had the
power to condemn lands; any property, if secured at all, would have
to be coaxed from its owner at an appropriately high price. Finally,
and most discouragingly, park enthusiasts faced an area of more
than 6,600 separate tracts and thousands of landowners.
Yet events conspired to give the park movement a sustaining drive.
The lumber companies had made the people of the Smokies more
dependent on money for additional food, modern-day clothing, and
new forms of recreation. World War I and the coming of the
highways had instilled a restlessness in the mountain people, a
yearning for new sights and different ways of living. Some began to
echo the sentiments of one farmer who, after realizing meager
returns for his hard labor on rocky fields, looked around him and
concluded, “Well, I reckon a park is about all this land is fit for.”
Determined leadership overcame obstacles large and small. Behind
Chapman’s professorial appearance—his wire-rimmed glasses and
three-piece suits and unkempt hair—was a man who had been a
colonel in World War I, a man who had resolved to make the dream
of a national park into a reality. Along with Chapman as the driving
force, associate director of the National Park Service Arno B.
Cammerer provided the steering and the gears. Cammerer’s marked
enthusiasm for incorporating the Great Smokies into the national
park system added a well-placed, influential spokesman to the
movement. By spring of 1926, groups in North Carolina and
Tennessee had raised more than a million dollars. Within another
year, the legislatures of the two states each had donated twice that
amount.
With $5 million as a nest egg, park advocates turned to the actual
buying of lands. Cammerer himself defined a boundary which
included the most suitable territory and which, as it turned out,
conformed closely to the final boundary. Chapman and his
associates approached individual homeowners. Sometimes they
received greetings similar to one on a homemade sign:
“Col. Chapman. You and Hoast are notify. Let the Cove People
Alone. Get Out. Get Gone. 40 m. Limit.”
The older mountain people clung desperately to what they had. Even
though the buyers were prepared to issue lifetime leases for those
who wanted to stay, they found it difficult to remove this resolute
band from their homeland.
Many of the Smokies’ residents—the younger, more mobile, more
financially oriented ones—accepted the coming of the park with a
combination of fatalism and cautious hope. Gradually they
acknowledged the fact that a park and its tourist trade might be a
continuing asset, whereas the prosperity from logging had proved at
best only temporary. After John D. Rockefeller, Jr., through the Laura
Spelman Rockefeller Memorial Fund, doubled the park fund with a
much-needed gift of an additional $5 million, renewed offers of cash
completely melted many icy objections.
The lumber companies followed suit, but for higher stakes.
Champion Fibre, Little River, Suncrest, Norwood, and Ritter were
among the 18 timber and pulpwood companies that owned more
than 85 percent of the proposed park area. They fought to stay for
obvious economic reasons, yet they were prepared to leave if the
price was right. Little River Lumber Company, after considerable
negotiation with the state of Tennessee and the city of Knoxville, sold
its 30,345 hectares (75,000 acres) for only $8.80 per hectare ($3.57
per acre).

George A. Grant
An early morning fog cloaks the dense vegetation and
rolling hills at Cove Creek Gap. Such scenes inspired
many people to rally around the idea of purchasing
land for a park.
National Park Service
Those attending a meeting March 6, 1928, when a $5
million gift from the Laura Spelman Rockefeller
Memorial was announced, included (front from left)
former Tennessee Gov. Ben W. Hooper, Willis P. Davis,
E. E. Conner, David C. Chapman, Gov. Henry H.
Horton, John Nolan, Knoxville Mayor James A. Fowler,
(back from left) Kenneth Chorley, Arno B. Cammerer,
Wiley Brownlee, J. M. Clark, Margaret Preston, Ben A.
Morton, Frank Maloney, Cary Spence, and Russell
Hanlon.
The vast holdings of Champion Fibre Company were at the very
heart of the park, however, and the results of the company’s
resistance to a national park were central to success or failure of the
whole movement. Champion’s 36,400 hectares (90,000 acres)
included upper Greenbrier, Mt. Guyot, Mt. LeConte, the Chimneys,
and a side of Clingmans Dome, crowned by extensive forests of
virgin spruce. This splendid domain was the cause of hot tempers,
torrid accusations, rigid defenses, and a hard-fought condemnation
lawsuit. In the end, however, on March 30, 1931, Champion Fibre
agreed to sell for a total of $3 million, a sum which took on added
appeal during the slump of the disastrous Depression.
Four days after this agreement, Horace Kephart died in an
automobile accident near Cherokee, North Carolina. An 8-ton
boulder was later brought from the hills above Smokemont to mark
his grave in Bryson City.
Only a few years earlier Kephart had said:
“Here to-day is the last stand of primeval American forest at its best.
If saved—and if saved at all it must be done at once—it will be a joy
and a wonder to our people for all time. The nation is summoned by
a solemn duty to preserve it.”
And it was, indeed, preserved. The Federal Government in 1933
contributed a final $2 million to the cause, establishing the figure of
$12 million as the grand total of money raised for the park. On
September 2, 1940, with land acquisition almost completed,
President Franklin D. Roosevelt dedicated the Great Smoky
Mountains National Park “for the permanent enjoyment of the
people.”
The park movement’s greatest victory, coming as it did at Kephart’s
death, lent a special significance to his life. For his experience
symbolized the good effects that a national park in the Great Smoky
Mountains could create. These mountains and their people inspired
him to write eloquently of their truth and endurance; his own health
seemed to thrive in the rugged, elemental environment of the
Smokies. Perhaps most important of all, he discovered here the
impact of what it can mean to know a real home. Having found a
home for himself, he labored tirelessly for a national park to give to
his fellow countrymen the same opportunity for wonder and renewal
and growth.
John Walker, the patriarch of a large self-reliant family,
admires cherries he raised at his home in Little
Greenbrier.
Jim Shelton
The Past Becomes Present
As early as 1930, citizens and officials across the United States had
begun to realize that a new additional park would indeed encompass
and preserve the Great Smoky Mountains. Hard-working Maj. J.
Ross Eakin, the first superintendent of the park, arrived at the
beginning of the next year from his previous post in Montana’s
Glacier National Park and was quickly introduced to the cold, mid-
January winds of the Great Smokies and some of the controversies
that had arisen during establishment of the park.
At first, Eakin and his few assistants limited their duties to the basics;
they marked boundaries, prevented hunting, fought and forestalled
fire. But as the months passed, as the park grew in size and its staff
increased in number, minds and muscles alike tackled the real
problem of shaping a sanctuary which all the people of present and
future generations could enjoy.
Help came from an unexpected quarter. The economic depression
that had gripped the country in 1930 tightened its stranglehold as the
decade progressed. In the famous “Hundred Days” spring of 1933, a
special session of Congress passed the first and most sweeping
series of President Roosevelt’s New Deal legislation. The Civilian
Conservation Corps, created in April, established work for more than
two million young men. CCC camps, paying $30 a month for work in
conservation, flood control, and wilderness projects, sprang up.
As far as the young, struggling Great Smoky Mountains National
Park was concerned, this new CCC program could not have come at
a better time. Through the Corps, much-needed manpower
converged by the hundreds on the Smokies from such places as
New Jersey, Ohio, and New York City. Supervised by Park Service
officials and reserve officers from the U.S. Army, college-age men
first set up their own camps—17 in all—and then went about that old
familiar labor in the Smokies, landscaping and building roads. In
addition, they constructed trails, shelters, powerlines, fire towers,
and bridges.
Some of their tent-strewn camps were pitched on old logging sites
with familiar names like Smokemont and Big Creek. Others, such as
Camp No. 413 on Forney Creek, were more remote but no less
adequate. Ingenuity, sparked by necessity, created accommodations
which made full use of all available resources. At Camp Forney, for
instance, there was a barracks, a messhall, a bathhouse, and an
officers’ quarters. Water from clear, cold Forney Creek was piped
into the kitchen; food was stored in a homemade ice chest. The
residents of the camp, seeing no reason why they should rough it
more than necessary, added a library, a post office, and a
commissary in their spare time.
The CCC men, their ages between 18 and 25, did not forget
recreation. As teams organized for football, baseball, boxing,
wrestling, and soccer, the hills resounded with unfamiliar calls of
scores and umpires’ decisions, while the more familiar tussles of
boxing and wrestling raised echoes of old partisan matches
throughout the hills. At times, these young workers answered the
urge to ramble, too. One of them later recalled his days as a radio
man on the top of Mt. Sterling:
“It was seven miles steep up there, and sometimes I’d jog down
about sundown and catch a truck for Newport. That’s where we went
to be with people. The last truck brought us back after midnight.”
A minor problem sometimes arose when the CCC “outsiders” began
dating local girls; farming fathers sometimes set fires to give the
boys something else to do during the weekends. The conflict of
cultures was thrown into a particularly sharp light when a Corps
participant shot a farmer’s hog one night and shouted that he had
killed a bear!
On the whole, however, the Civilian Conservation Corps program in
the Great Smoky Mountains was a major success. In one or two
extremely rugged areas of the park, retired loggers were hired in 10-
day shifts to hack out or even drill short trail lengths. The rest of the
965-kilometer (600-mile) trail system, together with half a dozen fire
towers and almost 480 kilometers (300 miles) of fire roads and
tourist highways, was the product of the CCC. When Superintendent
Eakin evaluated the work of only the first two years of the CCC’s
operation, he equated it with a decade of normal accomplishment.
Through these and similar efforts, which included almost 110
kilometers (70 miles) of the famous Appalachian Trail, the natural
value of the Great Smoky Mountains became a recognized and
established lure for thousands, eventually millions, of visitors. But
there was another resource that remained untapped, a challenge to
the national park purpose and imagination. This resource was first
overlooked, then neglected, and finally confronted with respect. The
resource was the people and their homes.
Many previous owners of park land had received lifetime leases that
allowed them to live on in their dwellings, work their fields, and cut
dead timber even while tourists streamed through the Smokies.
Some of the lessees, such as those living near Gatlinburg, saw a
new era coming, thrusting back the street-ends until motels and
restaurants and craft shops pushed against an abandoned apple
orchard or a 10-plot cemetery or a deserted backyard laced with
lilacs. These rememberers of an earlier time relinquished their lands
in the park, more often than not resettling within sight of the
mountain range and the homeland they had just left.
Yet a few lessees, those living further up the valleys, deeper into the
mountains, or isolated from the well-traveled paths, these few folks
stayed on. The Walker sisters of Little Greenbrier Cove were
representative of this small group.
John Walker, their father, was himself the eldest of his parents’ 15
children. In 1860, at the age of 19, he became engaged to 14-year-
old Margaret Jane King. The Civil War postponed their wedding, and
John, an ardent Unionist who had enlisted in the First Tennessee
Light Artillery, spent three months in a Confederate prison and lost
45 kilograms (100 pounds) before he was exchanged and provided
with a pension. In 1866, they were finally married. After Margaret
Jane’s father died, the young couple moved into the King homestead
in Little Greenbrier.
They had eleven children: four boys,
seven girls. John remained a strong
Republican and Primitive Baptist; he
liked to boast that in a long and
fruitful lifetime he had spent a total of
50 cents on health care for his family
(two of his sons had once required
medicine for the measles). Margaret
Jane was herself an “herb doctor”
and a midwife, talents which
complemented John’s skills as a
blacksmith, carpenter, miller, farmer.
Once, as Margaret Jane was
chasing a weasel from her hens, the
reddish-brown animal bit her thumb
and held on; she calmly thrust her
hand into a full washtub, where the
weasel drowned in water stained by
her blood.

Joseph S. Hall
Columbus “Clum”
Cardwell of Hills Creek,
Tennessee, worked in
the CCC garage at
Smokemont. That
experience led to a 23-
year career as an auto-
mechanic at the
national park.

Edouard E Exline
Little Greenbrier Cove was known to some people as
Five Sisters Cove because of the Walker sisters’ place
just above the schoolhouse. The Walkers had their
garden and grape arbors close to the house for handy
tending.
Edouard E Exline
Inside, everything was neat as a pin with coats, hats,
baskets, guns, and what-have-you hanging on the
newspaper-covered walls.

Edouard E Exline
Sitting on the front porch are (from left) Polly, Louisa,
and Martha. Also on the porch is a loom made by their
father (see page 120) and a spinning wheel.
The children grew up. The three older boys married and moved
away. The youngest, Giles Daniel, left for Iowa and fought in World
War I. Sarah Caroline, the only one of the daughters ever to marry,
began her life with Jim Shelton in 1908. Hettie Rebecca worked for a
year or two in a Knoxville hosiery mill, but the Depression sent her
back home. When Nancy Melinda died in 1931, the original home
place was left in the hands of five sisters; Hettie, Margaret Jane,
Polly, Louisa Susan, and Martha Ann.
They lived the self-sufficiency of their ancestors. They stated simply
that “our land produces everything we need except sugar, soda,
coffee, and salt.” Their supplies came from the grape arbor, the
orchard, the herb and vegetable garden; the sheep, hogs, fowl, and
milch cows; the springhouse crocks of pickled beets and sauerkraut;
the dried food and the seed bags and the spice racks that hung from
nails hammered into the newspaper-covered walls of the main
house. The material aspects of their surroundings represented fully
the fabric of life as it had been known in the hundreds of abandoned
cabins and barns and outbuildings that dotted the landscape of the
Great Smoky Mountains National Park. And the Walker sisters were
not about to give up their way of life without a struggle. In a poem,
“My Mountain Home,” Louisa expressed the family’s feelings:

“There is an old weather bettion house


That stands near a wood
With an orchard near by it
For all most one hundred years it has stood

“It was my home in infency


It sheltered me in youth
When I tell you I love it
I tell you the truth

“For years it has sheltered


By day and night
From the summer sun’s heat
And the cold winter blight.
“But now the park commesser
Comes all dressed up so gay
Saying this old house of yours
We must now take away

“They coax they wheedle


They fret they bark
Saying we have to have this place
For a National park

“For us poor mountain people


They dont have a care
But must a home for
The wolf the lion and the bear

“But many of us have a title


That is sure and will hold
To the City of peace
Where the streets are pure gold

“There no lion in its fury


Those pathes ever trod
It is the home of the soul
In the presence of God

“When we reach the portles


Of glory so fair
The Wolf cannot enter
Neather the lion or bear

“And no park Commissioner


Will ever dar
To desturbe or molest
Or take our home from us there.”

In January of 1941, however, the Walker sisters relented a little and


sold their 50 hectares (123 acres) to the United States for $4,750
and a lifetime lease. Partly because of this unique situation, this
special lifestyle, park
officials delayed any
well-defined program
to recreate and
present a vanishing
culture. When the
Saturday Evening
Post “discovered” the
Walker sisters in
1946, tourists in the
Smokies flocked to
the Walker home as
if it were a museum
of Appalachia. The
sisters themselves
tolerated the visitors,
even sold mountain
“souvenirs.” But the
years passed, three
of the sisters died,
and in 1953 Margaret
Jane and Louisa
wrote to the park
superintendent:
“I have a request to
you Will you please
have the Sign a bout
the Walker Sisters
Joseph S. Hall taken down the one
on High Way 73
Before leaving for Lufty Baptist especially the reason
Church, Alfred Dowdle and his I am asking this there
family of Collins Creek pose for is just 2 of the sister
Joseph S. Hall, who was studying lives at the old House
linguistics in the Smokies for the place one is 70 years
Park Service. of age the other is 82
years of age and we
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like