Get (eBook PDF) Handbook of Statistical Analysis and Data Mining Applications 2nd Edition PDF ebook with Full Chapters Now
Get (eBook PDF) Handbook of Statistical Analysis and Data Mining Applications 2nd Edition PDF ebook with Full Chapters Now
com
https://ebookluna.com/product/ebook-pdf-handbook-of-
statistical-analysis-and-data-mining-applications-2nd-
edition/
OR CLICK HERE
DOWLOAD NOW
https://ebookluna.com/download/handbook-of-statistical-analysis-and-
data-mining-applications-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-data-mining-and-predictive-
analytics-2nd-edition/
ebookluna.com
https://ebookluna.com/download/predictive-modeling-in-biomedical-data-
mining-and-analysis-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-data-mining-for-business-
analytics-concepts-techniques-and-applications-in-r/
ebookluna.com
(eBook PDF) An Introduction to Statistical Methods & Data
Analysis 7th
https://ebookluna.com/product/ebook-pdf-an-introduction-to-
statistical-methods-data-analysis-7th/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-introduction-to-data-mining-
global-edition-2nd-edition/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-data-mining-for-business-
analytics-concepts-techniques-and-applications-with-jmp-pro/
ebookluna.com
ebookluna.com
https://ebookluna.com/product/ebook-pdf-data-mining-concepts-and-
techniques-3rd/
ebookluna.com
vii
CONTENTS
Data Mining With STATISTICA 348 Tutorial I Data Prep 1–2: Data
Conclusion 355 Description
References 356
ROBERTA BORTOLOTTI, MSIS, CBAP
Further Reading 357
The Problem of Significance in Traditional P-Value Ethics and Data Science for the Virtues of Personal
Statistical Analysis 754 Life (Existential-Motivational) 769
USUAL Data Mining/Predictive Analytic Combination: Right Standards, Right Goals,
Performance Measures—Terminology 759 and Personal Virtue (Normative, Situational,
Unique Ways to Test Accuracy (“Significance”) Existential) 770
of Machine Learning Predictive Models 760 Michael Sandel on “Doing The Right Thing” With
Compare Predictive Model Performance Against Data Analytics 770
Random Results With Lift Charts and Decile Discovering Data Ethics in an “Alignment
Tables 760 Methodology” 771
Evaluate the Validity of Your Discovery With Target References 772
Shuffling 762 Further Reading 772
Test Predictive Model Consistency With Bootstrap
Sampling 763 22. IBM Watson
Postscript 764 Preamble 773
References 765 Introduction 773
What Exactly Is Watson? 773
21. Ethics and Data Analytics Jeopardy! 774
Internal Features of Watson 774
ANDY PETERSON
Application Programming Interfaces (APIs) 776
Preamble 767 Software Development Kits (SDKs) 778
The Birthday Party—A Practical Example for Ethical Some Existing Applications of Watson
Action 767 Techology 778
Academic Secular Ethics 768 Ushering in the Cognitive Era 780
Ethics and Data Science for the Norms of Postscript 780
Government (Deontological-Normative) 769 Reference 781
Ethics and Data Science for the Goals in Business
(Situational-Teleological) 769 Index 783
List of Tutorials on the Elsevier
Companion Web Page
Note: This list includes all the extra tuto- 8. TUTORIAL “W”—Diabetes Control in
rials published with the 1st edition of this Patients [Field: Medical Informatics]
handbook (2009). These can be considered 9. TUTORIAL “X”—Independent
“enrichment” tutorials for readers of this 2nd Component Analysis [Field: Separating
edition. Since the 1st edition of the handbook Competing Signals]
will not be available after the release of the 10. TUTORIAL “Y”—NTSB Aircraft
2nd edition, these extra tutorials are carried Accidents Reports [Field: Engineering—
over in their original format/versions of soft- Air Travel—Text Mining]
ware, as they are still very useful in learning 11. TUTORIAL “Z”—Obesity Control in
and understanding data mining and predic- Children [Field: Preventive Health Care]
tive analytics, and many readers will want to 12. TUTORIAL “AA”—Random Forests
take advantage of them. Example [Field: Statistics—Data Mining]
List of Extra Enrichment Tutorials that 13. TUTORIAL “BB”—Response
are only on the ELSEVIER COMPANION Optimization [Field: Data Mining—
web page, with data sets as appropriate, for Response Optimization]
downloading and use by readers of this 2nd 14. TUTORIAL “CC”—Diagnostic Tooling
edition of handbook: and Data Mining: Semiconductor Industry
[Field: Industry—Quality Control]
1. TUTORIAL “O”—Boston Housing
15. TUTORIAL “DD”—Titanic—Survivors
Using Regression Trees [Field:
of Ship Sinking [Field: Sociology]
Demographics]
16. TUTORIAL “EE”—Census Data
2. TUTORIAL “P”—Cancer Gene [Field:
Analysis [Field: Demography—Census]
Medical Informatics & Bioinformatics]
17. TUTORIAL “FF”—Linear & Logistic
3. TUTORIAL “Q”—Clustering of Shoppers
Regression—Ozone Data [Field:
[Field: CRM—Clustering Techniques]
Environment]
4. TUTORIAL “R”—Credit Risk
18. TUTORIAL “GG”—R-Language
using Discriminant Analysis [Field:
Integration—DISEASE SURVIVAL
Financial—Banking]
ANALYSIS Case Study [Field: Survival
5. TUTORIAL “S”—Data Preparation and
Analysis—Medical Informatics]
Transformation [Field: Data Analysis]
19. TUTORIAL “HH”—Social Networks
6. TUTORIAL “T”—Model Deployment
Among Community Organizations
on New Data [Field: Deployment of
[Field: Social Networks—Sociology &
Predictive Models]
Medical Informatics]
7. TUTORIAL “V”—Heart Disease Visual
20. TUTORIAL “II”—Nairobi, Kenya
Data Mining Methods [Field: Medical
Baboon Project: Social Networking
Informatics]
xi
xii LIST OF TUTORIALS ON THE ELSEVIER COMPANION WEB PAGE
This book will help the novice user be- • Asking the wrong question—when
come familiar with data mining. Basically, looking for a rare phenomenon, it may
data mining is doing data analysis (or statis- be helpful to identify the most common
tics) on data sets (often large) that have been pattern. These may lead to complex
obtained from potentially many sources. As analyses, as in item 3, but they may also
such, the miner may not have control of the be conceptually simple. Again, you may
input data, but must rely on sources that have need to take care that you don't overfit
gathered the data. As such, there are prob- the data.
lems that every data miner must be aware of • Don't become enamored with the data.
as he or she begins (or completes) a mining There may be a substantial history from
operation. I strongly resonated to the mate- earlier data or from domain experts that
rial on “The Top 10 Data Mining Mistakes,” can help with the modeling.
which give a worthwhile checklist: • Be wary of using an outcome variable (or
• Ensure you have a response variable and one highly correlated with the outcome
predictor variables—and that they are variable) and becoming excited about the
correctly measured. result. The predictors should be “proper”
• Beware of overfitting. With scads of predictors in the sense that they (a) are
variables, it is easy with most statistical measured prior to the outcome and (b)
programs to fit incredibly complex are not a function of the outcome.
models, but they cannot be reproduced. It • Do not discard outliers without solid
is good to save part of the sample to use justification. Just because an observation
to test the model. Various methods are is out of line with others is insufficient
offered in this book. reason to ignore it. You must check the
• Don't use only one method. Using only circumstances that led to the value. In
linear regression can be a problem. any event, it is useful to conduct the
Try dichotomizing the response or analysis with the observation(s) included
categorizing it to remove nonlinearities and excluded to determine the sensitivity
in the response variable. Often, there are of the results to the outlier.
clusters of values at zero, which messes • Extrapolating is a fine way to go
up any normality assumption. This, of broke; the best example is the stock
course, loses information, so you may market. Stick within your data, and
want to categorize a continuous response if you must go outside, put plenty
variable and use an alternative to of caveats. Better still, restrain the
regression. Similarly, predictor variables impulse to extrapolate. Beware that
may need to be treated as factors rather pictures are often far too simple and
than linear predictors. A classic example we can be misled. Political campaigns
is using marital status or race as a linear oversimplify complex problems (“my
predictor when there is no order. opponent wants to raise taxes”; “my
xiii
xiv FOREWORD 1 FOR 1st EDITION
opponent will take us to war”) when using mean replacement, almost the
the realities may imply we have same set of predictor variables surfaced,
some infrastructure needs that can be but the residual sum of squares was 20.
handled only with new funding or we I then used multiple imputation and
have been attacked by some bad guys. found approximately the same set of
Be wary of your data sources. If you are predictors but had a residual sum of
combining several sets of data, they need squares (median of 20 imputations) of
to meet a few standards: 25. I find that mean replacement is rather
• The definitions of variables that are optimistic but surely better than relying
being merged should be identical. Often, on only complete cases. Using stepwise
they are close but not exact (especially regression, I find it useful to replicate
in metaanalysis where clinical studies it with a bootstrap or with multiple
may have somewhat different definitions imputations. However, with large data
due to different medical institutions or sets, this approach may be expensive
laboratories). computationally.
• Be careful about missing values. Often, To conclude, there is a wealth of material
when multiple data sets are merged, in this handbook that will repay study.
missing values can be induced: one
variable isn't present in another data set; Peter A. Lachenbruch
what you thought was a unique variable Oregon State University, Corvallis, OR,
name was slightly different in the two United States
sets, so you end up with two variables American Statistical Association,
that both have a lot of missing values. Alexandria, VA, United States
• How you handle missing values can be Johns Hopkins University, Baltimore,
crucial. In one example, I used complete MD, United States
cases and lost half of my sample; all UCLA, Los Angeles, CA, United States
variables had at least 85% completeness, University of Iowa, Iowa City, IA,
but when put together, the sample lost United States
half of the data. The residual sum of University of North Carolina, Chapel
squares from a stepwise regression was Hill, NC, United States
about 8. When I included more variables
Foreword 2 for 1st Edition
xv
xvi FOREWORD 2 FOR 1st EDITION
However, the book is best read a few the excellent “History of Statistics and Data
chapters at a time while actively doing Mining” chapter and chapters 16, 17, and
the data mining rather than read cover to 18. These are broadly applicable and should
cover (a daunting task for a book this size). be read by even the most experienced data
Practitioners will appreciate tutorials that miners.
match their business objectives and choose The Handbook of Statistical Analysis and
to ignore other tutorials. They may choose Data Mining Applications is an exceptional
to read sections on a particular algorithm to book that should be on every data miner's
increase insight into that algorithm and then bookshelf or, better yet, found lying open
decide to add a second algorithm after the next to their computer.
first is mastered. For those new to a partic-
ular software tool highlighted in the tutori- Dean Abbott
als section, the step-by-step approach will Abbott Analytics, San Diego, CA,
operate much like a user's manual. Many United States
chapters stand well on their own, such as
Preface
xvii
xviii PREFACE
turn the key in the ignition, step on the gas is enough here to permit you to construct
and the brake at the right times, and turn the “smart enough” business operations with a
wheel to change direction in a safe manner, relatively small amount of the right informa-
and voilà, you are an expert user of the very tion. James Taylor developed this concept
complex technology under the hood. The for automating operational decision-making
other half of the story is the instruction man- in the area of enterprise decision man-
ual and the driver's education course that agement (Raden and Taylor, 2007). Taylor
help you to learn how to drive. recognized that companies need decision-
This book provides the instruction man- making systems that are automated enough
ual and a series of tutorials to train you how to keep up with the volume and time-critical
to do data mining in many subject areas. We nature of modern business operations.
provide both the right tools and the right These decisions should be deliberate, pre-
intuitive explanations (rather than formal cise, and consistent across the enterprise;
mathematical definitions) of the data mining smart enough to serve immediate needs
process and algorithms, which will enable appropriately; and agile enough to adapt
even beginner data miners to understand the to new opportunities and challenges in the
basic concepts necessary to understand what company. The same concept can be applied
they are doing. In addition, we provide many to nonoperational systems for customer re-
tutorials in many different industries and lationship management (CRM) and market-
businesses (using many of the most common ing support. Even though a CRM model for
data mining tools) to show how to do it. cross sell may not be optimal, it may enable
several times the response rate in prod-
uct sales following a marketing campaign.
OVERALL ORGANIZATION Models like this are “smart enough” to drive
OF THIS BOOK companies to the next level of sales. When
models like this are proliferated through-
We have divided the chapters in this book out the enterprise to lift all sales to the next
into four parts to guide you through the as- level, more refined models can be developed
pects of predictive analytics. Part I covers the to do even better. This e nterprise-wide “lift”
history and process of predictive analytics. in intelligent operations can drive a com-
Part II discusses the algorithms and methods pany through evolutionary rather than rev-
used. Part III is a group of tutorials, which olutionary changes to reach long-term goals.
serve in principle as Rome served—as the Companies can leverage “smart enough”
central governing influence. Part IV presents decision systems to do likewise in their pur-
some advanced topics. The central theme of suit of optimal profitability in their business.
this book is the education and training of Clearly, the use of this book and these tools
beginning data mining practitioners, not the will not make you experts in data mining.
rigorous academic preparation of algorithm Nor will the explanations in the book per-
scientists. Hence, we located the tutorials in mit you to understand the complexity of the
the middle of the book in Part III, flanked by theory behind the algorithms and methodol-
topical chapters in Parts I, II, and IV. ogies so necessary for the academic student.
This approach is “a mile wide and an inch But we will conduct you through a relatively
deep” by design, but there is a lot packed into thin slice across the wide practice of data
that inch. There is enough here to stimulate mining in many industries and disciplines.
you to take deeper dives into theory, and there We can show you how to create powerful
PREFACE xix
predictive models in your own organization Coauthor Gary Miner wishes to thank his
in a relatively short period of time. In addi- wife, Linda A. Winters-Miner, PhD, who has
tion, this book can function as a springboard been working with Gary on similar books over
to launch you into higher-level studies of the the past 30 years and wrote several of the tu-
theory behind the practice of data mining. torials included in this book, using real-world
If we can accomplish those goals, we will data. Gary also wishes to thank the following
have succeeded in taking a significant step in people from his office who helped in various
bringing the practice of data mining into the ways, including Angela Waner, Jon Hillis, Greg
mainstream of business analysis. Sergeant, and Dr. Thomas Hill, who gave per-
The three coauthors could not have done mission to use and also edited a group of the
this book completely by themselves, and tutorials that had been written over the years
we wish to thank the following individuals, by some of the people listed as guest authors in
with the disclaimer that we apologize if, by this book. Dr. Dave Dimas, of the University of
our neglect, we have left out of this “thank- California—Irvine, has also been very helpful
you list” anyone who contributed. in providing suggestions for enhancements for
Foremost, we would like to thank ac- this second edition—THANK YOU DAVE !!!
quisitions editor (name to use?) and others Without all the help of the people men-
(names). Bob Nisbet would like to honor tioned here and maybe many others we failed
and thank his wife, Jean Nisbet, PhD, who to specifically mention, this book would never
blasted him off in his technical career by re- have been completed. Thanks to you all!
typing his PhD dissertation five times (be-
fore word processing) and assumed much
of the family's burdens during the writing Bob Nisbet
of this book. Bob also thanks Dr. Daniel B. Gary Miner
Botkin, the famous global ecologist, for in- Ken Yale
troducing him to the world of modeling and
exposing him to the distinction between
viewing the world as machine and viewing Reference
it as organism. And thanks are due to Ken Raden, N., Taylor, J., 2007. Smart Enough Systems: How to
Reed, PhD, for inducting Bob into the prac- Deliver Competitive Advantage by Automating Hidden
Decisions. Prentice Hall, NJ, ISBN: 9780132713061.
tice of data mining.
Random documents with unrelated
content Scribd suggests to you:
The Project Gutenberg eBook of Nacha
Regules: Novela
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Language: Spanish
Las reglas ortográficas del castellano cuando esta obra fue publicada
por primera vez eran diferentes a las existentes cuando se realizó la
transcripción.
Por ejemplo "vió", "fué", "dió", en esa época se escribían con acento
ortográfico, mientras que vocablos que actualmente llevan acento
ortográfico, como "reír" y "oír", cuando la obra fue publicada no lo
llevaban.
El criterio utilizado para llevar a cabo esta transcripción ha sido el de
respetar las reglas de ortografía vigentes al momento de la
publicación de la obra. Sólo errores evidentes de ortografía,
impresión y/o puntuación, han sido corregidos.
La cubierta del libro fue modificada por el Transcriptor y ha sido
puesta en el dominio público.
El Índice de capítulos ha sido agregado por el Transcriptor.
MANUEL GÁLVEZ
NACHA REGULES
NOVELA
EDITORIAL PAX
BUENOS AIRES
1919
DEDICO ESTE LIBRO
A LAS MUJERES DE CORAZÓN,
PARA QUE NO IGNOREN CÓMO ES DE TRISTE LA VIDA
DE SUS HERMANAS QUE CAYERON,
Y LES TENGAN PIEDAD Y LES OFREZCAN SUS MANOS
PARA LEVANTARLAS DEL TERRIBLE ABISMO.
ÍNDICE DE CAPÍTULOS
CAPÍTULO Pág
I 7
II 22
III 33
IV 47
V 58
VI 74
VII 92
VIII 107
IX 118
X 132
XI 147
XII 159
XIII 169
XIV 186
XV 202
XVI 215
XVII 226
XVIII 239
XIX 251
XX 264
XXI 277
XXII 287
XXIII 302
XXIV 313
EPÍLOGO 321
I