0% found this document useful (0 votes)
15 views

Download Complete Making sense of data I a practical guide to exploratory data analysis and data mining 2ed. Edition Glenn J Myatt PDF for All Chapters

The document provides information about various ebooks available for download, including titles related to data analysis, statistics, and urban studies. It highlights the importance of understanding data analysis and mining processes, offering a practical guide for users. Additionally, it mentions the availability of tutorials and resources to support learning in these areas.

Uploaded by

xiangstimaom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Download Complete Making sense of data I a practical guide to exploratory data analysis and data mining 2ed. Edition Glenn J Myatt PDF for All Chapters

The document provides information about various ebooks available for download, including titles related to data analysis, statistics, and urban studies. It highlights the importance of understanding data analysis and mining processes, offering a practical guide for users. Additionally, it mentions the availability of tutorials and resources to support learning in these areas.

Uploaded by

xiangstimaom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Download the Full Version of the Ebook with Added Features ebookname.

com

Making sense of data I a practical guide to


exploratory data analysis and data mining 2ed.
Edition Glenn J Myatt

https://ebookname.com/product/making-sense-of-data-i-a-
practical-guide-to-exploratory-data-analysis-and-data-
mining-2ed-edition-glenn-j-myatt/

OR CLICK HERE

DOWLOAD NOW

Download more ebook instantly today at https://ebookname.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Intelligent Data Warehousing From Data Preparation to Data


Mining 1st Edition Zhengxin Chen (Author)

https://ebookname.com/product/intelligent-data-warehousing-from-data-
preparation-to-data-mining-1st-edition-zhengxin-chen-author/

ebookname.com

Exploratory Data Analysis with MATLAB Second Edition


Chapman Hall CRC Computer Science Data Analysis Wendy L.
Martinez
https://ebookname.com/product/exploratory-data-analysis-with-matlab-
second-edition-chapman-hall-crc-computer-science-data-analysis-wendy-
l-martinez/
ebookname.com

Think Stats Exploratory Data Analysis Second Edition Allen


B. Downey

https://ebookname.com/product/think-stats-exploratory-data-analysis-
second-edition-allen-b-downey/

ebookname.com

Achtung Cthulhu Guide to the Eastern Front 1st Edition


Dave Brewer

https://ebookname.com/product/achtung-cthulhu-guide-to-the-eastern-
front-1st-edition-dave-brewer/

ebookname.com
The Intelligible Metropolis Urban Mentality in
Contemporary London Novels 1. Aufl. Edition Nora Pleßke

https://ebookname.com/product/the-intelligible-metropolis-urban-
mentality-in-contemporary-london-novels-1-aufl-edition-nora-pleske/

ebookname.com

Mechanisms of Synaptic Transmission Bridging the Gaps 1890


1990 1st Edition Joseph D. Robinson

https://ebookname.com/product/mechanisms-of-synaptic-transmission-
bridging-the-gaps-1890-1990-1st-edition-joseph-d-robinson/

ebookname.com

Survival The Survival of the Human Race Darwin College


Lectures 1st Edition Emily Shuckburgh

https://ebookname.com/product/survival-the-survival-of-the-human-race-
darwin-college-lectures-1st-edition-emily-shuckburgh/

ebookname.com

Statistics for Linguistics with R A Practical Introduction


1st Edition Stefan Th. Gries

https://ebookname.com/product/statistics-for-linguistics-with-r-a-
practical-introduction-1st-edition-stefan-th-gries/

ebookname.com

What Philosophers Know Case Studies in Recent Analytic


Philosophy 1st Edition Gary Gutting

https://ebookname.com/product/what-philosophers-know-case-studies-in-
recent-analytic-philosophy-1st-edition-gary-gutting/

ebookname.com
Scott 2006 standard postage stamp catalogue 162nd ed
Edition Scott Publishing Co

https://ebookname.com/product/scott-2006-standard-postage-stamp-
catalogue-162nd-ed-edition-scott-publishing-co/

ebookname.com
MAKING SENSE OF
DATA I
MAKING SENSE OF
DATA I
A Practical Guide to Exploratory
Data Analysis and Data Mining

Second Edition

GLENN J. MYATT
WAYNE P. JOHNSON
Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey


Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission
should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street,
Hoboken, NJ07030, (201) 748-6011, fax (201) 748-6008, or online at
http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts
in preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact
our Customer Care Department within the United States at (800) 762-2974, outside the United States
at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
may not be available in electronic formats. For more information about Wiley products, visit our web
site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Myatt, Glenn J., 1969–


[Making sense of data]
Making sense of data I : a practical guide to exploratory data analysis and data mining /
Glenn J. Myatt, Wayne P. Johnson. – Second edition.
pages cm
Revised edition of: Making sense of data. c2007.
Includes bibliographical references and index.
ISBN 978-1-118-40741-7 (paper)
1. Data mining. 2. Mathematical statistics. I. Johnson, Wayne P. II. Title.
QA276.M92 2014
006.3′ 12–dc23
2014007303

Printed in the United States of America

ISBN: 9781118407417

10 9 8 7 6 5 4 3 2 1
CONTENTS

PREFACE ix

1 INTRODUCTION 1
1.1 Overview / 1
1.2 Sources of Data / 2
1.3 Process for Making Sense of Data / 3
1.4 Overview of Book / 13
1.5 Summary / 16
Further Reading / 16

2 DESCRIBING DATA 17
2.1 Overview / 17
2.2 Observations and Variables / 18
2.3 Types of Variables / 20
2.4 Central Tendency / 22
2.5 Distribution of the Data / 24
2.6 Confidence Intervals / 36
2.7 Hypothesis Tests / 40
Exercises / 42
Further Reading / 45
v
vi CONTENTS

3 PREPARING DATA TABLES 47


3.1 Overview / 47
3.2 Cleaning the Data / 48
3.3 Removing Observations and Variables / 49
3.4 Generating Consistent Scales Across Variables / 49
3.5 New Frequency Distribution / 51
3.6 Converting Text to Numbers / 52
3.7 Converting Continuous Data to Categories / 53
3.8 Combining Variables / 54
3.9 Generating Groups / 54
3.10 Preparing Unstructured Data / 55
Exercises / 57
Further Reading / 57

4 UNDERSTANDING RELATIONSHIPS 59
4.1 Overview / 59
4.2 Visualizing Relationships Between Variables / 60
4.3 Calculating Metrics About Relationships / 69
Exercises / 81
Further Reading / 82

5 IDENTIFYING AND UNDERSTANDING GROUPS 83


5.1 Overview / 83
5.2 Clustering / 88
5.3 Association Rules / 111
5.4 Learning Decision Trees from Data / 122
Exercises / 137
Further Reading / 140

6 BUILDING MODELS FROM DATA 141


6.1 Overview / 141
6.2 Linear Regression / 149
6.3 Logistic Regression / 161
6.4 k-Nearest Neighbors / 167
CONTENTS vii

6.5 Classification and Regression Trees / 172


6.6 Other Approaches / 178
Exercises / 179
Further Reading / 182

APPENDIX A ANSWERS TO EXERCISES 185

APPENDIX B HANDS-ON TUTORIALS 191


B.1 Tutorial Overview / 191
B.2 Access and Installation / 191
B.3 Software Overview / 192
B.4 Reading in Data / 193
B.5 Preparation Tools / 195
B.6 Tables and Graph Tools / 199
B.7 Statistics Tools / 202
B.8 Grouping Tools / 204
B.9 Models Tools / 207
B.10 Apply Model / 211
B.11 Exercises / 211

BIBLIOGRAPHY 227
INDEX 231
PREFACE

An unprecedented amount of data is being generated at increasingly rapid


rates in many disciplines. Every day retail companies collect data on sales
transactions, organizations log mouse clicks made on their websites, and
biologists generate millions of pieces of information related to genes.
It is practically impossible to make sense of data sets containing more
than a handful of data points without the help of computer programs.
Many free and commercial software programs exist to sift through data,
such as spreadsheet applications, data visualization software, statistical
packages and scripting languages, and data mining tools. Deciding what
software to use is just one of the many questions that must be considered
in exploratory data analysis or data mining projects. Translating the raw
data collected in various ways into actionable information requires an
understanding of exploratory data analysis and data mining methods and
often an appreciation of the subject matter, business processes, software
deployment, project management methods, change management issues,
and so on.
The purpose of this book is to describe a practical approach for making
sense out of data. A step-by-step process is introduced, which is designed
to walk you through the steps and issues that you will face in data analysis
or data mining projects. It covers the more common tasks relating to
the analysis of data including (1) how to prepare data prior to analysis,
(2) how to generate summaries of the data, (3) how to identify non-trivial

ix
x PREFACE

facts, patterns, and relationships in the data, and (4) how to create models
from the data to better understand the data and make predictions.
The process outlined in the book starts by understanding the problem
you are trying to solve, what data will be used and how, who will use
the information generated, and how it will be delivered to them, and the
specific and measurable success criteria against which the project will be
evaluated.
The type of data collected and the quality of this data will directly impact
the usefulness of the results. Ideally, the data will have been carefully col-
lected to answer the specific questions defined at the start of the project. In
practice, you are often dealing with data generated for an entirely different
purpose. In this situation, it is necessary to thoroughly understand and
prepare the data for the new questions being posed. This is often one of the
most time-consuming parts of the data mining process where many issues
need to be carefully adressed.
The analysis can begin once the data has been collected and prepared.
The choice of methods used to analyze the data depends on many factors,
including the problem definition and the type of the data that has been
collected. Although many methods might solve your problem, you may
not know which one works best until you have experimented with the
alternatives. Throughout the technical sections, issues relating to when
you would apply the different methods along with how you could optimize
the results are discussed.
After the data is analyzed, it needs to be delivered to your target audience.
This might be as simple as issuing a report or as complex as implementing
and deploying new software to automatically reapply the analysis as new
data becomes available. Beyond the technical challenges, if the solution
changes the way its intended audience operates on a daily basis, it will need
to be managed. It will be important to understand how well the solution
implemented in the field actually solves the original business problem.
Larger projects are increasingly implemented by interdisciplinary teams
involving subject matter experts, business analysts, statisticians or data
mining experts, IT professionals, and project managers. This book is aimed
at the entire interdisciplinary team and addresses issues and technical
solutions relating to data analysis or data mining projects. The book also
serves as an introductory textbook for students of any discipline, both
undergraduate and graduate, who wish to understand exploratory data
analysis and data mining processes and methods.
The book covers a series of topics relating to the process of making sense
of data, including the data mining process and how to describe data table
elements (i.e., observations and variables), preparing data prior to analysis,
PREFACE xi

visualizing and describing relationships between variables, identifying and


making statements about groups of observations, extracting interesting
rules, and building mathematical models that can be used to understand
the data and make predictions.
The book focuses on practical approaches and covers information on
how the techniques operate as well as suggestions for when and how to use
the different methods. Each chapter includes a “Further Reading” section
that highlights additional books and online resources that provide back-
ground as well as more in-depth coverage of the material. At the end of
selected chapters are a set of exercises designed to help in understanding
the chapter’s material. The appendix covers a series of practical tutorials
that make use of the freely available Traceis software developed to accom-
pany the book, which is available from the book’s website: http://www.
makingsenseofdata.com; however, the tutorials could be used with other
available software. Finally, a deck of slides has been developed to accom-
pany the book’s material and is available on request from the book’s
authors.
The authors wish to thank Chelsey Hill-Esler, Dr. McCullough, and
Vinod Chandnani for their help with the book.
CHAPTER 1

INTRODUCTION

1.1 OVERVIEW

Almost every discipline from biology and economics to engineering and


marketing measures, gathers, and stores data in some digital form. Retail
companies store information on sales transactions, insurance companies
keep track of insurance claims, and meteorological organizations measure
and collect data concerning weather conditions. Timely and well-founded
decisions need to be made using the information collected. These deci-
sions will be used to maximize sales, improve research and development
projects, and trim costs. Retail companies must determine which prod-
ucts in their stores are under- or over-performing as well as understand the
preferences of their customers; insurance companies need to identify activ-
ities associated with fraudulent claims; and meteorological organizations
attempt to predict future weather conditions.
Data are being produced at faster rates due to the explosion of internet-
related information and the increased use of operational systems to collect
business, engineering and scientific data, and measurements from sensors
or monitors. It is a trend that will continue into the foreseeable future. The
challenges of handling and making sense of this information are significant

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining,
Second Edition. Glenn J. Myatt and Wayne P. Johnson.
© 2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.

1
Other documents randomly have
different content
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookname.com

You might also like