0% found this document useful (0 votes)
17 views

An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download

This document introduces 'An Introduction to Spatial Data Science with GeoDa Volume 2: Clustering Spatial Data', which focuses on organizing spatial observations into meaningful groups through clustering methods. It covers dimension reduction techniques, classic clustering methods, and spatially constrained clustering, providing real-world examples and a user guide for GeoDa software. The book is intended for readers looking to deepen their understanding of spatial data beyond simple mapping.

Uploaded by

pekuaaspvik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

An Introduction to Spatial Data Science with GeoDa Volume 2 Clustering Spatial Data 2nd Edition Luc Anselin pdf download

This document introduces 'An Introduction to Spatial Data Science with GeoDa Volume 2: Clustering Spatial Data', which focuses on organizing spatial observations into meaningful groups through clustering methods. It covers dimension reduction techniques, classic clustering methods, and spatially constrained clustering, providing real-world examples and a user guide for GeoDa software. The book is intended for readers looking to deepen their understanding of spatial data beyond simple mapping.

Uploaded by

pekuaaspvik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

An Introduction to Spatial Data Science with

GeoDa Volume 2 Clustering Spatial Data 2nd


Edition Luc Anselin pdf download

https://ebookgate.com/product/an-introduction-to-spatial-data-
science-with-geoda-volume-2-clustering-spatial-data-2nd-edition-
luc-anselin/

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Fundamentals of Spatial Data Quality 1st Edition Rodolphe


Devillers

https://ebookgate.com/product/fundamentals-of-spatial-data-
quality-1st-edition-rodolphe-devillers/

ebookgate.com

Building European Spatial Data Infrastructures 3rd Edition


Ian Masser

https://ebookgate.com/product/building-european-spatial-data-
infrastructures-3rd-edition-ian-masser/

ebookgate.com

Introduction to Clustering Large and High Dimensional Data


1st Edition Jacob Kogan

https://ebookgate.com/product/introduction-to-clustering-large-and-
high-dimensional-data-1st-edition-jacob-kogan/

ebookgate.com

Clustering for Data Mining A Data Recovery Approach 1st


Edition Boris Mirkin

https://ebookgate.com/product/clustering-for-data-mining-a-data-
recovery-approach-1st-edition-boris-mirkin/

ebookgate.com
An Introduction to Programming with IDL Interactive Data
Language 1st Edition Kenneth P. Bowman

https://ebookgate.com/product/an-introduction-to-programming-with-idl-
interactive-data-language-1st-edition-kenneth-p-bowman/

ebookgate.com

Introduction to Data Analytics for Accounting 2nd Edition


--

https://ebookgate.com/product/introduction-to-data-analytics-for-
accounting-2nd-edition/

ebookgate.com

Data Science Fundamentals with R Python and Open Data 1st


Edition Marco Cremonini

https://ebookgate.com/product/data-science-fundamentals-with-r-python-
and-open-data-1st-edition-marco-cremonini/

ebookgate.com

Agile Data Science Building Data Analytics Applications


with Hadoop 1st Edition Russell Jurney

https://ebookgate.com/product/agile-data-science-building-data-
analytics-applications-with-hadoop-1st-edition-russell-jurney/

ebookgate.com

Knowledge based clustering from data to information


granules 1st Edition Witold Pedrycz

https://ebookgate.com/product/knowledge-based-clustering-from-data-to-
information-granules-1st-edition-witold-pedrycz/

ebookgate.com
An Introduction to Spatial
Data Science with GeoDa
Volume 2 – Clustering Spatial Data

This book is the second in a two-volume series that introduces the field of spatial data sci-
ence. It moves beyond pure data exploration to the organization of observations into meaningful
groups, i.e., spatial clustering. This constitutes an important component of so-called unsuper-
vised learning, a major aspect of modern machine learning.
The distinctive aspects of the book are both to explore ways to spatialize classic clustering meth-
ods through linked maps and graphs, as well as the explicit introduction of spatial contiguity
constraints into clustering algorithms. Leveraging a large number of real-world empirical il-
lustrations, readers will gain an understanding of the main concepts and techniques and their
relative advantages and disadvantages. The book also constitutes the definitive user’s guide for
these methods as implemented in the GeoDa open-source software for spatial analysis.
It is organized into three major parts, dealing with dimension reduction (principal components,
multidimensional scaling, stochastic network embedding), classic clustering methods (hierar-
chical clustering, k-means, k-medians, k-medoids and spectral clustering) and spatially con-
strained clustering methods (both hierarchical and partitioning). It closes with an assessment of
spatial and non-spatial cluster properties.
The book is intended for readers interested in going beyond simple mapping of geographical
data to gain insight into interesting patterns as expressed in spatial clusters of observations.
Familiarity with the material in Volume 1 is assumed, especially the analysis of local spatial au-
tocorrelation and the full range of visualization methods.

Luc Anselin is the Founding Director of the Center for Spatial Data Science at the University
of Chicago, where he is also Stein-Freiler Distinguished Service Professor of Sociology and the
College, as well as a member of the Committee on Data Science. He is the creator of the GeoDa
software and an active contributor to the PySAL Python open-source software library for spatial
analysis. He has written widely on topics dealing with the methodology of spatial data analysis,
including his classic 1988 text on Spatial Econometrics. His work has been recognized by many
awards, such as his election to the U.S. National Academy of Science and the American Academy
of Arts and Science.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
An Introduction to Spatial
Data Science with GeoDa
Volume 2 – Clustering Spatial Data

Luc Anselin
Designed cover image: © Luc Anselin

First edition published 2024


by CRC Press
2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431

and by CRC Press


4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

CRC Press is an imprint of Taylor & Francis Group, LLC

© 2024 Luc Anselin

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.

For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact [email protected]

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for iden-
tification and explanation without intent to infringe.

ISBN: 978-1-032-71302-1 (hbk)


ISBN: 978-1-032-71316-8 (pbk)
ISBN: 978-1-032-71317-5 (ebk)

DOI: 10.1201/9781032713175

Typeset in Latin Modern font


by KnowledgeWorks Global Ltd.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
To Emily
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Contents

List of Figures xi

Preface xvii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

About the Author xix

1 Introduction 1
1.1 Overview of Volume 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Sample Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

I Dimension Reduction 5
2 Principal Component Analysis (PCA) 7
2.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Matrix Algebra Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Matrix decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Principal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.2 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Visualizing principal components . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.1 Scatter plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2 Multivariate decomposition . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Spatializing Principal Components . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Principal component map . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 Univariate cluster map . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.3 Principal components as multivariate cluster maps . . . . . . . . . . 22

3 Multidimensional Scaling (MDS) 25


3.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Classic Metric Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 SMACOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Mathematical Details . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Visualizing MDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 MDS and Parallel Coordinate Plot . . . . . . . . . . . . . . . . . . . 39
3.4.2 MDS Scatter Plot with Categories . . . . . . . . . . . . . . . . . . . 39
3.4.3 3-D MDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Spatializing MDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

vii
viii Contents

3.5.1 MDS and Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


3.5.2 MDS Spatial Weights . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5.3 MDS Neighbor Match Test . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.4 HDBSCAN and MDS . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Stochastic Neighbor Embedding (SNE) 47


4.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Basics of Information Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Stochastic Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 t-SNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Cost Function and Optimization . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Large Data Applications (Barnes-Hut) . . . . . . . . . . . . . . . . . . 51
4.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4.2 Tuning the Optimization . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4.3 Interpretation and Spatialization . . . . . . . . . . . . . . . . . . . . 59
4.5 Comparing Distance Preserving Methods . . . . . . . . . . . . . . . . . . . 60
4.5.1 Comparing t-SNE Options . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5.2 Local Fit with Common Coverage Percentage . . . . . . . . . . . . . . 61

II Classic Clustering 63
5 Hierarchical Clustering Methods 65
5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Dissimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Agglomerative Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3.1 Linkage and Updating Formula . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.1 Variable Settings Dialog . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Ward’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.3 Single linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.4 Complete linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.5 Average linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.6 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6 Partitioning Clustering Methods 85


6.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 The K Means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1 Iterative Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.2 The Choice of K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.3 K-means++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3.1 Digression: Clustering with Dimension Reduction . . . . . . . . . . . 94
6.3.2 Cluster Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3.3 Cluster Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.3.4 Options and Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . 98
6.4 Cluster Categories as Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4.1 Conditional Box Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.4.2 Aggregation by Cluster . . . . . . . . . . . . . . . . . . . . . . . . . 102
Contents ix

7 Advanced Clustering Methods 105


7.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2 K-Medians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2.2 Options and Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . 108
7.3 K-Medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3.1 The PAM Algorithm for K-Medoids . . . . . . . . . . . . . . . . . . 109
7.3.2 Improving on the PAM Algorithm . . . . . . . . . . . . . . . . . . . 115
7.3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.4 Options and Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . 120

8 Spectral Clustering 121


8.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.2 Spectral Clustering Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.3 Clustering as a Graph Partitioning Problem . . . . . . . . . . . . . . . . . . 123
8.3.1 Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.4 The Spectral Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . 125
8.4.1 Adjacency matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.4.2 Clustering on the Eigenvectors of the Graph Laplacian . . . . . . . . 126
8.4.3 Spectral Clustering Parameters . . . . . . . . . . . . . . . . . . . . . 126
8.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5.1 Cluster results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.5.2 Options and Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . 129

III Spatial Clustering 131


9 Spatializing Classic Clustering Methods 133
9.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.2 Clustering on Geographic Coordinates . . . . . . . . . . . . . . . . . . . . . 134
9.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.3 Including Geographical Coordinates in the Feature Set . . . . . . . . . . . . 136
9.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.4 Weighted Optimization of Geographical and Attribute Similarity . . . . . . 139
9.4.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
9.5 Constructing a Spatially Contiguous Solution . . . . . . . . . . . . . . . . . 144
9.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

10 Spatially Constrained Clustering – Hierarchical Methods 149


10.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10.2 Spatially Constrained Hierarchical Clustering (SCHC) . . . . . . . . . . . . 150
10.2.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.3 SKATER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.3.1 Pruning the Minimum Spanning Tree . . . . . . . . . . . . . . . . . 157
10.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.4 REDCAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10.4.1 Illustration – FullOrder-CompleteLinkage . . . . . . . . . . . . . . . 163
10.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
10.5 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
x Contents

11 Spatially Constrained Clustering – Partitioning Methods 167


11.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.2 Automatic Zoning Procedure (AZP) . . . . . . . . . . . . . . . . . . . . . . 168
11.2.1 AZP Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.2.2 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11.2.3 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
11.2.4 ARiSeL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11.2.5 Using the Outcome from Another Cluster Routine as the Initial
Feasible Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
11.2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.2.7 Search Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.2.8 Initialization Options . . . . . . . . . . . . . . . . . . . . . . . . . . 178
11.3 Max-P Region Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.3.1 Max-p Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
11.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
11.3.3 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

IV Assessment 189
12 Cluster Validation 191
12.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
12.2 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
12.2.1 Traditional Measures of Fit . . . . . . . . . . . . . . . . . . . . . . . 193
12.2.2 Balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
12.2.3 Join Count Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12.2.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12.2.5 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
12.2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
12.3 External Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
12.3.1 Classic Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
12.3.2 Visualizing Cluster Match . . . . . . . . . . . . . . . . . . . . . . . . 200
12.4 Beyond Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Bibliography 205

Index 211
List of Figures

2.1 Clusters > PCA | MDS | t-SNE . . . . . . . . . . . . . . . . . . . . . . . . 8


2.2 Vectors in Two-Dimensional Space . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Italy Bank Characteristics Descriptive Statistics . . . . . . . . . . . . . . . 14
2.4 PCA Settings Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 PCA Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Principal Components in the Data Table . . . . . . . . . . . . . . . . . . . 16
2.7 Variable Loadings Using the EIGEN Mehtod . . . . . . . . . . . . . . . . . 18
2.8 Principal Component Calculation . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Principal Component Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 SVD versus EIGEN Results . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.11 Principal Components and PCP . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.12 Box Map of Second Principal Component . . . . . . . . . . . . . . . . . . . 22
2.13 Principal Component Local Geary Cluster Map and PCP . . . . . . . . . . 23
2.14 Principal Component and Multivariate Local Geary Cluster Map . . . . . 23

3.1 Clusters > PCA | MDS | t-SNE . . . . . . . . . . . . . . . . . . . . . . . . 26


3.2 MDS Main Dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 MDS Scatter Plot – Default Settings . . . . . . . . . . . . . . . . . . . . . . 31
3.4 MDS and PCA Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 MDS Dimensions and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 MDS Power Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Iterative Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8 MDS Scatter Plot for SMACOF . . . . . . . . . . . . . . . . . . . . . . . . 36
3.9 MDS Scatter Plot for SMACOF with Manhattan Distance . . . . . . . . . 37
3.10 MDS Classic vs SMACOF Dimensions . . . . . . . . . . . . . . . . . . . . 37
3.11 MDS and Parallel Coordinate Plot . . . . . . . . . . . . . . . . . . . . . . 38
3.12 MDS with a Categorical Variable . . . . . . . . . . . . . . . . . . . . . . . 39
3.13 3D MDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.14 3D MDS after Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.15 Linked Local Moran Cluster Map and MDS . . . . . . . . . . . . . . . . . 42
3.16 Common Connectivity K-6 Neighbors . . . . . . . . . . . . . . . . . . . . . 43
3.17 Local Neighbor Match Test for MDS . . . . . . . . . . . . . . . . . . . . . 45
3.18 Multivariate Local Neighbor Match Test . . . . . . . . . . . . . . . . . . . 45
3.19 HDBSCAN for MDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 Clusters > PCA | MDS | t-SNE . . . . . . . . . . . . . . . . . . . . . . . . 48


4.2 t-SNE Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 t-SNE after full iteration – default settings . . . . . . . . . . . . . . . . . . 54
4.4 t-SNE 2D Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Inspecting interations of t-SNE algorithm: 250 and 300 . . . . . . . . . . . 56
4.6 Inspecting interations of t-SNE algorithm: 1000 and 3000 . . . . . . . . . . 56
4.7 t-SNE 2D Coordinates for theta = 0 . . . . . . . . . . . . . . . . . . . . . 57

xi
xii List of Figures

4.8 t-SNE 2D Coordinates for perplexity=50 . . . . . . . . . . . . . . . . . . . 58


4.9 t-SNE 2D Coordinates for perplexity=50, momentum=100 . . . . . . . . . 58
4.10 Nearest Neighbor Match Test for t-SNE . . . . . . . . . . . . . . . . . . . 59
4.11 Common Coverage Percentage measures of fit for different methods . . . . . 61

5.1 Clusters > K Means | K Medians | K Medoids | Spectral | Hierarchical . . 67


5.2 Single Linkage Hierarchical Clustering Toy Example . . . . . . . . . . . . . 72
5.3 Coordinate Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4 Single Linkage – Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5 Smallest nearest neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 Single Linkage – Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.7 Single Linkage – Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 Single Linkage – Step 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.9 Single Linkage – Step 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.10 Single Linkage – Step 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.11 Single linkage iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.12 Single linkage dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.13 CMAP SDOH Variables Descriptive Statistics . . . . . . . . . . . . . . . . 76
5.14 Hierarchical Clustering Variable Settings . . . . . . . . . . . . . . . . . . . 77
5.15 Initial Dendrogram – Ward’s Method . . . . . . . . . . . . . . . . . . . . . 77
5.16 Dendrogram – Ward’s Method, k=5 . . . . . . . . . . . . . . . . . . . . . . 78
5.17 Summary – Ward’s Method, k=5 . . . . . . . . . . . . . . . . . . . . . . . 78
5.18 Cluster Map – Ward’s Method, k=5 . . . . . . . . . . . . . . . . . . . . . . 79
5.19 Dendrogram – Single Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . . 80
5.20 Summary – Single Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.21 Cluster Map – Single Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . . . 81
5.22 Dendrogram – Complete Linkage, k=5 . . . . . . . . . . . . . . . . . . . . 82
5.23 Summary – Complete Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . . 82
5.24 Cluster Map – Complete Linkage, k=5 . . . . . . . . . . . . . . . . . . . . 83
5.25 Dendrogram – Average Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . 83
5.26 Summary – Average Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . . . 84
5.27 Cluster Map – Average Linkage, k=5 . . . . . . . . . . . . . . . . . . . . . 84

6.1 Clusters > K Means | K Medians | K Medoids | Spectral | Hierarchical . . 86


6.2 K-means toy example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 Worked example – basic data . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.4 Steps in the K-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . 89
6.5 Squared distance to seeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.6 Step 1 – Summary characteristics . . . . . . . . . . . . . . . . . . . . . . . 90
6.7 Squared distance to Step 1 centers . . . . . . . . . . . . . . . . . . . . . . . 90
6.8 Step 2 – Summary characteristics . . . . . . . . . . . . . . . . . . . . . . . 90
6.9 Squared distance to Step 2 centers . . . . . . . . . . . . . . . . . . . . . . . 90
6.10 Step 3 – Summary characteristics . . . . . . . . . . . . . . . . . . . . . . . . 91
6.11 SDOH Census Tract Variables Descriptive Statistics . . . . . . . . . . . . . 93
6.12 Principal Components Variable Loadings . . . . . . . . . . . . . . . . . . . 94
6.13 K-Means Clustering Variable Settings . . . . . . . . . . . . . . . . . . . . . 95
6.14 Summary – K-Means Method, k=8 . . . . . . . . . . . . . . . . . . . . . . 96
6.15 PCP Plot of K-Means Cluster Centers . . . . . . . . . . . . . . . . . . . . 96
6.16 Cluster Map – K-Means Method, k=8 . . . . . . . . . . . . . . . . . . . . 97
6.17 K-Means – Evolution of fit with k . . . . . . . . . . . . . . . . . . . . . . . 98
6.18 K-Means – Elbow Plot WSS . . . . . . . . . . . . . . . . . . . . . . . . . . 99
List of Figures xiii

6.19 K-Means – Elbow Plot BSS/TSS . . . . . . . . . . . . . . . . . . . . . . . 99


6.20 Summary – K-Means Method, population bound 200,000, k=8 . . . . . . . 100
6.21 Cluster Map – K-Means Method, population bound 200,000, k=8 . . . . . 100
6.22 Conditional Box Plot by Cluster Category . . . . . . . . . . . . . . . . . . . 101
6.23 Dissolved Cluster Categories . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.24 Aggregated Cluster Categories . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.1 Summary – K-Medians Method, k=8 . . . . . . . . . . . . . . . . . . . . . 106


7.2 Cluster Map – K-Medians Method, k=8 . . . . . . . . . . . . . . . . . . . 107
7.3 Manhattan Inter-Point Distance Matrix . . . . . . . . . . . . . . . . . . . . 109
7.4 BUILD – Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.5 Cluster Assignment at end of BUILD stage . . . . . . . . . . . . . . . . . . 110
7.6 PAM SWAP Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.7 PAM SWAP Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.8 PAM SWAP Cost Changes – Step 1 . . . . . . . . . . . . . . . . . . . . . . 112
7.9 PAM SWAP Cost Changes – Step 2 . . . . . . . . . . . . . . . . . . . . . . 113
7.10 PAM SWAP Cost Changes – Step 3 . . . . . . . . . . . . . . . . . . . . . . 114
7.11 PAM SWAP Cost Changes – Step 4 . . . . . . . . . . . . . . . . . . . . . . 115
7.12 PAM SWAP Final Assignment . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.13 Summary, K-Medoids, k=8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.14 K-Medoids Cluster Map, k=8 . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.1 Spirals Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121


8.2 K-Means and K-Medoids for Spirals Data . . . . . . . . . . . . . . . . . . 122
8.3 Spectral Clustering of Spirals Data . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Cluster Characteristics for Spectral Clustering of Spirals Data – Default
Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.5 Cluster Characteristics for Spectral Clustering of Spirals Data – Mutual
KNN=3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.6 Cluster Characteristics for Spectral Clustering of Spirals Data – Mutual
KNN=3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.7 Cluster Characteristics for Spectral Clustering of Spirals Data – Gaussian
Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.8 Cluster Characteristics for Spectral Clustering of Spirals Data – Gaussian
Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

9.1 K-Means and K-Medoids on X-Y Coordinates . . . . . . . . . . . . . . . . 135


9.2 Spectral and Hierarchical on X-Y Coordinates . . . . . . . . . . . . . . . . 135
9.3 K-Means without and with X-Y Coordinates included . . . . . . . . . . . . 136
9.4 K-Means Cluster Characteristics . . . . . . . . . . . . . . . . . . . . . . . . 137
9.5 K-Means Cluster Characteristics with X-Y Coordinates . . . . . . . . . . . 138
9.6 Connectivity Check for Cluster Members . . . . . . . . . . . . . . . . . . . . 141
9.7 Cluster Map, K-Means with Weighted Optimization . . . . . . . . . . . . . 142
9.8 Cluster Characteristics, K-Means with Weighted Optimization . . . . . . . 143
9.9 Cluster Characteristics, K-Means with Weights at 0.25 . . . . . . . . . . . 144
9.10 Make Spatial Cluster 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.11 Make Spatial Cluster 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.12 Make Spatial Cluster Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.13 Make Spatial Cluster Characteristics . . . . . . . . . . . . . . . . . . . . . 147

10.1 Clusters > SCHC | skater | redcap . . . . . . . . . . . . . . . . . . . . . . 150


xiv List of Figures

10.2 Arizona counties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152


10.3 County identifiers and standardized variable . . . . . . . . . . . . . . . . . 152
10.4 AZ county queen contiguity (GAL) . . . . . . . . . . . . . . . . . . . . . . 152
10.5 SCHC complete linkage step 1 . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.6 SCHC complete linkage step 2 . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.7 SCHC complete linkage dendrogram, k=4 . . . . . . . . . . . . . . . . . . 153
10.8 SCHC complete linkage cluster map, k=4 . . . . . . . . . . . . . . . . . . 154
10.9 SCHC variable selection dialog . . . . . . . . . . . . . . . . . . . . . . . . . 154
10.10 SCHC cluster map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
10.11 SCHC cluster characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . 156
10.12 SKATER minimum spanning tree . . . . . . . . . . . . . . . . . . . . . . . 157
10.13 SKATER step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
10.14 SKATER minimum spanning tree – first split . . . . . . . . . . . . . . . . 158
10.15 SKATER minimum spanning tree – final split . . . . . . . . . . . . . . . . 159
10.16 SKATER cluster map, k=4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.17 Ceará, first cut in MST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
10.18 Ceará, SKATER cluster map, k=12 . . . . . . . . . . . . . . . . . . . . . . 160
10.19 Ceará, SKATER cluster characteristics, k=12 . . . . . . . . . . . . . . . . . 161
10.20 Ceará, constrained SKATER cluster map, k=7 with min population size . 162
10.21 Ceará, constrained SKATER cluster map, k=12 with min cluster size 10 . 162
10.22 REDCAP FullOrder-CompleteLinkage spanning tree . . . . . . . . . . . . 164
10.23 REDCAP step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10.24 Ceará, REDCAP cluster map, k=12 . . . . . . . . . . . . . . . . . . . . . . 165
10.25 Ceará, REDCAP cluster characteristics, k=12 . . . . . . . . . . . . . . . . 166

11.1 Clusters > AZP | max-p . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168


11.2 Arizona AZP initial feasible solution . . . . . . . . . . . . . . . . . . . . . 170
11.3 Arizona AZP initial Total Within SSD . . . . . . . . . . . . . . . . . . . . 170
11.4 Initial neighbor list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.5 Step 1 neighbor selection (2) . . . . . . . . . . . . . . . . . . . . . . . . . . 170
11.6 Arizona AZP step 2 neighbor selection – 14 . . . . . . . . . . . . . . . . . . 171
11.7 Arizona AZP step 2 Total Within SSD . . . . . . . . . . . . . . . . . . . . . 171
11.8 Arizona AZP step 3 neighbor selection – 3 . . . . . . . . . . . . . . . . . . 172
11.9 Arizona AZP step 3 Total Within SSD . . . . . . . . . . . . . . . . . . . . 172
11.10 Ceará AZP clusters for p=12, local search . . . . . . . . . . . . . . . . . . 176
11.11 Ceará AZP clusters for p=12, tabu search . . . . . . . . . . . . . . . . . . 176
11.12 Ceará AZP clusters for p=12, simulated annealing search . . . . . . . . . . 177
11.13 Ceará AZP clusters for p=12, simulated annealing search with ARiSeL . . 178
11.14 Ceará AZP clusters for p=12, simulated annealing search with SCHC initial
regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.15 Arizona counties population . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
11.16 Arizona max-p growth, start with 6 . . . . . . . . . . . . . . . . . . . . . . . 181
11.17 Arizona max-p growth phase – 6 and 2 . . . . . . . . . . . . . . . . . . . . 182
11.18 Arizona max-p growth phase – region 1 . . . . . . . . . . . . . . . . . . . . 182
11.19 Arizona max-p growth phase – regions 1 and 2 . . . . . . . . . . . . . . . . 182
11.20 Arizona max-p growth phase – pick 3 . . . . . . . . . . . . . . . . . . . . . 183
11.21 Arizona max-p growth phase – join 3 and 14 . . . . . . . . . . . . . . . . . 183
11.22 Arizona max-p growth phase – region 1, 2, 3 . . . . . . . . . . . . . . . . . 183
11.23 Arizona max-p growth phase – pick 12 . . . . . . . . . . . . . . . . . . . . 184
11.24 Arizona max-p growth phase – join 12 and 4 . . . . . . . . . . . . . . . . . 184
11.25 Arizona max-p enclaves and initial regions 1, 2, 3, 4 . . . . . . . . . . . . . 184
List of Figures xv

11.26 Arizona max-p assign enclaves . . . . . . . . . . . . . . . . . . . . . . . . . 185


11.27 Arizona max-p – feasible initial regions . . . . . . . . . . . . . . . . . . . . 186
11.28 Ceará max-p regions with population bound 422,619 . . . . . . . . . . . . 186
11.29 Ceará max-p regions with population bound 253,571 . . . . . . . . . . . . 187

12.1 Clusters > Cluster Match Map | Make Spatial | Validation . . . . . . . . 192
12.2 Hierarchical Clustering – Ward’s method, Ceará . . . . . . . . . . . . . . . 196
12.3 Internal Validation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 196
12.4 Internal Validation Result – Hierarchical Clustering . . . . . . . . . . . . . 197
12.5 Internal Validation Result – AZP with Initial Region . . . . . . . . . . . . 198
12.6 Adjusted Rand Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
12.7 Normalized Information Distance . . . . . . . . . . . . . . . . . . . . . . . 200
12.8 K-Means and SKATER overlap . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.9 SCHC and REDCAP overlap . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.10 Cluster Match Map – SKATER and K-MEANS . . . . . . . . . . . . . . . 202
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Preface

In contrast to the materials covered in Volume 1, this second volume has no precedent in
an earlier workbook. Much of its contents have been added in recent years to the GeoDa
documentation pages, as the topics were gradually included into my Introduction to Spatial
Data Science course and implemented in GeoDa. At one point, the material became too much
to constitute a single course and was split off into a separate Spatial Clustering course. The
division of the content between the two volumes follows this organization.
In contrast to the first volume, where the focus is almost exclusively on data exploration,
here attention switches to the delineation of groupings of observations, i.e., clusters. Both
traditional and spatially constrained methods are considered. Again, the emphasis is on how
a spatial perspective can contribute to additional insight, both by considering the spatial
aspects explicitly (as in spatially constrained clustering) as well as through spatializing
classic techniques.
Compared to Volume 1, the treatment is slightly more mathematical and familiarity with the
methods covered in the first volume is assumed. As before, extensive references are provided.
However, in contrast to the first volume, several methods included here are new and have
not been treated extensively in earlier publications. They were typically introduced as part
of the documentation of new features in GeoDa.
The empirical illustrations use the same sample data sets as in Volume 1. These are included
in the software.
All applications are based on Version 1.22 of the software, available in Summer 2023. Later
versions may include slight changes as well as additional features, but the treatment provided
here should remain valid. The software is free, cross-platform and open source, and can be
downloaded from https://geodacenter.github.io/download.html.

Acknowledgments
This second volume is based on enhancements in the GeoDa software implemented in the
past five or so years, with Xun Li as the lead software engineer and Julia Koschinsky as
a constant source of inspiration and constructive comments. The software development
received institutional support by the University of Chicago to the Center for Spatial Data
Science.
Help and suggestions with the production process from Lara Spieker of Chapman & Hall is
greatly appreciated.
As for the first volume, Emily has been patiently living with my GeoDa obsession for many
years. This volume is also dedicated to her.
Shelby, MI, Summer 2023

xvii
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
About the Author

Luc Anselin is the Founding Director of the Center for Spatial Data Science at the University
of Chicago, where he is also Stein-Freiler Distinguished Service Professor of Sociology and the
College. He previously held faculty appointments at Arizona State University, the University
of Illinois at Urbana-Champaign, the University of Texas at Dallas, the Regional Research
Institute at West Virginia University, the University of California, Santa Barbara, and The
Ohio State University. He also was a visiting professor at Brown University and MIT. He
holds a PhD in Regional Science from Cornell University.
Over the past four decades, he has developed new methods for exploratory spatial data
analysis and spatial econometrics, including the widely used local indicators of spatial
autocorrelation. His 1988 Spatial Econometrics text has been cited some 17,000 times. He
has implemented these methods into software, including the original SpaceStat software, as
well as GeoDa, and as part of the Python PySAL library for spatial analysis.
His work has been recognized by several awards, including election to the U.S. National
Academy of Sciences and the American Academy of Arts and Sciences.

xix
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
1
Introduction

This second volume in the Introduction to Spatial Data Science is devoted to the topic
of spatial clustering. More specifically, it deals with the grouping of observations into a
smaller number of clusters, which are designed to be representative of their members. The
techniques considered constitute an important part of so-called unsupervised learning in
modern machine learning. Purely statistical methods to discover spatial clusters in data are
beyond the scope.
In contrast to Volume 1, which assumed very little prior (spatial) knowledge, the current
volume is somewhat more advanced. At a minimum, it requires familiarity with the scope
of the exploratory toolbox included in the GeoDa software. In that sense, it clearly builds
upon the material covered in Volume 1. Important principles that are a main part of the
discussion in Volume 1 are assumed known. This includes linking and brushing, the various
types of maps and graphs, spatial weights and spatial autocorrelation statistics.
Much of the material covered in this volume pertains to methods that have been incorporated
into the GeoDa software only in the past few years, so as to support the second part of an
Introduction to Spatial Data Science course sequence. The particular perspective offered is the
tight integration of the clustering results with a spatial representation, through customized
cluster maps and by exploiting linking and brushing.
The treatment is slightly more technical than in the previous volume, but the mathematical
details can readily be skipped if the main interest is in application and interpretation.
Necessarily, the discussion relies on somewhat more formal concepts. Some examples are the
treatment of matrix eigenvalues and matrix decomposition, the concept of graph Laplacian,
essentials of information theory, elements of graph theory, advanced spatial data structures
such as quadtree and vantage point tree, and optimization algorithms like gradient search,
iterative greedy descent, simulated annealing and tabu search. These concepts are not
assumed known but will be explained in the text.
While many of the methods covered constitute part of mainstream data science, the perspec-
tive offered here is rather unique, with an enduring attempt at spatializing the respective
methods. In addition, the treatment of spatially constrained clustering introduces contiguity
as an additional element into clustering algorithms.
Most methods discussed are familiar from the literature, but some are new. Examples
include the common coverage percentage, a local measure of goodness of fit between distance
preserving dimension reduction methods, two new spatial measures to assess cluster quality,
i.e., the join count ratio and the cluster match map, a heuristic to obtain contiguous results
from classic clustering results and a hybrid approach toward spatially constrained clustering,
whereby the outcome of a given method is used as the initial feasible region in a second
method. The techniques are the results of refinements in the software and the presentation
of cluster results, and have not been published previously. In addition, the various methods
to spatialize cluster results are mostly also unique to the treatment in this volume.

DOI: 10.1201/9781032713175-1 1
2 Introduction

As in Volume 1, the coverage here also constitutes the definitive user’s guide to the GeoDa
software, complementing the previous discussion.
In the remainder of this introduction, I provide a broad overview of the organization of
Volume 2, followed by a listing of the sample data sets used. As was the case for Volume 1,
these data sets are included as part of the GeoDa software and do not need to be downloaded
separately. For a quick tour of the GeoDa software, I refer to the Introduction of Volume 1.

1.1 Overview of Volume 2


Volume 2 is organized into four parts:
• Dimension reduction
• Classic clustering
• Spatial clustering
• Assessment
The first part reviews classic dimension reduction techniques, divided into three chapters,
devoted to principal components, multidimensional scaling and stochastic neighbor embed-
ding. In addition to the discussion of the classic properties, specific attention is paid to
spatializing these techniques, i.e., bringing out interesting spatial aspects of the results.
Part II covers classic clustering methods, in contrast to spatially constrained clusters, which
are the topic of Part III. Four chapters deal with, respectively, hierarchical clustering methods,
partitioning clustering methods (K-Means), advanced methods (K-Medians and K-Medoids)
and spectral clustering.
The chapters in Part III deal with methods to include an explicit spatial constraint of
contiguity into the clustering routine. The first chapter outlines techniques to spatialize
classic clustering methods, which involve soft spatial constraints. These techniques do not
guarantee a spatially compact (contiguous) solution. In contrast, the methods discussed in
the next two chapters impose hard spatial constraints. One chapter deals with hierarchical
approaches (spatially constrained hierarchical clustering, SKATER and REDCAP) and the
other with partitioning methods (AZP and max-p).
Part IV deals with assessment and includes a final chapter outlining a range of approaches
to validate the cluster results, both in terms of internal validity and external validity. It
closes with some concluding remarks.
As before, in addition to the material covered in this volume, the GeoDaCenter Github
site (https://geodacenter.github.io) contains an extensive support infrastructure. This
includes detailed documentation and illustrations, as well as a large collection of sample data
sets, cookbook examples and links to a YouTube channel containing lectures and tutorials.
Specific software support is provided by means of a list of frequently asked questions and
answers to common technical questions, as well as by the community through the Google
Groups Openspace list.
Sample Data Sets 3

1.2 Sample Data Sets


As in Volume 1, the methods and software are illustrated by means of empirical examples
that are available directly from inside the GeoDa software. In Volume 2, only a subset of the
full slate of sample data are used.
These are
• Italian community banks (n=261)
– bank performance indicators for 2011-17 (used by Algeri et al., 2022)
– see Chapters 2 through 4
• Chicago CCA Profiles (n=77)
– socio-economic snapshot for Chicago Community Areas in 2020 (American Commu-
nity Survey from the Chicago Metropolitan Agency for Planning – CMAP – data
portal)
– see Chapter 5
• Chicago Census Tracts (n=791)
– socio-economic determinants of health in 2014 (a subset of the data used in Kolak
et al., 2020)
– see Chapters 6 and 7
• Spirals (n=300)
– canonical data set to test spectral clustering
– see Chapter 8
• Municipalities in the State of Ceará, Brazil (n=184)
– Zika and Microcephaly infections and socio-economic profiles for 2013-2016 (adapted
from Amaral et al., 2019)
– see Chapters 9 through 12
Further details are provided in the context of specific methods.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Part I

Dimension Reduction
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
2
Principal Component Analysis (PCA)

The familiar curse of dimensionality affects analysis across two dimensions. One is the
number of observations (big data) and the other is the number of variables considered.
The methods included in Volume 2 address this problem by reducing the dimensionality,
either in the number of observations (clustering) or in the number of variables (dimension
reduction). The three chapters in Part I address the latter problem. This chapter covers
principal components analysis (PCA), a core method of both multivariate statistics and
machine learning. Dimension reduction is particularly relevant in situations where many
variables are available that are highly intercorrelated. In essence, the original variables are
replaced by a smaller number of proxies that represent them well in terms of their statistical
properties.
Before delving into the formal derivation of principal components, a brief review is included
of some basic concepts from matrix algebra, focusing in particular on matrix decomposition.
Next follows a discussion of the mathematical properties of principal components and their
implementation and interpretation.
A distinct characteristic of this chapter is the attention paid to spatializing the inherently
non spatial concept of principal components. This is achieved by exploiting geovisualization,
linking and brushing to represent the dimension reduction in geographic space. Of particular
interest are principal component maps and the connection between univariate local cluster
maps for principal components and their multivariate counterpart.
The methods are illustrated using the Italy Community Banks sample data set.

2.1 Topics Covered


• Understand the mathematics behind principal component analysis
• Compute principal components for a set of variables
• Interpret the characteristics of a principal component analysis
• Spatialize the principal components
• Investigate the connection between clustering of principal components and multivariate
clustering
GeoDa Functions
• Clusters > PCA
– select variables
– PCA parameters
– PCA summary statistics
– saving PCA results

DOI: 10.1201/9781032713175-2 7
8 Principal Component Analysis (PCA)

Toolbar Icons

Figure 2.1: Clusters > PCA | MDS | t-SNE

2.2 Matrix Algebra Review


Before moving on to the mathematics of principal components analysis, a brief review of
some basic matrix algebra concepts is included here. Readers already familiar with this
material can easily skip this section.
Vectors and matrices are ways to collect a lot of information and manipulate it in a concise
mathematical way. One can think of a matrix as a table with rows and columns, and a
vector as a single row or column. In two dimensions, i.e., for two values, a vector can be
visualized as an arrow between the origin of the coordinate system (0,0) and a given point.
The first value corresponds to the x-axis and the second value corresponds to the y-axis. In
Figure 2.2, this is illustrated for the vector:
 
1
v= .
2
The arrow in the figure connects the origin of a x − y scatter plot to the point (x = 1, y = 2).
A central application in matrix algebra is the multiplication of vectors and matrices. The
simplest case is the multiplication of a vector by a scalar (i.e., a single number). Graphically,
multiplying a vector by a scalar just moves the end point further or closer on the same slope.
For example, multiplying the vector v by the scalar 2 gives:
   
1 2
2× =
2 4
This is equivalent to moving the arrow over on the same slope from (1,2) to the point (2,4)
further from the origin, as shown by the dashed red line in Figure 2.2.
Matrix Algebra Review 9

Figure 2.2: Vectors in Two-Dimensional Space

Multiplying a matrix by a vector is slightly more complex, but again corresponds to a simple
geometric transformation. For example, consider the 2 × 2 matrix A:
 
1 3
A= .
3 2
The result of a multiplication of a 2 × 2 matrix by a 2 × 1 column vector is a 2 × 1 column
vector. The first element of this vector is obtained as the product of the matching elements
of the first row with the vector, the second element similarly as the product of the matching
elements of the second row with the vector. In the example, this boils down to:
   
(1 × 1) + (3 × 2) 7
Av = = .
(3 × 1) + (2 × 2) 5
Geometrically, this consists of a combination of rescaling and rotation. For example, in
Figure 2.2, first the slope of the vector is changed, followed by a rescaling to the point (7,5),
as shown by the blue dashed arrows.
A case of particular interest is for any matrix A to find a vector v, such that when post-
multiplied by that vector, there is only rescaling and no rotation. In other words, instead of
finding what happens to the point (1,2) after pre-multiplying by the matrix A, the interest
focuses on finding a particular vector that just moves a point up or down on the same slope
for that particular matrix. As it turns out, there are several such solutions. This problem
is known as finding eigenvectors and eigenvalues for a matrix. It has a broad range of
applications, including in the computation of principal components.

2.2.1 Eigenvalues and eigenvectors


The eigenvectors and eigenvalues of a square symmetric matrix A are a special scalar-vector
pair, such that Av = λv, where λ is the eigenvalue and v is the eigenvector. In addition, the
different eigenvectors are such that they are orthogonal to each other. This means that the
product of two different eigenvectors is zero, i.e., vu vk = 0 (for u = k).1 Also, the sum of
squares of the eigenvector elements equals one. In vector notation, vu vu = 1.
1 The product of a row vector with a column vector is a scalar. The symbol  stands for the transpose of a
vector, in this case, a column vector that is turned into a row vector.
10 Principal Component Analysis (PCA)

What does this mean? For an eigenvector (i.e., arrow from the origin), the transformation
by A does not rotate the vector, but simply rescales it (i.e., moves it further or closer to the
origin), by exactly the factor λ.
For the example matrix A, the two eigenvectors turn out to be [0.6464 0.7630] and [-0.7630
0.6464], with associated eigenvalues 4.541 and -1.541. Each square matrix has as many
eigenvectors and matching eigenvalues as its rank, in this case 2 – for a 2 by 2 nonsingular
matrix. The actual computation of eigenvalues and eigenvectors is rather complicated, and
is beyond the scope of this discussion.
To further illustrate this concept, consider post-multiplying the matrix A with its eigenvector
[0.6464 0.7630]:    
(1 × 0.6464) + (3 × 0.7630) 2.935
=
(3 × 0.6464) + (2 × 0.7630) 3.465
The eigenvector rescaled by the matching eigenvalue gives the same result:
   
0.6464 2.935
4.541 × =
0.7630 3.465

In other words, for the point (0.6464 0.7630), a pre-multiplication by the matrix A just
moves it by a multiple of 4.541 to a new location on the same slope, without any rotation.
With the eigenvectors stacked in a matrix V, it is easy to verify that they are orthogonal
and the sum of squares of the coefficients sum to one, i.e., V  V = I (with I as the identity
matrix):
     
0.6464 −0.7630 0.6464 −0.7630 1 0
=
0.7630 0.6464 0.7630 0.6464 0 1
In addition, it is easily verified that V V  = I as well. This means that the transpose of V is
also its inverse (per the definition of an inverse matrix, i.e., a matrix for which the product
with the original matrix yields the identity matrix) or V −1 = V  .
Eigenvectors and eigenvalues are central in many statistical analyses, but it is important
to realize they are not as complicated as they may seem at first sight. On the other hand,
computing them efficiently is complicated, and best left to specialized programs.
Finally, a couple of useful properties of eigenvalues are worth mentioning.
The sum of the eigenvalues equals the trace of the matrix. The trace is the sum of the
diagonal elements. For the matrix A in the example, the trace is 1 + 2 = 3. The sum of the
two eigenvalues is 4.541 − 1.541 = 3.
In addition, the product of the eigenvalues equals the determinant of the matrix. For a
2 × 2 matrix, the determinant is ab − cd, or the product of the diagonal elements minus the
product of the off-diagonal elements. In the example, that is (1 × 2) − (3 × 3) = −7. The
product of the two eigenvalues is 4.541 × −1.541 = −7.0.

2.2.2 Matrix decompositions


In many applications in statistics and data science, a lot is gained by representing the
original matrix by a product of special matrices, typically related to the eigenvectors and
eigenvalues. These are so-called matrix decompositions. Two cases are particularly relevant
for principal component analysis, as well as many other applications: spectral decomposition
and singular value decomposition.
Matrix Algebra Review 11

2.2.2.1 Spectral decomposition


For each eigenvector v of a square symmetric matrix A, it holds that Av = λv. This can
be written compactly for all the eigenvectors and eigenvalues by organizing the individual
eigenvectors as columns in a k × k matrix V, with k as the dimension of the matrix A.
Similarly, the matching eigenvalues can be organized as the diagonal elements of a k × k
diagonal matrix, say G.
The basic eigenvalue expression can then be written as

AV = V G.

Note that V goes first in the matrix multiplication on the right hand side to ensure that
each column of V is multiplied by the corresponding eigenvalue on the diagonal of G to yield
λv. Taking advantage of the fact that the eigenvectors are orthogonal, namely that V V  = I,
gives that post-multiplying each side of the equation by V  yields AV V  = V GV  , or

A = V GV  .

This is the so-called eigenvalue decomposition or spectral decomposition of the square


symmetric matrix A.
For any n × k matrix of standardized observations X (i.e., n observations on k variables),
the square matrix X  X corresponds to the correlation matrix. The spectral decomposition
of this matrix yields:
X  X = V GV  ,
with V as a matrix with the eigenvectors as columns, and G as a diagonal matrix containing
the eigenvalues. This property can be used to construct the principal components of the
matrix X.

2.2.2.2 Singular value decomposition (SVD)


The spectral matrix decomposition only applies to square matrices. A more general de-
composition is the so-called singular value decomposition, which applies to any rectangular
matrix, such as the n × k matrix X with (standardized) observations directly, rather than
its correlation matrix.
For a full-rank n × k matrix X (there are many more general cases where SVD can be
applied), the decomposition takes the following form:

X = U DV  ,

where U is a n × k orthonormal matrix (i.e., U  U = I), D is a k × k diagonal matrix and V


is a k × k orthonormal matrix (i.e., V  V = I).
While SVD is very general, the full generality is not needed for the PCA case. It turns out
that there is a direct connection between the eigenvalues and eigenvectors of the (square)
correlation matrix X  X and the SVD of X. Using the SVD decomposition above, and
exploiting the orthonormal properties of the various matrices, the product X  X can be
written as:
X  X = (U DV  ) (U DV  ) = V DU  U DV  = V D2 V  ,
since U  U = I.
It thus turns out that the columns of V from the SVD decomposition contain the eigenvectors
of X  X. In addition, the square of the diagonal elements of D in the SVD are the eigenvalues
12 Principal Component Analysis (PCA)

of the correlation matrix. Or, equivalently, the diagonal elements of the matrix D are the
square roots of the eigenvalues of X  X. This property can be exploited to derive the principal
components of the matrix X.

2.3 Principal Components


Principal components analysis has an eminent historical pedigree, going back to pioneering
work in the early twentieth century by the statistician Karl Pearson (Pearson, 1901) and the
economist Harold Hotelling (Hotelling, 1933). The technique is also known as the Karhunen-
Loève transform in probability theory, and as empirical orthogonal functions or EOF in
meteorology (see, for example, in applications of space-time statistics in Cressie and Wikle,
2011; Wikle et al., 2019).
The derivation of principal components can be approached from a number of different
perspectives, all leading to the same solution. Perhaps the most common treatment considers
the components as the solution of a problem of finding new variables that are constructed as
a linear combination of the original variables, such that they maximize the explained variance.
In a sense, the principal components can be interpreted as the best linear approximation to
the multivariate point cloud of the data.
The point of departure is to organize n observations on k variables xh , with h = 1, . . . , k,
as a n × k matrix X (each variable is a column in the matrix). In practice, each of the
variables is typically standardized, such that its mean is zero and variance equals one. This
avoids problems with (large) scale differences between the variables (i.e., some are very small
numbers and others very large). For such standardized variables, the k × k cross product
matrix X  X corresponds to the correlation matrix (without standardization, this would be
the variance-covariance matrix).2
The goal is to find a small number of new variables, the so-called principal components, that
explain the bulk of the variance (or, in practice, the correlation) in the original variables. If
this can be accomplished with a much smaller number of variables than in the original set,
the objective of dimension reduction will have been achieved.
Each principal component zu is a linear combination of the original variables xh , with h
going from 1 to k such that:

zu = a1 x1 + a2 x2 + · · · + ak xk

The mathematical problem is to find the coefficients ah such that the new variables maximize
the explained variance of the original variables. In addition, to avoid an indeterminate
solution, the coefficients are scaled such that the sum of their squares equals 1.
A full mathematical treatment of the derivation of the optimal solution to this problem
is beyond the current scope (for details, see, e.g., Lee and Verleysen, 2007, Chapter 2).
Nevertheless, obtaining a basic intuition for the mathematical principles involved is useful.
The coefficients by which the original variables need to be multiplied to obtain each principal
component can be shown to correspond to the elements of the eigenvectors of X  X, with the
2 The standardization should not be done mechanically, since there are instances where the variance

differences between the variables are actually meaningful, e.g., when the scales on which they are measured
have a strong substantive meaning (e.g., in psychology).
Principal Components 13

associated eigenvalue giving the explained variance. Even though the original data matrix X
is typically not square (of dimension n × k), the cross-product matrix X  X is of dimension
k × k, so it is square and symmetric. As a result, all the eigenvalues are real numbers, which
avoids having to deal with complex numbers.
Operationally, the principal component coefficients are obtained by means of a matrix
decomposition. One option is to compute the spectral decomposition of the k × k matrix
X  X, i.e., of the correlation matrix. As shown in Section 2.2.2.1, this yields:

X  X = V GV  ,

where V is a k × k matrix with the eigenvectors as columns (the coefficients needed to


construct the principal components) and G a k × k diagonal matrix of the associated
eigenvalues (the explained variance).
A principal component for each observation is obtained by multiplying the row of standardized
observations by the column of eigenvalues, i.e., a column of the matrix V. More formally, all
the principal components are obtained concisely as:

XV.

A second and computationally preferred way to approach this is as a singular value decom-
position (SVD) of the n × k matrix X, i.e., the matrix of (standardized) observations. From
Section 2.2.2.2, this follows as
X = U DV  ,
where again V (the transpose of the k × k matrix V  ) is the matrix with the eigenvectors
of X  X as columns, and D is a k × k diagonal matrix, containing the square root of the
eigenvalues of X  X on the diagonal.3 Note that the number of eigenvalues used in the
spectral decomposition and in SVD is the same, and equals k, the column dimension of X.
Since V  V = I, the following result obtains when both sides of the SVD decomposition are
post-multiplied by V :
XV = U DV  V = U D.
In other words, the principal components XV can also be obtained as the product of the
orthonormal matrix U with a diagonal matrix containing the square root of the eigenvalues,
D. This result is important in the context of multidimensional scaling, considered in
Chapter 3.
It turns out that the SVD approach is the solution to viewing the principal components
explicitly as a dimension reduction problem, originally considered by Karl Pearson. The
observed vector on the k variables x can be expressed as a function of a number of unknown
latent variables z, such that there is a linear relationship between them:

x = Az,

where x is a k × 1 vector of the observed variables, and z is a p × 1 vector of the (unobserved)


latent variables, ideally with p much smaller than k. The matrix A is of dimension k × p
and contains the coefficients of the transformation. Again, in order to avoid indeterminate
solutions, the coefficients are scaled such that A A = I, which ensures that the sum of
squares of the coefficients associated with a given component equals one.
3 Since the eigenvalues equal the variance explained by the corresponding component, the diagonal elements

of D are thus the standard deviation explained by the component.


14 Principal Component Analysis (PCA)

Figure 2.3: Italy Bank Characteristics Descriptive Statistics

Instead of maximizing explained variance, the objective is now to find A and z such that
the so-called reconstruction error is minimized.4
Importantly, different computational approaches to obtain the eigenvalues and eigenvectors
(there is no analytical solution) may yield opposite signs for the elements of the eigenvectors.
However, the eigenvalues will be the same. The sign of the eigenvectors will affect the sign
of the resulting component, i.e., positives become negatives. For example, this can be the
difference between results based on a spectral decomposition versus SVD.
In a principal component analysis, the interest typically focuses on three main results. First,
the principal component scores are used as a replacement for the original variables. This
is particularly relevant when a small number of components explain a substantial share of
the original variance. Second, the relative contribution of each of the original variables to
each principal component is of interest. Finally, the variance proportion explained by each
component in and of itself is also important.

2.3.1 Implementation
Principal components are invoked from the drop-down list created by the toolbar Clusters
icon (Figure 2.1) as the top item (more precisely, the first item in the dimension reduction
category). Alternatively, from the main menu, Clusters > PCA gets the process started.
The illustration uses ten variables that characterize the efficiency of community banks, based
on the observations for 2013 from the Italy Community Bank sample data set (see Algeri
et al., 2022):
• CAPRAT: ratio of capital over risk weighted assets
• Z: z score of return on assets (ROA) + leverage over the standard deviation of ROA
• LIQASS: ratio of liquid assets over total assets
• NPL: ratio of non performing loans over total loans
• LLP: ratio of loan loss provision over customer loans
• INTR: ratio of interest expense over total funds
• DEPO: ratio of total deposits over total assets
• EQLN: ratio of total equity over customer loans
4 The concept of reconstruction error is somewhat technical. If A were a square matrix, one could solve

for z as z = A−1 x, where A−1 is the inverse of the matrix A. However, due to the dimension reduction, A is
not square, so something called a pseudo-inverse or Moore-Penrose inverse must be used. This is the p × k
matrix (A A)−1 A , such that z = (A A)−1 A x. Furthermore, because A A = I, this simplifies to z = A x
(of course, so far the elements of A are unknown). Since x = Az, if A were known, x could be found as Az,
or, as AA x. The reconstruction error is then the squared difference between x and AA x. The objective is
to find the coefficients for A that minimize this expression. For an extensive technical discussion, see Lee
and Verleysen (2007), Chapter 2.
Principal Components 15

Figure 2.4: PCA Settings Menu

• SERV: ratio of net interest income over total operating revenues


• EXPE: ratio of operating expenses over total assets
Some descriptive statistics are listed in Figure 2.3. An analysis of individual box plots (not
shown) reveals that most distributions are quite skewed, with only NPL, INTR and DEPO
not showing any outliers. SERV is the only variable with outliers at the low end of the
distribution. All the other variables have a considerable number of outlying observations at
the high end (see Algeri et al., 2022, for a substantive discussion of the variables).
The correlation matrix (not shown) includes both very strong linear relations between pairs
of variables as well as very low ones. For example, NPL is highly correlated with SERV
(−0.90) and LLP (0.64), as is CAPRATA with EQLN (0.87), but LIQASS is essentially
uncorrelated with NPL (−0.004) and SERV (0.01).
Selection of PCA brings up the PCA Settings Menu, which is the main interface to
specify all the settings. This interface has a similar structure for all multivariate analyses
and is shown in Figure 2.4.
The top dialog is the interface to Select Variables. The default Method to compute the
various coefficients is SVD. The other option is Eigen, which uses spectral decomposition.
By default, all variables are used as Standardize (Z), such that the mean is zero and the
standard deviation is one.5
The Run button computes the principal components and brings up the results in the
right-hand panel, as shown in Figure 2.5.
The result summary is evaluated in more detail in Section 2.3.2.
5A full list of the standardization options in GeoDa is given in Chapter 2 of Volume 1.
16 Principal Component Analysis (PCA)

Figure 2.5: PCA Results

Figure 2.6: Principal Components in the Data Table

2.3.1.1 Saving the principal components


Once the computation is finished, the resulting principal components become available to
be added to the Data Table as new variables. The Components drop-down list suggests
the number of components based on the 95% variance criterion (see Section 2.3.2). In the
example, this is 7.
Invoking the Save button brings up a dialog to specify the variable names for the principal
components, with as default PC1, PC2, etc. These variables are added to the Data Table
and become available for any subsequent analysis or visualization. This is illustrated in
Figure 2.6 for seven components based on the ten original bank variables.

2.3.1.2 Saving the result summary


The listing of the PCA results including eigenvalues, loadings and squared correlations can
be saved to a text file by a right-click in the window and selecting Save. The resulting text
file is an exact replica of the result listing.
Principal Components 17

2.3.2 Interpretation
The panel with summary results (Figure 2.5) provides several statistics pertaining to the
variance decomposition, the eigenvalues, the variable loadings and the contribution of each
of the original variables to the respective components.

2.3.2.1 Explained variance


After listing the PCA method (here the default SVD), the first item in the results panel
gives the Standard deviation explained by each of the components. It corresponds to
the square root of the Eigenvalues (each eigenvalue equals the variance explained by the
corresponding principal component), which are listed as well. In the example, the first
eigenvalue is 2.98675, which is thus the variance of the first component. Consequently, the
standard deviation is the square root of this value, i.e., 1.728221, given as the first item in
the list.
The sum of all the eigenvalues is 10, which equals the number of variables, or, more precisely,
the rank of the matrix X  X. Therefore, the proportion of variance explained by the first
component is 2.98675/10 = 0.2987, as reported in the list. Similarly, the proportion explained
by the second component is 0.1763, so that the cumulative proportion of the first and second
components amounts to 0.2987 + 0.1763 = 0.4750. In other words, the first two components
explain a little less than half of the total variance.
The fraction of the total variance explained is listed both as a separate Proportion and as
a Cumulative proportion. The latter is typically used to choose a cut-off for the number
of components. A common convention is to take a threshold of 95%, which would suggest 7
components in the example.
An alternative criterion to select the number of components is the so-called Kaiser criterion
(Kaiser, 1960), which suggests to take the components for which the eigenvalue exceeds 1.
In the example, this would yield 3 components (they explain slightly more than 60% of the
total variance).
The bottom part of the results panel is occupied by two tables that have the original variables
as rows and the components as columns.

2.3.2.2 Variable loadings


The first table that relates principal components to the original variables shows the Vari-
able Loadings. For each principal component (column), this lists the elements of the
corresponding eigenvector. The eigenvectors are standardized such that the sum of the
squared coefficients equals one. The elements of the eigenvector are the coefficients by which
the (standardized) original variables need to be multiplied to construct each component (see
Section 2.3.2.3).
It is important to keep in mind that the signs of the loadings may switch, depending on the
algorithm that is used in the computation. However, the absolute value of the coefficients
remains the same. In the example, setting Method to Eigen yields the loadings shown in
Figure 2.7.
For PC2, PC6, PC7 and PC9, the signs for the loadings are the opposite of those reported
in Figure 2.5. This needs to be kept in mind when interpreting the actual value (and sign)
of the components and when using the components as variables in subsequent analyses (e.g.,
regression) and visualization (see Section 2.4).
18 Principal Component Analysis (PCA)

Figure 2.7: Variable Loadings Using the EIGEN Mehtod

Figure 2.8: Principal Component Calculation

When the original variables are all standardized, each eigenvector coefficient gives a measure
of the relative contribution of a variable to the component in question.

2.3.2.3 Variable loadings and principal components


The detailed computation of the principal components is illustrated for the first two obser-
vations on the first component in Figure 2.8. The names of the standardized variables are
listed in the leftmost column, followed by the principal component loadings (the values in
the PC1 column in Figure 2.5). The next column shows the standardized values for each
variable for the first observation. These are multiplied by the loading in column four, with
the sum listed at the bottom. The value of 2.2624 matches the entry in the first row under
PC1 in Figure 2.6.
Similarly, the value of 1.1769 for the second row under PC1 is obtained at the bottom of
column 6. Similar calculations can be carried out to verify the other entries.

2.3.2.4 Substantive interpretation – squared correlation


The interpretation and substantive meaning of the principal components can be a challenge.
In factor analysis, a number of rotations are applied to clarify the contribution of each
variable to the different components. The latter are then imbued with meaning such as
“social deprivation,” “religious climate,” etc. Principal component analysis tends to stay
away from this, but nevertheless, it is useful to consider the relative contribution of each
variable to the respective components.
The table labeled as Squared correlations lists those statistics between each of the
original variables in a row and the principal component listed in the column. Each row of
the table shows how much of the variance in the original variable is explained by each of the
components. As a result, the values in each row sum to one.
More insightful is the analysis of each column, which indicates which variables are important
in the computation of the matching component. In the example, INTR (61.7%), EQLN
(57.3%) and CAPRAT (51.8%) dominate the contributions to the first principal component.
In the second component, the main contributor is NPL (52.6%), as well as Z (31.1%). This
Visualizing principal components 19

Figure 2.9: Principal Component Scatter Plot

provides a way to interpret how the multiple dimensions along the ten original variables are
summarized into the main principal components.
Since the correlations are squared, they do not depend on the sign of the eigenvector elements,
unlike the loadings.

2.4 Visualizing principal components


Once the principal components are added to the data table, they are available for analysis
in any of the many statistical graphs in GeoDa (see Chapters 7–8 in Volume 1). Two use
cases warrant some special attention: a scatter plot of any pair of principal components, and
the contribution to a component as visualized by means of a parallel coordinate plot.

2.4.1 Scatter plot


A useful graph is a scatter plot of any pair of principal components. For example, such
a graph is shown for the first two components (based on the SVD method) in Figure
2.9. By construction, the principal component variables are uncorrelated, which yields
the characteristic circular cloud plot. A regression line fit to this scatter plot results in a
horizontal line (with slope zero). Through linking and brushing, any point or subset of points
in the scatter plot can be associated with locations on the map.
In addition, this type of bivariate plot is sometimes employed to visually identify clusters
in the data. Such clusters would be suggested by distinct high-density clumps of points in
the graph. Such points are close in a multivariate sense, since they correspond to a linear
combination of the original variables that maximizes explained variance.6
6 For an example, see, e.g., Chapter 2 of Everitt et al. (2011).
20 Principal Component Analysis (PCA)

Figure 2.10: SVD versus EIGEN Results

For example, the density cluster methods from Chapter 20 in Volume 1 could be employed
to identify clusters among the points in the PCA scatter plot. To achieve this, geographical
coordinates would be replaced by coordinates along the two principal component dimensions.
This provides an alternative perspective on multivariate local clusters.
To illustrate the effect of the choice of eigenvalue computation, Figure 2.10 shows a scatter
plot of the second principal component using the SVD method (PC2) and the Eigen method
(PC2e). The sign change is reflected in the perfect negative correlation in the scatter plot.

2.4.2 Multivariate decomposition


Further insight into the connection between a principal component and its constituent
variables can be gained from an investigation of a parallel coordinate plot. In the left-hand
panel of Figure 2.11, the observations from the bottom 10% of the first principal component
are selected (26 observations). The three main contributing variables are shown in the parallel
coordinate plot on the right, in standardized form (s_intr, s_eqln, and s_caprat). The
selected observations suggest a clustering, with low scores on the principal component
corresponding to community banks with a high ratio of interest expenses over total funds, a
low ratio of total equity over customer loans and a low ratio of capital over risk-weighted
assets, all indicators of rather poor performance.
Again, this illustrates how a univariate principal component can be used as a proxy for
multivariate relationships, especially when a high percent of the variance of the original
variables is explained, with a distinct and small number of contributing variables to the
component.
Spatializing Principal Components 21

Figure 2.11: Principal Components and PCP

2.5 Spatializing Principal Components


A distinct perspective toward principal component analysis is taken by spatializing the visu-
alization, i.e., by directly connecting the value for a principal component to its geographical
location. As pointed out before, care must be taken in interpreting the results, since the
sign for the component may depend on the method used to compute it. Two special forms
of visualization are considered.
One is a thematic map, which may suggest patterns in the values obtained for the components.
Mapping of principal components was pioneered in botany and numerical ecology, going
back as far as an early paper by Goodall (1954), where principal component scores were
mapped using contour lines. A recent review is given in Dray and Jombart (2011).
Dray and Jombart (2011) also consider the global spatial autocorrelation among individual
principal components, specifically Moran’s I, as well as the principal components associated
with its extension to multivariate analysis (Dray et al., 2008).7 However, they do not consider
local indicators of spatial autocorrelation. The application of those local cluster methods to
principal components provides a univariate alternative to the multivariate cluster analysis
considered in Chapter 18 of Volume 1.
7 A related literature pertains to so-called spatial principal components or spatial factors, but with a focus

on global spatial autocorrelation, e.g., Jombart et al. (2008), Frichot et al. (2012).
22 Principal Component Analysis (PCA)

Figure 2.12: Box Map of Second Principal Component

2.5.1 Principal component map


Figure 2.12 illustrates the caution needed when interpreting a map of principal component
values. In the left-hand panel, a box map is shown of the second principal component
obtained through the SVD method, PC2. On the right is a box map of the same component,
but now computed using the Eigen method, PC2e. Clearly, what is high on the left, is low on
the right. Specifically, the two upper outliers on the left (observations in the red rectangle)
become two lower outliers on the right (observations in the blue rectangle). As a result,
what is high or low is less important than the notion of multivariate similarity. Observations
in the same category share a multivariate similarity that is summarized in the principal
component.

2.5.2 Univariate cluster map


A principal component can be treated as any other variable in a local cluster analysis. For
example, in the left-hand panel of Figure 2.13, the 19 observations identified as the cores of
Low-Low clusters are selected, based on the Local Geary statistic, using queen contiguity for
the point locations, 99,999 permutations and p < 0.01.8 The matching observations in the
parallel coordinate plot on the right illustrate a clustering along multivariate dimensions.
The univariate local cluster map for a principal component can thus be used as a proxy for
multivariate clustering of the variables that are the main contributors to the component.

2.5.3 Principal components as multivariate cluster maps


An even closer look at the correspondence between univariate clustering for a principal
component and its multivariate counterpart is offered by Figure 2.14. On the left is the same
Local Geary cluster map as in Figure 2.13, but now linked to a multivariate Local Geary
cluster map for the three main contributing variables (s_intr, s_eqln, and s_caprat). The
latter is also based on queen contiguity, with 99,999 permutations and p < 0.01. The total
number of significant locations in both maps is very similar: 39 in the univariate map and
8 See Chapter 17 in Volume 2 for technical details.
Spatializing Principal Components 23

Figure 2.13: Principal Component Local Geary Cluster Map and PCP

Figure 2.14: Principal Component and Multivariate Local Geary Cluster Map

43 in the multivariate map. Interestingly, the number of spatial outliers is almost identical,
with two of them identified for the same locations on the island of Sicily, highlighted by the
blue rectangle.
There is also close correspondence between several cluster locations. For example, the High-
High cluster in the north-east Trentino region and the Low-Low cluster in the region of
Marche are shared by both maps (highlighted within a green rectangle). While these maps
may give similar impressions, it should be noted that in the multivariate Local Geary each
variable receives the same weight, whereas the principal component is based on different
contributions by each variable.
These findings again suggest that in some instances, a local spatial autocorrelation analysis
for one or a few dominant principal components may provide a viable alternative to a
full-fledged multivariate analysis. This constitutes a spatial aspect of principal components
analysis that is absent in standard treatments of the method.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Another Random Scribd Document
with Unrelated Content
the officers of order and decorum—could such a purpose be
supposed to be thought of? She dresses with neatness, according to
the established order, but always with such modesty that nothing is
offensive to the chastest eye. She understands the range of her
activity and of her affections. It is within the circle of family and
relatives. All her accomplishments are to make her home pleasing.
Duties and places are settled. She lives for those to whom she
belongs, and who also belong to her. Her smiles are for her husband,
and for her children, and her relations. She has no thought of going
abroad to shine, nor to waste the time and money which belong to
her family upon strangers. She never dreams that she has any
mission which calls her away from her home. She has no call to
"clothe the ragged," wash other people's dirty children, reform evil-
doers, "convert the heathen," nor support "Society!" (These are
some of the phrases which you will hear among the Barbarian
women).
Where women have not husbands, none the less they have relatives,
and their home is with them. They have a right to this home, and
are bound to do their duty in it, submissively, usefully, and quietly.
If the Western Barbarians would see to it that all women, married or
unmarried, were duly cared for in homes of relatives, as of right, and
that they also made themselves welcome there by their usefulness
and obedience, they would find an end of that agitation as to
Women's Rights existing among them. Rights would be as
indisputable as duties—and the first of these would be a quiet,
modest, and rational obedience to their natural protectors, who, in
turn, would be bound to respect and protect them. And if by any
strange chance a woman was absolutely without relatives (a thing
nearly impossible in our Flowery Land), then the State should see to
it that she had a suitable home.
The education of woman, in a well-ordered Society, is also fixed and
clear. It has immediate relation to her position and her duties.
She is from the first never disturbed in the natural order. She sees
her relatives always quiet, modest, obedient. She never thinks this
state of things to be wrong. She perceives the manner of female life;
its seclusion, its devotion to the family, its purpose, and end. There
is no complexity about it, no outside glitter, no field for show, no
seeking for excitement and display. All her duties are at home—her
happiness is there; there she is to be attractive, and there she is to
attract—the love and respect of her husband, the regard of her
relatives, the affection and obedience of her children!
So, her education needs no straining after effect. It looks directly to
her duties, to her natural function and place; and to those
accomplishments, of mind and of person, which shall enable her to
be happy with books, with music, and the like; and shall add to the
pleasures of her home.
All these things are common-place with us—so simple as to appear
trivial. Our Illustrious wives and mothers could not understand the
reasons for their elaboration—they have never seen the women of
the Western Barbarians!
The position of women in the Social system of the West, on the
whole, is the most remarkable thing in it.
I have made sufficiently suggestive remarks in the progress of these
Observations; and only now have to add a word or two upon the
general effect.
It gives a wonderful life, restlessness, and colour to the whole
aspect of Barbarian life. Think of all the women in our Illustrious
Land, at once leaving their homes, the seclusion of their orderly
houses and lives, and rushing everywhere with the men, over the
Land! And, not only so, dressed in splendid gaiety of colour, and
adorned with gems and feathers, crowding into all places of
amusement and of travel!
Nor this only, but showing themselves, in public places, with men,
where paintings and sculpture, and things here only seen by men
alone, are exhibited! And, often, so dressed as to cause even the
man to blush!
Why, the face of social life is completely altered. Instead of gravity,
dignity, and an undivided attention to the duties of daily life,
everything is rendered restless, confused; there seems to be no
natural order, nor scarcely natural (cultured) decorum.
But we must not be misled. Nature is too strong to be pushed aside
—and with cultivation, even though imperfect, the moral instinct
lives and saves. Habit, too, "is a second nature;" (as our divine
Confutzi says); and what would be so overwhelming, if at once
done, being usual, necessarily has been subordinated to some rule—
and made, at least, tolerable.
And now, in drawing these Observations to an end, perhaps, I may
add, in respect of my poor and unworthy thoughts, that if I have
said amiss, and which offends, I beg our Illustrious will pardon. To
our Literati, exalted in wisdom, there is but little to which they may
curiously look—but to our people, if any there be with whom some
discontent may have been caused by too close intimacy with
Missionaries in our ports; by these let my poor Observations be
studiously pondered—that they may praise the Sovereign Lord of
Heaven, who has given them to live in the Central and Illustrious
Kingdom; where a true morality and a true worship are known; and
where due ORDER AND PEACE, resting upon the unchangeable
Heavenly order and peace, are established!
Here, are no brutal worship of Force, and admiration of bloody
plunders. Content to the due ordering of affairs, and with peace
within, our Illustrious Realm seeks no aggrandisement, dreams of no
conquests; and wishes to do nothing but good. It has no fears for its
own position, nor jealousy of others. It is simply calm, strong, wise,
and self-poised. It demands no more from others abroad than that it
may peacefully live; and be treated with that respect which it
accords to those who practise moderation and virtue.
FINIS.
Barrett, Sons & Co., Printers, 21, Seething Lane, London, E.C.
*** END OF THE PROJECT GUTENBERG EBOOK SOME
OBSERVATIONS UPON THE CIVILIZATION OF THE WESTERN
BARBARIANS, PARTICULARLY OF THE ENGLISH ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like