Lecture 6
Lecture 6
(GIST 4302/5302)
Guofeng Cao
Department of Geosciences
Texas Tech University
Outline of This Week
• Last week, we learned:
– spatial point pattern analysis (PPA)
– focus on location distribution of ‘events’
• This week, we will learn:
– spatial autocorrelation
– global measures of spatial autocorrelation
– local measure of spatial autocorrelation
Spatial Autocorrelation
• Tobler’s first law of geography
• Spatial auto/cross correlation
4
Spatial Autocorrelation of Areal
Data
5
Positive spatial autocorrelation
2002 population
- high values density
6
Source: Ron Briggs of UT Dallas
Negative spatial autocorrelation competition for space
- high values
surrounded by nearby low values Grocery store density
8
Spatial Neighbors
• Contiguity-based neighbors
– Zone i and j are neighbors if zone i is contiguity or
adjacent to zone j
– But what constitutes contiguity?
• Distance-based neighbors
– Zone i and j are neighbors if the distance between them
are less than the threshold distance
– But what distance do we use?
9
Contiguity-based Spatial Neighbors
• Sharing a border or boundary
– Rook: sharing a border
– Queen: sharing a border or a point
Which use?
10
Example
1st
order
Nearest
neighbor
12
Distance-based Neighbors
• How to measure distance between
polygons?
• Distance metrics
– 2D Cartesian distance (projected data)
– 3D spherical distance/great-circle distance
(lat/long data)
• Haversine formula
13
Distance-based Neighbors
• k-nearest neighbors
17
Spatial Weight Matrix
• Spatial weights can be seen as a list of
weights indexed by a list of neighbors
• If zone j is not a neighbor of zone i, weights
Wij will set to zero
– The weight matrix can be
illustrated as an image
– Sparse matrix
18
A Simple Example for Rook case
• Matrix contains a:
– 1 if share a border
– 0 if do not share a border
22
Row vs. Row standardization
A B C Divide each
number by the
row sum
D E F
Total number of neighbors Row standardized
--some have more than others --usually use this
Row
Row
A B C D E F Sum
A B C D E F Sum
A 0.0 0.5 0.0 0.5 0.0 0.0 1
A 0 1 0 1 0 0 2
B 0.3 0.0 0.3 0.0 0.3 0.0 1
B 1 0 1 0 1 0 3
C 0.0 0.5 0.0 0.0 0.0 0.5 1
C 0 1 0 0 0 1 2
D 1 0 0 0 1 0 2 D 0.5 0.0 0.0 0.0 0.5 0.0 1
E 0 1 0 1 0 1 3 E 0.0 0.3 0.0 0.3 0.0 0.3 1
F 0 0 1 0 1 0 2 F 0.0 0.0 0.5 0.0 0.5 0.0 1
23
General Spatial Weights Based on
Distance
• Decay functions of distance
– Most common choice is the inverse (reciprocal) of the distance
between locations i and j (wij = 1/dij)
– Other functions also used
• inverse of squared distance (wij =1/dij2), or
• negative exponential (wij = e-d or wij = e-d2)
24
25
Example
• Compare three different weight matrix in
images
26
Measure of Spatial
Autocorrelation
27
Global Measures and Local Measures
• Global Measures
– A single value which applies to the entire data set
• The same pattern or process occurs over the entire
geographic area
• An average for the entire area
• Local Measures
– A value calculated for each observation unit
• Different patterns or processes may occur in different
parts of the region
• A unique number for each location
• Global measures usually can be decomposed
into a combination of local measures
28
Global Measures and Local Measures
• Global Measures
– Join Count
– Moran’s I, Geary’s C, Getis-Ord’s G
• Local Measures
– Local Moran’s I , Geary’s C, Getis-Ord’s G
29
Join (or Joint or Joins) Count Statistic
31
Gore/Bush Presidential Election 2000
Actual
Jbb 60
Jgg 21
Jbg 28
Total 109 32
Join Count Statistic for Gore/Bush 2000 by State
• The expected number of joins is calculated based on the proportion of votes each
received in the election (for Bush = 109*.499*.499=27.125)
• There are far more Bush/Bush joins (actual = 60) than would be expected (27)
• Positive autocorrelation
• There are far fewer Bush/Gore joins (actual = 28) than would be expected (54)
• Positive autocorrelation
• No strong clustering evidence for Gore (actual = 21 slightly less than 27.375)
33
Moran’s I
• The most common measure of Spatial Autocorrelation
• Use for points or polygons
– Join Count statistic only for polygons
• Use for a continuous variable (any value)
– Join Count statistic only for binary variable (1,0)
• Where:
N is the number of observations (points or polygons)
x is the mean of the variable
Xi is the variable value at a particular location
Xj is the variable value at another location
Wij is a weight indexing location of i relative to j 35
Moran’s I
• Expectation of Moran’s I under no spatial
autocorrelation
E(I) 1/ (N1)
36
Moran’s I and Correlation Coefficient
(y (x
numerator (top) to the measures
i y) 2
i x) 2
of spatial association discussed
earlier if we view Yi as being the
i 1 i 1 Xi for the neighboring polygon
n n
(see next slide)
n n n n
n n
N w ij (x i x)(x j x)
i 1 j1
w
i 1 j1
ij (x i x)(x j x)/ w ij
i 1 j1
n n n
( w ij ) (x i x) 2
= n n
i i
i 1 j1 i 1
2 2
(x x) (x x)
Spatial i 1 i 1
auto-correlation n n 38
(yi y)
i 1
2
(x i x)
i 1
2 Spatial
weights
n n
Yi is the Xi for the
n n n n
i
(x
i 1
x) 2
i
(x
i 1
x) 2
Moran’s I n n 39
40
Moran Scatter Plots
We can draw a scatter diagram between these two variables (in
standardized form): X and lag-X (or W_X)
High/High
Low/High positive SA
negative SA
Low/Low High/Low
positive SA negative SA
42
Moran Scatterplot: Example
43
Moran’s I for rate-based data
• Moran’s I is often calculated for rates, such as crime
rates (e.g. number of crimes per 1,000 population) or
infant mortality rates (e.g. number of deaths per 1,000
births)
• An adjustment should be made, especially if the
denominator in the rate (population or number of births)
varies greatly (as it usually does)
• Adjustment is know as the EB adjustment:
– see Assuncao-Reis Empirical Bayes Standardization
Statistics in Medicine, 1999
• GeoDA software includes an option for this adjustment
44
Geary’s C
• Calculation is similar to Moran’s I,
– For Moran, the cross-product is based on the deviations from the mean
for the two location values
– For Geary, the cross-product uses the actual values themselves at each
location
– Covariance vs. variogram
n n
N w ij (x i x j ) 2
i 1 j1
C n n n
2( w ij ) (x i x) 2
i 1 j1 i 1
45
Geary’s C vs. Moran’s I
• Interpretation is very different, essentially the opposite!
Geary’s C varies on a scale from 0 to 2
– 0 indicates perfect positive autocorrelation/clustered
– 1 indicates no autocorrelation/random
– 2 indicates perfect negative autocorrelation/dispersed
• Can convert to a -/+1 scale by: calculating C* = 1 – C
• Morain’s I is more often used
46
Statistical Significance Tests for Geary’s C
• Similar to Moran
• Again, based on the normal frequency distribution with
C E (C ) Where: C is the calculated value for Geary’s C
Z
Serror( I ) from the sample
E(C) is the expected value if no
autocorrelation
S is the standard error
however, E(C) = 1
47
Hot Spots and Cold Spots
• What is a hot spot?
– A place where high values
cluster together
• What is a cold spot?
– A place where low values
cluster together
48
Getis-Ord General/Global G-Statistic
• The G statistic distinguishes between hot spots and cold spots. It
identifies spatial concentrations.
– G is relatively large if high values cluster together
– G is relatively low if low values cluster together
• The General G statistic is interpreted relative to its expected value
– The value for which there is no spatial association
– G > (larger than) expected value potential “hot spots”
– G < (smaller than) expected value potential “cold spots”
• A Z test statistic is used to test if the difference is statistically
significant
• Calculation of G based on a neighborhood distance within which
cluster is expected to occur
Getis, A. and Ord, J.K. (1992) The analysis of spatial association by use of
distance statistics Geographical Analysis, 24(3) 189-206
49
Formulae of General G
50
Comments on General G
• General G will not show negative spatial autocorrelation
• Should only be calculated for ratio scale data
– data with a “natural” zero such as crime rates, birth rates
• Although it was defined using a contiguity (0,1) weights
matrix, any type of spatial weights matrix can be used
– ArcGIS gives multiple options
• There are two global versions: G and G*
– G does not include the value of Xi itself, only “neighborhood”
values
– G* includes Xi as well as “neighborhood” values
• Significance test on General G and G* follows the
similar procedure as used for Moran’s I
51
Local Measures of
Spatial Autocorrelation
52
Local Indicators of Spatial Association (LISA)
• Local versions of Moran’s I, Geary’s C, and the Getis-
Ord G statistic
• Moran’s I is most commonly used, and the local version
is often called Anselin’s LISA, or just LISA
See:
Luc Anselin 1995 Local Indicators of Spatial
Association-LISA Geographical Analysis 27: 93-115
53
Local Indicators of Spatial Association (LISA)
54
Example:
55
Calculating Anselin’s LISA
• The local Moran statistic for areal unit i is:
I i zi wij z j
j
5
4
1
6 7
2
3
57
Anhui 1 0 1 1 1 1 1 0 5
Zhejiang 2 1 0 1 1 0 0 1 4
Jiangxi 3 1 1 0 0 0 1 0 3
Jiangsu 4 1 1 0 0 0 0 1 3
Henan 5 1 0 0 0 0 1 0 2
Hubei 6 1 0 1 0 1 0 0 3
Shanghai 7 0 1 0 1 0 0 0 2 1/3
Row Standardized Spatial Weights Matrix
Code Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai Sum
I i zi wij z j
Shanghai 7 0.00 0.50 0.00 0.50 0.00 0.00 0.00
Z-Scores for row Province and its potential neighbors
Anhui Zhejiang Jiangxi Jiangsu Henan Hubei Shanghai
Zi
Anhui 2.101 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414) j
Zhejiang 0.387 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Jiangxi
Jiangsu
(0.572)
(0.051)
2.101
2.101
0.387
0.387
(0.572)
(0.572)
(0.051)
(0.051)
(0.281)
(0.281)
(0.171)
(0.171)
(1.414)
(1.414)
zj
Henan (0.281) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Hubei (0.171) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
Shanghai (1.414) 2.101 0.387 (0.572) (0.051) (0.281) (0.171) (1.414)
w x ij j
G (d )
j
x
i
j
j
E (Gi (d ))
j
Interpreted relative to expected value
if randomly distributed. n 1
61
Moran Scatter Plot for GDI vs AL
Bivariate LISA
• Moran’s I is the correlation between X
and Lag-X--the same variable but in
nearby areas
– Univariate Moran’s I
• Bivariate Moran’s I is a correlation
between X and a different variable in
nearby areas.
Moran Significance Map for GDI vs. AL
62
Bivariate LISA
and the Correlation Coefficient
• Correlation Coefficient is the
relationship between two
different variables in the same
area
• Bivariate LISA is a correlation
between two different
variables in an area and in
nearby areas.
63
Bivariate Moran Scatter Plot
High/High
Low/High positive SA
negative SA
Low/Low High/Low
positive SA negative SA
64
Summary
• Spatial autocorrelation of areal data
• Spatial weight matrix
• Measures of spatial autocorrelation
• Global Measure
– Moran’s I/Geary’s C/General G and G*
• Local
• LISA: Moran’s I/Geary’s C/General G and G*
• Bivariate LISA
– Significance test
65
• End of this topic
66