0% found this document useful (0 votes)
3 views

Spatial Autocorrelation

The document discusses spatial autocorrelation, which measures the correlation of a variable with itself based on the spatial proximity of observations. It distinguishes between global and local autocorrelation, with a focus on global measures such as Moran's I, which assesses overall clustering in data. The document also explains how to conduct significance tests for Moran's I using R, including analytical and Monte Carlo methods.

Uploaded by

sjoerddklinkert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Spatial Autocorrelation

The document discusses spatial autocorrelation, which measures the correlation of a variable with itself based on the spatial proximity of observations. It distinguishes between global and local autocorrelation, with a focus on global measures such as Moran's I, which assesses overall clustering in data. The document also explains how to conduct significance tests for Moran's I using R, including analytical and Monte Carlo methods.

Uploaded by

sjoerddklinkert
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Spatial Autocorrelation

Diego LEGROS

2022-10-23

Spatial autocorrelation
Introduction
In the last classes, we have defined how the observations can be linked between themselves in space. - In this
lesson we begin to explore the analysis of local spatial autocorrelation statistics. Now, because we are able to
measure the proximity between spatial units, we are able to define a measure that allows to judge the form of
the relation as well as the intensity of this relation. This measure is called spatial autocorrelation. Spatial
autocorrelation is the correlation among data values, strictly due to the relative location proximity of the
objects that the data refer to. Spatial autocorrelation is defined as the positive or negative correlation of a
variable with itself due to the spatial location of the observations.
Remember in the previous lessons when we spoke about Tobler’s first law of geography : “Everything is
related to everything else, but near things are more related than distant things”. Spatial autocorrelation is
the measure of this correlation between near things. This spatial autocorrelation can be the results of :
• unobservable processes or difficult-to- quantify processes
• in the context of econometric specification, spatial autocorrelation can come from the omission of some
explanatory variables that are spatially correlated, errors on the choice of scale on which the spatial
phenomeno is analysed etc.
There are two kinds of indices for spatial autocorrelation : - Global autocorrelation - It is a measure of overall
clustering in the data. - It yields only one statistic to summarize the whole study area (Homogeneity).
• Local autocorrelation
– Global statistics are based on the assumption of a spatial stationary process : spatial autocorrelation
would be the same throughout the space.
– However this hypothesis is all the less realistic as the number of observations is high.
In this section, the focus is on global measures of spatial autocorrelation.

Spatial autocorrelation
• In the presence of spatial autocorrelation, we observe that the value of a variable for a spatial unit is
linked to the values of the same variable for the neighbouring obervations.
• Spatial autocorrelation is positive when the similar values of the variable to be studied are grouped
geographically
• Spatial autocorrelation is negative when dissimilar values of the variable to be studient are grouped
geographically
• If observations are randomly distributed over space, we say that there is no spatial autocorrelation.

Moran’s diagram
• Moran’s diagram allows a quick reading of the spatial structure of observations.

1
• It is a scatter which represents the spatially lagged variable W y on the y-axis as a function of the
variable y on the x-axis
• The scatter plot divides the space into 4 quadrants. Each quadrants corresponds to a type of particular
spatial association.

W yi

LH HH

yi

LL HL

Figure 1: Moran Diagram.

The High-High (HH) quadrant corresponds to the case where the high value of the demeaning variable
yi∗ = yi − y is surronded by high values of the neighborhood. High (respectively) means above (or below)
the arithmetic average. The Low-Low (LL) quadrant correspond to the opposite case where low values of
yi∗ is surrounded by low values in the neighborhood. - If a large number of observations is within these 2
quadrants, we say that we observe the presence of positive spatial autocorrelation. - The High-Low (HL)
quadrant corresponds to the case were a high value of yi∗ is surrounded by low values of the neighborhood.
The Low-High (LH) quadrant corresponds to the opposite case were a low value of yi∗ is surrounded by high
values of the neighborhood. If a large number of observations is within these 2 quadrants, we say that we
observe the presence of negative spatial autocorrelation
## Loading required package: sp
## Please note that rgdal will be retired by the end of 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
##
## rgdal: version: 1.5-27, (SVN revision 1148)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: /usr/share/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: /usr/share/proj
## Linking to sp version:1.4-6
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

2
Figure 2: Police

moran.plot(police$INC,queen.wl)

81 80
78
55 79
8000
spatially lagged police$INC

41 51
48
7500

47
7000
6500

32

5000 6000 7000 8000 9000 10000

police$INC

Global autocorrelation indices


The three most famous statistic index for global autocorrelation are : Moran’s I, Geary’s C. Moran’s I
(1950) is the mostly used measure of global spatial autocorrelation. It was developed by Patrick Alfred
Pierce Moran, an Australian statistician. It can be applied to detect departures from spatial randomness.
Departures from randomness indicate spatial pattern such as clusters or trends over space. Moran’s I is based
on cross-products to measure spatial autocorrelation. It measures the degree of linear association between a
vector of realizations, noted x, of a geo-referenced variable X and its spatial lag LX, i.e a weighted average
of the neighbouring values.

Moran’s I
The Moran’s I statistic is given by :
Pn Pn
i=1 j=1,j6=i wij (xi − x)(xj − x)
I= Pn (1)
S0 i=1 (xi − x)2 /n

3
The formula you see may look intimidating but it is nothing but the formula for standard correlation expanded
to incorporate the spatial weight matrix.
Pn Pn
n i=1 j=1,j6=i wij (xi − x)(xj − x)
I= × Pn 2
(2)
S0 i=1 (xi − x)

Pn Pn
n i=1 (xi − x) j=1,j6=i wij (xj − x)
I= × Pn (3)
S0 i=1 (xi − x)2
Pn Pn
where S0 = i=1 j=1 wij and wij is an element of the spatial weight matrix that measures spatial distance
or connectivity between regions i and j.

n (x − x)0 W (x − x)
I= × (4)
S0 (x − x)0 (x − x)

If the W matrix is row-standardized, S0 = n, then the Moran’s I is written:


Pn Pn ∗
i=1 j=1,j6=i wij (xi − x)(xj − x)
I= Pn 2
(5)
i=1 (xi − x)

Pn Pn ∗
i=1 (xi − x) j=1,j6=i wij (xj − x)
Pn (6)
i=1 (xi − x)2

In matrix notation, we have:


(x − x)0 W ∗ (x − x)
I= (7)
(x − x)0 (x − x)

Values of Moran’s I range from -1 (perfect dispersion) to 1 (perfect correlation). A zero value indicates a
random spatial pattern. Note that :
Pn
(x − x)(yi − y)
βbOLS = Pn i
i=1
2
(8)
i=1 (xi − x)

The Moran index I is equivalent to the slope of a linear regression of the spatial lag W y on the observation
vector y measured in deviation from their means. It is, however, not equivalent to the slope of y on W y
which would be a more natural way.

Significance test of Moran’s I


The statistic

I − E(I)
VNorm(I)

will be approximately normally distributed with mean zero and variance one, so that p-values may be obtained
by comparison with the standard normal distribution.

4
Randomisation hypothesis
The other form of the test is a more formal working of the randomisation idea the inference over Moran’s I
is usually conducted under the randomisation hypothesis. In this case, no assumption is made about the
distribution of the yi ’s but it is assumed that any permutation of the yi ’s against the polygons is equally
likely. This hypothesis means that each value can be equally likely be observed at each location. Thus, the
null hypothesis is still one of “no spatial pattern”, but it is conditional on the observed data. The variance of
the Moran’s I obtained under the permutation hypothesis is a little more complicated as we will see in a few
moment
The distribution of the statistic
I − E(I)
VRand(I)

is also close to the Normal distribution and the quantity in this expression can also be compared to the
normal distribution with mean zero and variance one, to obtain p-values. Both kinds of test are available in
R via the moran.test function shown later.
We first present the significance test of Moran’s I for the normal approximation making use of a row-
standardized weights matrix.

Significance test of Moran’s I


• The global Moran can be standardized as the following equation using the expectation value E(I) and
the variance V (I). The test statistic is:

I − E(I)
Z(I) = p → N (0, 1) (9)
V (I)

Because Z(I) follows the standard normal distribution N (0, 1) in an asymptotical manner, it is possible to
conduct hypothesis testing using the null hypothesis that sptial autocorrelation does not exist under a given
W . When normality of yi is assumed, the expected value and variance may be given by (Cliff and Ord, 1981)

1
E(I) = − (10)
n−1

The variance of the Moran I index under the normal approximation is:

n2 S1 − nS2 + 3S02
V (I) = − E(I)2 (11)
(n2 − 1)S02

Pn Pn Pn Pn Pn Pn Pn 2
1 2
with S0 = i=1 j=1 wij , S1 = 2 i=1 j=1 (wij + wji ) , S2 = i=1 j=1 wij + j=1 wji =
Pn 2
Pn
i=1 (wi. + w.i ) with Wi. = j=1 wij .

The test decision (right-sided test) is: z(I) > Z1−α ⇒ Rejet H0 (positive spatial autocorrelation). We are
interested in the distribution of the following statistic :

I − E(I)
Ti = p (12)
V (I)

5
Significance test of Moran’s I
The following theorem gives the moments of Moran’s I under randomization. Under permutation, we have :

1
E(I) = − (13)
n−1

and
   
2 n (n2 − 3n + 3)S1 − nS2 + 3S02 − b2 (n2 − 2)S1 − 2nS2 + 6S02
E(I ) = (14)
(n − 1)(n − 2)(n − 3)S02

Significance test of Moran’s I


Pn Pn Pn Pn Pn
• where S0 = i=1 j=1 wij , S1 = i=1 j=1 (wij + wji )2 /2, S2 = i=1 (wi. + w.i )2 , where wi. =
Pn Pn
j=1 wij , and w.i = j=1 wji . Then: Then:

V (I) = E(I 2 ) − E(I)2 (15)

It is important to note that the expected value of Moran’s I under normality and randomization is the same.
Inference :
1. If I ≥ E(I), then a spatial unit tends to be connected by locations with similar attributes: Spatial
clustering (low/low or high/high). The strength of positive spatial autocorrelation tends to increase
with I − E(I).
2. If I ≤ E(I), observations will tend to have dissimilar values from their neighbors: Negative spatial
autocorrelation (low/high or high/low).

Spatial autocorrelation with R


Spatial autocorrelation with R
• Remember that in the presence of spatial autocorrelation, we observed that value of a variable for an
observation is linked to the values of the same variable for the neighbouring observations.
– Spatial autocorrelation is positive when similar values of the interest variable are grouped geo-
graphically.
– Spatial autocorrelation is negative when the dissimilar values of the interest variable come together
geographically.
• When observations are randomly distributed, there is no spatial autocorrelation.

Moran I index using R


• Moran’s I statistic for spatial autocorrelation is implemented in spdep package.
• Remember, for example, that we are testing states that “the GDP values are randomly distributed
across counties (countries) following completely a random process”.
• In R, there are two methods to testing this hypothesis: an analytical method and a Monte Carlo
method.
• There are two separate functions,
– Analytical method : the function moran.test implements this approach where the inference is
based
∗ on a normality assumption of the yi
∗ or randomization assumption of the yi

6
Moran I index using R
• Monte Carlo method: the function moran.mc, is based on a Monte Carlo appraoch to compute the
p-value of the I statistic.
• The test is really simple
– Keep the map units (polygons) constant
– Randomly reassign values to the map units
– Calculate Moran’s I. Save the value
– Return to step 2, repeat step 3, after the user specified number of iterations stop
• the result is a list of list of Moran’s I values. These values represent observations of “randomness”.
• Then sort the list from lowest to highest

Moran I index using R


• If our observation of Moran’s I is near the beginning or end of the list we call it “a significant departure
from randomness”.
• The p-value consists of counting the number of simulated test statistic values more extreme than the
one observed. If we are interested in knowing the probability of having simulated values more extreme
than ours, we identify the side of the distribution of simulated values closest to our observed statistic,
count the number of simulated values more extreme than the observed statistic then compute p as
follows:

nextreme + 1
p=
n+1
• where nextreme is the number of simulated values more extreme than our observed statistic and n is the
total number of simulations.
• Note that this is for a one-sided test.

Moran I index using R


• If our real case was the last element in the list we would say the p-value was 0.001.
• Both functions (moran.test and moran.cm) take a variable name or numeric vector and a spatial
weighted list objects (listw), in that order, as mandatory parameters.
• The permutation test also requires the number of permutations as a third mandatory parameter.
• The different parameters and options for moran.test are revealed by a call to moran.test.
• Of the optional parameters, two are very important.

Moran I index using R


• The randomization option is set to TRUE by default, which implies that in order to get inference based
on a normal approximation, it must be explicitly set to FALSE.
• Similarly, the default is a one-sided test, so that on order to obtain the results for (more commonly
used) two-sided test, the option alternative must be set to two.sided.
• Note also that the zero.policy option is set to FALSE by default, which means that islands result in a
missing value code (NA).
• Setting this option to textit{TRUE} will set the spatial lag for islands to the customary zero value.
• To illustrate this use the variable CRIME and the weights list colqueen with the normal approximation
in a two-sided test

Moran I index using R with Police Data


##
## Moran I test under normality
##
## data: Police$CRIME

7
## weights: queen.wl
##
## Moran I statistic standard deviate = 1.7201, p-value = 0.08542
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.103588680 -0.012345679 0.004542775

Moran I index using R with Police Data


• Note how, unlike previous practice, the object created by moran.test function was not assigned to a
specific variable.
• If you simply want to get the results, this is not necessary.
• By entering the test this way, you indirectly invoke the print function for the object.
• If you want to access the individual results of the test for further processing, you should assign the
moran.test to an object and then print that object.

Moran I index using R with Police Data


• For example, using INC variable:
##
## Moran I test under normality
##
## data: Police$INC
## weights: queen.wl
##
## Moran I statistic standard deviate = 1.3288, p-value = 0.1839
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.077215210 -0.012345679 0.004542775

Moran I index using R


• Note how the expectation and variance of the Moran’ I test statistic is the same for both CRIME and
INC.
• In fact, under the normal approximation, theses moments only depend on the spatial weights, and not
on the variable under consideration.
• The MoranINC object belongs to the class htest, which is a generic class in R designed to hold the
results of test statistics.
• Technically, it is a list and all its items can be accessed individually (provided you know what they are
called; check the help).

Moran I index using R with Police Data


• For example,
class(MoranINC)

## [1] "htest"
MoranINC$statistic

## Moran I statistic standard deviate


## 1.328794

8
• shows the information stored in the statistic element of the list.
• When the null hypothesis considered is based on the randomization distribution, the randomization
option does not need to be set.

Moran I index using R with Police Data and CRIME variable


moran.test(Police$CRIME,queen.wl,randomisation=TRUE,alternative="two.sided")

##
## Moran I test under randomisation
##
## data: Police$CRIME
## weights: queen.wl
##
## Moran I statistic standard deviate = 2.2072, p-value = 0.0273
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.103588680 -0.012345679 0.002758842
• Note how the value of the statistic and its expectation do not change relative to the normal case, only
the variance is different (and thus the z-value and associated p-values)

Moran I index using R: Monte Carlo Approach


• A moran’s I test statistic with inference based on random permutation (Monte Carlo approach) is
contained in the function moran.mc.
• As in moran.test it takes the variable name (or vector) and the weights list file ( .listw object) as the
first two mandatory arguments?
• It also needs the number of permutations as the thrid argument ( .nsim).
• Since the rank of the observed statistic is computed relative to the reference distribution of statistics
for the permuted data sets, it is good practice to set this numer to something ending on 9 (such as 99
or 999).

Moran I index using R: Monte Carlo Approach


• This will yield nicely rounded pseudo p-values (like 0.01 or 0.01).
• The other options are identical to those for the moran.test function.
• For example, using the CRIME variable with the queen.wl weightd, setting the numberof permutation
to 99 and leaving everything else to default (e.g, a one sided test).
MoranCrimePermtutations<-moran.mc(Police$CRIME,queen.wl,99)
print(MoranCrimePermtutations)

##
## Monte-Carlo simulation of Moran I
##
## data: Police$CRIME
## weights: queen.wl
## number of simulations + 1: 100
##
## statistic = 0.10359, observed rank = 99, p-value = 0.01
## alternative hypothesis: greater

9
Moran I index using R: Monte Carlo Approach
• Note how the number of simulations + 1 is round to 100.
• None of the permuted data sets yielded a Moran’s I greater than the observed value of 0.4858, hence a
pseudo p-value of 0.04.
• This is obtained as the ratio of the number of values greater or equal to observed statistic + 1 over the
number of simulations + 1. in the example, this is (0 + 1)/(99 + 1).

Moran I index using R: Monte Carlo Approach


• In this example, the results of the procedure (MoranCrimePermtutations) were assigned to an object of
class htest, which must be printed to reveal its contents.
• Our item in the list MoranCrimePermtutations is a vector with the computed statistics for each of the
permuted data set.
• This is contained in the item $res, as in :
MoranCrimePermtutations$res

## [1] -9.115539e-02 -1.525690e-02 -4.712817e-03 8.320093e-02 1.101499e-01


## [6] -8.194625e-02 -1.956049e-02 -4.438371e-02 2.343791e-02 2.527896e-02
## [11] -1.971019e-02 2.853401e-02 4.351357e-02 8.646284e-02 -6.186105e-04
## [16] 7.308508e-03 -4.337899e-03 3.664944e-02 -1.668983e-02 -7.778469e-02
## [21] -6.596509e-02 -3.574770e-02 2.176723e-02 3.013287e-02 -1.403317e-02
## [26] 2.957833e-02 5.770656e-02 -1.109370e-01 -9.042071e-02 -1.432679e-02
## [31] 6.104100e-02 -5.562834e-02 -1.003989e-02 -5.900202e-02 4.073617e-02
## [36] -6.175813e-02 -1.054571e-01 1.733101e-02 -5.225752e-02 -3.490510e-02
## [41] -3.800283e-02 3.022623e-02 -1.547060e-02 -1.551195e-02 -2.457990e-02
## [46] -1.604290e-02 -4.432586e-02 -9.204234e-02 -5.801599e-02 -4.139917e-02
## [51] 1.573387e-02 -4.910870e-03 -8.718079e-02 9.500405e-05 -2.301926e-02
## [56] 2.177221e-02 -1.345404e-02 2.621641e-03 -2.972168e-02 -2.451109e-02
## [61] 9.769040e-02 -7.441407e-02 2.364356e-02 -1.198700e-01 -3.974274e-02
## [66] 7.120890e-03 -1.598318e-02 -2.877198e-02 -2.285292e-02 -5.736709e-03
## [71] -4.490725e-02 -1.071864e-01 1.611401e-02 -2.597461e-02 5.486298e-02
## [76] -5.976755e-02 5.114327e-02 8.965969e-02 -2.394054e-03 4.319693e-02
## [81] -2.302465e-02 -9.202599e-02 -4.175777e-02 -1.392044e-02 -5.216781e-02
## [86] 7.256476e-02 5.675837e-02 -3.541930e-02 -6.134406e-02 3.431858e-02
## [91] -2.241537e-02 -4.664778e-02 -6.102050e-02 8.854444e-03 -1.000187e-02
## [96] -4.268418e-02 -6.896745e-02 -6.478341e-02 -7.936929e-02 1.035887e-01
• Note that the observed value of Moran’s I (0.48577) is included in this list as the last element. ##
Moran I index using R: Monte Carlo Approach
• By default, the contents of this vector will slightly differ from run to run, since they are based on
random permutations.
• The default random number seed value is determined from the current time and so no random permuation
will be identical.
• To control the seed, use the R function set.seed(seed, kind = NULL) right before invoking the moran.mc
command, and set the same value each time.

Moran I index using R: Monte Carlo Approach


• For example, try this using set.seed(123456):
set.seed(123456)
MoranCrimePermtutations <- moran.mc(Police$CRIME,queen.wl,99)
MoranCrimePermtutations$res

## [1] 1.558810e-02 -6.236174e-02 -2.454199e-02 1.521673e-02 -3.342999e-02

10
## [6] -3.658711e-02 -9.885310e-03 1.913200e-03 -5.205790e-02 -8.659591e-04
## [11] 7.641221e-02 5.925944e-02 1.164173e-02 -1.286995e-02 -9.150634e-03
## [16] -2.943162e-03 -1.155581e-02 4.159631e-02 6.355551e-02 3.963826e-02
## [21] -5.676721e-03 -1.349112e-02 -3.005129e-02 4.519048e-02 -5.194294e-02
## [26] 1.232976e-01 3.538739e-03 -5.456543e-02 -1.270799e-02 1.642366e-02
## [31] 2.226327e-02 3.216281e-02 -2.105236e-02 -3.478252e-02 -7.460455e-02
## [36] -2.306669e-02 2.716858e-02 1.529558e-02 4.622271e-02 7.885572e-02
## [41] 3.851046e-02 2.825124e-02 -1.742498e-02 -3.469730e-02 -3.888761e-02
## [46] 7.423914e-03 2.472966e-02 -1.040876e-01 -2.179414e-02 -1.617477e-02
## [51] -2.309804e-02 3.415871e-02 -1.494746e-02 2.688014e-03 4.993817e-02
## [56] 4.111695e-03 1.255441e-01 1.524413e-02 -7.290068e-02 -5.552675e-05
## [61] 4.702426e-03 5.162509e-02 -2.662026e-02 -5.339628e-02 -2.915053e-02
## [66] -6.167387e-02 5.558801e-02 -9.595228e-02 -2.514605e-02 -4.939776e-02
## [71] -2.463602e-02 -4.901517e-02 2.464438e-02 -6.786941e-02 -4.280186e-02
## [76] 4.356271e-02 -2.252887e-02 -4.585242e-02 -2.597473e-02 -1.020191e-01
## [81] -5.145040e-02 -6.241276e-02 1.280840e-02 -9.292512e-03 -5.530533e-02
## [86] -2.924032e-02 5.679331e-02 -4.371198e-02 -2.270892e-02 6.353031e-03
## [91] 3.744825e-02 -1.160099e-02 -6.015619e-02 -4.276017e-02 3.489650e-02
## [96] -3.726210e-02 1.786244e-01 2.652582e-02 -2.590808e-02 1.035887e-01
• Unless you want to replicate results exactly, it is typically better to let the randomness of the clock set
the seed.

Moran I index using R: Monte Carlo Approach


• The full vector of Moran statistics for the permuted data sets lends itself well to a histogram or density
plot.
• R has very sophisticated plotting functions, and this example will only scratch the surface.
• Consult the help files for further details and specific options.
• In order to construct a density plot, we first need to create a density object.

Moran I index using R: Monte Carlo Approach


• To make sure that this is the reference distribution plotted for the randomly permuted data sets only,
we must remove the last element from MoranCrimePermtutations$res.
morp <- MoranCrimePermtutations$res[1:length(MoranCrimePermtutations$res)-1]

• Next, we must pass the new list (morp) to the function density.
• This function has many options, and in this example they are all kept to their default settings (check
out help(density) for further details), so that only the vector with statistics is specified.
• The result is stored in the object zz:
zz <- density(morp)

Moran I index using R: Monte Carlo Approach


• Next, we will plot three graphs on top of each other: a (continuous) density function (based on zz, a
histogram for the reference distribution, and a line indicating the observed Moran’s I.
• The latter is contained in the statistic attribute of the moran.mc object, morpermCRIME$statistic.
• In the code that follows, the density plot is drawn first, along with the titles (main on top and xlab
under the x-axis).

11
Moran I index using R: Monte Carlo Approach
• Several, but not all the plotting functions in R support an add=T argument, which adds a graph to an
existing plot.
• Since the current version of plot.density does not support this argument, while the other two plots do,
it has to be drawn first.
• Also, to make the plot more distinct, a double line width is set (lwd=2 ) and the plot is in red (col=2 ).
• Finally, to make sure that the default settings for the plot do not make it too small, we explicitly set the
horizontal (xlim? and vertical (ylim) width (typically, this takes some trial and error to get it right).

Moran I index using R: Monte Carlo Approach


• Next, you add the histogram (hist) and a vertical line (abline).
• The histogram is drawn in regular thickness black lines, while the vertical line is double thickness
(lwd=2 ) and drawn in blue (col=4 ):
Moran's $I$ Permutation test
10
8
6
Density

4
2
0

−0.2 0.0 0.2 0.4 0.6

Reference Distribution

Moran I index using R: Monte Carlo Approach


• In order to create hard copy output, you need to specify an output device.
• R supports several devices, but by far the most commonly used are the postscript and pdf devices.
• You must first open the device explicitly.
• For example,
postscript(file="filename")

for a ps file.
pdf(file="filename")

for a pdf file.

Moran I index using R: Monte Carlo Approach


• All plotting commands that follow will be written to that device, until you close it with a dev.off()
command.
• For example,## Significance test of Moran’s I
• So the statistic

I − E(I)
VNorm(I)
• will be approximately normally distributed with mean zero and variance one, so that p-values may be
obtained by comparison with the standard normal distribution.

12
– Randomisation hypothesis: the other form of the test is a more formal working of the ran-
domisation idea the inference over Moran’s I is usually conducted under the randomisation
hypothesis.
∗ In this case, no assumption is made about the distribution of the yi ’s - but it is assumed that
any permutation of the yi ’s against the polygons is equally likely. This hypothesis means that
each value can be equally likely be observed at each location.
∗ Thus, the null hypothesis is still one of “no spatial pattern”, but it is conditional on the
observed data.

Significance test of Moran’s I


• The variance of the Moran’s I obtained under the permutation hypothesis is a little more complicated
as we will see in a few moment
• The distribution of the statistic
I − E(I)
VRand(I)

• is also close to the Normal distribution - and the quantity in this expression can also be compared to
the normal distribution with mean zero and variance one, to obtain p-values.
• Both kinds of test are available in R via the moran.test function shown later.
• We first present the significance test of Moran’s I for the normal approximation making use of a
row-standardized weights matrix.

Significance test of Moran’s I


• The global Moran can be standardized as the following equation using the expectation value E(I) and
the variance V (I)
• Test statistic
I − E(I)
Z(I) = p → N (0, 1) (16)
V (I)
• Because Z(I) follows the standard normal distribution N (0, 1) in an asymptotical manner, it is possible
to conduct hypothesis testing using the null hypothesis that sptial autocorrelation does not exist under
a given W
• When normality of yi is assumed, the expected value and variance may be given by (Cliff and Ord,
1981)

1
E(I) = − (17)
n−1

Variance for normal approximation


n2 S1 − nS2 + 3S02
V (I) = − E(I)2 (18)
(n2 − 1)S02

Pn Pn Pn Pn Pn Pn Pn 2
1 2
with S0 = i=1 j=1 wij , S1 = 2 i=1 j=1 (w ij + wji ) , S2 = i=1 j=1 w ij + j=1 wji =
Pn 2
Pn
i=1 (wi. + w.i ) with Wi. = j=1 wij .

• Test decision (right-sided test) : z(I) > Z1−α ⇒ Rejet H0 (positive spatial autocorrelation)
• We are intereted in the distribution of the following statistic :

I − E(I)
Ti = p (19)
V (I)

13
Significance test of Moran’s I
• The next theorem gives the moments of Moran’s I under randomization.
• Under permutation, we have :

1
E(I) = − (20)
n−1

and
   
2 n (n2 − 3n + 3)S1 − nS2 + 3S02 − b2 (n2 − 2)S1 − 2nS2 + 6S02
E(I ) = (21)
(n − 1)(n − 2)(n − 3)S02

Significance test of Moran’s I


Pn Pn Pn Pn Pn
• where S0 = i=1 j=1 wij , S1 = i=1 j=1 (wij + wji )2 /2, S2 = i=1 (wi. + w.i )2 , where wi. =
Pn Pn
j=1 wij , and w.i = j=1 wji . Then: Then:

V (I) = E(I 2 ) − E(I)2 (22)

It is important to note that the expected value of Moran’s I under normality and randomization is the same.
1. If I ≥ E(I), then a spatial unit tends to be connected by locations with similar attributes: Spatial
clustering (low/low or high/high). The strength of positive spatial autocorrelation tends to increase
with I − E(I).
2. If I ≤ E(I), observations will tend to have dissimilar values from their neighbors: Negative spatial
autocorrelation (low/high or high/low).

Spatial autocorrelation with R


Remember that in the presence of spatial autocorrelation, we observed that value of a variable for an observation
is linked to the values of the same variable for the neighboring observations. - Spatial autocorrelation is
positive when similar values of the interest variable are grouped geographically. - Spatial autocorrelation is
negative when the dissimilar values of the interest variable come together geographically.
When observations are randomly distributed, there is no spatial autocorrelation.
Moran’s I statistic for spatial autocorrelation is implemented in spdep package. Remember, for example, that
we are testing states that “the GDP values are randomly distributed across counties (countries) following
completely a random process”. In R, there are two methods to testing this hypothesis: an analytical
method and a Monte Carlo method. There are two separate functions, - Analytical method : the function
moran.test implements this approach where the inference is based - on a normality assumption of the yi - or
randomization assumption of the yi
Monte Carlo method: the function moran.mc, is based on a Monte Carlo appraoch to compute the p-value
of the I statistic. The test is really simple:
• Keep the map units (polygons) constant
• Randomly reassign values to the map units
• Calculate Moran’s I. Save the value
• Return to step 2, repeat step 3, after the user specified number of iterations stop
The result is a list of list of Moran’s I values. These values represent observations of “randomness”. Then sort
the list from lowest to highest. If our observation of Moran’s I is near the beginning or end of the list we call
it “a significant departure from randomness”. The p-value consists of counting the number of simulated test
statistic values more extreme than the one observed. If we are interested in knowing the probability of having

14
simulated values more extreme than ours, we identify the side of the distribution of simulated values closest
to our observed statistic, count the number of simulated values more extreme than the observed statistic
then compute p as follows:

nextreme + 1
p=
n+1

where nextreme is the number of simulated values more extreme than our observed statistic and n is the total
number of simulations. Note that this is for a one-sided test. In our real case was the last element in the list
we would say the p-value was 0.001.
Both functions (moran.test and moran.cm) take a variable name or numeric vector and a spatial weighted
list objects (listw), in that order, as mandatory parameters. The permutation test also requires the number
of permutations as a third mandatory parameter. The different parameters and options for moran.test are
revealed by a call to moran.test.
Of the optional parameters, two are very important. The randomization option is set to TRUE by default,
which implies that in order to get inference based on a normal approximation, it must be explicitly set to
FALSE. Similarly, the default is a one-sided test, so that on order to obtain the results for (more commonly
used) two-sided test, the option alternative must be set to two.sided. Note also that the zero.policy option is
set to FALSE by default, which means that islands result in a missing value code (NA). Setting this option
to textit{TRUE} will set the spatial lag for islands to the customary zero value.
To illustrate this use the variable CRIME and the weights list colqueen with the normal approximation in a
two-sided test

Moran I index using R with Police Data


##
## Moran I test under normality
##
## data: police$CRIME
## weights: queen.wl
##
## Moran I statistic standard deviate = 1.7201, p-value = 0.08542
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.103588680 -0.012345679 0.004542775
Note how, unlike previous practice, the object created by moran.test function was not assigned to a specific
variable. If you simply want to get the results, this is not necessary. By entering the test this way, you
indirectly invoke the print function for the object. If you want to access the individual results of the test for
further processing, you should assign the moran.test to an object and then print that object.
For example, using INC variable:
##
## Moran I test under normality
##
## data: police$INC
## weights: queen.wl
##
## Moran I statistic standard deviate = 1.3288, p-value = 0.1839
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance

15
## 0.077215210 -0.012345679 0.004542775
Note how the expectation and variance of the Moran’ I test statistic is the same for both CRIME and INC.
In fact, under the normal approximation, theses moments only depend on the spatial weights, and not on
the variable under consideration. The MoranINC object belongs to the class htest, which is a generic class
in R designed to hold the results of test statistics. Technically, it is a list and all its items can be accessed
individually (provided you know what they are called; check the help).
For example,
class(MoranINC)

## [1] "htest"
MoranINC$statistic

## Moran I statistic standard deviate


## 1.328794
shows the information stored in the statistic element of the list. When the null hypothesis considered is based
on the randomization distribution, the randomization option does not need to be set.
moran.test(Police$CRIME,queen.wl,randomisation=TRUE,alternative="two.sided")

##
## Moran I test under randomisation
##
## data: Police$CRIME
## weights: queen.wl
##
## Moran I statistic standard deviate = 2.2072, p-value = 0.0273
## alternative hypothesis: two.sided
## sample estimates:
## Moran I statistic Expectation Variance
## 0.103588680 -0.012345679 0.002758842
Note how the value of the statistic and its expectation do not change relative to the normal case, only the
variance is different (and thus the z-value and associated p-values)

Moran I index using R: Monte Carlo Approach


A moran’s I test statistic with inference based on random permutation (Monte Carlo approach) is contained
in the function moran.mc. As in moran.test it takes the variable name (or vector) and the weights list file (
.listw object) as the first two mandatory arguments? It also needs the number of permutations as the thrid
argument ( nsim?) Since the rank of the observed statistic is computed relative to the reference distribution
of statistics for the permuted data sets, it is good practice to set this numer to something ending on 9 (such
as 99 or 999).
This will yield nicely rounded pseudo p-values (like 0.01 or 0.01). The other options are identical to those
for the moran.test function. For example, using the CRIME variable with the queen.wl weightd, setting the
numberof permutation to 99 and leaving everything else to default (e.g, a one sided test).
MoranCrimePermtutations<-moran.mc(police$CRIME,queen.wl,99)
print(MoranCrimePermtutations)

##
## Monte-Carlo simulation of Moran I
##
## data: police$CRIME
## weights: queen.wl

16
## number of simulations + 1: 100
##
## statistic = 0.10359, observed rank = 98, p-value = 0.02
## alternative hypothesis: greater
Note how the number of simulations + 1 is round to 100. None of the permuted data sets yielded a Moran’s
I greater than the observed value of 0.4858, hence a pseudo p-value of 0.04. This is obtained as the ratio of
the number of values greater or equal to observed statistic + 1 over the number of simulations + 1. in the
example, this is (0 + 1)/(99 + 1).
In this example, the results of the procedure (MoranCrimePermtutations) were assigned to an object of class
htest, which must be printed to reveal its contents. Our item in the list MoranCrimePermtutations is a
vector with the computed statistics for each of the permuted data set. This is contained in the item res, as
in :
MoranCrimePermtutations$res

## [1] -4.940029e-02 -7.929607e-02 -2.553588e-02 3.837100e-02 1.898299e-01


## [6] -3.932243e-03 6.475930e-02 8.221389e-02 -5.496016e-02 7.570407e-03
## [11] 1.109606e-01 1.018491e-02 -1.030564e-02 6.807391e-02 -2.344995e-02
## [16] -9.200904e-04 -4.324680e-02 7.922515e-03 -3.345347e-02 -6.701318e-03
## [21] 5.584619e-03 -6.548612e-02 2.360931e-02 -6.564012e-02 -2.846319e-02
## [26] 2.661844e-02 6.024655e-03 -6.580024e-02 -3.779682e-02 8.278397e-03
## [31] -5.589463e-03 6.740081e-02 -8.395147e-02 -5.986557e-02 1.861713e-02
## [36] -1.170816e-02 1.011418e-01 5.075654e-02 -9.744001e-02 -2.915201e-02
## [41] -1.490501e-02 -6.275837e-05 -1.467008e-02 8.606274e-03 9.665527e-04
## [46] -3.265777e-02 -1.024935e-01 -4.149035e-02 -1.592150e-02 -1.776486e-02
## [51] 3.668502e-02 2.407140e-02 -3.709238e-02 -6.841204e-02 -1.482552e-02
## [56] -9.855861e-02 4.914810e-02 -8.423151e-03 1.216023e-02 4.093953e-02
## [61] -6.648635e-02 5.071734e-02 7.572572e-03 -5.929631e-02 -3.108655e-02
## [66] -2.553859e-03 7.523372e-02 -2.460712e-02 2.481103e-03 -2.291627e-02
## [71] -1.574426e-02 -7.077753e-03 -1.110176e-01 -5.291158e-02 -4.009538e-02
## [76] 2.297543e-02 3.844516e-02 -3.482171e-02 5.409428e-02 -8.334177e-02
## [81] -6.733309e-02 2.622762e-02 -5.592140e-02 -9.201068e-02 -1.063081e-01
## [86] -5.386345e-02 2.914123e-02 4.570724e-02 -1.453302e-02 1.938488e-02
## [91] -6.073388e-02 -4.065206e-02 -7.983720e-03 7.141790e-02 -2.716635e-02
## [96] 7.265866e-02 -1.060114e-01 -8.554338e-03 9.031500e-02 1.035887e-01
Note that the observed value of Moran’s I (0.48577) is included in this list as the last element. By default,
the contents of this vector will slightly differ from run to run, since they are based on random permutations.
The default random number seed value is determined from the current time and so no random permuation will
be identical. To control the seed, use the R function set.seed(seed, kind = NULL) right before invoking
the moran.mc command, and set the same value each time.
For example, try this using set.seed(123456):
set.seed(123456)
MoranCrimePermtutations <- moran.mc(police$CRIME,queen.wl,99)
MoranCrimePermtutations$res

## [1] 1.558810e-02 -6.236174e-02 -2.454199e-02 1.521673e-02 -3.342999e-02


## [6] -3.658711e-02 -9.885310e-03 1.913200e-03 -5.205790e-02 -8.659591e-04
## [11] 7.641221e-02 5.925944e-02 1.164173e-02 -1.286995e-02 -9.150634e-03
## [16] -2.943162e-03 -1.155581e-02 4.159631e-02 6.355551e-02 3.963826e-02
## [21] -5.676721e-03 -1.349112e-02 -3.005129e-02 4.519048e-02 -5.194294e-02
## [26] 1.232976e-01 3.538739e-03 -5.456543e-02 -1.270799e-02 1.642366e-02
## [31] 2.226327e-02 3.216281e-02 -2.105236e-02 -3.478252e-02 -7.460455e-02
## [36] -2.306669e-02 2.716858e-02 1.529558e-02 4.622271e-02 7.885572e-02

17
## [41] 3.851046e-02 2.825124e-02 -1.742498e-02 -3.469730e-02 -3.888761e-02
## [46] 7.423914e-03 2.472966e-02 -1.040876e-01 -2.179414e-02 -1.617477e-02
## [51] -2.309804e-02 3.415871e-02 -1.494746e-02 2.688014e-03 4.993817e-02
## [56] 4.111695e-03 1.255441e-01 1.524413e-02 -7.290068e-02 -5.552675e-05
## [61] 4.702426e-03 5.162509e-02 -2.662026e-02 -5.339628e-02 -2.915053e-02
## [66] -6.167387e-02 5.558801e-02 -9.595228e-02 -2.514605e-02 -4.939776e-02
## [71] -2.463602e-02 -4.901517e-02 2.464438e-02 -6.786941e-02 -4.280186e-02
## [76] 4.356271e-02 -2.252887e-02 -4.585242e-02 -2.597473e-02 -1.020191e-01
## [81] -5.145040e-02 -6.241276e-02 1.280840e-02 -9.292512e-03 -5.530533e-02
## [86] -2.924032e-02 5.679331e-02 -4.371198e-02 -2.270892e-02 6.353031e-03
## [91] 3.744825e-02 -1.160099e-02 -6.015619e-02 -4.276017e-02 3.489650e-02
## [96] -3.726210e-02 1.786244e-01 2.652582e-02 -2.590808e-02 1.035887e-01
Unless you want to replicate results exactly, it is typically better to let the randomness of the clock set the
seed.

Moran I index using R: Monte Carlo Approach


The full vector of Moran statistics for the permuted data sets lends itself well to a histogram or density plot.
R has very sophisticated plotting functions, and this example will only scratch the surface. Consult the help
files for further details and specific options. In order to construct a density plot, we first need to create a
density object.
To make sure that this is the reference distribution plotted for the randomly permuted data sets only, we
must remove the last element from MoranCrimePermtutations\$res.
morp <- MoranCrimePermtutations$res[1:length(MoranCrimePermtutations$res)-1]

Next, we must pass the new list (morp) to the function density. This function has many options, and in this
example they are all kept to their default settings (check out help(density) for further details), so that
only the vector with statistics is specified. The result is stored in the object zz:
zz <- density(morp)

Next, we will plot three graphs on top of each other: a (continuous) density function (based on zz, a histogram
for the reference distribution, and a line indicating the observed Moran’s I. The latter is contained in the
statistic attribute of the moran.mc object, morpermCRIME\$statistic. In the code that follows, the density
plot is drawn first, along with the titles (main on top and xlab under the x-axis).
Several, but not all the plotting functions in R support an add=T argument, which adds a graph to an existing
plot. Since the current version of plot.density does not support this argument, while the other two plots
do, it has to be drawn first. Also, to make the plot more distinct, a double line width is set (lwd=2) and the
plot is in red (col=2). Finally, to make sure that the default settings for the plot do not make it too small,
we explicitly set the horizontal (xlim} and vertical (ylim) width (typically, this takes some trial and error to
get it right).
Next, you add the histogram (hist) and a vertical line (abline). The histogram is drawn in regular thickness
black lines, while the vertical line is double thickness (lwd=2) and drawn in blue (col=4):

18
Moran's $I$ Permutation test
10
8
6
Density

4
2
0

−0.2 0.0 0.2 0.4 0.6

Reference Distribution

In order to create hard copy output, you need to specify an output device. R supports several devices, but by
far the most commonly used are the postscript and pdf devices. You must first open the device explicitly.
For example,
postscript(file="filename")

for a ps file.
pdf(file="filename")

for a pdf file. All plotting commands that follow will be written to that device, until you close it with a
dev.off() command. For example,
postscript(file="moran.pdf") # file name for pdf file
... # plotting commands
> dev.off() # close postscript device

postscript(file="moran.pdf") # file name for pdf file


... # plotting commands
> dev.off() # close postscript device

19

You might also like