100% found this document useful (1 vote)
59 views

Geostatistical Analysis

This document provides an overview of geostatistical analysis and inverse distance weighted (IDW) interpolation. It discusses exploring spatial data using exploratory spatial data analysis graphs and tools. It also describes using the Geostatistical Wizard to create surfaces using deterministic and geostatistical interpolation methods like IDW. IDW assumes nearby points are more influential than distant points, and it assigns weights to input points that diminish with distance. The optimal power value and search neighborhood can be configured to improve IDW interpolation results.

Uploaded by

drzubairulislam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
59 views

Geostatistical Analysis

This document provides an overview of geostatistical analysis and inverse distance weighted (IDW) interpolation. It discusses exploring spatial data using exploratory spatial data analysis graphs and tools. It also describes using the Geostatistical Wizard to create surfaces using deterministic and geostatistical interpolation methods like IDW. IDW assumes nearby points are more influential than distant points, and it assigns weights to input points that diminish with distance. The optimal power value and search neighborhood can be configured to improve IDW interpolation results.

Uploaded by

drzubairulislam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Course Title: Data Analysis in GIS TOPIC 3

Geostatistical Analysis
Course Code: GeES 612 Mapping Patterns

Postgraduate Programme in GIS and Remote Sensing


Department of Geography & Environmental Studies
College of Social Sciences & Humanities
Adigrat University

Compiled By
Dr. Zubairul Islam
Associate Professor
GIS and Remote Sensing
Department of Geography and Environmental Sciences
Adigrat University, Ethiopia
E-mail: [email protected] , Contact no.: +251-967490505

Source: ArcGIS Help


Topic 3.1 - Geostatistical Analysis
Following topics will be covered in this part:

1. Creating a surface using parameters


2. Exploring data
3. Mapping Pattern
1. components of Geostatistical Analyst:

There are three main components of Geostatistical Analyst:

● A set of exploratory spatial data analysis (ESDA) graphs


● The Geostatistical Wizard
● The Geostatistical Analyst toolbox, which houses geoprocessing tools
specifically designed to extend the capabilities of the Geostatistical Wizard
and allow further analysis of the surfaces it generates
Exploring data
Exploratory spatial data analysis graphs
Before using the interpolation techniques, you should explore your data using the
exploratory spatial data analysis tools. These tools allow you to gain insights into your
data and to select the most appropriate method and parameters for the interpolation
model. For example, when using ordinary kriging to produce a quantile map, you should
examine the distribution of the input data because this particular method assumes that
the data is normally distributed. If your data is not normally distributed, you should
include a data transformation as part of the interpolation model. A second example is that
you might detect a spatial trend in your data using the ESDA tools and want to include a
step to model it independently as part of the prediction process.
● Histogram—Examine the distribution and summary statistics of a dataset.
● Normal QQ Plot and General QQ Plot—Assess whether a dataset is normally distributed and
explore whether two datasets have similar distributions, respectively.
● Voronoi Maps—Visually examine the spatial variability and stationarity of a dataset.
● Trend Analysis—Visualize and examine spatial trends in a dataset.
● Semivariogram/Covariance Cloud—Evaluate the spatial dependence (semivariogram and
covariance) in a dataset.
● Crosscovariance Cloud—Assess the spatial dependence (covariance) between two datasets.
The ESDA graphs are shown below.

Tools for exploring a single dataset

The following graphic illustrates the ESDA


tools used for analyzing one dataset at a
time:
Tools for exploring relationships
between datasets

The following graphic depicts the two tools


that are designed to examine relationships
between two datasets:
Geostatistical Wizard
The Geostatistical Wizard provides access to a number of interpolation techniques, which are divided into
two main types: deterministic and geostatistical.

Deterministic methods
Deterministic techniques have parameters that control either (1) the extent of similarity (for example, inverse
distance weighted) of the values or (2) the degree of smoothing (for example, radial basis functions) in the
surface. These techniques are not based on a random spatial process model, and there is no explicit
measurement or modeling of spatial autocorrelation in the data.

Deterministic methods include the following:

● Global polynomial interpolation


● Local polynomial interpolation
● Inverse distance weighted
● Radial basis functions
● Interpolation with barriers (using impermeable or semipermeable barriers in the interpolation process)
○ Diffusion kernel
○ Kernel smoothing
Geostatistical methods
Geostatistical techniques assume that at least some of the spatial variation observed in natural phenomena
can be modeled by random processes with spatial autocorrelation and require that the spatial autocorrelation
be explicitly modeled. Geostatistical techniques can be used to describe and model spatial patterns
(variography), predict values at unmeasured locations (kriging), and assess the uncertainty associated with a
predicted value at the unmeasured locations (kriging).

The Geostatistical Wizard offers several types of kriging, which are suitable for different types of data and
have different underlying assumptions:

● Ordinary
● Simple
● Universal
● Indicator
● Probability
● Disjunctive
● Areal interpolation
● Empirical Bayesian
These methods can be used to produce the following surfaces:

● Maps of kriging predicted values


● Maps of kriging standard errors associated with predicted values
● Maps of probability, indicating whether a predefined critical level was exceeded
● Maps of quantiles for a predetermined probability level

There are exceptions to this:

● Indicator and probability kriging produce the following:


○ Maps of probability, indicating whether a predefined critical level was exceeded
○ Maps of standard errors of indicators
● Areal interpolation produces the following:
○ Maps of predicted values
○ Maps of standard errors associated with predicted values
Creating a surface using parameters
under
Inverse distance weighted interpolation
Geostatistical Wizard
The Geostatistical Wizard is accessed through the Geostatistical Analyst toolbar, as shown
below:
During construction of an interpolation model, the
wizard allows changes in parameter values,
suggests or provides optimized parameter values,
and allows you to move forward or backward in the
process to assess the cross-validation results to see
whether the current model is satisfactory or some of
the parameter values should be modified. This
flexibility, in addition to dynamic data and surface
previews, makes the wizard a powerful environment
in which to build interpolation models.
The Geostatistical Wizard is a
dynamic set of pages that is
designed to guide you through the
process of constructing and
evaluating the performance of an
interpolation model. Choices made
on one page determine which
options will be available on the
following pages and how you interact
with the data to develop a suitable
model. The wizard guides you from
the point when you choose an
interpolation method all the way to
viewing summary measures of the
model's expected performance. A
simple version of this workflow (for
inverse distance weighted
interpolation) is represented
How inverse distance weighted interpolation works

Inverse distance weighted (IDW) interpolation explicitly makes the assumption that
things that are close to one another are more alike than those that are farther
apart. To predict a value for any unmeasured location, IDW uses the measured
values surrounding the prediction location. The measured values closest to the
prediction location have more influence on the predicted value than those farther
away. IDW assumes that each measured point has a local influence that
diminishes with distance. It gives greater weights to points closest to the prediction
location, and the weights diminish as a function of distance, hence the name
inverse distance weighted.
Weights assigned to data
points are illustrated in the
following example:

The Weights window


contains the list of weights
assigned to each data
point that is used to
generate a predicted
value at the location
marked by the crosshair.
The Power function

As mentioned above, weights are proportional to the


inverse of the distance (between the data point and the
prediction location) raised to the power value p. As a result,
as the distance increases, the weights decrease rapidly.
The rate at which the weights decrease is dependent on
the value of p. If p = 0, there is no decrease with distance,
and because each weight λi is the same, the prediction will
be the mean of all the data values in the search
neighborhood. As p increases, the weights for distant
points decrease rapidly. If the p value is very high, only the
immediate surrounding points will influence the prediction.
Geostatistical Analyst uses power values greater or equal to
1. When p = 2, the method is known as the inverse distance
squared weighted interpolation. The default value is p = 2,
although there is no theoretical justification to prefer this
value over others, and the effect of changing p should be
investigated by previewing the output and examining the
cross-validation statistics.

An optimal power value can be determined by minimizing the


root mean square prediction error (RMSPE). The RMSPE is a
statistic that is calculated during cross-validation. RMSPE
quantifies the error of the prediction surface. Geostatistical
Analyst will evaluate several different power values to identify
the one that produces the lowest RMSPE. The diagram below
illustrates how Geostatistical Analyst calculates the optimal
power. The RMSPE is plotted for several different power
values (using the same dataset). A curve is fit to the points (a
quadratic local polynomial interpolation), and from the curve,
the power that provides the smallest RMSPE is determined
as the optimal power.
The search neighborhood
Because things that are close to one another are
more alike than those that are farther away, as the
locations get farther away, the measured values will
have little relationship to the value of the prediction
location. To speed calculations, you can exclude the
more distant points that will have little influence on
the prediction. As a result, it is common practice to
limit the number of measured values by specifying a
search neighborhood. The shape of the
neighborhood restricts how far and where to look for
the measured values to be used in the prediction.
Other neighborhood parameters restrict the
locations that will be used within that shape. In the
following image, five measured points (neighbors)
will be used when predicting a value for the location
without a measurement, the yellow point.
The shape of the neighborhood is influenced by the input data and the surface you are trying to create. If there
are no directional influences in your data, you'll want to consider points equally in all directions. To do so, you
will define the search neighborhood as a circle. However, if there is a directional influence in your data, such as
a prevailing wind, you may want to adjust for it by changing the shape of the search neighborhood to an ellipse
with the major axis parallel with the wind. The adjustment for this directional influence is justified because you
know that locations upwind from a prediction location are going to be more similar at remote distances than
locations that are perpendicular to the wind but located closer to the prediction location.

Once a neighborhood shape has been specified, you can restrict which data locations within the shape should
be used. You can define the maximum and minimum number of locations to use, and you can divide the
neighborhood into sectors. If you divide the neighborhood into sectors, the maximum and minimum constraints
will be applied to each sector. There are several different sectors that can be used and are displayed below.
The points highlighted in the data view show the
locations and the weights that will be used for
predicting a location at the center of the ellipse
(the location of the crosshair). The search
neighborhood is limited to the interior of the
ellipse. In the example shown below, the two red
points will be given weights of more than 10
percent. In the eastern sector, one point (brown)
will be given a weight between 5 percent and 10
percent. The rest of the points in the search
neighborhood will receive lower weights.
When to use IDW
A surface calculated using IDW depends on the selection of the power value (p) and the search
neighborhood strategy. IDW is an exact interpolator, where the maximum and minimum values (see diagram
below) in the interpolated surface can only occur at sample points.

The output surface is sensitive to clustering and the presence of outliers. IDW assumes that the
phenomenon being modeled is driven by local variation, which can be captured (modeled) by defining an
adequate search neighborhood. Since IDW does not provide prediction standard errors, justifying the use of
this model may be problematic.
Topic 3.2 - Mapping Pattern
Identifying geographic patterns is important for understanding how geographic phenomena
behave.

Although you can get a sense of the overall pattern of features and their associated values
by mapping them, calculating a statistic quantifies the pattern. This makes it easier to
compare patterns for different distributions or different time periods. Often the tools in the
Analyzing Patterns toolset are a starting point for more in-depth analyses. Using the
Incremental Spatial Autocorrelation tool to identify distances where the processes promoting
spatial clustering are most pronounced, for example, might help you select an appropriate
distance (scale of analysis) to use for investigating hot spots (Hot Spot Analysis).
The tools in the Analyzing Patterns toolset are inferential statistics; they start with the null
hypothesis that your features, or the values associated with your features, exhibit a spatially
random pattern. They then compute a p-value representing the probability that the null
hypothesis is correct (that the observed pattern is simply one of many possible versions of
complete spatial randomness). Calculating a probability may be important if you need to have a
high level of confidence in a particular decision. If there are public safety or legal implications
associated with your decision, for example, you may need to justify your decision using
statistical evidence.

The Analyzing Patterns tools provide statistics that quantify broad spatial patterns. These tools
answer questions such as, "Are the features in the dataset, or the values associated with the
features in the dataset, spatially clustered?" and "Is the clustering becoming more or less
intense over time?". The following table lists the tools available and provides a brief description
of each.
Tool Description

Average Nearest Calculates a nearest neighbor index based on the average distance from each feature to its

Neighbor nearest neighboring feature.

High/Low Measures the degree of clustering for either high values or low values using the Getis-Ord

Clustering General G statistic.

Incremental Measures spatial autocorrelation for a series of distances and optionally creates a line

Spatial graph of those distances and their corresponding z-scores. Z-scores reflect the intensity of

Autocorrelation spatial clustering, and statistically significant peak z-scores indicate distances where spatial

processes promoting clustering are most pronounced. These peak distances are often

appropriate values to use for tools with a Distance Band or Distance Radius parameter.
Tool Description

Spatial Measures spatial autocorrelation based on feature locations and attribute values

Autocorrelation using the Global Moran's I statistic.

Multi-Distance Determines whether features, or the values associated with features, exhibit

Spatial Cluster statistically significant clustering or dispersion over a range of distances.

Analysis (Ripley's

k-function)
Average Nearest Neighbor
Introduction
Calculates a nearest neighbor index based on the average distance from each feature to
its nearest neighboring feature.
Uses

The Average Nearest Neighbor tool returns five values: Observed Mean Distance,
Expected Mean Distance, Nearest Neighbor Index, z-score, and p-value. These
values are accessible from the Results window and are also passed as derived
output values for potential use in models or scripts. Optionally, this tool will create
an HTML file with a graphical summary of results. Double-clicking on the HTML
entry in the Results window will open the HTML file in the default Internet browser.
Right-clicking on the Messages entry in the Results window and selecting View
will display the results in a Message dialog box.
● The z-score and p-value results are measures of statistical significance which tell you
whether or not to reject the null hypothesis. Note, however, that the statistical significance
for this method is strongly impacted by study area size (see below). For the Average
Nearest Neighbor statistic, the null hypothesis states that features are randomly distributed.

● The Nearest Neighbor Index is expressed as the ratio of the Observed Mean Distance to
the Expected Mean Distance. The expected distance is the average distance between
neighbors in a hypothetical random distribution. If the index is less than 1, the pattern
exhibits clustering; if the index is greater than 1, the trend is toward dispersion or
competition.

● The average nearest neighbor method is very sensitive to the Area value (small changes in
the Area parameter value can result in considerable changes in the z-score and p-value
results). Consequently, the Average Nearest Neighbor tool is most effective for comparing
different features in a fixed study area. The picture below is a classic example of how
identical feature distributions can be dispersed or clustered depending on the study area
specified.
● If an Area parameter value is not specified, then the area of the minimum enclosing
rectangle around the input features is used. Unlike the extent, a minimum enclosing
rectangle will not necessarily align with the x- and y-axes.

● When the Input Feature Class is not projected (that is, when coordinates are given in
degrees, minutes, and seconds) or when the output coordinate system is set to a
Geographic Coordinate System, distances are computed using chordal measurements.
Chordal distance measurements are used because they can be computed quickly and
provide very good estimates of true geodesic distances, at least for points within
about thirty degrees of each other. Chordal distances are based on an oblate spheroid.
Given any two points on the earth's surface, the chordal distance between them is the
length of a line, passing through the three-dimensional earth, to connect those two
points. Chordal distances are reported in meters.
How Average Nearest Neighbor works
The Average Nearest Neighbor tool measures the distance between each feature
centroid and its nearest neighbor's centroid location. It then averages all these nearest
neighbor distances. If the average distance is less than the average for a hypothetical
random distribution, the distribution of the features being analyzed is considered
clustered. If the average distance is greater than a hypothetical random distribution, the
features are considered dispersed. The average nearest neighbor ratio is calculated as
the observed average distance divided by the expected average distance (with expected
average distance being based on a hypothetical random distribution with the same
number of features covering the same total area).
Interpretation
If the index (average
nearest neighbor ratio)
is less than 1, the
pattern exhibits
clustering. If the index
is greater than 1, the
trend is toward
dispersion.
Thanks

You might also like