0% found this document useful (0 votes)
22 views

ex_TrendSurface

This document is a tutorial on computing trend surfaces using R, covering various statistical methods such as Ordinary Least Squares, Generalized Additive Models, and Kriging techniques. It includes sections on preparing the computing environment, loading datasets, and performing exploratory analysis and trend surface predictions. The tutorial aims to illustrate how to model geographic trends and analyze spatial data effectively.

Uploaded by

Mudthir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

ex_TrendSurface

This document is a tutorial on computing trend surfaces using R, covering various statistical methods such as Ordinary Least Squares, Generalized Additive Models, and Kriging techniques. It includes sections on preparing the computing environment, loading datasets, and performing exploratory analysis and trend surface predictions. The tutorial aims to illustrate how to model geographic trends and analyze spatial data effectively.

Uploaded by

Mudthir Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Tutorial: Trend surfaces in R

— Ordinary Least Squares


— Generalized Additive Models
— Thin-plate splines
— Generalized Least Squares
— GLS-Regression Kriging
— Universal Kriging

D G Rossiter
Cornell University, Section of Soil & Crop Sciences
ISRIC–World Soil Information
W¬ 'f0 ffb

February 23, 2021

Contents

1 Introduction 1

2 Preparing for the exercise 1


2.1 Computing environment . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Loading R packages . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.3 Adjusting processing options . . . . . . . . . . . . . . . . . . . 2

3 Dataset 2
3.1 Loading and adjusting the dataset . . . . . . . . . . . . . . . . 3
3.2 Making a spatial object . . . . . . . . . . . . . . . . . . . . . . . 6

4 Exploratory analysis 8

5 Trend surface analysis by Ordinary Least Squares 12


5.1 First-order trend surface . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Second-order trend surface . . . . . . . . . . . . . . . . . . . . 17

Version 2.4 Copyright © 2017, 2020-21 D G Rossiter All rights reserved.


Reproduction and dissemination of the work as a whole (not parts) freely
permitted if this original copyright notice is included. Sale or placement
on a web site where payment must be made to access this document
is strictly prohibited. To adapt or translate please contact the author
([email protected]).
6 Trend surface prediction 21
6.1 Creating a prediction grid . . . . . . . . . . . . . . . . . . . . . 21
6.2 Mapping the trend surface . . . . . . . . . . . . . . . . . . . . . 22

7 Generalized Additive Models 27


7.1 Fitting a Generalized Additive Model . . . . . . . . . . . . . . 29
7.2 GAM prediction over the study area . . . . . . . . . . . . . . . 34

8 Thin-plate spline interpolation 38

9 Spatial correlation of trend surface residuals 43


9.1 The empirical variogram . . . . . . . . . . . . . . . . . . . . . . 44

10 Trend surface analysis by Generalized Least Squares 45


10.1 Computing the GLS trend surface . . . . . . . . . . . . . . . . 46
10.2 Predicting from the GLS trend surface . . . . . . . . . . . . . 49

11 Local interpolation of the residuals 51


11.1 Visualizing the residuals . . . . . . . . . . . . . . . . . . . . . . 51
11.2 Variogram modelling . . . . . . . . . . . . . . . . . . . . . . . . 53
11.2.1 Ordinary Kriging . . . . . . . . . . . . . . . . . . . . . . . 55
11.2.2 OK predictions . . . . . . . . . . . . . . . . . . . . . . . . 56

12 GLS-Regression Kriging 58

13 Universal Kriging (UK) 60


13.1 Residual variogram . . . . . . . . . . . . . . . . . . . . . . . . . 61
13.2 The Universal Kriging system . . . . . . . . . . . . . . . . . . . 62
13.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

14 Discussion 66

15 Answers 67

A Derivation of the OLS solution to the linear model 72

B Standardized residuals 72

C Theory of thin-plate splines 73

D Theory of GLS and REML 74


D.1 GLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
D.2 GLS with spatially-correlated residuals . . . . . . . . . . . . . 76
D.3 REML estimation of the covariance parameters . . . . . . . . 77

References 78

ii
1 Introduction

This exercise shows how to compute trend surfaces using the R environ-
ment for statistical computing [8, 18].
A trend surface is a map of some continuous variable, computed as a
function of the coördinates. This corresponds to the concept of a geo-
graphic trend, where the variable changes its value along a geographic
gradient.
The trend can be modelled as a linear trend, i.e., the variable increases
or decreases a fixed amount for each unit change in the coördinates in
some direction. This is called a first-order trend surface (§5.1). It can
also be modelled as a polynomial trend, i.e., a linear model of some poly-
nomials of the coördinates, for example, a quadratic, which is called a
second-order trend surface (§5.2). It can also be modelled as an empirical
smooth function of the coördinates, for example a generalized additive
model (§7 ) or a minimum-curvature surface (thin-plate spline) (§8).
The residuals from any of the above approaches may have spatial struc-
ture (§9). This has two implications:
1. The OLS fit may not br optimal, and a Generalized Least Squares
(GLS) trend should be fit (§10).
2. The OLS or GLS trend surfaces can be modified by (1) interpolating
the residuals from the trend-surface fit (§11) and (2) adding these
to the trend.
3. The trend and local deviations can be modelled together with Uni-
versal Kriging (UK) (§13).
In this exercise we compare these different approaches.

2 Preparing for the exercise

2.1 Computing environment

The code to complete this tutorial can be executed in any R environment.


A good choice is the RStudio1 integrated development environment (IDE)
for R.

2.2 Loading R packages

Task 1 : Load the sp “spatial data structures”, the gstat “geostatistics”,


and the lattice “Trellis graphics” packages into your search path. •

Note: You can also load this via checkboxes in the RStudio “Packages”
pane.
1
https://www.rstudio.org

1
The require or library functions are used to load R packages.
require(sp)
require(gstat)
require(lattice)

2.3 Adjusting processing options

R has many options, which can be listed with the options function. Here
we use this function to change the default option of showing the so-
called “significance stars” in model summaries. These stars (*, ** , ***)
for various levels of the “significance level”2 α have been widely criti-
cized because they are a lazy way to assess the success of models and
the importance of predictors.3
options(show.signif.stars=FALSE)

3 Dataset

We use an example dataset that is well-suited to illustrate the concepts


of trend surface: a set of observations on the elevation above mean sea
level of the top of an aquifer in western Kansas, USA measured in 161
wells.

Note: This aquifer is in Miocene–Pliocene sedimentary rocks, the Ogalalla


formation, and is an important source of irrigation water, especially for
centre-pivot irrigation systems.

This dataset is used as an example in the well-known geology statistics


text of Davis [3, pp. 435-438]4 . The practical task is to map the elevation
of the top of the aquifer over the study area.

Q1 : What is the purpose of producing a map of the the elevation of the


top of the aquifer over the study area? In other words, who would use
the map and for what purpose? Jump to A1 •

Note: More information on the aquifer monitoring network from which


this dataset is taken is available at the Kansas Geological Survey5 , for
example Olea and Davis [14, 15]. The water-level logs are also available
on-line6 .

Figure 1 is taken from the original report [14]. It shows the location of
wells, the boundary of the aquifer, and the well IDs. The example dataset
uses a small portion of this, in the SE corner of the study area7 . Figure
2
interpreted as the probability of incorrectly rejecting a true null hypothesis of no
effect
3
This is part of a major debate about how statistics should be used to draw conclu-
sions about the “real world”, see for example [6].
4
The datasets for this book are available at http://www.kgs.ku.edu/Mathgeo/
Books/Stat/index.html
5
http://www.kgs.ku.edu
6
http://www.kgs.ku.edu/Magellan/WaterLevels/
7
portions of Pratt, Kingman, Stafford and Reno counties

2
2 is a Google Earth view of part of the study area, with the location of
several of the wells as placemarks.

Figure 1: Location of aquifer monitoring wells, SE Kansas (USA). Source: [14], plate 1

3.1 Loading and adjusting the dataset

The dataset is a plain-text file, AQUIFER.TXT.

Task 2 : Obtain this dataset, either from the instructor or by download


from the KGS8 . •

Task 3 : Change R’s working directory to where you have downloaded


the text file AQUIFER.TXT. •
You can do this with the RStudio menu command Tools | Change direc-
tory. . . , or with the setwd function.

Task 4 : Examine the contents of file AQUIFER.TXT. •


8
http://www.kgs.ku.edu/Mathgeo/Books/Stat/ASCII/AQUIFER.TXT

3
Figure 2: Google Earth view of part of the study area, with the location of several of the
wells as placemarks

You can view this file from within RStudio, by opening it from the Files
pane.
The first few lines look like this:
UTM easting UTM northing Water Table, ft.
569464.5 4172114.75 1627.66
573151.25 4167192.75 1588.83
559973.94 4169585 1675.72
553514.44 4174584.5 1689.52
The field names are self-explanatory.
The Coördinate Reference System (CRS) is not specified, although we can
guess from the field headers that the projection is Universal Transmer-
cator (UTM). The UTM zone is 14N (see Davis [3, Fig. 5-100 caption]) and
the coördinates are defined in the UTM system as meters North from the
equator and East from a false origin of 500 000 at the central meridian
of the UTM zone. For zone 14 this is 99° W.
However to fully know the CRS, we must know the datum on which the
projection is developed. This is not explained in the several Kansas Ge-
ological Survey reports; however a more comprehensive report covering

4
the entire High Plains [11], states that the datum is the North American
Datum of 1983 (NAD 83).
The aquifer elevation is in US feet9 above mean sea level according to an
unspecified vertical datum (probably NAVD 88).

Task 5 : Read text file AQUIFER.TXT into an R data frame, rename the
columns to shorter names, and examine its structure. •
The read.table function can read many kinds of tabular data. It has
many arguments, to adjust to different text formats. See the R Data
Import/Export Manual [17] for details.
By default the data fields in the text file are assumed to be separated by
white space (tabs, spaces), as is the case here. Another optional argument
is skip; we use it here because the header line of AQUIFER.TXT has more
spaces than the other lines, so if we try to use the header for the variable
names, R thinks the other lines are incomplete. One solution would be
to place quotes around the variable names, or rename the variables, in
the text file. What we do here is skip the first line and assign variable
names ourselves in R.
We name the R data frame aq:
aq <- read.table("AQUIFER.TXT", skip=1)

str(aq)

## 'data.frame': 161 obs. of 3 variables:


## $ V1: num 569464 573151 559974 553514 550350 ...
## $ V2: num 4172115 4167193 4169585 4174584 4171337 ...
## $ V3: num 1628 1589 1676 1690 1691 ...

names(aq) <- c("UTM.E", "UTM.N", "z")


str(aq)

## 'data.frame': 161 obs. of 3 variables:


## $ UTM.E: num 569464 573151 559974 553514 550350 ...
## $ UTM.N: num 4172115 4167193 4169585 4174584 4171337 ...
## $ z : num 1628 1589 1676 1690 1691 ...

However, this dataset can be manipulated to make it more suitable for


analysis. First, the elevation should be converted to meters, to conform
to international standards.

Task 6 : Convert the elevation in feet above sea level to elevation in me-
ters above sea level (m.a.s.l.), and add it as a new field in the dataframe.

ft.to.m <- 0.3048
aq$zm <- aq$z * ft.to.m

Second, the E and N coördinates give the location in UTM zone is 14N,
but for numerical stability it’s useful to reduce these to local coördinates,
9
1 foot = 0.3048 m exactly

5
with the (0, 0) point in the middle of the range, and because the numbers
are large, convert to km. This will make the equations easier to read.

Task 7 : Subtract the median E and N coördinates of the dataset from


the UTM 14N E and N coördinates, convert these from m to km, and add
these as new fields to the dataframe. •
The median function computes the median of a vector.
aq$e <- (aq$UTM.E - median(aq$UTM.E))/1000
aq$n <- (aq$UTM.N - median(aq$UTM.N))/1000

3.2 Making a spatial object

For this section we need to make the dataset into an explicitly spatial
data structure. A spatial object, for the sp package, is one that has
explicit coördinates. The aq dataframe does have coördinates, but “hid-
den” as attributes. These in fact have a special status. To continue the
analysis, we identify these explicitly as being spatial.

Task 8 : Make an explicitly-spatial version of the point dataset. •


The coordinates method specifies coördinates, thus converting a data-
frame or matrix into an explicitly spatial object. We use the local coördi-
nateswe created just above for the spatial object; note that the real-world
coördinates are still in the data frame as fields.
aq.sp <- aq
coordinates(aq.sp) <- c("e","n")

str(aq)

## 'data.frame': 161 obs. of 6 variables:


## $ UTM.E: num 569464 573151 559974 553514 550350 ...
## $ UTM.N: num 4172115 4167193 4169585 4174584 4171337 ...
## $ z : num 1628 1589 1676 1690 1691 ...
## $ zm : num 496 484 511 515 516 ...
## $ e : num 36.1 39.8 26.6 20.1 17 ...
## $ n : num -25.1 -30 -27.7 -22.7 -25.9 ...

str(aq.sp)

## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots


## ..@ data :'data.frame': 161 obs. of 4 variables:
## .. ..$ UTM.E: num [1:161] 569464 573151 559974 553514 550350 ...
## .. ..$ UTM.N: num [1:161] 4172115 4167193 4169585 4174584 4171337 ...
## .. ..$ z : num [1:161] 1628 1589 1676 1690 1691 ...
## .. ..$ zm : num [1:161] 496 484 511 515 516 ...
## ..@ coords.nrs : int [1:2] 5 6
## ..@ coords : num [1:161, 1:2] 36.1 39.8 26.6 20.1 17 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "e" "n"
## ..@ bbox : num [1:2, 1:2] -33 -47 41.1 51.1
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "e" "n"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr NA

6
This structure display is quite different from the previous one. The
object now is of class SpatialPointsDataFrame and has five slots,
marked with the @ symbol.
The information in the original dataframe is now clearly split into two
kinds:

Geographic space : Coordinates; location of the observation in some coördinate refer-


ence system, here UTM14N;
Feature-space : Also called attribute space: properties of the observation. Here
there is only one, the aquifer elevation.

Q2 : Looking at the names of the slots, which likely refer to geographic


space? Which slot contains the feature-space information? Jump to A2

Note: The structure shows the CRS as the proj4string field; this refers
to the PROJ10 generic coordinate transformation software specifications
of CRS. We can display this directly with the proj4string function.
proj4string(aq.sp)

## [1] NA

This is now listed as NA, “not available”, because there was no CRS infor-
mation in the text file from which we obtained the coördinates. In the
analysis of this tutorial we do not need to specify a CRS, since we just
work with the numbers as independent (predictor) variables.

However, if we want to later use with other spatial data we should specify
the CRS. Above we determined the CRS is UTM14N on the NAD83 datum.
To specify this we find the EPSG code for this CRS in the EPSG database11
and discover this is code 26914. See Figure 3; note that UTM14N has been
defined on several datums.

Then we update the CRS using the CRS function, with a string describing
the CRS, in this case an initialization based on the EPSG code.
proj4string(aq.sp) <- CRS("+init=epsg:26914")
print(proj4string(aq.sp))

## Warning in proj4string(aq.sp): CRS object has comment, which is


lost in output

## [1] "+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs"

We’ve done some work to get this data set into proper form for spatial
analysis; so we save it in this format.

Task 9 : Save the spatial object as an R Data file. •


save(aq, aq.sp, file="aquifer.rda")

This can be read into a later R session with the load method.
10
https://proj.org/
11
https://epsg.org/

7
Figure 3: Finding a code in the EPSG database

4 Exploratory analysis

As with any unfamiliar dataset, the first step is to examine its contents.
In the case of spatially-explicit datasets, that includes visualizing its ge-
ography.

Task 10 : Summarize the dataset. •


summary(aq.sp)

## Object of class SpatialPointsDataFrame


## Coordinates:
## min max
## e -33.00497 41.06325
## n -46.99025 51.07400
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Number of points: 161
## Data attributes:
## UTM.E UTM.N z zm
## Min. :500361 Min. :4150248 Min. :1560 Min. :475.5
## 1st Qu.:518465 1st Qu.:4176120 1st Qu.:1721 1st Qu.:524.6
## Median :533366 Median :4197238 Median :1814 Median :552.8
## Mean :535668 Mean :4198439 Mean :1808 Mean :551.0
## 3rd Qu.:553569 3rd Qu.:4220405 3rd Qu.:1901 3rd Qu.:579.4
## Max. :574430 Max. :4248312 Max. :2045 Max. :623.2

Q3 : How many observations are there? What was recorded at each


point? Jump to A3 •

Q4 : What are the geographic limits of the study area? What is its area,
in km2? Jump to A4 •

The range function computes the range of numeric variable; the diff
function computes the difference between two numeric values.

8
range(aq$UTM.E)

## [1] 500361.3 574429.6

range(aq$UTM.N)

## [1] 4150248 4248312

diff(range(aq$UTM.E)) *diff(range(aq$UTM.N)) / 10^6

## [1] 7263.444

From the spatial version of the dataset we can compute the bounding
box with the bbox function.
bbox(aq.sp)

## min max
## e -33.00497 41.06325
## n -46.99025 51.07400

Task 11 : Find the location of this sample area in the large study area,
shown in Fig. 1. •

Q5 : What is the range of elevations in the sample set? Jump to A5 •


range(aq$zm); diff(range(aq$zm))

## [1] 475.5032 623.2428


## [1] 147.7396

We now try three different visualizations of the distribution of the data


values (i.e. aquifer elevations); these are known as postplots. To keep
the geographic reference, we use the original UTM 14N coördinates.

Task 12 : Display a text postplot of the data values, showing the eleva-
tions, rounded to the nearest foot, as text labels centred at the observa-
tion point. •
We use the two coördinates as plot axes, so this looks like a map:
plot(aq$UTM.N ~ aq$UTM.E, pch=20, cex=0.2, col="blue", asp=1,
xlab="UTM 14N E", ylab="UTM 14N N")
grid()
text(aq$UTM.E, aq$UTM.N, round(aq$zm), adj=c(0.5,0.5))
title("Elevation of aquifer, m")

9
The aquifer elevation is clearly higher in the west (towards the Rocky
Mountains about 650 km to the west, where it outcrops).

Note: Parameter cex is an expansion factor; here we plot a very small


blue dot and then add the data value at each point with the text method.
The adj argument centres the text at the point. The asp=1 argument
makes the two axes have the same scale. This is necessary to get a true
map when the study area is not square.

Another visualization is with the symbol size proportional to the the


data value.

Task 13 : Display a graphical postplot of the data values, with size


proportional to the data value. •
plot(aq$UTM.N ~ aq$UTM.E,
cex=1.8*aq$zm/max(aq$zm),
col="blue", bg="red", pch=21, asp=1,
xlab="UTM 14N E", ylab="UTM 14N N")
grid()
title("Elevation of aquifer, m.a.s.l.")

10
Note: Print character (pch) 21 has both a symbol (col) and fill (bg) colour.

A final visualization combines both size and colour:

Task 14 : Display a graphical postplot of the data values, with size and
colour proportional to the data value •
Notice the use of the rank function to give the rank order of the eleva-
tions; these are then used as indices into a vector of colours, created with
the bpy.colors function, of the same length as the vector of elevation
values.
The ~ formula operator show the functional relation between two vari-
ables; here it is the North coördinate for the y-axis, depending on the
East coördinate for the x-axis.
plot(aq$UTM.N ~ aq$UTM.E, pch=21,
xlab="UTM 14N E", ylab="UTM 14N N",
bg=bpy.colors(length(aq$zm))[rank(aq$zm)],
cex=1.8*aq$zm/max(aq$zm), asp=1)
grid()
title("Elevation of aquifer, m.a.s.l.")

11
Q6 : Describe the spatial pattern of the elevations. Do nearby points
have similar values? Is there a trend across the whole area? Are there
local exceptions to the trend? Jump to A6 •

Q7 : Discuss the relative advantages of the three types of postplot.


Jump to A7 •

5 Trend surface analysis by Ordinary Least Squares

The visualizations suggest a trend surface, i.e., the aquifer elevations is


some smooth function of the coördinates. This is a polynomial function
of the coördinates to any degree (1st, 2nd, 3rd etc.), which is called the
order of the surface.
The higher the degree, the more the surface can match the points, but
the degree should also be chosen to match a plausible process, in this
case, the structure of the aquifer. Also, the higher the degree, the more
extreme are extrapolations, i.e., predictions outside the convex hull of
the calibration points.

5.1 First-order trend surface

We begin with a first-order trend: a plane defined by the two coördinates


and an intercept that sets the overall level, here the aquifer elevation.

Q8 : What is the geological interpretation of a first-order trend surface


of the aquifer? Jump to A8 •

12
A trend surface has the same form as a standard linear model, using
the coördinates as regression predictors. The first-order trend surface
model has the form:
z = β0 + β1 E + β2 N + ε (1)
where ε ∼ N (0, σ 2 ), i.e., independently and normally distributed. This
assumption allows us to fit the trend surface with Ordinary Least Squares
(OLS).
In the linear model, with any number of predictors, there is a n × p
design matrix of predictor values usually written as X, with one row per
observation (data point), i.e., n rows, and one column per predictor, i.e.,
p columns. In the first-order trend surface case, it is a n × 3 matrix with
three columns: (1) a column of 1 representing the intercept, to center
the response, (2) a column of predictor values ei from the Easting, and
(3) a column of predictor values ni from the Northing.
The predictand (response variable), here the aquifer elevation is a n × 1
column vector y, one row per observation. The coefficient vector β is a
p × 1 column vector, i.e., one row per predictor (here, 3). This multiplies
the design matrix to produce the response:12

y = Xβ + ε (2)

where ε is a n × 1 column vector of residuals, also called errors, i.e., the


lack of fit. We know the values in the predictor matrix X and the response
vector y from our observations, so the task is to find the optimum values
of the coefficients vector β. This can be found directly; see the Appendix
A for the derivation. The OLS solution is:

β̂ols = (X T X)−1 X T · y (3)


where X is the design matrix.
The term “first-order” refers to the power to which each coördinate is
raised; here it is the first power, so it’s a first-order trend surface.

Note: This assumption of uncorrelated residuals is in fact not true in this


case; we prove this in §9 below. So the trend surface should in fact be fit
not by OLS but by Generalized Least Squares (GLS), taking into account
the spatial auto-correlation of the residuals. We pursue this further in
§10.

For this dataset with many observations well-spread in space, the result
will be similar to the OLS estimate.

Task 15 : Fit a first–order trend surface (i.e. linear in the E and N


coördinates) to the elevations. Summarize the model and evaluate its
goodness-of-fit. •
The lm “linear model” function fits linear models.
12
The dimensions of the matrix multiplication are n × 1 = (n × p)(p × 1)

13
model.ts1 <- lm(zm ~ n + e, data=aq)
summary(model.ts1)

##
## Call:
## lm(formula = zm ~ n + e, data = aq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.3550 -5.8267 0.2674 7.1062 16.7349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 554.77509 0.68478 810.15 <2e-16
## n -0.03336 0.02528 -1.32 0.189
## e -1.61714 0.03201 -50.51 <2e-16
##
## Residual standard error: 8.629 on 158 degrees of freedom
## Multiple R-squared: 0.9417,Adjusted R-squared: 0.941
## F-statistic: 1276 on 2 and 158 DF, p-value: < 2.2e-16

In this summary the proportion of variability explained is given by the


adjusted R 2 . This is (1 − RSS/TSS), i.e., a perfect fit less the ratio of the
ci )2 to the total sum of squares
P
residual sum-of-squares RSS = i (zi − z
TSS = i (zi − z̄)2 , adjusted for the degrees of freedom and number of
P

observations, where zi is the observed value and z ci is the value predicted


by the fitted linear model.

Q9 : What is the equation of the trend surface? How does elevation vary
with the E and N coördinates? Is the relation statistically-significant?
How much of the total variability does it explain? Are all the coefficients
statistically-significant? Jump to A9 •

Task 16 : Summarize the residuals (lack of fit) from the trend surface
both numerically and graphically, in feature space. Express this in terms
of the median elevation. •
The residuals function extracts the residuals from a linear model ob-
ject. The hist function displays a histogram of a numeric vector.
res.ts1 <- residuals(model.ts1); summary(res.ts1)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -25.3550 -5.8267 0.2674 0.0000 7.1062 16.7349

hist(res.ts1, main="Residuals from 1st-order trend",


xlab="residual elevation (m)")
rug(res.ts1)
range(res.ts1)

## [1] -25.35498 16.73489

max(abs(res.ts1))/median(aq$zm)*100

## [1] 4.586769

14
Q10 : What is the range of residuals? How does this compare with
the target variable? How are the residuaks distributed in feature space?
Jump to A10 •

Task 17 : Show the diagnostic plots of the linear model. •


The plot method applied to a linear model object produces some diag-
nostic plots. We will display the most important: (1) residuals vs. fitted
values; (2) quantile-quantile (“QQ”) plot of the standardized residuals13 .
The Q-Q plot shows (1) on the y-axis, the standardized residuals, (2)
on the x-axis, the standardized residuals that would be expected if the
residuals were from a normal distribution with the mean and standard
deviation computed from the actual standardized residuals. These two
should match exactly on 1:1 line.

Note: The par “parameters” function here sets up a 1 row by 2 column


matrix of plots, because the plot.lm function here will display two plots
as requested by the which optional argument.
13
These are explained in §B

15
par(mfrow=c(1,2))
plot(model.ts1, which=1:2)
par(mfrow=c(1,1))

Q11 : Does this model meet the feature-space requirements for a valid
linear model?
1. No relation between the fitted values and the residuals;
2. Equal spread of residuals across the range of fitted values;
3. Normally-distributed standardized residuals.
Jump to A11 •

Task 18 : Display the residuals as a postplot. •

Note: In the following code, the expression for the cex “character ex-
pansion” optional argument sets the size of the circle as each residual’s
proportion of the maximum residual, so that the larger absolute values
of the residuals show larger circles. This way we can visualize where are
the largest over- and under-predictions. The ifelse statement applied to
the col “color” optional argument sets the color of the circle according
to whether the residual is positive (under-prediction) or negative (over-
prediction).
plot(aq$n ~ aq$e, cex=3*abs(res.ts1)/max(abs(res.ts1)),
col=ifelse(res.ts1 > 0, "green", "red"),
xlab="E", ylab="N",
main="Residuals from 1st-order trend",
sub="Positive: green; negative: red", asp=1)
grid()

16
Q12 : Is there a spatial pattern to the residuals? Is there local spatial
correlation without an overall pattern? What does this imply about the
suitability of a first-order trend surface? Jump to A12 •

5.2 Second-order trend surface

We see from the pattern of residuals from the first-order surface that
there is still structure, in particular clear bands of positive and negative
residuals. These suggest that a higher-order trend surface might fit bet-
ter. A higher-order trend might also fix the problems with the regression
diagnostics.
A second-order trend includes linear and quadratic (squared) functions
of the coördinates.

Q13 : What is the geological interpretation of a second-order trend


surface of the aquifer? Jump to A13 •

A full second-order surface uses the coördinates, their squares, and their
cross-products.

z = β0 + β1 E + β2 N + β3 E 2 + β4 N 2 + β5 (E ∗ N) + ε (4)

Note: This can also be expressed in matrix notation of Equation 2, where


the design matrix has six columns, one per predictor.

Task 19 : Fit a second-order trend surface to the aquifer elevations. •

17
model.ts2 <- lm(zm ~ n + e + I(n^2) + I(e^2) + I(e*n),
data=aq)
summary(model.ts2)

##
## Call:
## lm(formula = zm ~ n + e + I(n^2) + I(e^2) + I(e * n), data = aq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.847 -3.366 0.822 3.538 14.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.611e+02 7.999e-01 701.411 < 2e-16
## n -1.655e-02 1.664e-02 -0.995 0.321
## e -1.621e+00 2.212e-02 -73.274 < 2e-16
## I(n^2) -7.500e-03 6.435e-04 -11.655 < 2e-16
## I(e^2) -1.648e-03 1.074e-03 -1.534 0.127
## I(e * n) 6.700e-03 7.781e-04 8.610 7.74e-15
##
## Residual standard error: 5.598 on 155 degrees of freedom
## Multiple R-squared: 0.9759,Adjusted R-squared: 0.9751
## F-statistic: 1256 on 5 and 155 DF, p-value: < 2.2e-16

Note: The I “identity” function must be used for the squares and cross-
product terms, because the ^ and * symbols represent the usual mathe-
matical operators.

If this function is not used, lm interprets the ^ and * symbols as formula


operators, rather than as their normal mathematical meanings.

Q14 : How much of the variance does the second-order surface explain?
Jump to A14 •

Task 20 : Compare the second-order model statistically with the first-


order model. •
The anova “analysis of variance” method compares the residual sums-
of-squares of two or more models and computes the probability that the
more complicated model14 is not better than the less complicated model.
We list the simpler model first, i.e., the one with fewer predictors and
more degrees of freedom, and the more complex model second. We
expect the second model will have a lower values of the residual sum-of-
squares, since it will explain more of the variance. But, because we have
used more degrees of freedom, perhaps the F-ratio of the two variances
adjusted for their degrees of freedom will be low, showing that the more
complex model is not in fact an improvement.
Let’s see:
anova(model.ts1, model.ts2)

## Analysis of Variance Table


##

14
with more predictors, and thus fewer degrees of freedom

18
## Model 1: zm ~ n + e
## Model 2: zm ~ n + e + I(n^2) + I(e^2) + I(e * n)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 158 11763.5
## 2 155 4858.2 3 6905.3 73.438 < 2.2e-16

Q15 : Is the second-order surface statistically superior to the first-order


surface? Jump to A15 •

Task 21 : Summarize the residuals from the second-order trend surface


both numerically and graphically, in feature space. Express this in terms
of the median elevation. •
res.ts2 <- residuals(model.ts2)
summary(res.ts2)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -19.847 -3.366 0.822 0.000 3.538 14.807

hist(res.ts2); rug(res.ts2)
max(abs(res.ts2))/median(aq$zm)

## [1] 0.03590352

Q16 : What is the range of residuals? How does this compare with the
target variable? How are they distributed in feature space? How do these
compare with the residuals from the first-order surface? Jump to A16 •

Task 22 : Show the diagnostic plots of the residuals, as for the first-
order trend surface residuals. •

19
par(mfrow=c(1,2))
plot(model.ts2, which=1:2)
par(mfrow=c(1,1))

Q17 : Does this model meet the feature-space requirements for a valid
linear model? How do these diagnostics compare to those from the first-
order surface?
1. No relation between the fitted values and the residuals;
2. Equal spread of residuals across the range of fitted values;
3. Normally-distributed standardized residuals.
Jump to A17 •

Task 23 : Display the residuals as a postplot; compare to the postplot


from the first-order trend surface. •
plot(aq$n ~ aq$e, cex=3*abs(res.ts2)/max(abs(res.ts2)),
col=ifelse(res.ts2 > 0, "green", "red"),
xlab="E", ylab="N",
main="Residuals from 2nd-order trend",
sub="Positive: green; negative: red", asp=1)
grid()

20
Q18 : Is there an overall pattern to the residuals? Is there local spa-
tial correlation without an overall pattern? Does there seem to be any
anisotropy (stronger spatial dependence in one direction than the or-
thogonal direction)? Jump to A18

Since this second-order trend surface is much better than the first-order
trend surface, we will use it for subsequent modelling.

6 Trend surface prediction

We now use the trend surface model of the previous section to predict
over the study area, discretized as an interpolation grid at some resolu-
tion that we choose.

6.1 Creating a prediction grid

We first make a grid over which to predict.

Task 24 : Create a grid of equally-spaced (1 x 1 km) points across the


study area, beginning with UTM (500 000E, 4150 000N) in the lower-left
corner, as in Davis [3, Fig. 5-100, 5-101, 5-102], but adjusted for the
reduced coördinates. •

Note: The choice of grid resolution depends on (1) the support of the
observations used to model the trend surface, (2) the resolution needed
by the map user; (3) for larger areas, computer memory and processing
time.

21
Here the observations are essentially point support, so there is no mini-
mum grid size. The map user will use this to decide on whether to drill
a well into the aquifer, based on cost which depends on the depth from
the surface. In this area the surface elevation is quite uniform and does
not vary much. Also, the aquifer in this area does not have sharp changes
in structure, it is gently dipping to the E, with a slight doming. We know
from the 1st-order OLS model that for each km E (direction of maximum
dip) the aquifer elevation decreases by 554.78 m, which we suppose is
hardly significant to the well driller. Indeed, a coarser grid would proba-
bly be sufficient.

The seq function creates a regular sequence of numbers; the expand.grid


function makes a grid from two sequences.
First, find the bounding box:
range(aq$e); range(aq$n)

## [1] -33.00497 41.06325


## [1] -46.99025 51.07400

Then use these, rounded below and above to the nearest kilometer, as
the limits of the two axes:
seq.e <- seq(-33, 42, by=1)
seq.n <- seq(-47, 52, by=1)

Finally, use expand.grid to make a complete grid from these sequences:


grid <- expand.grid(e=seq.e, n=seq.n)
plot(grid$n ~ grid$e, cex=0.2, asp=1)
40
20
grid$n

0
−20
−40

−40 −20 0 20 40 60

grid$e

6.2 Mapping the trend surface

Task 25 : Interpolate the second-order trend surface onto this grid.

22
Compute both the best fit and a 95% prediction interval for each point
on the grid. •
The predict.lm function, applied to a linear model object, computes
the predicted values at new locations, in this case the regular grid. The
optional interval argument specifies that a prediction interval, as well
as the best fits, should also be computed. The optional level argu-
ment specifies the (1 − α) probability, where α is the probability that, on
repeated calculation from a similar sample, the true value at the point
would not be included in the computed prediction interval.
pred.ts2 <- predict.lm(model.ts2, newdata=grid,
interval="prediction", level=0.95)
summary(pred.ts2)

## fit lwr upr


## Min. :461.1 Min. :449.0 Min. :473.2
## 1st Qu.:516.4 1st Qu.:505.2 1st Qu.:527.7
## Median :547.3 Median :536.1 Median :558.6
## Mean :546.7 Mean :535.4 Mean :558.0
## 3rd Qu.:577.0 3rd Qu.:565.7 3rd Qu.:588.2
## Max. :614.6 Max. :603.2 Max. :626.0

The predict.lm produces three fields in the resulting object: fit (the
best fit value), lwr (the value at the lowest 2.5% limit) and upr (the value
at the upper 2.5% limit).
The prediction interval is a range in which future observations are ex-
pected to fall, with a given probability specified by the analyst. It is
based on the known observations and the regression model..
There are two sources of prediction error:
1. The uncertainty of fitting the best regression parameters from the
available data;
2. The uncertainty in the prediction, even with perfect regression pa-
rameters, because of uncertainty in the process which is revealed
by the regression, i.e., the inherent noise in the process.
The prediction interval is computed from the prediction variance, which
is then assumed to represent the variance of a t-distribution.
The prediction variance sY20 for predictand x0 depends on the variance of
the regression sY2 .x but also on the distance of the predictor x0 from the
value of the predictor at the centroid of the regression, x. The further
from the centroid, the more any error in estimating the slope of the line
will affect the prediction:
1 (x0 − x)2
 
2 2
sY0 = sY .x 1 + + n P
2
(5)
n i=1 (xi − x)
where x refers to both coördinates.
The variance of the regression sY2 .x is computed from the squared devia-
tions of actual (yi ) and estimated (yb i values:
n
1 X
sY2 .x = b i )2
(yi − y (6)
n − 2 i=1

23
To display a map of the interpolated surface, it’s easiest to format the
grid as a spatial object, so that the plotting method spplot‘spatial plot”
can be used.

Task 26 : Convert the grid to a spatial object: a SpatialGridDataFrame.



The gridded function specifies that the spatial object is points on a reg-
ular grid. Since this is a complete grid, we can improve computational
efficiency by using the fullgrid function to specify that the grid is com-
plete (“full”). We also ensure that its CRS matches the well points CRS,
using the proj4string function.
coordinates(grid) <- c("e","n")
sp.grid <- SpatialPointsDataFrame(coords=coordinates(grid),
data=as.data.frame(pred.ts2))
gridded(sp.grid) <- TRUE; fullgrid(sp.grid) <- TRUE
proj4string(sp.grid) <- proj4string(aq.sp)

## Warning in proj4string(aq.sp): CRS object has comment, which is lost in output

summary(sp.grid)

## Object of class SpatialGridDataFrame


## Coordinates:
## min max
## e -33.5 42.5
## n -47.5 52.5
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Grid attributes:
## cellcentre.offset cellsize cells.dim
## e -33 1 76
## n -47 1 100
## Data attributes:
## fit lwr upr
## Min. :461.1 Min. :449.0 Min. :473.2
## 1st Qu.:516.4 1st Qu.:505.2 1st Qu.:527.7
## Median :547.3 Median :536.1 Median :558.6
## Mean :546.7 Mean :535.4 Mean :558.0
## 3rd Qu.:577.0 3rd Qu.:565.7 3rd Qu.:588.2
## Max. :614.6 Max. :603.2 Max. :626.0

Task 27 : Display the best-fit interpolation, with the data points super-
imposed. •
The spplot “spatial plot” method plots spatial objects, i.e., those in one
of the sp classes.
The fit field of the prediction object contains the trend surface fits.
We save this plot for comparison later with the Generalized Least Squares
(GLS) trend surface (§10).
ts.plot.breaks <- seq(440, 640, by=5)
p.ols <- spplot(sp.grid, zcol="fit",
sub="2nd-order trend, OLS fit",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,

24
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(res.ts2 < 0, "red", "black"),
cex=2*abs(res.ts2)/max(abs(res.ts2)))
})

print(p.ols)

Aquifer elevation, m.a.s.l.

600

550
North

500

450

East
2nd−order trend, OLS fit

In this plot the residual from the model at each observation point is
shown (1) in colour: red = negative (actual < predicted), black = positive
(actual > predicted). If a prediction is exactly on the trend surface it will
not appear. This gives a nice visualization of the fit of the trend surface
to the sample points.

Note: The spplot method in the sp package makes use of the levelplot
method of the lattice graphics package. Unlike base graphics, in lattice
all plotting must be done at once; you can’t start a plot and add more
later. Graphical elements are added with a panel function, introduced
with the panel argument. This function may contain many methods to
draw graphic elements. In this case there is panel.levelplot to draw
the levelplot (trend surface) and panel.points to place a set of points
on top of it.

Q19 : How well does the trend surface fit the points? Are there obvious
problems? Jump to A19 •

Task 28 : Summarize the uncertainty from the trend surface, as abso-


lute differences between the upper and lower prediction limits, and then
this as a percentage of the best fit value. •

25
summary(sp.grid$lwr)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 449.0 505.2 536.1 535.4 565.7 603.2

summary(sp.grid$diff.range <- sp.grid$upr - sp.grid$lwr)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 22.30 22.34 22.46 22.58 22.73 24.36

summary(100*sp.grid$diff.range/sp.grid$fit)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 3.706 3.900 4.101 4.150 4.358 5.252

The lwr “lower” and upr “upper” fields of the prediction object contain
the lower and upper limits of the 95% prediction interval for each point
on the grid. Their difference is the range of uncertainty; this divided by
the fit is an approximation to 2 standard deviations.

Task 29 : Display the prediction interval of the trend surface as a map,


showing also the location of the observation points. •
spplot(sp.grid, zcol="diff.range", col.regions=terrain.colors(64),
main="Range of 95% prediction interval",
sub="2nd order trend, OLS fit",
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp),
pch=20, col="white")}
)

Note: Here we only show the locations of the observations, because their
values do not affect the prediction variance, once the model is fit.

26
Q20 : What are the units of prediction interval? How large are they?
How does this compare to the variable we are trying to predict? Jump
to A20 •

Q21 : Describe the spatial pattern of the prediction interval. Jump to


A21 •

7 Generalized Additive Models

Generalized Additive Models (GAM) are similar to multiple linear regres-


sion, except that each term in the linear sum of predictors need not be
the predictor variable itself, but can be an empirical smooth function of
it. So instead of the linear model of k predictors:
X
y i = β0 + βk xk,i + εi (7)
k

we allow functions fk of these:


X
yi = β0 + fk (xk,i ) + εi (8)
k

The advantage is that non-linear relations in nature can be fit, without


any need to try transformations or to fit piecewise regressions. If this
is a better model fit, it should result in better predictions. The model
is additive, so the marginal contribution of each predictor to the model
fit can be determined. The disadvantage is that it is just an empirical fit
and can not be extrapolated beyond the range of calibration. A further
disadvantage is that the choice of function is arbitrary; it is generally
some smooth function of the predictor, with the degree of smoothness
determined by cross-validation.

Note: The GAM should never be extrapolated (there is no data to support


it), whereas a polynomial can, with caution, be extrapolated, on the theory
that the data used to fit the model extends outside the range. This is of
course very dangerous for higher-order polynomials, which are a main
competitor to GAM.

Hastie et al. [5, §9.1] give a thorough explanation of GAM; a simplified


explanation of the same material is given in James et al. [9, §7.7]. In a
geostatistical setting, we can choose the coördinates as the predictors
(as in a trend surface) but fit these with smooth functions, rather than
polynomials. We can also fit any other predictor this way, e.g., in this
example, the elevation.
The smooth functions can be chosen in many ways; the most common
are cubic splines with knots at each value of the predictor.
But we first examine whether a smooth curve, rather than one line (as in
linear regression) better matches the dependence of the annual ground
water level by the two coördinates.

27
Task 30 : Display a scatterplot of the three predictors against the
annual GDD50, with an empirical smoother. •
We use the ggplot2 graphics package to produce the scatterplot and
show a smoother with standard error. A simple way to visualize the
trend is with a local polynomial regression, provided with the loess
function, and incorporated into the scatterplot with the geom_smooth
function.

Note: The loess function has an span argument, which controls the
degree of smoothing by setting the neighbourhood for the local fit as a
proportion of the number of points. The default span=0.75 thus uses
the 3/4 of the total points closest each point. These are then weighted so
that closer points have more weight; see ?loess for details. The default
works well in most situations, and here we only want a visual impression,
not a “best fit” in a statistical sense.

We use the gridExtra package, which includes a function grid.arrange


to arrange saved plots in a grid.

require(ggplot2)
g1 <- ggplot(aq, aes(x=UTM.N, y=zm)) +
geom_point() +
geom_smooth(method="loess")
g2 <- ggplot(aq, aes(x=UTM.E, y=zm)) +
geom_point() +
geom_smooth(method="loess")
require(gridExtra)
grid.arrange(g1, g2, ncol = 2)

## ‘geom_smooth()‘ using formula ’y ~ x’


## ‘geom_smooth()‘ using formula ’y ~ x’

Q22 : Do these marginal relations appear to be linear in the predictors?

28
Jump to A22 •

7.1 Fitting a Generalized Additive Model

GAM can be fit in R with the gam function of the mgcv “Mixed GAM Com-
putation Vehicle” package. This specifies the model with a formula, as
with lm, but terms can now be arbitrary functions of predictor variables,
not just the variables themselves or simple transformations that apply
to the whole range of the variable, e.g. sqrt or log. Smooth functions
of one or more variables are specified with the s function of the mgcv
package.

Task 31 : Load the mgcv package into the workspace. •


require(mgcv)

Common practice in GAM for models using coördinates is to smooth


them together with a bivariate smoother, by default a thin plate regres-
sion spline. These control the degree of smoothness by penalizing in-
creasingly complex models, i.e., those with more curvature; see the help
text ?s “defining smooths in GAM formulae” for details. In practice the
default parameters work well.

Task 32 : Fit a GAM to the aquifer elevation at the observation sta-


tions, with the predictors being a two-dimensional thin-plate spline of
the coördinates. •
We bring the adjusted coördinates into the data frame as regular fields,
because the s function does not understand spatial objects.
aq$E <- coordinates(aq.sp)[,1]
aq$N <- coordinates(aq.sp)[,2]
model.gam <- gam(zm ~ s(E, N), data=aq)
summary(model.gam)

##
## Family: gaussian
## Link function: identity
##
## Formula:
## zm ~ s(E, N)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 551.0134 0.2817 1956 <2e-16
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(E,N) 27.35 28.83 542.2 <2e-16
##
## R-sq.(adj) = 0.99 Deviance explained = 99.2%
## GCV = 15.506 Scale est. = 12.776 n = 161

Q23 : How well does this model fit the calibration observations? Jump
to A23 •

29
Task 33 : Compare the residuals from the GAM with those from the
best linear trend surface, i.e., the quadratic. •
summary(residuals(model.gam))

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -14.51830 -1.15766 0.08794 0.00000 1.51223 8.56727

summary(residuals(model.ts2))

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -19.847 -3.366 0.822 0.000 3.538 14.807

par(mfrow=c(1,2))
hist(residuals(model.gam), xlim=c(-20, 20),
breaks=seq(-20, 20,by=4), main="Residuals from GLS")
rug(residuals(model.gam))
hist(residuals(model.ts2), xlim=c(-20, 20),
breaks=seq(-20,20,by=4), main="Residuals from 2nd-order OLS trend")
rug(residuals(model.ts2))
par(mfrow=c(1,1))

Q24 : Which histogram shows the narrowest spread? Jump to A24 •

An important consideration is whether the residuals have any spatial


structure.

Task 34 : Plot the residuals as a bubble plot. •


aq.sp@data$resid.model.gam <- residuals(model.gam)
bubble(aq.sp, zcol="resid.model.gam", pch=1, main="Residuals from GAM")

30
vr <- variogram(resid.model.gam ~ 1, loc=aq.sp)
plot(vr, pl=T)

Q25 : Does there appear to be any local spatial correlation of the resid-
uals? Does the empirical variogram support your conclusion? Jump to
A25 •

The plot.gam function of the mgcv package displays the marginal smooth
fit. For the 2D surface (model term s(E,N), this is shown as a wireframe
plot if the optional scheme argument is set to 1. The select argument
selects which model term to display. We orient it to see lowest elevation
towards viewer, using the theta argument:
plot.gam(model.gam, rug=T, se=T, select=1,
scheme=1, theta=30+130, phi=30)

31
Q26 : How does the GAM trend differ from a polynomial trend surface?
Jump to A26 •

This surface can also be shown with the vis.gam function of the mgcv
package, also showing ± 1 standard error of fit:
vis.gam(model.gam, plot.type="persp", color="terrain",
theta=160, zlab="elevation", se=1.96)

32
The fit is very good, the standard error is quite small.

Task 35 : Compute the RMSE of the GAM model, i.e., compared to fits
with the actual values. •
(rmse.gam <- sqrt(sum(residuals(model.gam)^2)/length(residuals(model.gam))))

## [1] 3.244445

The RMSE is 3.24; this is a very small error.

7.2 GAM prediction over the study area

Since we now have a model which uses the coördinates, which are known
across the prediction grid, we can use the model to predict over the grid.

Task 36 : Predict the aquifer elevation, and the standard error of pre-
diction, across the prediction grid, using the fitted GAM, and display the
predictions. •
The predict.gam function predicts from a fitted GAM. The se.fit op-
tional argument specifies that the standard error of prediction should

33
also be computed.
We first make a data.frame version of the grid, to get the coordinates
in the same form as the model.
grid.df <- as.data.frame(grid)
names(grid.df)

## [1] "e" "n"

names(grid.df) <- c("E", "N")


summary(grid.df)

## E N
## Min. :-33.00 Min. :-47.00
## 1st Qu.:-14.25 1st Qu.:-22.25
## Median : 4.50 Median : 2.50
## Mean : 4.50 Mean : 2.50
## 3rd Qu.: 23.25 3rd Qu.: 27.25
## Max. : 42.00 Max. : 52.00

We then predict onto this grid into a temporary object, because it will
have two fields (columns): the prediction and its standard error.
tmp <- predict.gam(object=model.gam, newdata=grid.df, se.fit=TRUE)
summary(tmp$fit)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 478.9 516.4 546.6 546.8 576.0 621.6

summary(tmp$se.fit)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 1.182 1.323 1.411 1.540 1.635 3.763

We then add these to the spatial grid.


sp.grid$pred.gam <- tmp$fit
sp.grid$pred.gam.se <- tmp$se.fit

Task 37 : Display the map of the predicted surface. •


res.gam <- residuals(model.gam)
p.gam <- spplot(sp.grid, zcol="pred.gam",
sub="GAM fit",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(res.gam < 0, "red", "black"),
cex=2*abs(res.gam)/max(abs(res.gam)))
})

print(p.gam)

34
Aquifer elevation, m.a.s.l.

600

550

North
500

450

East
GAM fit

As with the OLS and GLS maps, in this plot the residual from the model
at each observation point is shown (1) in colour: red = negative (actual <
predicted), black = positive (actual > predicted). If a prediction is exactly
on the trend surface it will not appear. This gives a nice visualization of
the fit of the trend surface to the sample points.

Q27 : How well does the GAM trend surface fit the points? Are there
obvious problems? Jump to A27 •

Task 38 : Display the map of the standard errors of prediction. •


p.gam.se <- spplot(sp.grid, zcol="pred.gam.se",
sub="GAM fit",
main="s.d. Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
col.regions = terrain.colors(length(ts.plot.breaks)))

print(p.gam.se)

35
s.d. Aquifer elevation, m.a.s.l.

3.5

3.0

North
2.5

2.0

1.5

East
GAM fit

Consistent with the marginal plots, we see that the standard error is
highest at the edges, but there is some local pattern due to the local
adjustments of the GAM.
An obvious question is where this map differs from the parametric trend
surfaces.

Task 39 : Compute the differences between the trend surfaces pro-


duced by the GAM and the 2nd-order OLS trend surface predictions over
the grid, summarize numerically, and display as a difference map. •
summary(sp.grid$diff.gam.ols <-
sp.grid$pred.gam -sp.grid$fit)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -28.18563 -6.52483 -0.75114 0.07674 6.73482 32.31116

spplot(sp.grid, zcol="diff.gam.ols", sub="GAM - 2nd order OLS fits",


main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))

36
Q28 : Where are the largest differences between the GAM and 2nd order
OLS trend surface predictions? Explain why, considering how the two
surfaces are computed. Jump to A28 •

8 Thin-plate spline interpolation

A quick way to see the distribution of a variable in space as a surface is


with an empirical method that adjusts locally to the data. A common em-
pirical method is thin-plate splines (TPS), also referred to as “minimum
curvature” surfaces, which are implemented in the fields package. The
theory of thin-plate splines is explained in the Appendix, §C.
TPS is the mathematical equivalent of a thin (so, flexible) plate that is
warped to fit the data. This can range from very “rigid”, i.e., just a sin-
gle surface (the usual least-squares plane of a first-order trend surface)
to very “flexible”, i.e., perfectly fitting every observation. In general we
want something in between: if we think there is an overall surface we
just fit it as one polynomial (first, second . . . order polynomials on the
coördinates), but if we want to fit more locally, we must expect local
noise which should be somehow locally averaged-out.

Task 40 : Set up for thin-plate splines and compute the minimum-


curvature spline, subject to roughness constraint determined by gener-
alized cross-validation. •
The Tps function of the fields package compute this; however the coör-
dinates must be formatted as a matrix field in the dataframe, using the
matrix function.

37
require(fields)
aq.tps <- aq[, c("E","N", "zm")]
aq.tps$coords <- matrix(c(aq.tps$E, aq.tps$N), byrow=F, ncol=2)
str(aq.tps$coords)

## num [1:161, 1:2] 36.1 39.8 26.6 20.1 17 ...

surf.1 <-Tps(aq.tps$coords, aq.tps$zm)

## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 8.53564e-06 (eff. df=
## 152.95 )

summary(surf.1)

## CALL:
## Tps(x = aq.tps$coords, Y = aq.tps$zm)
##
## Number of Observations: 161
## Number of unique points: 161
## Number of parameters in the null space 3
## Parameters for fixed spatial drift 3
## Effective degrees of freedom: 152.9
## Residual degrees of freedom: 8.1
## MLE sigma 0.6747
## GCV sigma 0.7098
## MLE rho 53330
## Scale passed for covariance (rho) <NA>
## Scale passed for nugget (sigma^2) <NA>
## Smoothing parameter lambda 8.536e-06
##
## Residual Summary:
## min 1st Q median 3rd Q max
## -0.6250000 -0.0701000 -0.0007448 0.0793500 0.5199000
##
## Covariance Model: Rad.cov
## Names of non-default covariance arguments:
## p
##
## DETAILS ON SMOOTHING PARAMETER:
## Method used: GCV Cost: 1
## lambda trA GCV GCV.one GCV.model shat
## 8.536e-06 1.529e+02 1.008e+01 1.008e+01 NA 7.098e-01
##
## Summary of all estimates found for lambda
## lambda trA GCV shat -lnLike Prof converge
## GCV 8.536e-06 152.9 10.08 0.7098 462.5 NA
## GCV.model NA NA NA NA NA NA
## GCV.one 8.536e-06 152.9 10.08 0.7098 NA NA
## RMSE NA NA NA NA NA NA
## pure error NA NA NA NA NA NA
## REML 7.654e-05 117.7 11.24 1.7383 461.6 3

Task 41 : Predict over the study area grid using the fitted thin-plate
spline. •
The predict.Krig method of the fields package computes the pre-
diction. Again, the coördinates must be a matrix. We have these im-
plicitly in the grid, but we need to list them explicitly by converting to
SpatialPoints, and then convert to a matrix.
grid.coords <- as(sp.grid, "SpatialPoints")
summary(grid.coords)

38
## Object of class SpatialPoints
## Coordinates:
## min max
## e -33 42
## n -47 52
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Number of points: 7600

grid.coords.m <- as.matrix(grid.coords@coords)

surf.1.pred <- predict.Krig(surf.1, grid.coords.m)


str(surf.1.pred)

## num [1:7600, 1] 577 575 574 573 572 ...

summary(sp.grid$pred.tps <- as.vector(surf.1.pred))

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 475.3 515.9 546.0 546.8 575.8 623.8

Now we have a matrix of predictions.

Task 42 : Compute and summarize the residuals. •


The predict.Krig function does not compute residuals. So, to deter-
mine the predicted value at any location, here the data points, the kriged
surface must be appended to the SpatialPixelsDataFrame version of
the grid, and then the value at a location with the over “spatial over-
lay” method of the sp package. This queries its second argument (layer
from which attributes are queries) and assigns attribute values to the
geometries in the first argument. We compute the difference of this pre-
dicted value from the actual value at that point, and add this as a field
in the points spatial object, to display on the map of thin-plate spline
predictions.
summary(aq.sp$resid.tps <- (over(aq.sp, sp.grid)$pred.tps - aq$zm))

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -2.084026 -0.358348 0.041853 0.008678 0.405389 1.622130

hist(aq.sp$resid.tps, main="Thin-plate spline residuals", breaks=12)


rug(aq.sp$resid.tps)

39
Task 43 : Display the gridded prediction, with the residuals over-
printed. •
spplot(sp.grid, zcol="pred.tps",
xlab="East", ylab="North",
at=ts.plot.breaks,
main="Aquifer elevation, m.a.s.l.",
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(aq.sp$resid.tps < 0, "red", "black"),
cex=2*abs(aq.sp$resid.tps)/max(abs(aq.sp$resid.tps)))
})

40
This adjusts very closely to the data points.

Task 44 : Compute and display the difference between the thin-plate


spline and the GAM predictions. •
summary(sp.grid$diff.tps.gam <-
sp.grid$pred.tps -sp.grid$pred.gam)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -24.09195 -6.67143 0.48857 0.03135 7.13101 27.91866

spplot(sp.grid, zcol="diff.tps.gam", sub="Thin-plate spline -- GAM fits",


main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))

There are some very large differences, even though GAM allows some
local deviations from a trend.

9 Spatial correlation of trend surface residuals

In trend surface analysis of §5.1 (1st order) and §5.2 (2nd order) we
showed that the residuals from the OLS trend surface models are not
spatially independent – there are local clusters of similar values as re-
vealed by the bubble plot. This implies the trend surface should in fact
be fit not by OLS but by Generalized Least Squares (GLS), taking into
account the spatial auto-correlation of the residuals. We pursue this fur-
ther in §10. Here we investigate the spatial structure of the residuals
The spatial structure of the residuals can be modelled with a variogram;
this structure can then be used to adjust the trend surface with GLS

41
(§10.1, below). In this section we examine the empirical variogram of the
residuals, and later use it to initialize the GLS estimate.

Task 45 : Add the second-order trend-surface predictions and residu-


als as fields to the aq data frame. •
The fitted method extracts fitted values from a linear model object; the
residuals method extracts the residuals.
aq$fit.ts2 <- fitted(model.ts2)
aq.sp$fit.ts2 <- fitted(model.ts2)
aq$res.ts2 <- residuals(model.ts2)
aq.sp$res.ts2 <- residuals(model.ts2)

9.1 The empirical variogram

To visualize the spatial autocorrelation of the trend surface residuals,


we compute an empirical variogram, which show the relation between
separation distance in geographic space between pairs of points and a
measure of their separation distance in attribute (feature) space. This
measure is called the semivariance: γ of one point-pair (si , sj ):

1
γ(si , sj ) ≡ [z(si ) − z(sj )]2 (9)
2

Because there a large number ((n(n−1))/2) of point-pairs, the separation


distances are usually grouped into ranges, and the average semivariance
γ(h) is computed as:

m(h)
1 X
γ(h) = [z(si ) − z(si + h)]2 (10)
2m(h) i=1

where:
• m(h) is the number of point-pairs separated by vector h, in practice
some range of separations (“bin”);
• these are indexed by i;
• the notation z(si +h) means the “tail” of point-pair i, i.e., separated
from the “head” si by the separation vector h.

Task 46 : Compute and plot the empirical variogram of the residuals


from the second-order surface, with a cutoff of 40 km. •
The variogram function of the gstat package computes the empirical
variogram We show both the variogram cloud and the summarized var-
iogram, which averages the points in the variogram cloud over some
separation ranges; these are called variogram bins.

42
vr.c <- variogram(res.ts2 ~ 1, loc=aq.sp, cutoff=40, cloud=T)
vr <- variogram(res.ts2 ~ 1, loc=aq.sp, cutoff=40)
p1 <- plot(vr.c, col="blue", pch=20, cex=0.5)
p2 <- plot(vr, plot.numbers=T, col="blue", pch=20, cex=1.5)
print(p1, split=c(1,1,2,1), more=T)
print(p2, split=c(2,1,2,1), more=F)

Note: The code to print two variograms side-by-side uses the split and
more optional arguments to the print method for Lattice graphics plots.

The empirical variogram can be characterized by three parameters; here


they are just estimated by eye. These will be defined precisely as part of
variogram modelling, see below.

• c0 the nugget parameter: the semi-variance at zero separation; this


represents a combination of measurement error, sampling error,
and spatial variation at range shorter than the sample support;
• c is the sill parameter, i.e., the maximum variance in the attribute
when point-pairs are widely-separated;
• a is the range parameter, the separation at which there is no more
spatial autocorrelation, so that the semivariance reaches the sill.

Q29 : What are the estimated sill, range, and nugget of this variogram?
Jump to A29 •

In later sections (§11.2) we will see how to model the variogram and
use it in spatial prediction. For now, we continue with a method that
takes spatial correlation of the residuals into acount when computing
the trend.

10 Trend surface analysis by Generalized Least Squares

As explained in §5, the OLS solution is only valid for independent residu-
als. The previous § shows that in this case the residuals are not spatially

43
independent, and we were able to model that dependence with a vari-
ogram model. Thus, using OLS may result in an incorrect trend surface
equation, although the OLS estimate is unbiased. A large number of
close-by points with similar values will “pull” a trend surface towards
them. Furthermore, the OLS R 2 (goodness-of-fit) may be over-optimistic.
This is discussed by Fox [4, §14.1].
The solution is to use Generalised Least Squares (GLS) to estimate the
trend surface. This allows a covariance structure between residuals
to be included directly in the least-squares solution of the regression
equation.
The GLS estimate of the regression coefficients is [2]:

β̂gls = (X T C −1 X)−1 X T C −1 y (11)

where X is the design matrix, C the covariance matrix of the (spatially-


correlated) residuals, and y the vector of observations. If there is no
spatial dependence among the errors, C reduces to Iσ 2 and the estimate
to OLS as in Equation 3.
The covariance matrix C gives the covariance between the residuals at
each pair of points used to determine the β̂gls . Clearly, there is no way to
know the covariance between all the point-pairs, since we only have one
realization of the random field. So we model the covariance as a function
of the separation (usually the distance) between point pairs, similar to
what we did in §9.1, to fit a variogram model. However, we instead fit
a spatial covariance model. This leads us to a further difficulty: the
covariance structure refers to the residuals, but we can’t compute these
until we fit the trend . . . but we need the covariance structure to fit the
trend . . . and so on. This is a classic “which came first: the chicken or the
egg?” problem.
One method to compute the GLS model is iterative:

1. make a first estimate of the trend surface with OLS;


2. compute the residuals;
3. model the covariance structure of the OLS residuals as a function
of their separation;
4. use this covariance structure to determine the weights to compute
the GLS trend surface;
5. repeat steps (2)–(4) until the covariance structure does not change
between iterations.

In many cases only one iteration is necessary. However, theoretically this


is not optimal, because the estimates of the covariance parameters are
biased.
A more elegant solution is to fit the covariance structure at the same
time the trend surface coefficients are computed. The theory and math-
ematics of this are explained in §D.

44
10.1 Computing the GLS trend surface

GLS trend surfaces can be computed in several R packages. The trend


and the covariance must be computed at the same time. This is imple-
mented in the gls function of the nlme package, using Residual Maxi-
mum Likelihood (REML)15 .

Task 47 : Compute the coefficients of a full second-order trend, using


GLS. •
The gls function fits two models at once: the linear model, specified
with the model argument, and the autocorrelation structure, specified
with the correlation argument.
The model argument is the same model formula we use for OLS.
The correlation argument specifies the form of the correlation, and
its initial parameters. Here we specify an exponential correlation with
the corExp function. The form form is two-dimensional in the coördi-
nates, and from examination of the empirical variogram we see there
is no nugget, so we specify nugget as FALSE. The value argument to
corExp specifies starting values for this correlation structure, here the
estimated range parameter. Recall that this is 1/3 of the separation at
which there is no longer any spatial autocorrelation. From the empirical
variogram (§9.1) we estimate this as 30/3 = 10.
require(nlme)
model.ts2.gls <- gls(
model = zm ~ n + e + I(n^2) + I(e^2) + I(e * n),
data = aq,
method="ML",
correlation=corExp(form=~e + n,
nugget=FALSE,
value=10) # initial value of the range parameter
)
class(model.ts2.gls)

## [1] "gls"

summary(model.ts2.gls)

## Generalized least squares fit by maximum likelihood


## Model: zm ~ n + e + I(n^2) + I(e^2) + I(e * n)
## Data: aq
## AIC BIC logLik
## 939.389 964.0403 -461.6945
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~e + n
## Parameter estimate(s):
## range
## 14.42157
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 559.3066 3.362267 166.34805 0.0000
## n -0.0511 0.054687 -0.93440 0.3516
## e -1.5499 0.067526 -22.95261 0.0000
## I(n^2) -0.0055 0.001690 -3.23652 0.0015

15
See §D.3 for the theory and mathematics of REML.

45
## I(e^2) -0.0013 0.002451 -0.51108 0.6100
## I(e * n) 0.0045 0.001948 2.32555 0.0213
##
## Correlation:
## (Intr) n e I(n^2) I(e^2)
## n 0.017
## e 0.057 -0.038
## I(n^2) -0.620 -0.083 0.024
## I(e^2) -0.571 0.029 -0.233 0.099
## I(e * n) 0.031 -0.048 0.012 -0.026 -0.066
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.4096926 -0.5485337 0.1276973 0.6513998 2.0752494
##
## Residual standard error: 6.385512
## Degrees of freedom: 161 total; 155 residual

Notice that the gls method also estimates the range of spatial correla-
tion.

Q30 : What is the range of spatial correlation of the exponential model,


as estimated by gls? Jump to A30 •

This gives different coefficients than the OLS fit.

Task 48 : Compare the coefficients from the GLS and OLS fits, as abso-
lute differences and as percentages of the OLS fit. •
The generic coef method extracts coefficients from model objects.
coef(model.ts2.gls) - coef(model.ts2)

## (Intercept) n e I(n^2) I(e^2)


## -1.7509493085 -0.0345445195 0.0708594856 0.0020310694 0.0003950789
## I(e * n)
## -0.0021705328

round(100*(coef(model.ts2.gls) - coef(model.ts2))
/coef(model.ts2),1)

## (Intercept) n e I(n^2) I(e^2)


## -0.3 208.7 -4.4 -27.1 -24.0
## I(e * n)
## -32.4

Q31 : Why are the GLS coefficients different than the OLS coefficients?
Jump to A31 •

Task 49 : Display the 90% confidence intervals for the GLS model pa-
rameters. •
The generic intervals method has a specific method for a fitted GLS
model; internally this is the intervals.gls function of the nlme pack-
age.

46
intervals(model.ts2.gls, level=0.90)

## Approximate 90% confidence intervals


##
## Coefficients:
## lower est. upper
## (Intercept) 553.742944672 559.306635752 564.870326832
## n -0.141592157 -0.051099363 0.039393431
## e -1.661627308 -1.549889672 -1.438152036
## I(n^2) -0.008264555 -0.005468607 -0.002672660
## I(e^2) -0.005307723 -0.001252487 0.002802750
## I(e * n) 0.001306510 0.004529404 0.007752298
## attr(,"label")
## [1] "Coefficients:"
##
## Correlation structure:
## lower est. upper
## range 8.14054 14.42157 25.54889
## attr(,"label")
## [1] "Correlation structure:"
##
## Residual standard error:
## lower est. upper
## 5.006231 6.385512 8.144803

10.2 Predicting from the GLS trend surface

Task 50 : Predict over the grid with the GLS trend. •


The predict generic method has a specific method for a fitted GLS
model; internally this is the predict.gls function of the nlme package.
pred.ts2.gls <- predict(model.ts2.gls, newdata=grid)
summary(pred.ts2.gls)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## 473.4 517.8 547.5 547.0 576.3 610.9

Task 51 : Display the GLS 2nd-order trend surface, with the data points
superimposed, side-by-side with the OLS 2nd-order trend surface com-
puted in §6.2. •
First we need to compute and store the residuals, to be displayed on the
trend surface:
res.ts2.gls <- residuals(model.ts2.gls)

The spplot “spatial plot” method plots spatial objects, i.e., those in one
of the sp classes.
The fit field of the prediction object contains the trend surface fits.

47
sp.grid$gls.fit <- pred.ts2.gls
p.gls <- spplot(sp.grid, zcol="gls.fit",
sub="2nd-order trend, GLS fit",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(res.ts2.gls < 0, "red", "black"),
cex=2*abs(res.ts2.gls)/max(abs(res.ts2.gls)))
})
print(p.ols, split=c(1,1,2,1), more=T)
print(p.gls, split=c(2,1,2,1), more=F)

Q32 : Describe the differences between these surfaces. Jump to A32 •

An obvious question is how much these differ, and where.

Task 52 : Compute the difference between the OLS and GLS trend
surfaces, and map them. •
sp.grid$diff.gls.ols <- sp.grid$gls.fit - sp.grid$fit
spplot(sp.grid, zcol="diff.gls.ols", sub="GLS-OLS fits",
main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))

48
Q33 : Where are the largest differences between the OLS and GLS trend
surfaces? Explain why. Jump to A33 •

11 Local interpolation of the residuals

The trend surface fits an overall trend, but of course does not fit every
observation exactly. This lack of fit can be pure noise, but it can also
have a spatially-correlated component which can be modelled and used
to improve the predictions.

11.1 Visualizing the residuals

Task 53 : Display the residuals from the GLS trend surface as a post-
plot. •
summary(res.ts2.gls)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -21.77263 -3.50267 0.81541 -0.05721 4.15952 13.25153

plot(aq$n ~ aq$e, cex=3*abs(res.ts2.gls)/max(abs(res.ts2.gls)),


col=ifelse(res.ts2.gls > 0, "green", "red"),
xlab="E", ylab="N",
main="Residuals from 2nd-order trend, GLS fit",
sub="Positive: green; negative: red", asp=1)
grid()

49
We can see from this post-plot of the residuals that there is local spatial
correlation. The GLS fit optimized the estimates of the trend surface co-
efficients, and correctly estimated the spatial correlation of the residuals,
but did not correct for this in mapping.

Task 54 : Compute the empirical variogram model residuals from the


GLS trend surface model. •
First extract the residuals into the point observations object, compute
the empirical variogram, and display it to estimate the variogram model
parameters.
aq$res.ts2.gls <- residuals(model.ts2.gls)
vr.gls <- variogram(res.ts2.gls ~ 1, loc=aq.sp)
plot(vr.gls, plot.numbers=T,
main="Residuals from second-order GLS trend")

50
Q34 : What are the approximate variogram parameters? Jump to A34 •

11.2 Variogram modelling

In the kriging formula (see below, §11.2.1), we need to compute the semi-
variance at any separation distance. Therefore, we need to fit a vari-
ogram function to the empirical variogram. This function represents
the structure of the spatial autocorrelation of the attribute, in this case
the trend surface residual.
There are many authorized variogram functions, that will ensure that
the kriging system can be solved. One of the most common is the expo-
nential function:

 
h


γ(h) = c 1 − e a (12)

Where:
• h is the separation distance between a point-pair; this is the argu-
ment to the function which changes with each point-pair;
• c is the fitted sill parameter, i.e., the maximum variance in the at-
tribute when point-pairs are widely-separated’
• a is the fitted range parameter.
The effective range 3a is the separation distance at which γ = 0.95c.

Task 55 : Fit an exponential variogram function and display the fitted


model on the empirical variogram. •
To fit a variogram function to an empirical variogram, one method is
to estimate the parameters by eye, and adjust them until they seem to
match the empirical variogram. A more objective way is to use the initial

51
estimates as a starting point for the fit.variogram function, which
adjusts the parameters by weighted least-squares.

Note: A commonly-used empirical weighting is proportional to the num-


ber of point-pairs in a bin (giving more weight to bins with more evidence)
and inversely proportional to the average separation in the bin (giving
more weight to the close-range portion of the variogram, where most of
the kriging weights are determined).

The vgm function specifies a variogram model and its parameters. We


then adjust it with fit.variogram. Recall, we estimated the parameters
by eye in the previous answer. We must divide our estimate of the range
by 3 to obtain the a parameter of the exponential model.
vr.m.estimate <- vgm(psill=40, model="Exp", range=30/3, nugget=0)
(vr.gls.m.f <- fit.variogram(vr.gls, vr.m.estimate))

## model psill range


## 1 Nug 0.00000 0.00000
## 2 Exp 43.75809 14.16883

plot(vr.gls, model=vr.gls.m.f, plot.numbers=T)

Task 56 : Compare the range parameter of this fitted variogram with


the range parameter estimate from the GLS fit. •
print(vr.gls.m.f)

## model psill range


## 1 Nug 0.00000 0.00000
## 2 Exp 43.75809 14.16883

intervals(model.ts2.gls)$corStruct[2]

## [1] 14.42157

Q35 : Does the range parameter of this fitted model agree with the
estimate from the GLS fit? Jump to A35 •

52
11.2.1 Ordinary Kriging

Once we know the structure of the residuals, their values, and their loca-
tions, we can predict their values at all locations (e.g., over the grid), by
Ordinary Kriging interpolation.
To do this, we first need to understand OK.
Kriging is a form of linear prediction of the attribute value at an un-
known point z(s
b 0 ), as a weighted sum of the attribute values at the
known points z(si ):

N
X
z(s
b 0) = λi z(si ) (13)
i=1

The weights λi must sum to 1, and are determined by solving the krig-
ing system of equations. This system ensures that the prediction has the
least possible prediction variance, i.e., uncertainty, among all the possi-
ble weights. Therefore OK is called the Best Linear Unbiased Predictor
(BLUP).
Here we do not derive the system, but present it. The weights are the
solution of the linear equation Aλ = b where:

γ(s1 , s1 ) γ(s1 , s2 ) ··· γ(s1 , sN ) 1


 

 γ(s2 , s1 ) γ(s2 , s2 ) ··· γ(s2 , sN ) 1 

 .. .. .. .. 
A = 
 . ···. . .


 γ(sN , s1 ) γ(sN , s2 ) · · · γ(sN , sN ) 1
 

1 1 ··· 1 0

λ1 γ(s1 , s0 )
   

 λ2 


 γ(s2 , s0 ) 

 ..   .. 
λ = 
 .
 b=
  .


 λN  γ(sN , s0 )
   
 
ψ 1

All of the semivariances γ in these formulas are computed from the


fitted variogram model, by substituting the separation betwen the known
points (si , sj j) in the A matrix, and between the known points and the
prediction point (si , s0 ) in the b vector.
The kriging system is solved by matrix inversion and multiplication as:

λ = A−1 b

These weights can then be used in the prediction formula Equation 13.
b 2 = bY λ.
They also can be used to compute the prediction variance as σ

53
11.2.2 OK predictions

Task 57 : Predict over the grid by OK, using the fitted variogram model.

The krige function computes kriging predictions and their variances.
kr <- krige(res.ts2.gls ~ 1, loc=aq.sp, newdata=sp.grid, model=vr.gls.m.f)

## Warning in proj4string(d$data): CRS object has comment, which is lost in output

## Warning in proj4string(newdata): CRS object has comment, which is lost in output

## [using ordinary kriging]

## Warning in proj4string(newdata): CRS object has comment, which is lost in output

summary(kr)

## Object of class SpatialGridDataFrame


## Coordinates:
## min max
## e -33.5 42.5
## n -47.5 52.5
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Grid attributes:
## cellcentre.offset cellsize cells.dim
## e -33 1 76
## n -47 1 100
## Data attributes:
## var1.pred var1.var
## Min. :-20.82445 Min. : 0.1369
## 1st Qu.: -3.14113 1st Qu.: 7.9602
## Median : -0.03776 Median :10.2017
## Mean : -0.37273 Mean :10.8357
## 3rd Qu.: 2.78306 3rd Qu.:12.4351
## Max. : 12.83216 Max. :33.7259

Note: Notice that the mean kriging prediction is not zero.

Task 58 : Display the kriging predictions and their prediction standard


deviations. •
p1 <- spplot(kr, zcol="var1.pred", col.regions=bpy.colors(64),
main="Residuals from GLS trend, m")
print(p1)

54
kr$var1.sd <-sqrt(kr$var1.var)
p2 <- spplot(kr, zcol="var1.sd",
col.regions=cm.colors(64),
main="Kriging prediction standard deviation, m")
print(p2)

Q36 : Which areas were most changed by interpolating the residuals?


Why? Jump to A36 •

Q37 : Which areas have the most and least uncertainty? Why? Jump to

55
A37 •

12 GLS-Regression Kriging

Now we have both parts of a universal model: a global trend and local
deviations from it. We can combine these for a “best” prediction.
To understand this, we introduce the so-called universal model of spa-
tial variation:

Z(s) = Z ∗ (s) + ε(s) + ε0 (s) (14)


where:
• (s) is a location in space, designated by a vector of coördinates;
• Z(s) is the true (unknown) value of some property at the location;
• Z ∗ (s) is the deterministic component, due to some non-stochastic
process, i.e., the trend surface;
• ε(s) is the spatially-autocorrelated stochastic component of the
deviations from the trend;
• ε0 (s)is the pure (“white”) noise with no structure; this can not be
modelled.
We have seen how to model Z ∗ (s) + ε0 (s) with an OLS polynomial trend
surface in §5. We have seen how to model the local spatial structure as
ε(s) + ε0 (s) by Ordinary Kriging (OK) in §11. Both are modelled together
in GLS (§10), but that trend surface still leaves residuals ε(s) + ε0 (s) to
be accounted for.
When these two are combined, the method is called Generalized Least
Squares Trend – Regression Kriging (GLS-RK).

Task 59 : Add the OK predictions of the GLS residuals to the prediction


grid object, and then add this to the GLS trend surface prediction to
obtain a final prediction. •
The kriging prediction object was built from the spatial grid, so it has
the same dimensions.
sp.grid$ok <- kr$var1.pred
sp.grid$rk.gls <- sp.grid$fit + sp.grid$ok

Task 60 : Plot the final prediction. •


The GAM prediction (§7.2) also considered both global and local and
components of the spatial variation. An obvious question is how similar
they are.

56
Task 61 : Compare this predicted surface with the GAM prediction. •
p.rk <- spplot(sp.grid, zcol="rk.gls",
sub="GLS-RK prediction",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
col.regions = topo.colors(length(ts.plot.breaks)))
print(p.rk)

summary(sp.grid$diff.gam.rk.gls <-
sp.grid$pred.gam -sp.grid$rk.gls)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -30.93181 -7.58851 -0.01386 0.44947 8.60389 30.18723

spplot(sp.grid, zcol="diff.gam.rk.gls", sub="GAM - RK-GLS fits",


main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))

57
Q38 : Where are the largest differences between these two trend sur-
face predictions? Explain why, considering how the two surfaces are
computed. Jump to A38 •

13 Universal Kriging (UK)

We showed in §9 that the residuals from the OLS fit are not spatially
independent. We used this fact in §10 to produce a correct trend surface
by GLS. We then modelled the spatial structure of the residuals from the
GLS (not OLS) surface and interpolated these, to make a final map of both
the trend and local variations with GLS-RK (§11).
There is another method of fitting the trend and the local deviations
from it in one step, called “Universal Kriging”, abbreviated as UK. This is
not completely correct theoretically, as we will explain, but if observation
points are well-distributed over the area (as is the case in this exercise)
so that the GLS and OLS trend surfaces are not too different, it provides
a very similar map to GLS-RK, and in one step.

Note: If spatially-complete covariates are used instead of coördinates


in the OLS model, this is called “Kriging with External Drift”, abbreviated
KED.

Recall the universal model of spatial variation:

Z(s) = Z ∗ (s) + ε(s) + ε0 (s) (15)


where:
• (s) is a location in space, designated by a vector of coördinates;

58
• Z(s) is the true (unknown) value of some property at the location;
• Z ∗ (s) is the deterministic component, due to some non-stochastic
process, i.e., the trend surface;
• ε(s) is the spatially-autocorrelated stochastic component of the
deviations from the trend;
• ε0 (s)is the pure (“white”) noise with no structure; this can not be
modelled.
We have seen how to model Z ∗ (s)+ε0 (s) with a polynomial trend surface
in §5. We have seen how to model the local spatial structure as ε(s) +
ε0 (s) by Ordinary Kriging (OK) in §11. UK is an extension of OK that
models the entire Equation 15 in one step.

13.1 Residual variogram

The residual variogram is computed from the OLS surface as in §9, but
directly from the definition of the trend:

Note: The variogram function requires an sp spatial object with the


locations as coördinates and the target variable.
vr <- variogram(zm ~ n + e + I(n^2) + I(e^2) + I(e*n), locations=aq.sp)
plot(vr, plot.numbers=TRUE,
xlab="separation (km)",
ylab="semivariance (m^2)")

485

429 532 574


455
595
571 548
30 613

332

254
374
semivariance (m^2)

20

216

86
10

10 20 30 40

separation (km)

This variogram of the OLS trend surface residuals is then modelled as


before (§11.2):
(vr.m.f <- fit.variogram(vr, vgm(35, "Exp", 30/3, 0)))

## model psill range


## 1 Nug 0.00000 0.00000
## 2 Exp 35.02372 10.09537

59
plot(vr, plot.numbers=TRUE,
xlab="separation (km)",
ylab="semivariance (m^2)",
model=vr.m.f,
main="Fitted residual variogram model")

Fitted residual variogram model

485

429 532 574


455
595
571 548
30 613

332

254
374
semivariance (m^2)

20

216

86
10

10 20 30 40

separation (km)

Task 62 : Compare this fitted variogram with that from the OLS trend
surface residuals, and with the estimate from the GLS fit. •
print(vr.m.f) # OLS trend residuals

## model psill range


## 1 Nug 0.00000 0.00000
## 2 Exp 35.02372 10.09537

print(vr.gls.m.f) # GLS trend residuals

## model psill range


## 1 Nug 0.00000 0.00000
## 2 Exp 43.75809 14.16883

Q39 : How do these fitted variogram parameters compare to those from


the GLS trend surface residuals (§11.2)? Why are they different? Jump
to A39 •

13.2 The Universal Kriging system

Recall from §11.2.1 that Kriging is a form of linear prediction of the at-
tribute value at an unknown point s0 , as a weighted sum of the attribute
values at the known points z(si ):

N
X
z(s
b 0) = λi z(si ) (16)
i=1

60
where the weights λi must sum to 1, and are determined by solving
the kriging system of equations, which ensures that the prediction has
the least possible prediction variance, i.e., uncertainty, among all the
possible weights.
In OK the weights only take into account local spatial autocorrelation. In
UK the weights λi take into account both the global trend and the local
spatial autocorrelation of the trend residuals.
Here we do not derive the system, but present it.
The weights are the solution of the linear equation AU λU = bU where:

 
γ(s1 ,s1 ) ··· γ(s1 ,sN ) 1 f1 (s1 ) ··· fk (s1 )
 .. .. .. .. .. 

 . ··· . . . ··· . 

 
 γ(sN ,s1 ) ··· γ(sN ,sN ) 1 f1 (sN ) ··· fk (sN ) 
 
AU = 
 1 ··· 1 0 0 ··· 0 

f1 (s1 ) ··· f1 (sN ) 0 0 ··· 0
 
 
 .. .. .. .. .. .. .. 
. . . . . . .
 
 
fk (s1 ) ··· fk (sN ) 0 0 ··· 0

 
  γ(s1 ,s0 )
λ1
..
 
 
.
 
···
   
   
   
 λN   γ(sN ,s0 )



   
λU = 
 ψ0 
 bU = 
 1 

 
ψ1  f (x )
 
1 0

   
..
   
···  
.
   
 
ψk  
fk (x0 )

All of the semivariances γ in these formulas are computed from the


fitted variogram model, by substituting the separation betwen the known
points (si , sj ) in the AU matrix, and between the known points and the
prediction point (si , s0 ) in the bU vector.
Then the kriging system is solved by matrix inversion and multiplication
as:

λU = AU −1 bU

These weights can then be used in the prediction formula Equation 16.
b 2 = bU Y λ U .
They also can be used to compute the prediction variance as σ

13.3 Prediction

Now we have the model, the krige function can compute the UK pre-
diction at any location, for example at all the grid points. Note that the
formula given in krige must match that given in variogram.

Task 63 : Predict over the grid with the UK model. •

61
k.uk <- krige(zm ~ n + e + I(n^2) + I(e^2) + I(e*n), locations=aq.sp,
newdata=sp.grid, model=vr.m.f)

## Warning in proj4string(d$data): CRS object has comment, which is lost in output

## Warning in proj4string(newdata): CRS object has comment, which is lost in output

## [using universal kriging]

## Warning in proj4string(newdata): CRS object has comment, which is lost in output

summary(k.uk)

## Object of class SpatialGridDataFrame


## Coordinates:
## min max
## e -33.5 42.5
## n -47.5 52.5
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Grid attributes:
## cellcentre.offset cellsize cells.dim
## e -33 1 76
## n -47 1 100
## Data attributes:
## var1.pred var1.var
## Min. :472.9 Min. : 0.1537
## 1st Qu.:515.6 1st Qu.: 8.8429
## Median :546.2 Median :11.2689
## Mean :546.6 Mean :11.9611
## 3rd Qu.:576.0 3rd Qu.:13.6506
## Max. :623.3 Max. :45.3279

Task 64 : Display the UK predictions. •


p1 <- spplot(k.uk, zcol="var1.pred",
col.regions=topo.colors(64),
main="UK prediction, m")
plot(p1)

UK prediction, m

620

600

580

560

540

520

500

480

62
The obvious question is how close is this one-step procedure to the two-
step procedure of GLS-RK (§12). Both methods take both global and local
structure into account.

Task 65 : Compute the difference between the UK and GLS-RK predic-


tions, and display as a histogram and on a map. •
summary(sp.grid$diff.uk.rk.gls <- k.uk$var1.pred -sp.grid$rk.gls)

## Min. 1st Qu. Median Mean 3rd Qu. Max.


## -4.1558 -1.4263 -0.1576 0.3349 1.4565 9.7464

hist(sp.grid$diff.uk.rk.gls, main="UK - GLS-RK prediction differences",


freq=FALSE, xlab="difference, UK - GLS-RK")

UK − GLS−RK prediction differences


0.20
0.15
Density

0.10
0.05
0.00

−5 0 5 10

difference, UK − GLS−RK

spplot(sp.grid, zcol="diff.uk.rk.gls", sub="UK - GLS-RK predictions",


main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))

63
difference, m

10

North
2

−2

−4

East
UK − GLS−RK predictions

Q40 : How large are differences between the UK and GLS-RK trend
surface predictions? Where are the largest differences? Explain why
there is a difference. Jump to A40 •

14 Discussion

In this exercise we have compared several methods of predicting an at-


tribute over space, from a set of geo-referenced observations.
Discussion question: In this study area, which of the prediction meth-
ods would you recommend, and why?
Discussion question: For each method introduced, in what situations
would you prefer it to the other methods?

64
15 Answers

A1 : The map of aquifer elevations, along with a map of the elevation of the
land surface, can be used by well-drillers, to estimate the cost of drilling a well
to reach the aquifer at any location. Return to Q1 •

A2 : There are several slots in the object that refer to geographic space:

1. bbox for the bounding box (extreme values of coördinates)


2. coords storing the coördinatesof each observation
3. proj4string for the map projection, not used here

The attribute data are in slot data, which is a data frame, like the original
(non-spatial) dataset. In this case there is only one attribute: the elevation of
the aquifer at the location. Return to Q2 •

A3 : There are 161 observations (wells); for each we know the coördinates (E
and N) and the elevation of aquifer (z); we also have the transformed elevation
in meters and the reduced coördinates. Return to Q3 •

A4 : UTM East from 5.003613 × 105 m . . . 5.744296 × 105 m (range 74.068 km);
UTM North from 4.1502482 × 106 m . . . 4.2483125 × 106 m (range 98.064 km);
total area 7263 km2. Return to Q4 •

A5 : Elevations are from 476 to 623 m.a.s.l., a range of 148 m. Return to Q5 •

A6 : Nearby points tend to be similar; there appears to be trend from E to W,


but there are portions of the map that do not follow this strictly. Return to
Q6 •

A7 : (1) The text postplot has the advantage of showing the actual values,
but it is not very graphical and difficult to read; (2) the size postplot clearly
shows the relative data values; (3) the size and colour postplot gives two ways
to visualize; it seems especially good for seeing the E–W increasing first-order
trend. Return to Q7 •

A8 : The aquifer has a flat surface, tilted towards some direction, by some
regional uplift. In this case, the uplift of the Rocky Mountains about 650 km
to the west has tilted the aquifer. Return to Q8 •

A9 : The trend surface equation is: z = 555 + -1.617135 e + -0.033361 n.


The intercept term gives the estimated aquifer elevation at the centroid of the
area. Then the two coefficients give the change in elevation per unit change of
the target variable. That is, for each km E the elevation decreases by -1.62 m,

65
for each km N it decreases by -0.03 m. The relation is highly-significant; it
explains 94.1% of the variability in the observations; however the N coördinate
is not needed – it is not statistically different from zero. Return to Q9 •

A10 : Residuals range from -25.4 to 16.7 m; compare this to the median
elevation 552.8 m; the maximum calibration error is 4.6%. Return to Q10 •

A11 :

1. No relation between fitted values and residuals; but . . .


2. a slight relation between spread of residuals and the fitted values: a
“cone” shape with a wider spread of residuals at the higher fitted values;
but more seriously . . .
3. the residuals are not normally-distributed, especially in the high tail.
That is, the largest positive residuals (under-predictions) are not as ex-
treme as would be expected. The largest negative residuals (over-predict-
ions) are a bit too extreme.

Conclusion: this OLS fit does not satisfy the assumptions of independent resid-
uals. Return to Q11

A12 : There is a spatial pattern. Large residuals tend to be near each other,
and vice-versa. Positive residuals (above the trend surface) are found almost
exclusively in the middle third of the map. Dependence seems to be stronger
along a SW-NE axis (range about 50 to 70 km) than the NW-SE axis (range about
10 to 20 km). This implies a higher-order trend surface or a periodic surface
superimposed on the linear trend. Return to Q12 •

A13 : The tilted structure has local warping as either a dome or a basin.
Return to Q13 •

A14 : The model explains 97.5% of the variance in the observations, compared
to 94.1% for the first-order significance. Return to Q14 •

A15 : The probability that the higher-order surface is this much better just by
chance is almost zero, so the second-order surface is statistically superior to
the first-order surface. Return to Q15 •

A16 : Residuals range from -19.8 to 14.8 m; compare this to the median ele-
vation 552.8 m; the maximum calibration error is 3.6%. This range is narrower
than for the first-order surface: -25.4 to 16.7 m. Return to Q16 •

A17 :

1. No relation between fitted values and residuals;

66
2. no slight relation between spread of residuals and the fitted values; but
...
3. The residuals are closer to normally-distributed. The largest negative
residuals (over-predictions) are a still too extreme. However, the problem
with the largest positive residuals from the first-order surface has been
solved.

Conclusion: this 2nd-order OLS fit is much closer to being valid than for the
1storder OLS fit. Return to Q17 •

A18 : These residuals form local clusters of positive, negative, and near-zero;
there does not appear to be any overall spatial pattern. So, a higher-order trend
surface is not indicated. Instead, some local interpolation of the residuals
would seem to improve the model. Return to Q18 •

A19 : The fit is generally good but some clusters of points stand out from the
background; their values are not that well matched. Return to Q19 •

A20 : The prediction errors are from 22.3 to 24.4 m; this is about 4.1% of
the predicted value. This much uncertainty in the prediction corresponds to
uncertainty in the expense of drilling a well at the location. Return to Q20 •

A21 : They are least at the centre of gravity of the regression in both E and N;
they increase away from this in both directions; the largest uncertainties are
in the corners of the grid. Return to Q21 •

A22 : The relation with East seems almost linear, and a very tight relation.
However, the relation with North is much more scattered, and seems to have
higher elevations towards the middle of the range. Return to Q22 •

A23 : Extremely well. Return to Q23 •

A24 : The GAM has a much smaller spread of residuals, much more concen-
trated towards zero. Return to Q24

A25 : There is short-range spatial autocorrelation, to about 10 km. This is


about 1/3 the range of the spatial autocorrelation of the OLS trend surface
residuals. Return to Q25 •

A26 : The GAM trend shows local warping. Return to Q26 •

A27 : The fit is very good, the residuals are smaller than for OLS or GLS.

67
A few large negative residuals (over-predictions) are in the south-central and
southeast. Return to Q27 •

A28 : The GAM predicts higher elevations in the NE and especially the SW (up
to 32 m), OLS predicts higher elevations in the NW and SE. The predictions are
the same at the map centroid. Return to Q28 •

A29 : The sill is estimated as 35 m2, there is no nugget, the sill is reached at
a range of about 20 km. Return to Q29 •

A30 : The range parameter of spatial correlation of the exponential model, as


estimated by gls, is 14.4 km. Return to Q30 •

A31 : The GLS coefficients take into account the spatial correlation of the
trend surface residuals, i.e., the fit uses a variance-covariance matrix of the
residuals to adjust the least-squares fit. Return to Q31 •

A32 : The GLS surface has a narrower area of medium values at the N side of
the map, and wider at the S side. The main axes of the 2nd-order are not at the
same angle. Return to Q32 •

A33 : The GLS surface is substantially higher than the OLS surface in the
NW and SE, and substantially lower in the NE and SW. This shows that some
clustered observations affected the OLS fit. Return to Q33 •

A34 : Sill about 38 m2, no nugget, range about 20 km. Return to Q34 •

A35 : Yes, this range parameter here is 14.2 km; the range estimated by gls,
is 14.4 km. These are very close, despite being fit in two very different ways.
Return to Q35 •

A36 : The largest adjusments towards lower aquifer elevations are in a NE-
SW band towards the SE of the map. The largest adjusments towards higher
aquifer elevations are in a large spot at the SW-center side of the map. These
correspond to local warping of the overall aquifer structure at the scale re-
vealed by the variogram model. Return to Q36

A37 : The prediction uncertainty is least near observation points, especially


near clusters of them. It is most away from points. This is because the semi-
variance between observation points and prediction points increases with sep-
aration. Return to Q37

68

A38 : The differences here are mainly because in GLS-RK the kriging of the
residuals finds local highs and lows that can not be captured by the smooth
function of the GAM. Return to Q38 •

A39 : The range parameters are 14.2 (GLS) and 14.2 (OLS), so that the GLS
range is about 40% longer. The partial sills are 43.8 (GLS) and 43.8 (OLS), so
that the GLS sill is about 25% higher. This implies that GLS has removed less of
the local variation in the residuals than OLS, or in other words, OLS incorrectly
removed this variation. Return to Q39 •

A40 : The differences ae quite small, almost all < ±2.5 m, with a range of
≈ -4 . . . 10 m. These are much smaller differences than between GAM and
GLS-RK. Differences are skewed to the positive differences, i.e., UK > GLS-RK.
This difference comes about because UK uses the fitted variogram from the
2nd-order OLS trend residuals, not from the 2nd-order OLS residuals. Because
the two trend surfaces are different, so are the residuals, and so are the fitted
models. Return to Q40 •

69
A Derivation of the OLS solution to the linear model

Recall the equation for OLS from §5.1:

y = Xβ + ε (17)

To solve Equation 17 we need an optimization criterion, i.e., what makes


a particular solution (values of β) better than any other. The obvious
criterion is to minimize the total error (lack of fit) as some function of
ε = y − Xβ; the goodness-of-fit is then measured by the size of this
error. A common way to measure the total error is by the sum of vector
norms; in the simplest case the Euclidean distance from the expected
value, which we take to be 0 in order to have an unbiased estimate. If we
decide that both positive and negative residuals are equally important,
and that larger errors are more serious than smaller, the vector norm is
expressed as the sum of squared errors, which in matrix algebra can be
written as:
S = (y − Xβ)T (y − Xβ) (18)
which expands to

S = yT y − βT XT y − yT Xβ + βT XT Xβ
S = yT y − 2βT XT y + βT XT Xβ (19)

Note: yT Xβ is a 1×1 matrix, i.e., a scalar16 , so it is equivalent to its trans-


pose: yT Xβ = [yT Xβ]T = βT XT y. So we can collected the two identical
1 × 1 matrices (scalars) into one term.

This is minimized by finding the partial derivative with respect the the
unknown coefficients β, setting this equal to 0, and solving:


S = −2XT y + 2XT Xβ
∂βT
0 = −XT y + XT Xβ
(XT X)β = XT y
(XT X)−1 (XT X)β = (XT X)−1 XT y
β̂OLS = (XT X)−1 XT y (20)

which is the OLS solution.

B Standardized residuals

Standardized residuals17 adjust the residuals from a linear regression


model to residuals which should be distributed as N (0, 1) with equal
variance. These can then be compared to residuals drawn from that
theoretical distribution, for example in a quantile-quantile (“QQ”) plot of
the standardized residuals.
16
The dimensions of the matrix multiplication are (1 × n)(n × p)(p × 1)
17
This is the term used by plot.lm; some authors call this the “studentized” residuals.

70
p
The standardized residuals are computed as ri /(s · 1 − hii ), where ri
are the unstandardized residuals, s is the sample standard deviation of
the residuals, and the hii are the diagonal entries of the so-called “hat”
−1
matrix V = X(X0 X) X0 .
The sample standard deviation of the residuals s is computed as the
square root of the estimated variance of the random error:
s
1 X
s= · ri2
(n − p)
where n is the number of observations and p the number of predictors.
It is shown in the linear model summary as “Residual standard error”;
it can be extracted as summary(model_name)$sigma. This is an overall
measure of the variability of the residuals, and so can be used to stan-
dardize the residuals to N (0, 1).
The “hat” matrix V is another way to look at linear regression. This
matrix multiplies the observed values to compute the fitted values. The
hat value for an observation gives the overall leverage (i.e.,
p importance
when computing the fit) of that observation. So the term 1 − hii in the
denominator shows that with low influence (small hii ) the ratio ri /s (a
simple standardization) is not affected much, but with a high influence
(large hii ) the denominator is smaller and so the standardized residual
is increased. Thus the standardized residuals are higher for points with
high influence on the regression coefficients.

C Theory of thin-plate splines

Hastie et al. [5, §5.7] explains the mathematics of multi-dimensional


smoothing splines. A more thorough mathematical treatment is given
by Wood [19] and Mitasova and Mitas [13]; these are developments from
the “minimum curvature” methods of Briggs [1]. Applications include
Hutchinson [7] and Mitasova and Hofierka [12].
Fitting a TPS depends on the k data points with known coördinates and
attribute values. They can be described by 2(k + 3) parameters, six of
which are overall affine transformation parameters (to center the func-
tion in 2D) and 2k of which link to the control points.
The general method is to minimize the residual sum of squares (RSS) of
the fitted function, subject to a constraint that the function be “smooth”
in some sense; this is expressed by a roughness penalty which balances
the fit to the observations with smoothness. This is a minimization prob-
lem. If xi is one point in 2D space (i.e., it has two coördinates) and yi is
the attribute value at the same points, the aim is to minimize:
N
X
min {yi − f (xi )}2 + λJ[f ] (21)
f
i=1

where J is the penalty function and λ controls how important it is; λ = 0


means there is no roughness penalty and the data will be fit exactly; as

71
λ → ω the solution approximates the least-squares plane, i.e., the trend
surface averaged over all the points.
In 2D an appropriate penalty is:

2 f (x) 2 2 f (x) 2
 ! ! !2 
∂ ∂ ∂ 2 f (x)
Z Z
J[f ] =  +2 +  dx 1 dx 2 (22)
R R ∂x12 ∂x1 ∂x2 ∂x22

where (x1 , x2 ) are the two coördinates of the vector x. In practice the
double integral is discretized over some grid known as knots; these
may be defined by the observations or may be a different set, maybe
an evenly-spaced grid.
This penalty can be interpreted as the “bending energy” of a thin plate
represented by the function f (x); by minimizing this energy the spline
function in over the 2D plane is a thin (flexible) plate which, according
to the first term of Equation 21 would be forced to pass through data
points, with minimum bending. However the second term of Equation
21 allows some smoothing: the plate does not have to bend so much,
since it is allowed to pass “close to” but not necessarily through the data
points. The higher the λ, the less exact is the fit.
This has two purposes: (1) it allows for measurement error; the data
points are not taken as exact; (2) it results in a smoother surface. So
cross-validation is used to determine the degree of smoothness.
The solution to Equation 22 is a linear function:

N
X
f (x) = β0 + βT x + αj hj (x) (23)
j=1

where the β account for the overall trend and the α are the coefficients
of the warping.
The set of functions hj (x) is the basis kernel, also called a radial basis
function (RBF), for thin-plate splines:

hj (x) = kx − xj k2 log kx − xj k (24)

where the norm distance r = kx−xj k is also called the radius of the basis
function. The norm is usually the Euclidean (straight-line) distance.

D Theory of GLS and REML

Here we present the theory of Generalized Least Squares (GLS) estimation


of the parameters of the linear model (§D.1), the specific case of GLS
with spatially-correlated residuals (§D.2), as well as the estimation of the
covariance structure of the linear model residuals (§D.3).

72
D.1 GLS

The difference between Generalized Least Squares (GLS) and Ordinary


Least Squares (OLS) solutions to the linear model is the that in the linear
model fit by OLS, the residuals ε are assumed to be independently and
identically distributed with the same variance σ 2 :

y = Xβ + ε, ε ∼ N (0, σ 2 I) (25)

Whereas, in GLS the residuals are themselves considered to be a random


variable η that has a covariance structure:

y = Xβ + η, η ∼ N (0, V) (26)

where V is a positive-definite variance-covariance matrix of the model


residuals.
In modelling terminology, the coefficients β are called fixed effects, be-
cause their effect on the response variable is fixed once the parameters
are known. By contrast the covariance parameters η are called random
effects, because their effect on the response variable is stochastic, de-
pending on a random variable with these parameters. Models with the
form of Equation 26 are called mixed models: some effects are fixed (in
the example of this tutorial, the relation between coördinates and the
aquifer elevation) and others are random (here, the error variances) but
follow a known structure. These models have many applications and are
extensively discussed in Pinheiro and Bates [16].
Lark and Cullis [10, Appendix] point out that the error vectors can now
not be assumed to be spherically distributed in feature space around the
0 expected value, but rather that error vectors in some directions are
longer than in others. So, the measure of distance (the vector norm) is
now a so-called “generalized” distance18 , taking into account the covari-
ance between error vectors:

S = (y − Xβ)T V−1 (y − Xβ) (27)

The OLS equivalent is simpler:

S = (y − Xβ)T (y − Xβ) (28)

Comparing these equations, we see that the GLS formulation of Equation


27 includes the variance-covariance matrix of the residuals V = σ 2 C,
where σ 2 is the variance of the residuals and C is the correlation ma-
trix. This reduces to the OLS formulation of Equation 28 if there is no
covariance, i.e., V = I.
The covariances can be based on any relation, for example, time depen-
dence (e.g., correlation between measurements that are close in time),
18
This is closely related to the Mahalanobis distance

73
repeated measures of the same object, or spatial dependence (the case
here, see next sub-section for details).
Expanding Equation 27, taking the partial derivative with respect to the
parameters, setting equal to zero and solving we obtain:


S = −2XT V−1 y + 2XT V−1 Xβ
∂β
0 = −XT V−1 y + XT V−1 Xβ
β̂GLS = (XT V−1 X)−1 XT V−1 y (29)

This reduces to the OLS estimate β̂OLS if there is no covariance, i.e., V = I.

D.2 GLS with spatially-correlated residuals

In the case that the residuals are spatially-correlated, the covariances


(off-diagonals) in the V matrix are typically based on the distance be-
tween observations, using some model of spatial correlation. We en-
sure positive-definiteness (i.e., always a real-valued solution) by using
an authorized spatial covariance function C and assuming that the en-
tries are completely determined by the vector distance19 between points
xi − xj :
Ci,j = C(xi − xj ) (30)

In this formulation C has a three-parameter vector θ, as does the corre-


sponding variogram model: the range a, the total sill σ 2 , and the pro-
portion of total sill due to pure error, not spatial correlation s 20 .
Here the random effect η represents both the spatial structure of the
residuals from the fixed-effects model, and the unexplainable (short-
range) noise. This latter corresponds to the noise σ 2 of the linear model
of Equation 25.
To solve Equation 29 we first need to compute V, i.e., estimate the co-
variance parameters θ = [σ 2 , s, a], use these to compute C with equation
30 and from this V, after which we can use equation 29 to estimate the
fixed effects β. But θ is estimated from the residuals of the fixed-effects
regression, which has not yet been computed. How can this “chicken-
and-egg”21 computation be solved?
The answer is to use residual (sometimes called “restricted”) maximum
likelihood (REML) to maximize the likelihood of the random effects θ
independently of the fixed effects β.
Here we fit the fixed effects (regression coefficients) at the same time as
we estimate the spatial correlation.
19
so this distance measure can take into account angles, i.e., anisotropy
20
In variogram terms, this is the nugget variance c0 as a proportion of the total sill
(c0 + c1 ).
21
from the question “which came first, the chicken or the egg?”

74
Lark and Cullis [10, Eq. 12] show that the likelihood of the parameters
in Equation 25 can be expanded to include the spatial dependence im-
plicit in the variance-covariance matrix V, rather than a single residual
variance σ 2 . The log-likelihood is then:

1 1
`(β, θ|y) = c − log |V| − (y − Xβ)T V−1 (y − Xβ) (31)
2 2
where c is a constant (and so does not vary with the parameters) and V
is built from the variance parameters θ and the distances between the
observations. By assuming second-order stationarity22 , the structure
can be summarized by the covariance parameters θ = [σ 2 , s, a], i.e., the
total sill, nugget proportion, and range.
However, maximizing this likelihood for the random-effects covariance
parameters θ also requires maximizing in terms of the fixed-effects re-
gression parameters β, which in this context are called nuisance parame-
ters since at this point we don’t care about their values; we will compute
them after determining the covariance structure.

D.3 REML estimation of the covariance parameters

Both the covariance and the nuisance parameters β must be estimated,


it seems at the same time (“chicken and egg” problem) but in fact the
technique of REML can be used to first estimate θ without having to
know the nuisance parameters. Then we can use these to compute C
with equation 30 and from this V, after which we can use equation 29 to
estimate the fixed effects β.
The maximum likelihood estimate of θ is thus called “restricted”, be-
cause it only estimates the covariance parameters (random effects). Con-
ceptually, REML estimation of the covariance parameters θ is ML estima-
tion of both these and the nuisance parameters β, with the later inte-
grated out: Z
`(θ|y) = `(β, θ|y) dβ (32)

Pinheiro and Bates [16, §2.2.5] show how this is achieved, given a likeli-
hood function, by a change of variable to a statistic sufficient for β.

22
that is, the covariance structure is the same over the entire field, and only depends
on the vector distance between pairs of points

75
References
[1] I. C. Briggs. Machine contouring using minimum curvature. Geo-
physics, 39(1):39–48, January 1974. ISSN 0016-8033, 1942-2156.
doi: 10.1190/1.1440410. 73
[2] N Cressie. Statistics for spatial data. John Wiley & Sons, revised
edition, 1993. 46
[3] J. C. Davis. Statistics and data analysis in geology. John Wiley &
Sons, New York, 3rd edition, 2002. 2, 4, 21
[4] J Fox. Applied regression, linear models, and related methods. Sage,
Newbury Park, 1997. 45
[5] T Hastie, R Tibshirani, and J H Friedman. The elements of statis-
tical learning data mining, inference, and prediction. Springer se-
ries in statistics. Springer, New York, 2nd ed edition, 2009. ISBN
9780387848587. 27, 73
[6] Jan M. Hoem. The reporting of statistical significance in scientific
journals: A reflexion. Demographic Research, 18(15):437–442, Jun
2008. ISSN 1435-9871. doi: 10.4054/DemRes.2008.18.15. 2
[7] M. F. Hutchinson. Interpolating mean rainfall using thin plate
smoothing splines. International Journal of Geographical Informa-
tion Science, 9(4):385 – 403, 1995. 73
[8] R Ihaka and R Gentleman. R: A language for data analysis and graph-
ics. Journal of Computational and Graphical Statistics, 5(3):299–314,
1996. 1
[9] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshi-
rani. An introduction to statistical learning: with applications in R.
Number 103 in Springer texts in statistics. Springer, 2013. ISBN
9781461471370. 27
[10] R. M. Lark and B. R. Cullis. Model based analysis using REML for in-
ference from systematically sampled data on soil. European Journal
of Soil Science, 55(4):799–813, 2004. 75, 77
[11] Virginia L. McGuire. Water-level and recoverable water in storage
changes, High Plains aquifer, predevelopment to 2015 and 2013–15.
USGS Numbered Series 2017-5040, U.S. Geological Survey, Reston,
VA, 2017. 5
[12] H. Mitasova and J. Hofierka. Interpolation by regularized spline with
tension: II. Application to terrain modeling and surface geometry
analysis. Mathematical Geology, 25(6):657–669, 1993. doi: 10.1007/
BF00893172. 73
[13] H. Mitasova and L. Mitas. Interpolation by regularized spline with
tension: I. Theory and implementation. Mathematical Geology, 25
(6):641–655, 1993. doi: 10.1007/BF00893171. 73

76
[14] R. A. Olea and J. C. Davis. Sampling analysis and mapping of water
levels in the High Plains aquifer of Kansas. Technical Report KGS
Open File Report 1999-11, Kansas Geological Survey, May 1999. URL
http://www.kgs.ku.edu/Hydro/Levels/OFR99_11/. 2, 3
[15] R. A. Olea and J. C. Davis. Optimization of the high plains aquifer
water-level observation network. Technical Report KGS Open File
Report 1999-15, Kansas Geological Survey, May 1999. URL http:
//www.kgs.ku.edu/Hydro/Levels/OFR99_15/. 2
[16] J C Pinheiro and D M Bates. Mixed-effects models in S and S-PLUS.
Springer, 2000. ISBN 0387989579. 75, 77
[17] R Development Core Team. R Data Import/Export. The R Foundation
for Statistical Computing, version 3.6.2 (2019-12-12) edition, 2015.
URL http://cran.r-project.org/doc/manuals/R-data.pdf. 5
[18] W N Venables, D M Smith, and R Development Core Team. An in-
troduction to R; notes on R: a programming environment for data
analysis and graphics; Version 3.6.2. R Foundation for Statisti-
cal Computing, Dec 2019. ISBN ISBN 3-900051-12-7. URL http:
//www.R-project.org. 1
[19] S. N. Wood. Thin plate regression splines. Journal of the Royal
Statistical Society Series B-Statistical Methodology, 65:95–114, 2003.
doi: 10.1111/1467-9868.00374. 73

77

You might also like