ex_TrendSurface
ex_TrendSurface
D G Rossiter
Cornell University, Section of Soil & Crop Sciences
ISRIC–World Soil Information
W¬ 'f0 ffb
Contents
1 Introduction 1
3 Dataset 2
3.1 Loading and adjusting the dataset . . . . . . . . . . . . . . . . 3
3.2 Making a spatial object . . . . . . . . . . . . . . . . . . . . . . . 6
4 Exploratory analysis 8
12 GLS-Regression Kriging 58
14 Discussion 66
15 Answers 67
B Standardized residuals 72
References 78
ii
1 Introduction
This exercise shows how to compute trend surfaces using the R environ-
ment for statistical computing [8, 18].
A trend surface is a map of some continuous variable, computed as a
function of the coördinates. This corresponds to the concept of a geo-
graphic trend, where the variable changes its value along a geographic
gradient.
The trend can be modelled as a linear trend, i.e., the variable increases
or decreases a fixed amount for each unit change in the coördinates in
some direction. This is called a first-order trend surface (§5.1). It can
also be modelled as a polynomial trend, i.e., a linear model of some poly-
nomials of the coördinates, for example, a quadratic, which is called a
second-order trend surface (§5.2). It can also be modelled as an empirical
smooth function of the coördinates, for example a generalized additive
model (§7 ) or a minimum-curvature surface (thin-plate spline) (§8).
The residuals from any of the above approaches may have spatial struc-
ture (§9). This has two implications:
1. The OLS fit may not br optimal, and a Generalized Least Squares
(GLS) trend should be fit (§10).
2. The OLS or GLS trend surfaces can be modified by (1) interpolating
the residuals from the trend-surface fit (§11) and (2) adding these
to the trend.
3. The trend and local deviations can be modelled together with Uni-
versal Kriging (UK) (§13).
In this exercise we compare these different approaches.
Note: You can also load this via checkboxes in the RStudio “Packages”
pane.
1
https://www.rstudio.org
1
The require or library functions are used to load R packages.
require(sp)
require(gstat)
require(lattice)
R has many options, which can be listed with the options function. Here
we use this function to change the default option of showing the so-
called “significance stars” in model summaries. These stars (*, ** , ***)
for various levels of the “significance level”2 α have been widely criti-
cized because they are a lazy way to assess the success of models and
the importance of predictors.3
options(show.signif.stars=FALSE)
3 Dataset
Figure 1 is taken from the original report [14]. It shows the location of
wells, the boundary of the aquifer, and the well IDs. The example dataset
uses a small portion of this, in the SE corner of the study area7 . Figure
2
interpreted as the probability of incorrectly rejecting a true null hypothesis of no
effect
3
This is part of a major debate about how statistics should be used to draw conclu-
sions about the “real world”, see for example [6].
4
The datasets for this book are available at http://www.kgs.ku.edu/Mathgeo/
Books/Stat/index.html
5
http://www.kgs.ku.edu
6
http://www.kgs.ku.edu/Magellan/WaterLevels/
7
portions of Pratt, Kingman, Stafford and Reno counties
2
2 is a Google Earth view of part of the study area, with the location of
several of the wells as placemarks.
Figure 1: Location of aquifer monitoring wells, SE Kansas (USA). Source: [14], plate 1
3
Figure 2: Google Earth view of part of the study area, with the location of several of the
wells as placemarks
You can view this file from within RStudio, by opening it from the Files
pane.
The first few lines look like this:
UTM easting UTM northing Water Table, ft.
569464.5 4172114.75 1627.66
573151.25 4167192.75 1588.83
559973.94 4169585 1675.72
553514.44 4174584.5 1689.52
The field names are self-explanatory.
The Coördinate Reference System (CRS) is not specified, although we can
guess from the field headers that the projection is Universal Transmer-
cator (UTM). The UTM zone is 14N (see Davis [3, Fig. 5-100 caption]) and
the coördinates are defined in the UTM system as meters North from the
equator and East from a false origin of 500 000 at the central meridian
of the UTM zone. For zone 14 this is 99° W.
However to fully know the CRS, we must know the datum on which the
projection is developed. This is not explained in the several Kansas Ge-
ological Survey reports; however a more comprehensive report covering
4
the entire High Plains [11], states that the datum is the North American
Datum of 1983 (NAD 83).
The aquifer elevation is in US feet9 above mean sea level according to an
unspecified vertical datum (probably NAVD 88).
Task 5 : Read text file AQUIFER.TXT into an R data frame, rename the
columns to shorter names, and examine its structure. •
The read.table function can read many kinds of tabular data. It has
many arguments, to adjust to different text formats. See the R Data
Import/Export Manual [17] for details.
By default the data fields in the text file are assumed to be separated by
white space (tabs, spaces), as is the case here. Another optional argument
is skip; we use it here because the header line of AQUIFER.TXT has more
spaces than the other lines, so if we try to use the header for the variable
names, R thinks the other lines are incomplete. One solution would be
to place quotes around the variable names, or rename the variables, in
the text file. What we do here is skip the first line and assign variable
names ourselves in R.
We name the R data frame aq:
aq <- read.table("AQUIFER.TXT", skip=1)
str(aq)
Task 6 : Convert the elevation in feet above sea level to elevation in me-
ters above sea level (m.a.s.l.), and add it as a new field in the dataframe.
•
ft.to.m <- 0.3048
aq$zm <- aq$z * ft.to.m
Second, the E and N coördinates give the location in UTM zone is 14N,
but for numerical stability it’s useful to reduce these to local coördinates,
9
1 foot = 0.3048 m exactly
5
with the (0, 0) point in the middle of the range, and because the numbers
are large, convert to km. This will make the equations easier to read.
For this section we need to make the dataset into an explicitly spatial
data structure. A spatial object, for the sp package, is one that has
explicit coördinates. The aq dataframe does have coördinates, but “hid-
den” as attributes. These in fact have a special status. To continue the
analysis, we identify these explicitly as being spatial.
str(aq)
str(aq.sp)
6
This structure display is quite different from the previous one. The
object now is of class SpatialPointsDataFrame and has five slots,
marked with the @ symbol.
The information in the original dataframe is now clearly split into two
kinds:
Note: The structure shows the CRS as the proj4string field; this refers
to the PROJ10 generic coordinate transformation software specifications
of CRS. We can display this directly with the proj4string function.
proj4string(aq.sp)
## [1] NA
This is now listed as NA, “not available”, because there was no CRS infor-
mation in the text file from which we obtained the coördinates. In the
analysis of this tutorial we do not need to specify a CRS, since we just
work with the numbers as independent (predictor) variables.
However, if we want to later use with other spatial data we should specify
the CRS. Above we determined the CRS is UTM14N on the NAD83 datum.
To specify this we find the EPSG code for this CRS in the EPSG database11
and discover this is code 26914. See Figure 3; note that UTM14N has been
defined on several datums.
Then we update the CRS using the CRS function, with a string describing
the CRS, in this case an initialization based on the EPSG code.
proj4string(aq.sp) <- CRS("+init=epsg:26914")
print(proj4string(aq.sp))
We’ve done some work to get this data set into proper form for spatial
analysis; so we save it in this format.
This can be read into a later R session with the load method.
10
https://proj.org/
11
https://epsg.org/
7
Figure 3: Finding a code in the EPSG database
4 Exploratory analysis
As with any unfamiliar dataset, the first step is to examine its contents.
In the case of spatially-explicit datasets, that includes visualizing its ge-
ography.
Q4 : What are the geographic limits of the study area? What is its area,
in km2? Jump to A4 •
The range function computes the range of numeric variable; the diff
function computes the difference between two numeric values.
8
range(aq$UTM.E)
range(aq$UTM.N)
## [1] 7263.444
From the spatial version of the dataset we can compute the bounding
box with the bbox function.
bbox(aq.sp)
## min max
## e -33.00497 41.06325
## n -46.99025 51.07400
Task 11 : Find the location of this sample area in the large study area,
shown in Fig. 1. •
Task 12 : Display a text postplot of the data values, showing the eleva-
tions, rounded to the nearest foot, as text labels centred at the observa-
tion point. •
We use the two coördinates as plot axes, so this looks like a map:
plot(aq$UTM.N ~ aq$UTM.E, pch=20, cex=0.2, col="blue", asp=1,
xlab="UTM 14N E", ylab="UTM 14N N")
grid()
text(aq$UTM.E, aq$UTM.N, round(aq$zm), adj=c(0.5,0.5))
title("Elevation of aquifer, m")
9
The aquifer elevation is clearly higher in the west (towards the Rocky
Mountains about 650 km to the west, where it outcrops).
10
Note: Print character (pch) 21 has both a symbol (col) and fill (bg) colour.
Task 14 : Display a graphical postplot of the data values, with size and
colour proportional to the data value •
Notice the use of the rank function to give the rank order of the eleva-
tions; these are then used as indices into a vector of colours, created with
the bpy.colors function, of the same length as the vector of elevation
values.
The ~ formula operator show the functional relation between two vari-
ables; here it is the North coördinate for the y-axis, depending on the
East coördinate for the x-axis.
plot(aq$UTM.N ~ aq$UTM.E, pch=21,
xlab="UTM 14N E", ylab="UTM 14N N",
bg=bpy.colors(length(aq$zm))[rank(aq$zm)],
cex=1.8*aq$zm/max(aq$zm), asp=1)
grid()
title("Elevation of aquifer, m.a.s.l.")
11
Q6 : Describe the spatial pattern of the elevations. Do nearby points
have similar values? Is there a trend across the whole area? Are there
local exceptions to the trend? Jump to A6 •
12
A trend surface has the same form as a standard linear model, using
the coördinates as regression predictors. The first-order trend surface
model has the form:
z = β0 + β1 E + β2 N + ε (1)
where ε ∼ N (0, σ 2 ), i.e., independently and normally distributed. This
assumption allows us to fit the trend surface with Ordinary Least Squares
(OLS).
In the linear model, with any number of predictors, there is a n × p
design matrix of predictor values usually written as X, with one row per
observation (data point), i.e., n rows, and one column per predictor, i.e.,
p columns. In the first-order trend surface case, it is a n × 3 matrix with
three columns: (1) a column of 1 representing the intercept, to center
the response, (2) a column of predictor values ei from the Easting, and
(3) a column of predictor values ni from the Northing.
The predictand (response variable), here the aquifer elevation is a n × 1
column vector y, one row per observation. The coefficient vector β is a
p × 1 column vector, i.e., one row per predictor (here, 3). This multiplies
the design matrix to produce the response:12
y = Xβ + ε (2)
For this dataset with many observations well-spread in space, the result
will be similar to the OLS estimate.
13
model.ts1 <- lm(zm ~ n + e, data=aq)
summary(model.ts1)
##
## Call:
## lm(formula = zm ~ n + e, data = aq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.3550 -5.8267 0.2674 7.1062 16.7349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 554.77509 0.68478 810.15 <2e-16
## n -0.03336 0.02528 -1.32 0.189
## e -1.61714 0.03201 -50.51 <2e-16
##
## Residual standard error: 8.629 on 158 degrees of freedom
## Multiple R-squared: 0.9417,Adjusted R-squared: 0.941
## F-statistic: 1276 on 2 and 158 DF, p-value: < 2.2e-16
Q9 : What is the equation of the trend surface? How does elevation vary
with the E and N coördinates? Is the relation statistically-significant?
How much of the total variability does it explain? Are all the coefficients
statistically-significant? Jump to A9 •
Task 16 : Summarize the residuals (lack of fit) from the trend surface
both numerically and graphically, in feature space. Express this in terms
of the median elevation. •
The residuals function extracts the residuals from a linear model ob-
ject. The hist function displays a histogram of a numeric vector.
res.ts1 <- residuals(model.ts1); summary(res.ts1)
max(abs(res.ts1))/median(aq$zm)*100
## [1] 4.586769
14
Q10 : What is the range of residuals? How does this compare with
the target variable? How are the residuaks distributed in feature space?
Jump to A10 •
15
par(mfrow=c(1,2))
plot(model.ts1, which=1:2)
par(mfrow=c(1,1))
Q11 : Does this model meet the feature-space requirements for a valid
linear model?
1. No relation between the fitted values and the residuals;
2. Equal spread of residuals across the range of fitted values;
3. Normally-distributed standardized residuals.
Jump to A11 •
Note: In the following code, the expression for the cex “character ex-
pansion” optional argument sets the size of the circle as each residual’s
proportion of the maximum residual, so that the larger absolute values
of the residuals show larger circles. This way we can visualize where are
the largest over- and under-predictions. The ifelse statement applied to
the col “color” optional argument sets the color of the circle according
to whether the residual is positive (under-prediction) or negative (over-
prediction).
plot(aq$n ~ aq$e, cex=3*abs(res.ts1)/max(abs(res.ts1)),
col=ifelse(res.ts1 > 0, "green", "red"),
xlab="E", ylab="N",
main="Residuals from 1st-order trend",
sub="Positive: green; negative: red", asp=1)
grid()
16
Q12 : Is there a spatial pattern to the residuals? Is there local spatial
correlation without an overall pattern? What does this imply about the
suitability of a first-order trend surface? Jump to A12 •
We see from the pattern of residuals from the first-order surface that
there is still structure, in particular clear bands of positive and negative
residuals. These suggest that a higher-order trend surface might fit bet-
ter. A higher-order trend might also fix the problems with the regression
diagnostics.
A second-order trend includes linear and quadratic (squared) functions
of the coördinates.
A full second-order surface uses the coördinates, their squares, and their
cross-products.
z = β0 + β1 E + β2 N + β3 E 2 + β4 N 2 + β5 (E ∗ N) + ε (4)
17
model.ts2 <- lm(zm ~ n + e + I(n^2) + I(e^2) + I(e*n),
data=aq)
summary(model.ts2)
##
## Call:
## lm(formula = zm ~ n + e + I(n^2) + I(e^2) + I(e * n), data = aq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.847 -3.366 0.822 3.538 14.807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.611e+02 7.999e-01 701.411 < 2e-16
## n -1.655e-02 1.664e-02 -0.995 0.321
## e -1.621e+00 2.212e-02 -73.274 < 2e-16
## I(n^2) -7.500e-03 6.435e-04 -11.655 < 2e-16
## I(e^2) -1.648e-03 1.074e-03 -1.534 0.127
## I(e * n) 6.700e-03 7.781e-04 8.610 7.74e-15
##
## Residual standard error: 5.598 on 155 degrees of freedom
## Multiple R-squared: 0.9759,Adjusted R-squared: 0.9751
## F-statistic: 1256 on 5 and 155 DF, p-value: < 2.2e-16
Note: The I “identity” function must be used for the squares and cross-
product terms, because the ^ and * symbols represent the usual mathe-
matical operators.
Q14 : How much of the variance does the second-order surface explain?
Jump to A14 •
14
with more predictors, and thus fewer degrees of freedom
18
## Model 1: zm ~ n + e
## Model 2: zm ~ n + e + I(n^2) + I(e^2) + I(e * n)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 158 11763.5
## 2 155 4858.2 3 6905.3 73.438 < 2.2e-16
hist(res.ts2); rug(res.ts2)
max(abs(res.ts2))/median(aq$zm)
## [1] 0.03590352
Q16 : What is the range of residuals? How does this compare with the
target variable? How are they distributed in feature space? How do these
compare with the residuals from the first-order surface? Jump to A16 •
Task 22 : Show the diagnostic plots of the residuals, as for the first-
order trend surface residuals. •
19
par(mfrow=c(1,2))
plot(model.ts2, which=1:2)
par(mfrow=c(1,1))
Q17 : Does this model meet the feature-space requirements for a valid
linear model? How do these diagnostics compare to those from the first-
order surface?
1. No relation between the fitted values and the residuals;
2. Equal spread of residuals across the range of fitted values;
3. Normally-distributed standardized residuals.
Jump to A17 •
20
Q18 : Is there an overall pattern to the residuals? Is there local spa-
tial correlation without an overall pattern? Does there seem to be any
anisotropy (stronger spatial dependence in one direction than the or-
thogonal direction)? Jump to A18
•
Since this second-order trend surface is much better than the first-order
trend surface, we will use it for subsequent modelling.
We now use the trend surface model of the previous section to predict
over the study area, discretized as an interpolation grid at some resolu-
tion that we choose.
Note: The choice of grid resolution depends on (1) the support of the
observations used to model the trend surface, (2) the resolution needed
by the map user; (3) for larger areas, computer memory and processing
time.
21
Here the observations are essentially point support, so there is no mini-
mum grid size. The map user will use this to decide on whether to drill
a well into the aquifer, based on cost which depends on the depth from
the surface. In this area the surface elevation is quite uniform and does
not vary much. Also, the aquifer in this area does not have sharp changes
in structure, it is gently dipping to the E, with a slight doming. We know
from the 1st-order OLS model that for each km E (direction of maximum
dip) the aquifer elevation decreases by 554.78 m, which we suppose is
hardly significant to the well driller. Indeed, a coarser grid would proba-
bly be sufficient.
Then use these, rounded below and above to the nearest kilometer, as
the limits of the two axes:
seq.e <- seq(-33, 42, by=1)
seq.n <- seq(-47, 52, by=1)
0
−20
−40
−40 −20 0 20 40 60
grid$e
22
Compute both the best fit and a 95% prediction interval for each point
on the grid. •
The predict.lm function, applied to a linear model object, computes
the predicted values at new locations, in this case the regular grid. The
optional interval argument specifies that a prediction interval, as well
as the best fits, should also be computed. The optional level argu-
ment specifies the (1 − α) probability, where α is the probability that, on
repeated calculation from a similar sample, the true value at the point
would not be included in the computed prediction interval.
pred.ts2 <- predict.lm(model.ts2, newdata=grid,
interval="prediction", level=0.95)
summary(pred.ts2)
The predict.lm produces three fields in the resulting object: fit (the
best fit value), lwr (the value at the lowest 2.5% limit) and upr (the value
at the upper 2.5% limit).
The prediction interval is a range in which future observations are ex-
pected to fall, with a given probability specified by the analyst. It is
based on the known observations and the regression model..
There are two sources of prediction error:
1. The uncertainty of fitting the best regression parameters from the
available data;
2. The uncertainty in the prediction, even with perfect regression pa-
rameters, because of uncertainty in the process which is revealed
by the regression, i.e., the inherent noise in the process.
The prediction interval is computed from the prediction variance, which
is then assumed to represent the variance of a t-distribution.
The prediction variance sY20 for predictand x0 depends on the variance of
the regression sY2 .x but also on the distance of the predictor x0 from the
value of the predictor at the centroid of the regression, x. The further
from the centroid, the more any error in estimating the slope of the line
will affect the prediction:
1 (x0 − x)2
2 2
sY0 = sY .x 1 + + n P
2
(5)
n i=1 (xi − x)
where x refers to both coördinates.
The variance of the regression sY2 .x is computed from the squared devia-
tions of actual (yi ) and estimated (yb i values:
n
1 X
sY2 .x = b i )2
(yi − y (6)
n − 2 i=1
23
To display a map of the interpolated surface, it’s easiest to format the
grid as a spatial object, so that the plotting method spplot‘spatial plot”
can be used.
summary(sp.grid)
Task 27 : Display the best-fit interpolation, with the data points super-
imposed. •
The spplot “spatial plot” method plots spatial objects, i.e., those in one
of the sp classes.
The fit field of the prediction object contains the trend surface fits.
We save this plot for comparison later with the Generalized Least Squares
(GLS) trend surface (§10).
ts.plot.breaks <- seq(440, 640, by=5)
p.ols <- spplot(sp.grid, zcol="fit",
sub="2nd-order trend, OLS fit",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
24
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(res.ts2 < 0, "red", "black"),
cex=2*abs(res.ts2)/max(abs(res.ts2)))
})
print(p.ols)
600
550
North
500
450
East
2nd−order trend, OLS fit
In this plot the residual from the model at each observation point is
shown (1) in colour: red = negative (actual < predicted), black = positive
(actual > predicted). If a prediction is exactly on the trend surface it will
not appear. This gives a nice visualization of the fit of the trend surface
to the sample points.
Note: The spplot method in the sp package makes use of the levelplot
method of the lattice graphics package. Unlike base graphics, in lattice
all plotting must be done at once; you can’t start a plot and add more
later. Graphical elements are added with a panel function, introduced
with the panel argument. This function may contain many methods to
draw graphic elements. In this case there is panel.levelplot to draw
the levelplot (trend surface) and panel.points to place a set of points
on top of it.
Q19 : How well does the trend surface fit the points? Are there obvious
problems? Jump to A19 •
25
summary(sp.grid$lwr)
summary(100*sp.grid$diff.range/sp.grid$fit)
The lwr “lower” and upr “upper” fields of the prediction object contain
the lower and upper limits of the 95% prediction interval for each point
on the grid. Their difference is the range of uncertainty; this divided by
the fit is an approximation to 2 standard deviations.
Note: Here we only show the locations of the observations, because their
values do not affect the prediction variance, once the model is fit.
26
Q20 : What are the units of prediction interval? How large are they?
How does this compare to the variable we are trying to predict? Jump
to A20 •
27
Task 30 : Display a scatterplot of the three predictors against the
annual GDD50, with an empirical smoother. •
We use the ggplot2 graphics package to produce the scatterplot and
show a smoother with standard error. A simple way to visualize the
trend is with a local polynomial regression, provided with the loess
function, and incorporated into the scatterplot with the geom_smooth
function.
Note: The loess function has an span argument, which controls the
degree of smoothing by setting the neighbourhood for the local fit as a
proportion of the number of points. The default span=0.75 thus uses
the 3/4 of the total points closest each point. These are then weighted so
that closer points have more weight; see ?loess for details. The default
works well in most situations, and here we only want a visual impression,
not a “best fit” in a statistical sense.
require(ggplot2)
g1 <- ggplot(aq, aes(x=UTM.N, y=zm)) +
geom_point() +
geom_smooth(method="loess")
g2 <- ggplot(aq, aes(x=UTM.E, y=zm)) +
geom_point() +
geom_smooth(method="loess")
require(gridExtra)
grid.arrange(g1, g2, ncol = 2)
28
Jump to A22 •
GAM can be fit in R with the gam function of the mgcv “Mixed GAM Com-
putation Vehicle” package. This specifies the model with a formula, as
with lm, but terms can now be arbitrary functions of predictor variables,
not just the variables themselves or simple transformations that apply
to the whole range of the variable, e.g. sqrt or log. Smooth functions
of one or more variables are specified with the s function of the mgcv
package.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## zm ~ s(E, N)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 551.0134 0.2817 1956 <2e-16
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(E,N) 27.35 28.83 542.2 <2e-16
##
## R-sq.(adj) = 0.99 Deviance explained = 99.2%
## GCV = 15.506 Scale est. = 12.776 n = 161
Q23 : How well does this model fit the calibration observations? Jump
to A23 •
29
Task 33 : Compare the residuals from the GAM with those from the
best linear trend surface, i.e., the quadratic. •
summary(residuals(model.gam))
summary(residuals(model.ts2))
par(mfrow=c(1,2))
hist(residuals(model.gam), xlim=c(-20, 20),
breaks=seq(-20, 20,by=4), main="Residuals from GLS")
rug(residuals(model.gam))
hist(residuals(model.ts2), xlim=c(-20, 20),
breaks=seq(-20,20,by=4), main="Residuals from 2nd-order OLS trend")
rug(residuals(model.ts2))
par(mfrow=c(1,1))
30
vr <- variogram(resid.model.gam ~ 1, loc=aq.sp)
plot(vr, pl=T)
Q25 : Does there appear to be any local spatial correlation of the resid-
uals? Does the empirical variogram support your conclusion? Jump to
A25 •
The plot.gam function of the mgcv package displays the marginal smooth
fit. For the 2D surface (model term s(E,N), this is shown as a wireframe
plot if the optional scheme argument is set to 1. The select argument
selects which model term to display. We orient it to see lowest elevation
towards viewer, using the theta argument:
plot.gam(model.gam, rug=T, se=T, select=1,
scheme=1, theta=30+130, phi=30)
31
Q26 : How does the GAM trend differ from a polynomial trend surface?
Jump to A26 •
This surface can also be shown with the vis.gam function of the mgcv
package, also showing ± 1 standard error of fit:
vis.gam(model.gam, plot.type="persp", color="terrain",
theta=160, zlab="elevation", se=1.96)
32
The fit is very good, the standard error is quite small.
Task 35 : Compute the RMSE of the GAM model, i.e., compared to fits
with the actual values. •
(rmse.gam <- sqrt(sum(residuals(model.gam)^2)/length(residuals(model.gam))))
## [1] 3.244445
Since we now have a model which uses the coördinates, which are known
across the prediction grid, we can use the model to predict over the grid.
Task 36 : Predict the aquifer elevation, and the standard error of pre-
diction, across the prediction grid, using the fitted GAM, and display the
predictions. •
The predict.gam function predicts from a fitted GAM. The se.fit op-
tional argument specifies that the standard error of prediction should
33
also be computed.
We first make a data.frame version of the grid, to get the coordinates
in the same form as the model.
grid.df <- as.data.frame(grid)
names(grid.df)
## E N
## Min. :-33.00 Min. :-47.00
## 1st Qu.:-14.25 1st Qu.:-22.25
## Median : 4.50 Median : 2.50
## Mean : 4.50 Mean : 2.50
## 3rd Qu.: 23.25 3rd Qu.: 27.25
## Max. : 42.00 Max. : 52.00
We then predict onto this grid into a temporary object, because it will
have two fields (columns): the prediction and its standard error.
tmp <- predict.gam(object=model.gam, newdata=grid.df, se.fit=TRUE)
summary(tmp$fit)
summary(tmp$se.fit)
print(p.gam)
34
Aquifer elevation, m.a.s.l.
600
550
North
500
450
East
GAM fit
As with the OLS and GLS maps, in this plot the residual from the model
at each observation point is shown (1) in colour: red = negative (actual <
predicted), black = positive (actual > predicted). If a prediction is exactly
on the trend surface it will not appear. This gives a nice visualization of
the fit of the trend surface to the sample points.
Q27 : How well does the GAM trend surface fit the points? Are there
obvious problems? Jump to A27 •
print(p.gam.se)
35
s.d. Aquifer elevation, m.a.s.l.
3.5
3.0
North
2.5
2.0
1.5
East
GAM fit
Consistent with the marginal plots, we see that the standard error is
highest at the edges, but there is some local pattern due to the local
adjustments of the GAM.
An obvious question is where this map differs from the parametric trend
surfaces.
36
Q28 : Where are the largest differences between the GAM and 2nd order
OLS trend surface predictions? Explain why, considering how the two
surfaces are computed. Jump to A28 •
37
require(fields)
aq.tps <- aq[, c("E","N", "zm")]
aq.tps$coords <- matrix(c(aq.tps$E, aq.tps$N), byrow=F, ncol=2)
str(aq.tps$coords)
## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 8.53564e-06 (eff. df=
## 152.95 )
summary(surf.1)
## CALL:
## Tps(x = aq.tps$coords, Y = aq.tps$zm)
##
## Number of Observations: 161
## Number of unique points: 161
## Number of parameters in the null space 3
## Parameters for fixed spatial drift 3
## Effective degrees of freedom: 152.9
## Residual degrees of freedom: 8.1
## MLE sigma 0.6747
## GCV sigma 0.7098
## MLE rho 53330
## Scale passed for covariance (rho) <NA>
## Scale passed for nugget (sigma^2) <NA>
## Smoothing parameter lambda 8.536e-06
##
## Residual Summary:
## min 1st Q median 3rd Q max
## -0.6250000 -0.0701000 -0.0007448 0.0793500 0.5199000
##
## Covariance Model: Rad.cov
## Names of non-default covariance arguments:
## p
##
## DETAILS ON SMOOTHING PARAMETER:
## Method used: GCV Cost: 1
## lambda trA GCV GCV.one GCV.model shat
## 8.536e-06 1.529e+02 1.008e+01 1.008e+01 NA 7.098e-01
##
## Summary of all estimates found for lambda
## lambda trA GCV shat -lnLike Prof converge
## GCV 8.536e-06 152.9 10.08 0.7098 462.5 NA
## GCV.model NA NA NA NA NA NA
## GCV.one 8.536e-06 152.9 10.08 0.7098 NA NA
## RMSE NA NA NA NA NA NA
## pure error NA NA NA NA NA NA
## REML 7.654e-05 117.7 11.24 1.7383 461.6 3
Task 41 : Predict over the study area grid using the fitted thin-plate
spline. •
The predict.Krig method of the fields package computes the pre-
diction. Again, the coördinates must be a matrix. We have these im-
plicitly in the grid, but we need to list them explicitly by converting to
SpatialPoints, and then convert to a matrix.
grid.coords <- as(sp.grid, "SpatialPoints")
summary(grid.coords)
38
## Object of class SpatialPoints
## Coordinates:
## min max
## e -33 42
## n -47 52
## Is projected: TRUE
## proj4string :
## [+proj=utm +zone=14 +datum=NAD83 +units=m +no_defs]
## Number of points: 7600
39
Task 43 : Display the gridded prediction, with the residuals over-
printed. •
spplot(sp.grid, zcol="pred.tps",
xlab="East", ylab="North",
at=ts.plot.breaks,
main="Aquifer elevation, m.a.s.l.",
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(aq.sp$resid.tps < 0, "red", "black"),
cex=2*abs(aq.sp$resid.tps)/max(abs(aq.sp$resid.tps)))
})
40
This adjusts very closely to the data points.
There are some very large differences, even though GAM allows some
local deviations from a trend.
In trend surface analysis of §5.1 (1st order) and §5.2 (2nd order) we
showed that the residuals from the OLS trend surface models are not
spatially independent – there are local clusters of similar values as re-
vealed by the bubble plot. This implies the trend surface should in fact
be fit not by OLS but by Generalized Least Squares (GLS), taking into
account the spatial auto-correlation of the residuals. We pursue this fur-
ther in §10. Here we investigate the spatial structure of the residuals
The spatial structure of the residuals can be modelled with a variogram;
this structure can then be used to adjust the trend surface with GLS
41
(§10.1, below). In this section we examine the empirical variogram of the
residuals, and later use it to initialize the GLS estimate.
1
γ(si , sj ) ≡ [z(si ) − z(sj )]2 (9)
2
m(h)
1 X
γ(h) = [z(si ) − z(si + h)]2 (10)
2m(h) i=1
where:
• m(h) is the number of point-pairs separated by vector h, in practice
some range of separations (“bin”);
• these are indexed by i;
• the notation z(si +h) means the “tail” of point-pair i, i.e., separated
from the “head” si by the separation vector h.
42
vr.c <- variogram(res.ts2 ~ 1, loc=aq.sp, cutoff=40, cloud=T)
vr <- variogram(res.ts2 ~ 1, loc=aq.sp, cutoff=40)
p1 <- plot(vr.c, col="blue", pch=20, cex=0.5)
p2 <- plot(vr, plot.numbers=T, col="blue", pch=20, cex=1.5)
print(p1, split=c(1,1,2,1), more=T)
print(p2, split=c(2,1,2,1), more=F)
Note: The code to print two variograms side-by-side uses the split and
more optional arguments to the print method for Lattice graphics plots.
Q29 : What are the estimated sill, range, and nugget of this variogram?
Jump to A29 •
In later sections (§11.2) we will see how to model the variogram and
use it in spatial prediction. For now, we continue with a method that
takes spatial correlation of the residuals into acount when computing
the trend.
As explained in §5, the OLS solution is only valid for independent residu-
als. The previous § shows that in this case the residuals are not spatially
43
independent, and we were able to model that dependence with a vari-
ogram model. Thus, using OLS may result in an incorrect trend surface
equation, although the OLS estimate is unbiased. A large number of
close-by points with similar values will “pull” a trend surface towards
them. Furthermore, the OLS R 2 (goodness-of-fit) may be over-optimistic.
This is discussed by Fox [4, §14.1].
The solution is to use Generalised Least Squares (GLS) to estimate the
trend surface. This allows a covariance structure between residuals
to be included directly in the least-squares solution of the regression
equation.
The GLS estimate of the regression coefficients is [2]:
44
10.1 Computing the GLS trend surface
## [1] "gls"
summary(model.ts2.gls)
15
See §D.3 for the theory and mathematics of REML.
45
## I(e^2) -0.0013 0.002451 -0.51108 0.6100
## I(e * n) 0.0045 0.001948 2.32555 0.0213
##
## Correlation:
## (Intr) n e I(n^2) I(e^2)
## n 0.017
## e 0.057 -0.038
## I(n^2) -0.620 -0.083 0.024
## I(e^2) -0.571 0.029 -0.233 0.099
## I(e * n) 0.031 -0.048 0.012 -0.026 -0.066
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -3.4096926 -0.5485337 0.1276973 0.6513998 2.0752494
##
## Residual standard error: 6.385512
## Degrees of freedom: 161 total; 155 residual
Notice that the gls method also estimates the range of spatial correla-
tion.
Task 48 : Compare the coefficients from the GLS and OLS fits, as abso-
lute differences and as percentages of the OLS fit. •
The generic coef method extracts coefficients from model objects.
coef(model.ts2.gls) - coef(model.ts2)
round(100*(coef(model.ts2.gls) - coef(model.ts2))
/coef(model.ts2),1)
Q31 : Why are the GLS coefficients different than the OLS coefficients?
Jump to A31 •
Task 49 : Display the 90% confidence intervals for the GLS model pa-
rameters. •
The generic intervals method has a specific method for a fitted GLS
model; internally this is the intervals.gls function of the nlme pack-
age.
46
intervals(model.ts2.gls, level=0.90)
Task 51 : Display the GLS 2nd-order trend surface, with the data points
superimposed, side-by-side with the OLS 2nd-order trend surface com-
puted in §6.2. •
First we need to compute and store the residuals, to be displayed on the
trend surface:
res.ts2.gls <- residuals(model.ts2.gls)
The spplot “spatial plot” method plots spatial objects, i.e., those in one
of the sp classes.
The fit field of the prediction object contains the trend surface fits.
47
sp.grid$gls.fit <- pred.ts2.gls
p.gls <- spplot(sp.grid, zcol="gls.fit",
sub="2nd-order trend, GLS fit",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
col.regions = topo.colors(length(ts.plot.breaks)),
panel=function(x, ...) {
panel.levelplot(x, ...);
panel.points(coordinates(aq.sp), pch=1,
col=ifelse(res.ts2.gls < 0, "red", "black"),
cex=2*abs(res.ts2.gls)/max(abs(res.ts2.gls)))
})
print(p.ols, split=c(1,1,2,1), more=T)
print(p.gls, split=c(2,1,2,1), more=F)
Task 52 : Compute the difference between the OLS and GLS trend
surfaces, and map them. •
sp.grid$diff.gls.ols <- sp.grid$gls.fit - sp.grid$fit
spplot(sp.grid, zcol="diff.gls.ols", sub="GLS-OLS fits",
main="difference, m", xlab="East", ylab="North",
col.regions = terrain.colors(64))
48
Q33 : Where are the largest differences between the OLS and GLS trend
surfaces? Explain why. Jump to A33 •
The trend surface fits an overall trend, but of course does not fit every
observation exactly. This lack of fit can be pure noise, but it can also
have a spatially-correlated component which can be modelled and used
to improve the predictions.
Task 53 : Display the residuals from the GLS trend surface as a post-
plot. •
summary(res.ts2.gls)
49
We can see from this post-plot of the residuals that there is local spatial
correlation. The GLS fit optimized the estimates of the trend surface co-
efficients, and correctly estimated the spatial correlation of the residuals,
but did not correct for this in mapping.
50
Q34 : What are the approximate variogram parameters? Jump to A34 •
In the kriging formula (see below, §11.2.1), we need to compute the semi-
variance at any separation distance. Therefore, we need to fit a vari-
ogram function to the empirical variogram. This function represents
the structure of the spatial autocorrelation of the attribute, in this case
the trend surface residual.
There are many authorized variogram functions, that will ensure that
the kriging system can be solved. One of the most common is the expo-
nential function:
h
−
γ(h) = c 1 − e a (12)
Where:
• h is the separation distance between a point-pair; this is the argu-
ment to the function which changes with each point-pair;
• c is the fitted sill parameter, i.e., the maximum variance in the at-
tribute when point-pairs are widely-separated’
• a is the fitted range parameter.
The effective range 3a is the separation distance at which γ = 0.95c.
51
estimates as a starting point for the fit.variogram function, which
adjusts the parameters by weighted least-squares.
intervals(model.ts2.gls)$corStruct[2]
## [1] 14.42157
Q35 : Does the range parameter of this fitted model agree with the
estimate from the GLS fit? Jump to A35 •
52
11.2.1 Ordinary Kriging
Once we know the structure of the residuals, their values, and their loca-
tions, we can predict their values at all locations (e.g., over the grid), by
Ordinary Kriging interpolation.
To do this, we first need to understand OK.
Kriging is a form of linear prediction of the attribute value at an un-
known point z(s
b 0 ), as a weighted sum of the attribute values at the
known points z(si ):
N
X
z(s
b 0) = λi z(si ) (13)
i=1
The weights λi must sum to 1, and are determined by solving the krig-
ing system of equations. This system ensures that the prediction has the
least possible prediction variance, i.e., uncertainty, among all the possi-
ble weights. Therefore OK is called the Best Linear Unbiased Predictor
(BLUP).
Here we do not derive the system, but present it. The weights are the
solution of the linear equation Aλ = b where:
λ1 γ(s1 , s0 )
λ2
γ(s2 , s0 )
.. ..
λ =
.
b=
.
λN γ(sN , s0 )
ψ 1
λ = A−1 b
These weights can then be used in the prediction formula Equation 13.
b 2 = bY λ.
They also can be used to compute the prediction variance as σ
53
11.2.2 OK predictions
Task 57 : Predict over the grid by OK, using the fitted variogram model.
•
The krige function computes kriging predictions and their variances.
kr <- krige(res.ts2.gls ~ 1, loc=aq.sp, newdata=sp.grid, model=vr.gls.m.f)
summary(kr)
54
kr$var1.sd <-sqrt(kr$var1.var)
p2 <- spplot(kr, zcol="var1.sd",
col.regions=cm.colors(64),
main="Kriging prediction standard deviation, m")
print(p2)
Q37 : Which areas have the most and least uncertainty? Why? Jump to
55
A37 •
12 GLS-Regression Kriging
Now we have both parts of a universal model: a global trend and local
deviations from it. We can combine these for a “best” prediction.
To understand this, we introduce the so-called universal model of spa-
tial variation:
56
Task 61 : Compare this predicted surface with the GAM prediction. •
p.rk <- spplot(sp.grid, zcol="rk.gls",
sub="GLS-RK prediction",
main="Aquifer elevation, m.a.s.l.",
xlab="East", ylab="North",
at=ts.plot.breaks,
col.regions = topo.colors(length(ts.plot.breaks)))
print(p.rk)
summary(sp.grid$diff.gam.rk.gls <-
sp.grid$pred.gam -sp.grid$rk.gls)
57
Q38 : Where are the largest differences between these two trend sur-
face predictions? Explain why, considering how the two surfaces are
computed. Jump to A38 •
We showed in §9 that the residuals from the OLS fit are not spatially
independent. We used this fact in §10 to produce a correct trend surface
by GLS. We then modelled the spatial structure of the residuals from the
GLS (not OLS) surface and interpolated these, to make a final map of both
the trend and local variations with GLS-RK (§11).
There is another method of fitting the trend and the local deviations
from it in one step, called “Universal Kriging”, abbreviated as UK. This is
not completely correct theoretically, as we will explain, but if observation
points are well-distributed over the area (as is the case in this exercise)
so that the GLS and OLS trend surfaces are not too different, it provides
a very similar map to GLS-RK, and in one step.
58
• Z(s) is the true (unknown) value of some property at the location;
• Z ∗ (s) is the deterministic component, due to some non-stochastic
process, i.e., the trend surface;
• ε(s) is the spatially-autocorrelated stochastic component of the
deviations from the trend;
• ε0 (s)is the pure (“white”) noise with no structure; this can not be
modelled.
We have seen how to model Z ∗ (s)+ε0 (s) with a polynomial trend surface
in §5. We have seen how to model the local spatial structure as ε(s) +
ε0 (s) by Ordinary Kriging (OK) in §11. UK is an extension of OK that
models the entire Equation 15 in one step.
The residual variogram is computed from the OLS surface as in §9, but
directly from the definition of the trend:
485
332
254
374
semivariance (m^2)
20
216
86
10
10 20 30 40
separation (km)
59
plot(vr, plot.numbers=TRUE,
xlab="separation (km)",
ylab="semivariance (m^2)",
model=vr.m.f,
main="Fitted residual variogram model")
485
332
254
374
semivariance (m^2)
20
216
86
10
10 20 30 40
separation (km)
Task 62 : Compare this fitted variogram with that from the OLS trend
surface residuals, and with the estimate from the GLS fit. •
print(vr.m.f) # OLS trend residuals
Recall from §11.2.1 that Kriging is a form of linear prediction of the at-
tribute value at an unknown point s0 , as a weighted sum of the attribute
values at the known points z(si ):
N
X
z(s
b 0) = λi z(si ) (16)
i=1
60
where the weights λi must sum to 1, and are determined by solving
the kriging system of equations, which ensures that the prediction has
the least possible prediction variance, i.e., uncertainty, among all the
possible weights.
In OK the weights only take into account local spatial autocorrelation. In
UK the weights λi take into account both the global trend and the local
spatial autocorrelation of the trend residuals.
Here we do not derive the system, but present it.
The weights are the solution of the linear equation AU λU = bU where:
γ(s1 ,s1 ) ··· γ(s1 ,sN ) 1 f1 (s1 ) ··· fk (s1 )
.. .. .. .. ..
. ··· . . . ··· .
γ(sN ,s1 ) ··· γ(sN ,sN ) 1 f1 (sN ) ··· fk (sN )
AU =
1 ··· 1 0 0 ··· 0
f1 (s1 ) ··· f1 (sN ) 0 0 ··· 0
.. .. .. .. .. .. ..
. . . . . . .
fk (s1 ) ··· fk (sN ) 0 0 ··· 0
γ(s1 ,s0 )
λ1
..
.
···
λN γ(sN ,s0 )
λU =
ψ0
bU =
1
ψ1 f (x )
1 0
..
···
.
ψk
fk (x0 )
λU = AU −1 bU
These weights can then be used in the prediction formula Equation 16.
b 2 = bU Y λ U .
They also can be used to compute the prediction variance as σ
13.3 Prediction
Now we have the model, the krige function can compute the UK pre-
diction at any location, for example at all the grid points. Note that the
formula given in krige must match that given in variogram.
61
k.uk <- krige(zm ~ n + e + I(n^2) + I(e^2) + I(e*n), locations=aq.sp,
newdata=sp.grid, model=vr.m.f)
summary(k.uk)
UK prediction, m
620
600
580
560
540
520
500
480
62
The obvious question is how close is this one-step procedure to the two-
step procedure of GLS-RK (§12). Both methods take both global and local
structure into account.
0.10
0.05
0.00
−5 0 5 10
difference, UK − GLS−RK
63
difference, m
10
North
2
−2
−4
East
UK − GLS−RK predictions
Q40 : How large are differences between the UK and GLS-RK trend
surface predictions? Where are the largest differences? Explain why
there is a difference. Jump to A40 •
14 Discussion
64
15 Answers
A1 : The map of aquifer elevations, along with a map of the elevation of the
land surface, can be used by well-drillers, to estimate the cost of drilling a well
to reach the aquifer at any location. Return to Q1 •
A2 : There are several slots in the object that refer to geographic space:
The attribute data are in slot data, which is a data frame, like the original
(non-spatial) dataset. In this case there is only one attribute: the elevation of
the aquifer at the location. Return to Q2 •
A3 : There are 161 observations (wells); for each we know the coördinates (E
and N) and the elevation of aquifer (z); we also have the transformed elevation
in meters and the reduced coördinates. Return to Q3 •
A4 : UTM East from 5.003613 × 105 m . . . 5.744296 × 105 m (range 74.068 km);
UTM North from 4.1502482 × 106 m . . . 4.2483125 × 106 m (range 98.064 km);
total area 7263 km2. Return to Q4 •
A7 : (1) The text postplot has the advantage of showing the actual values,
but it is not very graphical and difficult to read; (2) the size postplot clearly
shows the relative data values; (3) the size and colour postplot gives two ways
to visualize; it seems especially good for seeing the E–W increasing first-order
trend. Return to Q7 •
A8 : The aquifer has a flat surface, tilted towards some direction, by some
regional uplift. In this case, the uplift of the Rocky Mountains about 650 km
to the west has tilted the aquifer. Return to Q8 •
65
for each km N it decreases by -0.03 m. The relation is highly-significant; it
explains 94.1% of the variability in the observations; however the N coördinate
is not needed – it is not statistically different from zero. Return to Q9 •
A10 : Residuals range from -25.4 to 16.7 m; compare this to the median
elevation 552.8 m; the maximum calibration error is 4.6%. Return to Q10 •
A11 :
Conclusion: this OLS fit does not satisfy the assumptions of independent resid-
uals. Return to Q11
•
A12 : There is a spatial pattern. Large residuals tend to be near each other,
and vice-versa. Positive residuals (above the trend surface) are found almost
exclusively in the middle third of the map. Dependence seems to be stronger
along a SW-NE axis (range about 50 to 70 km) than the NW-SE axis (range about
10 to 20 km). This implies a higher-order trend surface or a periodic surface
superimposed on the linear trend. Return to Q12 •
A13 : The tilted structure has local warping as either a dome or a basin.
Return to Q13 •
A14 : The model explains 97.5% of the variance in the observations, compared
to 94.1% for the first-order significance. Return to Q14 •
A15 : The probability that the higher-order surface is this much better just by
chance is almost zero, so the second-order surface is statistically superior to
the first-order surface. Return to Q15 •
A16 : Residuals range from -19.8 to 14.8 m; compare this to the median ele-
vation 552.8 m; the maximum calibration error is 3.6%. This range is narrower
than for the first-order surface: -25.4 to 16.7 m. Return to Q16 •
A17 :
66
2. no slight relation between spread of residuals and the fitted values; but
...
3. The residuals are closer to normally-distributed. The largest negative
residuals (over-predictions) are a still too extreme. However, the problem
with the largest positive residuals from the first-order surface has been
solved.
Conclusion: this 2nd-order OLS fit is much closer to being valid than for the
1storder OLS fit. Return to Q17 •
A18 : These residuals form local clusters of positive, negative, and near-zero;
there does not appear to be any overall spatial pattern. So, a higher-order trend
surface is not indicated. Instead, some local interpolation of the residuals
would seem to improve the model. Return to Q18 •
A19 : The fit is generally good but some clusters of points stand out from the
background; their values are not that well matched. Return to Q19 •
A20 : The prediction errors are from 22.3 to 24.4 m; this is about 4.1% of
the predicted value. This much uncertainty in the prediction corresponds to
uncertainty in the expense of drilling a well at the location. Return to Q20 •
A21 : They are least at the centre of gravity of the regression in both E and N;
they increase away from this in both directions; the largest uncertainties are
in the corners of the grid. Return to Q21 •
A22 : The relation with East seems almost linear, and a very tight relation.
However, the relation with North is much more scattered, and seems to have
higher elevations towards the middle of the range. Return to Q22 •
A24 : The GAM has a much smaller spread of residuals, much more concen-
trated towards zero. Return to Q24
•
A27 : The fit is very good, the residuals are smaller than for OLS or GLS.
67
A few large negative residuals (over-predictions) are in the south-central and
southeast. Return to Q27 •
A28 : The GAM predicts higher elevations in the NE and especially the SW (up
to 32 m), OLS predicts higher elevations in the NW and SE. The predictions are
the same at the map centroid. Return to Q28 •
A29 : The sill is estimated as 35 m2, there is no nugget, the sill is reached at
a range of about 20 km. Return to Q29 •
A31 : The GLS coefficients take into account the spatial correlation of the
trend surface residuals, i.e., the fit uses a variance-covariance matrix of the
residuals to adjust the least-squares fit. Return to Q31 •
A32 : The GLS surface has a narrower area of medium values at the N side of
the map, and wider at the S side. The main axes of the 2nd-order are not at the
same angle. Return to Q32 •
A33 : The GLS surface is substantially higher than the OLS surface in the
NW and SE, and substantially lower in the NE and SW. This shows that some
clustered observations affected the OLS fit. Return to Q33 •
A34 : Sill about 38 m2, no nugget, range about 20 km. Return to Q34 •
A35 : Yes, this range parameter here is 14.2 km; the range estimated by gls,
is 14.4 km. These are very close, despite being fit in two very different ways.
Return to Q35 •
A36 : The largest adjusments towards lower aquifer elevations are in a NE-
SW band towards the SE of the map. The largest adjusments towards higher
aquifer elevations are in a large spot at the SW-center side of the map. These
correspond to local warping of the overall aquifer structure at the scale re-
vealed by the variogram model. Return to Q36
•
68
•
A38 : The differences here are mainly because in GLS-RK the kriging of the
residuals finds local highs and lows that can not be captured by the smooth
function of the GAM. Return to Q38 •
A39 : The range parameters are 14.2 (GLS) and 14.2 (OLS), so that the GLS
range is about 40% longer. The partial sills are 43.8 (GLS) and 43.8 (OLS), so
that the GLS sill is about 25% higher. This implies that GLS has removed less of
the local variation in the residuals than OLS, or in other words, OLS incorrectly
removed this variation. Return to Q39 •
A40 : The differences ae quite small, almost all < ±2.5 m, with a range of
≈ -4 . . . 10 m. These are much smaller differences than between GAM and
GLS-RK. Differences are skewed to the positive differences, i.e., UK > GLS-RK.
This difference comes about because UK uses the fitted variogram from the
2nd-order OLS trend residuals, not from the 2nd-order OLS residuals. Because
the two trend surfaces are different, so are the residuals, and so are the fitted
models. Return to Q40 •
69
A Derivation of the OLS solution to the linear model
y = Xβ + ε (17)
S = yT y − βT XT y − yT Xβ + βT XT Xβ
S = yT y − 2βT XT y + βT XT Xβ (19)
This is minimized by finding the partial derivative with respect the the
unknown coefficients β, setting this equal to 0, and solving:
∂
S = −2XT y + 2XT Xβ
∂βT
0 = −XT y + XT Xβ
(XT X)β = XT y
(XT X)−1 (XT X)β = (XT X)−1 XT y
β̂OLS = (XT X)−1 XT y (20)
B Standardized residuals
70
p
The standardized residuals are computed as ri /(s · 1 − hii ), where ri
are the unstandardized residuals, s is the sample standard deviation of
the residuals, and the hii are the diagonal entries of the so-called “hat”
−1
matrix V = X(X0 X) X0 .
The sample standard deviation of the residuals s is computed as the
square root of the estimated variance of the random error:
s
1 X
s= · ri2
(n − p)
where n is the number of observations and p the number of predictors.
It is shown in the linear model summary as “Residual standard error”;
it can be extracted as summary(model_name)$sigma. This is an overall
measure of the variability of the residuals, and so can be used to stan-
dardize the residuals to N (0, 1).
The “hat” matrix V is another way to look at linear regression. This
matrix multiplies the observed values to compute the fitted values. The
hat value for an observation gives the overall leverage (i.e.,
p importance
when computing the fit) of that observation. So the term 1 − hii in the
denominator shows that with low influence (small hii ) the ratio ri /s (a
simple standardization) is not affected much, but with a high influence
(large hii ) the denominator is smaller and so the standardized residual
is increased. Thus the standardized residuals are higher for points with
high influence on the regression coefficients.
71
λ → ω the solution approximates the least-squares plane, i.e., the trend
surface averaged over all the points.
In 2D an appropriate penalty is:
2 f (x) 2 2 f (x) 2
! ! !2
∂ ∂ ∂ 2 f (x)
Z Z
J[f ] = +2 + dx 1 dx 2 (22)
R R ∂x12 ∂x1 ∂x2 ∂x22
where (x1 , x2 ) are the two coördinates of the vector x. In practice the
double integral is discretized over some grid known as knots; these
may be defined by the observations or may be a different set, maybe
an evenly-spaced grid.
This penalty can be interpreted as the “bending energy” of a thin plate
represented by the function f (x); by minimizing this energy the spline
function in over the 2D plane is a thin (flexible) plate which, according
to the first term of Equation 21 would be forced to pass through data
points, with minimum bending. However the second term of Equation
21 allows some smoothing: the plate does not have to bend so much,
since it is allowed to pass “close to” but not necessarily through the data
points. The higher the λ, the less exact is the fit.
This has two purposes: (1) it allows for measurement error; the data
points are not taken as exact; (2) it results in a smoother surface. So
cross-validation is used to determine the degree of smoothness.
The solution to Equation 22 is a linear function:
N
X
f (x) = β0 + βT x + αj hj (x) (23)
j=1
where the β account for the overall trend and the α are the coefficients
of the warping.
The set of functions hj (x) is the basis kernel, also called a radial basis
function (RBF), for thin-plate splines:
where the norm distance r = kx−xj k is also called the radius of the basis
function. The norm is usually the Euclidean (straight-line) distance.
72
D.1 GLS
y = Xβ + ε, ε ∼ N (0, σ 2 I) (25)
y = Xβ + η, η ∼ N (0, V) (26)
73
repeated measures of the same object, or spatial dependence (the case
here, see next sub-section for details).
Expanding Equation 27, taking the partial derivative with respect to the
parameters, setting equal to zero and solving we obtain:
∂
S = −2XT V−1 y + 2XT V−1 Xβ
∂β
0 = −XT V−1 y + XT V−1 Xβ
β̂GLS = (XT V−1 X)−1 XT V−1 y (29)
74
Lark and Cullis [10, Eq. 12] show that the likelihood of the parameters
in Equation 25 can be expanded to include the spatial dependence im-
plicit in the variance-covariance matrix V, rather than a single residual
variance σ 2 . The log-likelihood is then:
1 1
`(β, θ|y) = c − log |V| − (y − Xβ)T V−1 (y − Xβ) (31)
2 2
where c is a constant (and so does not vary with the parameters) and V
is built from the variance parameters θ and the distances between the
observations. By assuming second-order stationarity22 , the structure
can be summarized by the covariance parameters θ = [σ 2 , s, a], i.e., the
total sill, nugget proportion, and range.
However, maximizing this likelihood for the random-effects covariance
parameters θ also requires maximizing in terms of the fixed-effects re-
gression parameters β, which in this context are called nuisance parame-
ters since at this point we don’t care about their values; we will compute
them after determining the covariance structure.
Pinheiro and Bates [16, §2.2.5] show how this is achieved, given a likeli-
hood function, by a change of variable to a statistic sufficient for β.
22
that is, the covariance structure is the same over the entire field, and only depends
on the vector distance between pairs of points
75
References
[1] I. C. Briggs. Machine contouring using minimum curvature. Geo-
physics, 39(1):39–48, January 1974. ISSN 0016-8033, 1942-2156.
doi: 10.1190/1.1440410. 73
[2] N Cressie. Statistics for spatial data. John Wiley & Sons, revised
edition, 1993. 46
[3] J. C. Davis. Statistics and data analysis in geology. John Wiley &
Sons, New York, 3rd edition, 2002. 2, 4, 21
[4] J Fox. Applied regression, linear models, and related methods. Sage,
Newbury Park, 1997. 45
[5] T Hastie, R Tibshirani, and J H Friedman. The elements of statis-
tical learning data mining, inference, and prediction. Springer se-
ries in statistics. Springer, New York, 2nd ed edition, 2009. ISBN
9780387848587. 27, 73
[6] Jan M. Hoem. The reporting of statistical significance in scientific
journals: A reflexion. Demographic Research, 18(15):437–442, Jun
2008. ISSN 1435-9871. doi: 10.4054/DemRes.2008.18.15. 2
[7] M. F. Hutchinson. Interpolating mean rainfall using thin plate
smoothing splines. International Journal of Geographical Informa-
tion Science, 9(4):385 – 403, 1995. 73
[8] R Ihaka and R Gentleman. R: A language for data analysis and graph-
ics. Journal of Computational and Graphical Statistics, 5(3):299–314,
1996. 1
[9] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshi-
rani. An introduction to statistical learning: with applications in R.
Number 103 in Springer texts in statistics. Springer, 2013. ISBN
9781461471370. 27
[10] R. M. Lark and B. R. Cullis. Model based analysis using REML for in-
ference from systematically sampled data on soil. European Journal
of Soil Science, 55(4):799–813, 2004. 75, 77
[11] Virginia L. McGuire. Water-level and recoverable water in storage
changes, High Plains aquifer, predevelopment to 2015 and 2013–15.
USGS Numbered Series 2017-5040, U.S. Geological Survey, Reston,
VA, 2017. 5
[12] H. Mitasova and J. Hofierka. Interpolation by regularized spline with
tension: II. Application to terrain modeling and surface geometry
analysis. Mathematical Geology, 25(6):657–669, 1993. doi: 10.1007/
BF00893172. 73
[13] H. Mitasova and L. Mitas. Interpolation by regularized spline with
tension: I. Theory and implementation. Mathematical Geology, 25
(6):641–655, 1993. doi: 10.1007/BF00893171. 73
76
[14] R. A. Olea and J. C. Davis. Sampling analysis and mapping of water
levels in the High Plains aquifer of Kansas. Technical Report KGS
Open File Report 1999-11, Kansas Geological Survey, May 1999. URL
http://www.kgs.ku.edu/Hydro/Levels/OFR99_11/. 2, 3
[15] R. A. Olea and J. C. Davis. Optimization of the high plains aquifer
water-level observation network. Technical Report KGS Open File
Report 1999-15, Kansas Geological Survey, May 1999. URL http:
//www.kgs.ku.edu/Hydro/Levels/OFR99_15/. 2
[16] J C Pinheiro and D M Bates. Mixed-effects models in S and S-PLUS.
Springer, 2000. ISBN 0387989579. 75, 77
[17] R Development Core Team. R Data Import/Export. The R Foundation
for Statistical Computing, version 3.6.2 (2019-12-12) edition, 2015.
URL http://cran.r-project.org/doc/manuals/R-data.pdf. 5
[18] W N Venables, D M Smith, and R Development Core Team. An in-
troduction to R; notes on R: a programming environment for data
analysis and graphics; Version 3.6.2. R Foundation for Statisti-
cal Computing, Dec 2019. ISBN ISBN 3-900051-12-7. URL http:
//www.R-project.org. 1
[19] S. N. Wood. Thin plate regression splines. Journal of the Royal
Statistical Society Series B-Statistical Methodology, 65:95–114, 2003.
doi: 10.1111/1467-9868.00374. 73
77