0% found this document useful (0 votes)
0 views

Field Analysis

The document discusses field analysis in geomatics engineering, explaining the representation of scalar and vector fields through various models such as matrices, triangulated irregular networks, and point clouds. It details methods for deterministic spatial interpolation to estimate unknown field values, including nearest neighbor and local spatial average techniques. Additionally, it covers the derivation of information from fields, such as slope and aspect, using numerical approximations and gradient calculations.

Uploaded by

Alp Molla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Field Analysis

The document discusses field analysis in geomatics engineering, explaining the representation of scalar and vector fields through various models such as matrices, triangulated irregular networks, and point clouds. It details methods for deterministic spatial interpolation to estimate unknown field values, including nearest neighbor and local spatial average techniques. Additionally, it covers the derivation of information from fields, such as slope and aspect, using numerical approximations and gradient calculations.

Uploaded by

Alp Molla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

GMT 454

Hacettepe Univertisty Department of Geomatics Engineering ¶


Field Analysis
Field is a continuous representation of an attribute value over a region. Scalar Fields may be described by a continuous single valued function
𝑧 = 𝑓(𝑥, 𝑦)

where 𝑧 is the attribute value, 𝑥 and 𝑦 are coordinates inside the region. Thus, 𝑓(𝑥, 𝑦) represents the variation of the attribute value depending on location
over the region.

Vector fields are represented by a vector valued function depending on location. For example the wind dataset from meteorology may be represented by a
vector field where both speed and direction are represented by a vector quantity.

Field representation types


The continuous function 𝑓(𝑥, 𝑦) may be represented with different models in computers.

A Matrix of Attribute Values For example, the height values over a region are represented by a matrix where the region is represented as a grid of equal
area quads (generally known as Digital Elevation Model).
Triangulated Irregular Network Every triangle represents a facet of the surface. The vertices of the TIN are generally selected to represent characteristic
points (Very Important Points) of the surface such as peaks or valleys.
Digitized Contours set of level curves of constant attribute values.
Mathematical Models The variation of attribute values are represented by a function.
Point Clouds The surface is represented by a dense point cloud data mostly acquired by a scanning device like LIDAR or by Photogrammetric methods.

(book: Geographic Information Analysis)

In order to derive a field representation of an attribute over a region, we need to collect measurement of attribute values over the field with known locations.

Due to many practical reasons (can you mention a few?) we may not be able to collect measurements everywhere in the field.
Fields may also have time dependency, thus the samples from different times may not be used together.
Most of the time we cannot control the data collection procedure.

Thus, we need tools and procedures to obtain continuous field representation from available field observations.

Deterministic Spatial Interpolation


Interpolation is a method to obtain unknown field values for locations where no observation is available.

Generally, interpolated values are equal to observed field values at locations where samples are collected (interpolation passes through sample values at
observation locations).
Some interpolation techniques (e.g Least Squares Interpolation) results in interpolators which may not pass through sample points, where we get an
approximate value.

Proximity Polygons
The most general and straightforward method for deterministic interpolation is the nearest neighbour interpolator which assigns the field value for a location the
value of the nearest observation. For this problem the general method is to construct a proximity polygon mesh over the region using the observed locations
and assign the observed attribute valueto each polygonal region.

Mostly used for nominal data.


The resulting field is not continuous (what do we mean by continuous?)

Local spatial Average


Local spatial average methods assign the attribute value to a location by taking the weighted average of the neighbouring observations.
∑𝑖𝑛𝑗 𝑤𝑖,𝑗 𝑧𝑖
𝑠𝑗 =
𝑛𝑗
, where (𝑥𝑖 , 𝑦𝑖 ) ∈ neighborhood (𝑥𝑗 , 𝑦𝑗 ) .

k-nearest neighbor (kNN)


Contiguity based neighbor
Adjacency based neighbor
Kernel Density Estimation or Moving Average Window based neighbors. Where weights are assigned by a kernel function depending on location
𝑤𝑖,𝑗 = 𝐾(𝑥𝑗 , 𝑦𝑗 )
Inverse Distance Weighting, where the weights are assigned depending on the inverse of distances between points.

The weights are generally normalized to have their sum equal to 1.


𝑤𝑖,𝑗
𝑤𝑖,𝑗 =
∑𝑖𝑛𝑗 𝑤𝑖,𝑗

Different distance metrics can be pplied for 𝑑𝑖,𝑗


Different neighborhood definitions can be applied.

Inverse distance weights can be specified as

1
power of inverse distances 𝑤𝑖,𝑗 = .
𝑑𝑧𝑖,𝑗
exponentials 𝑤𝑖,𝑗 = 𝑒−𝑘𝑑𝑖,𝑗 .

(book: Geographic Information Analysis)

In addition, nearest neighbor, bilinear or cubic spline interpolations are commonly used for up and down-sampling of raster datasets.

(wikipedia)

Derived information from Fields

Slope
The slope of a surface is the maximum rate of change in the attribute value (for example elevation).
Δ𝑧
tan(𝜃) =
||Δ𝑥, Δ𝑦||
It can also be specified by a gradient vector which is partial derivatives of surface with respect to 𝑥 and 𝑦 .
∂𝑧

[ ]
∂𝑥
∇𝑧 = ∂𝑧
∂𝑦

A numerical approximation of the partial derivatives can be obtained when the field is represented as a regular grid by

(book: Geographic Information Analysis)


∂𝑧 𝑧𝑖,𝑗+1 − 𝑧𝑖,𝑗−1
=
∂𝑥 2𝑔
and
∂𝑧 𝑧𝑖+1,𝑗+1 − 𝑧𝑖−1,𝑗
=
∂𝑦 2𝑔
The gradient then be calculated by
‾∂𝑧
‾‾‾‾‾‾‾‾‾
∂𝑧 2‾
√ ∂𝑥
2
𝑔𝑟𝑎𝑑 = +
∂𝑦

And aspect of the surface at that point is represented by


∂𝑧 ∂𝑧
𝑎𝑠𝑝𝑒𝑐𝑡 = 90 − arctan2( , )
∂𝑦 ∂𝑥

In [1]: import numpy as np


import matplotlib.pyplot as plt
import rasterio as rio
from rasterio.windows import Window

In [2]: ## Sampling over the longitude


dataset = rio.open('Data/dted.tif')
print (dataset.crs)
p1 = dataset.xy(dataset.height,0)
p2 = dataset.xy(0,dataset.width)

## Read band 1 of the dataset


dted = dataset.read(1)
#dted = dted*10 ## Use for exegeration

print (dted.shape)
print (p1, p2)

EPSG:3857
(451, 442)
(3840621.9500124436, 4847796.4621421285) (3854289.4850124437, 4865859.425142129)

In [3]: plt.imshow(dted,cmap='cool')
plt.colorbar()

Out[3]: <matplotlib.colorbar.Colorbar at 0x126bdcc10>

In [4]: print(dted.min(), dted.max())

1060 1556

In [5]: from scipy import signal

partial_x = signal.convolve2d(dted, np.array([


[-1,0,1],
[-2,0,2],
[-1,0,1]
]),boundary='symm', mode='same')/(8*30.0)

plt.imshow(partial_x)

Out[5]: <matplotlib.image.AxesImage at 0x136f67ed0>

In [6]: from scipy import signal

partial_y = signal.convolve2d(dted, np.array([


[1,2,1],
[0,0,0],
[-1,-2,-1]
]),boundary='symm', mode='same') / (8*30.0) # meters per cell is 30 m

plt.imshow(partial_y)

Out[6]: <matplotlib.image.AxesImage at 0x13701f650>

In [7]: aspect = np.pi/2 - np.arctan2(partial_y, partial_x)

aspect[aspect<0] +=(2*np.pi)
print (aspect.min(), aspect.max())

plt.imshow(aspect,cmap='hot')
plt.colorbar()

0.0 6.27705041482171

Out[7]: <matplotlib.colorbar.Colorbar at 0x1370bf5d0>

In [8]: slope = np.arctan(np.sqrt(partial_x**2 + partial_y**2))

plt.imshow(np.degrees(slope), cmap='hot')
plt.colorbar()

Out[8]: <matplotlib.colorbar.Colorbar at 0x1371724d0>

In [9]: #Lambertian Surface illumination

sun_elevation = np.radians(45)
sun_azimuth = np.radians(315)

shade = np.cos(sun_azimuth-aspect)*np.sin(slope)*np.cos(sun_elevation)
+ np.cos(slope)*np.sin(sun_elevation)

shade = 255*(shade + 1)/2

plt.figure(figsize=(10,10))
plt.imshow(shade,cmap='Greys_r')
plt.colorbar()

Out[9]: <matplotlib.colorbar.Colorbar at 0x137230c90>

In [10]: #We can also blend with elevation dependent coloring

plt.figure(figsize=(10,10))
plt.imshow(dted,cmap='gist_earth')
plt.colorbar()
plt.imshow(shade,cmap='Greys_r', alpha=.40)

Out[10]: <matplotlib.image.AxesImage at 0x1372f6a10>

In [11]: #Lets also add contours

sun_elevation = np.radians(45)
sun_azimuth = np.radians(315)

shade = np.cos(sun_azimuth-aspect)*np.sin(slope)*np.cos(sun_elevation)
+ np.cos(slope)*np.sin(sun_elevation)

shade = 255*(shade + 1)/2

plt.figure(figsize=(10,10))
plt.imshow(dted,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='gist_earth')
plt.colorbar()
plt.imshow(shade,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='Greys_r', alpha=.40)

## create a meshgrid for plotting contour.


x_m = np.arange(0,dataset.width,1)
y_m = np.arange(0,dataset.height,1)

X = [dataset.xy(0,i)[0] for i in range(dataset.width)]


Y = [dataset.xy(i,0)[1] for i in range(dataset.height)]

X,Y = np.meshgrid(X,Y)

CS1 = plt.contour(X, Y, dted, cmap='hot')


plt.clabel(CS1, CS1.levels[::2], inline=True, fontsize=10)

Out[11]: <a list of 8 text.Text objects>

Deterministic Spatial Interpolation


Here we will have an exercise using the terrain surface loaded above. First of all we will resample the terrain height by random sampling to simulate
observations on the field. Thus we will have observations from the field having coordinates and height values.
𝑜𝑖 = (𝑒𝑎𝑠𝑡, 𝑛𝑜𝑟𝑡ℎ, ℎ𝑒𝑖𝑔ℎ𝑡𝑖 )

where we will select easting by a uniform random sampling along the easting. A similar procedure will be applied for northing. Then we will use the map to get
the height values.

In [12]: e_p = np.random.uniform(0, dataset.width,100).astype(np.int)


n_p = np.random.uniform(0, dataset.height,100).astype(np.int)

## Coordinates in dataset coordinate system


points = [dataset.xy(p[1],p[0]) for p in zip(e_p, n_p)]
## height values corresponding to coordinates
heights = np.array([dted[p[1],p[0]] for p in zip(e_p, n_p)])

## numpy array for coordinates


eastings = np.array([p[0] for p in points])
northings = np.array([p[1] for p in points])

## Lets plot the height values and associated measurements from the field
plt.figure(figsize=(10,10))
plt.imshow(dted,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='gist_earth')
plt.imshow(shade,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='Greys_r', alpha=.40)
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

Out[12]: <matplotlib.colorbar.Colorbar at 0x1374ebd90>

In [13]: ## We will utilize a KDtree to search for nearest neighbors

import scipy.spatial

data_points = np.vstack((eastings, northings)).T


print (data_points.shape)
tree = scipy.spatial.KDTree(data_points)

(100, 2)

In [14]: ## Lets create a new raster for interpolation


knn_raster = np.zeros_like(dted)
k = 5 # five nearest neighbor

## For each pixel in the new dataset we will search for k nearest neighbors and inverse distance weight

for row in range(knn_raster.shape[0]):


for col in range(knn_raster.shape[1]):
p = dataset.xy(row, col)
distances,neighbors = tree.query(p,k)

## distance of zero will have maximum weight


distances[distances <1] = 1
idw = 1/distances

idw = idw / np.sum(idw) # Normalize weights

h_neighbors = np.array([heights[i] for i in neighbors])

height = np.sum(h_neighbors*idw) ## Weighted local sum of height values


knn_raster[row,col] = int(height)

plt.figure(figsize=(10,10))
plt.imshow(knn_raster, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

Out[14]: <matplotlib.colorbar.Colorbar at 0x1375bb650>

In using the k the nearest neighbors, when we increase the number of neighbors used (𝑘 ), we will get smoother and smoother fields. Actually if 𝑘 = 𝑁 , we use
all data in the dataset to local average which will be equal to the global average of the data.

(book: Geographic Information Analysis)

In [15]: knn_raster_avg = np.zeros_like(dted)


k = 5 # five nearest neighbor

## For each pixel in the new dataset we will search for k nearest neighbors and take average

for row in range(knn_raster_avg.shape[0]):


for col in range(knn_raster_avg.shape[1]):
p = dataset.xy(row, col)
distances,neighbors = tree.query(p,k)

## distance of zero will have maximum weight


h_neighbors = np.array([heights[i] for i in neighbors])

height = np.sum(h_neighbors)/len(neighbors)

knn_raster_avg[row,col] = int(height)

plt.figure(figsize=(10,10))
plt.imshow(knn_raster_avg, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

Out[15]: <matplotlib.colorbar.Colorbar at 0x137684510>


In [16]: ## The first nearest neighbor is somewhat equivalent to the voronoi interpolation

voronoi = scipy.spatial.Voronoi(data_points)

knn_raster_voronoi = np.zeros_like(dted)
k = 1 # first nearest neighbor

## For each pixel in the new dataset we will search for k nearest neighbors and take average

for row in range(knn_raster_voronoi.shape[0]):


for col in range(knn_raster_voronoi.shape[1]):
p = dataset.xy(row, col)
distances,neighbors = tree.query(p,k)

## distance of zero will have maximum weight


h_neighbor = heights[neighbors]

knn_raster_voronoi[row,col] = int(h_neighbor)

plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(knn_raster_voronoi, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

scipy.spatial.voronoi_plot_2d(voronoi, show_vertices=False, line_colors='orange', line_width=2, line_alpha=0.6, point_


size=2, ax=ax)

Out[16]:

Trend Surface Analysis


In many cases the measurements collected from field contain measurement errros which may result in wrong interpretations when ignored. In addition, the
measurement errors may also have some kind of spatial autocorrelation, thus their stochastic nature shall also be considered for better interpretations.

Trend surface analysis is a kind of multivariate regression analysis where the independent variables are locations and the dependent variable is the field value.
In practice, we may also add attribute values are independent variables to the multivariate regression analysis for a better field representation.

For a given location 𝑠𝑗 = (𝑥𝑗 , 𝑦𝑗 ) , the field is represented as a mathematical function of location as
𝑓(𝑠𝑗 )

The measurements collected from field are assumed to follow the following measurement equation
𝑧𝑗 = 𝑓(𝑠𝑗 ) + 𝑒𝑗

where 𝑒𝑗 are random erros.

Trend surface is generally expressed as a collection of basis functions with associated unknown parameters.
𝑀


𝑓(𝑠) = 𝛽𝑖 𝑔𝑖 (𝑠)
𝑖

where M is the number of basis functions in the model. The basis functions 𝑔𝑖 are generally selected depending on the field characteristics. Mostly used basis
function set is the polynomial basis which may be represented as
𝑔𝑖 (𝑠) = 𝑥𝑚(𝑖) 𝑦𝑛(𝑖) ,

where 𝑚(𝑖) and 𝑛(𝑖) represents the degree of the 𝑖𝑡ℎ term. For example for a first degree polynomial
𝑓(𝑠) = 𝛽0,0 𝑥0 𝑦0 + 𝛽1,0 𝑥1 𝑦0 + 𝛽0,1 𝑥0 𝑦1 + 𝛽1,1 𝑥1 𝑦1

given N observations from the field


ℎ𝑒𝑖𝑔ℎ𝑡𝑖 , 𝑥𝑖 , 𝑦𝑖 , for 𝑖 ∈ 1, 2, . . , 𝑁

we can construct the observation equations in matrix form as


⎡ ℎ𝑒𝑖𝑔ℎ𝑡1 ⎤ ⎡ 1 𝑥1 𝑦1 𝑥1 𝑦1 ⎤ ⎡ 𝛽0,0 ⎤ ⎡ 𝑒1 ⎤
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ℎ𝑒𝑖𝑔ℎ𝑡2 ⎥ ⎢1 𝑥2 𝑦2 𝑥2 𝑦2 ⎥ ⎢ 𝛽1,0 ⎥ ⎢ 𝑒2 ⎥
⎢ ⎥ ⎢
= ⎥⎢𝛽 ⎥+⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋮ ⎥ ⎢ 0,1 ⎥ ⎢ ⋮ ⎥
⎣ ℎ𝑒𝑖𝑔ℎ𝑡𝑁 ⎦ ⎣ 1 𝑥𝑁 𝑦𝑁 𝑥𝑁 𝑦𝑁 ⎦ ⎣ 𝛽1,1 ⎦ ⎣ 𝑒𝑁 ⎦

or
𝑦 = 𝑋𝜷 + 𝑒

where the solution can be obtained by least squares estimation as


𝜷̂ = (𝑋 𝑇 𝑋 )−1 𝑋 𝑇 𝑦

In addition, we can also construct an estimate of measurement error variance assuming that the measurement errors are normally distributed.
^2 (𝑦 − 𝑋 𝜷 )̂ 𝑇 (𝑦 − 𝑋 𝜷 )̂ 𝑒𝑇 𝑒
𝜎 = =
𝑁−𝑀 𝑁−𝑀
where 𝑁 is the number of measurements, and 𝑀 is the number of parameters. Additionally, variance and covariance of parameters can be estimated as
^
𝐷(𝛽) = 𝜎 2 (𝑋 𝑇 𝑋)−1

Using the estimated paramters 𝛽̂ we can approximate field values given location (𝑥, 𝑦) by using the mathematical expression of our field representation as
^ ^ ^ ^
ℎ = 𝑓(𝑥, 𝑦) = 𝛽0,0 𝑥0 𝑦0 + 𝛽1,0 𝑥1 𝑦0 + 𝛽0,1 𝑥0 𝑦1 + 𝛽1,1 𝑥1 𝑦1

Generally we will calculate the height value for interpolating a regular grid as we did with previous weighted average interpolation techniques. In this case we
can construct a new 𝑋 matrix that contains each pixel in the grid and use
ℎ𝑔𝑟𝑖𝑑 = 𝑋 𝑔𝑟𝑖𝑑 𝜷̂

However, note that our estimated parameters are also random variables having mean 𝛽̂ and variance 𝐷(𝛽) . Thus the predicted height values will also have
some uncertainity associated with them. Using the error propagation law we can calculate the uncertainity in the calculated height values as
𝐷(ℎ𝑔𝑟𝑖𝑑 ) = 𝑋 𝑔 𝑟 𝑖 𝑑 𝐷(𝛽̂)𝑋 𝑔𝑇𝑟 𝑖 𝑑

We can use the diagonals of the 𝐷(ℎ𝑔𝑟𝑖𝑑 ) as the estimated uncertainity in the grid values. This will give a kind of error map which shows the uncertainity in the
approxiated height value.

In [17]: ## Trend surface fitting

Y = heights

x = eastings
y = northings

## Important notice, using large x and y values may reveal ugly numerical problems with powers or multiplications
## Generally x and y values are scaled between -1 and 1 or 0 and 1.

x = (x - dataset.bounds.left) / (dataset.bounds.right - dataset.bounds.left)


y = (y - dataset.bounds.bottom) / (dataset.bounds.top - dataset.bounds.bottom )

X = np.vstack(
(
np.ones_like(heights),
x,
y,
x*y
# ,
# x**2,
# x**3,
# y**2,
# y**3,
# x**2 * y,
# y**2*x
)
).T

betas = np.linalg.lstsq(X,Y,rcond=None)[0]

residuals = Y - np.dot(X,betas)

var = np.sum(residuals**2)/(len(Y)-4)

var_beta = var*np.linalg.inv(np.dot(X.T,X))

print ('measurement variance ', var, 'Estimated parameters ', betas, ' And associated uncertainity ', np.sqrt(np.diag(
var_beta)))

measurement variance 9998.918325950443 Estimated parameters [1225.93201013 146.01285134 171.19041223 -199.0946963


2] And associated uncertainity [ 38.04355401 70.75997171 69.08485891 120.01991436]

In [18]: ## Now we can construct grid

ts_raster = np.zeros_like(dted)
k = 1 # five nearest neighbor

## For each pixel in the new dataset we will search for k nearest neighbors and take average
grid_e = []
grid_n = []
## Calculate map coordinates for grid points
for row in range(ts_raster.shape[0]):
for col in range(ts_raster.shape[1]):
p = dataset.xy(row, col)
grid_e.append(p[0])
grid_n.append(p[1])

## Actually it is simpler to use a meshgrid to create this.

grid_e = np.array(grid_e)
grid_n = np.array(grid_n)

print (grid_e.shape, grid_n.shape)

grid_e_norm = (grid_e - dataset.bounds.left) / (dataset.bounds.right - dataset.bounds.left)


grid_n_norm = (grid_n - dataset.bounds.bottom) / (dataset.bounds.top - dataset.bounds.bottom )

X_grid = np.vstack(
(
np.ones_like(grid_e_norm),
grid_e_norm,
grid_n_norm,
grid_e_norm*grid_n_norm
# ,
# grid_e**2,
# grid_e**3,
# grid_n**2,
# grid_n**3,
# grid_e**2 * grid_n,
# grid_n**2 * grid_e
)
).T

print (X_grid.shape)

h_grid = np.dot(X_grid,betas)

ts_raster[:] = h_grid.reshape(ts_raster.shape[0],ts_raster.shape[1])

plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(ts_raster, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

(199342,) (199342,)
(199342, 4)

Out[18]: <matplotlib.colorbar.Colorbar at 0x1384784d0>

In [19]: #We can also contruct an error map according to error propagation

d_h = np.sqrt(
np.dot(X_grid**2, np.diag(var_beta))
)

error_map = d_h.reshape(ts_raster.shape[0], ts_raster.shape[1])

plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(error_map, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=residuals, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

/Users/muratd/Programs/tflow/lib/python3.7/site-packages/matplotlib/collections.py:857: RuntimeWarning: invalid value


encountered in sqrt
scale = np.sqrt(self._sizes) * dpi / 72.0 * self._factor

Out[19]: <matplotlib.colorbar.Colorbar at 0x137890f50>

In [30]: ## We can also use more complicated basis expansions for interpolation
## Here we use radial basis functions to interpolate the data

from scipy.interpolate import Rbf

x = (eastings - dataset.bounds.left) / (dataset.bounds.right - dataset.bounds.left)


y = (northings - dataset.bounds.bottom) / (dataset.bounds.top - dataset.bounds.bottom )

W = (heights.max() - heights.min())
Min = heights.min()
h = (heights - heights.min())/W

z = np.zeros_like(eastings)

rbf = Rbf(x,y,z,h,function='thin_plate', smooth=0.00001)

z_grid = np.zeros_like(grid_e_norm)

d_rbf = rbf(grid_e_norm, grid_n_norm, z_grid)

d_rbf = d_rbf*W+Min

d_rbf = d_rbf.reshape(ts_raster.shape[0],ts_raster.shape[1])

plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(d_rbf, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

Out[30]: <matplotlib.colorbar.Colorbar at 0x1401e3ad0>

Steps of Deterministic Interpolation

Sample The Surface


Sampling a surface is generally done by a field work to obtain surface values at specific sampling locations or control points. If we denote the control point
as 𝑠𝑖 = (𝑥𝑖 , 𝑦𝑖 ) then the task is to measure the surface value 𝑧𝑖 at control point 𝑠𝑖 .

The choice of sampling locations is an important part of field survey. Most of the time we do not have control over the sampling locations. But if we have,
we need to take care of spatial variation over the study region. Thus, a prior knowledge of spatial auto-correlation of the surface in the study region may
help. Generally we need to take dense samples for those regions where spatial variation is high.

Choose a Continuous Surface Description


After having the sampled data 𝑧𝑖 , 𝑠𝑖 we need to establish a continuous surface description 𝑓(𝑠𝑖 ) such that

This representation can be stored as:

Contour lines of continuous surface representation where each contour line hold the constant field value.
A Mathematical expression of continuous field 𝑧𝑖 = 𝑓(𝑥𝑖 , 𝑦𝑖 ) .
Surface can be represented as a point cloud for example LIDAR data. In addition most of the digital elevation models are represented as a grid of sample
locations which is generally stored as raster data.
Triangulated Irregular Networks: a triangle mesh is created from control points to represent the surface. The nodes of the triangle are chosen wisely
(Delaunay Triangulation) and every triangle represent a facet of the surface.

Predicting The Missing Values


Most of the time the sample locations do not cover the whole region. We need to predict the continuous surface at unobserved locations. If there is a spatial
auto-correlation in the data we can say that the surface value at the unobserved location shall be somewhat similar to the surface value of nearly control points.
Thus we may use local spatial averaging to obtain an surface value at unknown locations.

Spatial Interpolation: Considering spatial-autocorrelation we may apply weighted averaging of nearby control points to obtain surface values at unknown
locations such that 𝑧𝑗 = ∑𝑀
𝑖 𝑤𝑗,𝑖 𝑧𝑖 .

Proximity Polygons: Use nearest neighbor to assign surface value to an unknown location.
Use k nearest neighbors (kNN): Use the local spatial average of k nearest neighbors.
1
Inverse Distance Weighting: Use a weighted local spatial spatial average with weights 𝑤𝑖,𝑗 ∼ , ehere 𝑝 is the power of the distance.
𝑑𝑝𝑖,𝑗
Use Kernel Density Estimators (KDE) which calculate the weight based on a kernel function for example Gaussian Kernel.
Use local regression such as GWR.
Use Least Squares to obtain a global surface representation model. We can use polynomial of coordinates, B-splines, Thin-plate Splines, Nurbs,
Spaherical Harmonics etc. to represent continuous fields and obtain coefficients of the associated basis functions.

Assess the quality of interpolation


All of the methods above have some tuning parameter associated with them. Generally we need to choose kernel width for KDE or the number of nearest
neighbors for kNN. In addition we may need a measure of the error associated with the interpolation. In this case we may:

Seperate some of the control points for validation (validation set appoach).
Randomly generate validation set K times (K- fold cross validation)
each time leave one measurement out and test the resulting surface with this seperated validation point. (Leave one out cross validation).
In adition, if we are using a trend surface analysis methods based on Least Squares, we may also produce an error map associated with out predicted
map.

Review Qestions
1) List which kind of derived information can be obtained from fields.

2) List methods for deterministic spatial interpolation. Which of them pass through samples (interpolates data) and which do not (approximates).

3) What is the effect of increasing the number of neighbors in knn or IDW interpolations.

4) What is the effect of sampling (observation locations) on the resulting interpolated fields?

5) Could you suggest a better sampling for the digitial elevation model generation problem above? Where would you collect points if you have been in the field?

6) Does your selection of points in Q5 depend on the statistics of field (mean and variations - geostatistical properties like variance)? Describe how?

Geostatistics Based Interpolation


We have seen in previous section that given the control data 𝑧𝑖 , 𝑠𝑖 , we could represent the continous field by a surface representation by a trend surface. This
surface approximates the continuous field thus there will be an error 𝑒𝑖 associated with the field value at control point. In adition we may also have some
measurement error associated with the field value at sample points. If we apply Least squares for estimating the trend surface these errors will also be called
residuals.
𝑧𝑖 = 𝑓(𝑠𝑖 ) + 𝑒𝑖

For example if we choose a polygnomial model to represent fields


𝑓(𝑥𝑖 , 𝑦𝑖 ) = 𝛽0 + 𝛽1 𝑥𝑖 + 𝛽2 𝑦𝑖

then we can estimate the parameters 𝛽̂ = [𝛽0 , 𝛽1 , 𝛽2 ] by


𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦

where the matrix 𝑋 represents the deisng matrix given in previous sections. We have discussed various ways of assessing the quality of a least squares
estimate, such as p-values or 𝑅2 . However we still need to consider some important aspects of this analysis:

Is the mathematical model to represent the surface complete? Are we missing some important variations in the field values.
Does the residuals show another spatial signal? As you may recall from our discussions about spatial auto-correlation, If the residuals only contain
measurement error, then they will potentially be distributed IID (Independent and Identically) and they will not have spatial auto-correlation. If they have
some kind of spatial auto-correlation this may indicate that we are missing a spatial signal when representing the surface with this trend surface.

Measuring Spatial Variation


If we consider the field as a stochastic process, and the measurements as outcomes. We shall consider the statistics of the field over the region. Generally, we
will consider the mean and the variance of the field as the main statistics of the field. A stochastic process is called stationary if the mean and variance of the
field is constant. A weaker form of stationarity (wide sense stationary) can be stated as the mean is constant and the autocovariance (autocorrelation)
depends on on the distance between measurement locations.

mean is constant over the region.


𝐸{𝑧𝑖 } = 𝜇

Auto correlation depends only on the distance between control points.


𝑅(𝑧𝑖 , 𝑧𝑗 ) = 𝑅(𝑑𝑖,𝑗 )

Another important concept with the above formulation is that the variation 𝑅 in the field values does not depend on the direction. Thus the auto correlation
structure is the same in all directions within the field. In this case we say that there is isotropy in the dataset. If there is a dependency between direction and
autocorrelation structure then it is an indicator of anisotropy.

But generally we do not know the auto-correlation structure of the field before obtaining measurements. Thus we need to construct from the measurements.
This may be established by a variogram analysis of the dataset.

The variogram (generally called experimental-variogram since depends on the data) is a scatter plot of variation in field values at a specific distance.
1
(𝑧𝑖 − 𝑧𝑗 )2
𝑛(𝑑) 𝑑∑
2𝛾(𝑑) =
=𝑑𝑖,𝑗

here 𝑑 is the distance, 𝑛(𝑑) is the number of pairs 𝑧𝑖 , 𝑧𝑗 which have a distance of 𝑑𝑖,𝑗 between each other. However you may directly observe that most of the
time we may not have enough points which have a distance 𝑑 . Thus generally, we choose bins of distance 𝑑 + Δ𝑑 , and choose pairs with distances that fall
into these bins.
1
(𝑧𝑖 − 𝑧𝑗 )2

2𝛾(𝑑) =
𝑛(𝑑 ± Δ) 𝑑−Δ𝑑<𝑑𝑖,𝑗 <𝑑+Δ𝑑

In [21]: from skgstat import Variogram, OrdinaryKriging

V1 = Variogram(np.vstack((eastings, northings)).T, heights, normalize=False)


V1.n_lags = 10
V1.plot(show=False);

Warning: 'harmonize' is deprecated and will be removedwith the next release. You can add a 'SKG_SUPPRESS' environment
variable to suppress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.

Using the semi-variances created from the dataset (blue dots in the figure) we can establish a mathematical model depending on distance for the variation in
the dataset which we call a semi-variograms models. The general shape of a semi-variogram may be given as:

(https://gisgeography.com/semi-variogram-nugget-range-sill/)

In the figure semi-variance is given in the vertical axis and distance lag is given in the horizontal axis. When a mathematical model is fit to the semi-variances
calculated from the dataset the general shape of the model may be represented by the following terms:

Sill: is the semi-variance value of the semi-variogram where the curve flattens.
Range: The distance at which the semi-varigram flattens. This extent can also be used to limit the neighborhood taken for averaging.
Nugget: The intercept (the point at which the semi-variogram model has the semi-variance at 0 distance). Actually we expect the variance to be zero at
zero distance lag. We almost always observe nugget since we are using distance binning and fitting a mathematical model to the observed semi-variances.

The mathematical models describing a semi-variogram depends on the dataset and its spatial variation. The following figure shows some of the mathematical
models used to represent semi-varigrams.

(book:Geographic Information Analysis)

Depending on the variance in the dataset a corresponding semivarigram shape will be generated. The figure given below shows potential semi-variograms that
represents the spatial variation of a field (e.g. a digital elevation over a region) reprensented by a profile. Please also observe the change of the Range, Sill and
Nugget in the resulting variograms. For example, in the last figure where we have a very fast varying spatial structure in the dataset we have a very small Range
whereas in the first figure we have a very smooth surface and the resulting semi-variogram has a longer Range.

(book: Geographic Information Analysis)

In [22]: fig, _a = plt.subplots(2,3, figsize=(18, 10), sharex=True, sharey=True)


axes = _a.flatten()
for i, model in enumerate(('spherical', 'exponential', 'gaussian', 'matern', 'stable', 'cubic')):
V1.model = model
V1.plot(axes=axes[i], hist=False, show=False)
axes[i].set_title('Model: %s; RMSE: %.2f' % (model, V1.rmse))

Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.

/Users/muratd/Programs/tflow/lib/python3.7/site-packages/skgstat/models.py:17: RuntimeWarning: invalid value encounte


red in double_scalars
return func(*args, **kwargs)
Ordinary Kriging
Having defined the stationarity, isotropy, semi-variogram we may now define a geostatistics based interpolation technique called Ordinary Kriging. Having the
assumption of stationarity, we can assume that the field have a constant mean over the region.
𝑧=𝜇+𝜖

where 𝜇 is the constant mean over the region and 𝜖 is a spatially autocorrelated process. Then the following conditions may be assumed for this stationary
process.

Unbiasedness
𝐸{𝑧(𝑠) − 𝑧(𝑠 + 𝑑)} = 0

and variance between field values expressed by a semi-variogram model 𝛾(𝑑) .


𝑉 𝑎𝑟(𝑠(𝑠), 𝑧(𝑠 + 𝑑)) = 2𝛾(𝑑)

Then we may compute an interpolated value for a given location by a weighted sum of neighboring observations.
𝑧𝑠 =

𝑤𝑖 𝑧𝑖
𝑖

However, unlike our previous discussion of deterministic interpolation, the weights are chosen depending on location to satisfy both unbiasedness and
variance structure in the dataset. The weights that satisfy both of these conditions may be estimated pointwise by solving the following linear equation
𝐴𝑤 = 𝑏

where
⎡ 𝛾(𝑑11 ) ... 𝛾(𝑑1𝑛 1⎤
⎢ ⎥
⎢ 𝛾(𝑑21 ) ... 𝛾(𝑑2𝑛 1⎥
𝐴 = ⎢⎢ ⋮ ⋮ ⋮ ⋮⎥

⎢ 𝛾(𝑑 ) 1 ⎥⎥
⎢ 𝑛1 ... 𝛾(𝑑𝑛𝑛
⎣ 1 1 1 0⎦
and
𝑏 = [ 𝛾(𝑑1𝑝 ) ... 𝛾(𝑑𝑛𝑝 1]

with
𝑤 = [𝑤1 , 𝑤2 , 𝑤3 , . . . , 𝜆]

where 𝜆 is a lagrange multiplier. Actually the equation system estimates weights such that the sum of weights is equal to 1 for unbiasedness, ∑𝑖 𝑤𝑖 = 1 and
the variance structure of the interpolated point is similar to the variance structure of the data. There are different variants of Kriging with different assumptions
of the mean in the literature. For example the Universal Kriging assumes the mean is a trend surface.

In addition we can also estimate an error map for our predictions by


𝜎𝑝2 = 𝑤𝑇 𝑏

In [23]: def interpolate(V, ax, grid_e, grid_n):


xx, yy = np.meshgrid(grid_e, grid_n)
print (xx.min(), xx.max())
ok = OrdinaryKriging(V, min_points=5, max_points=15, mode='exact')
field = ok.transform(xx.ravel(), yy.ravel())

field = field.reshape(len(grid_e),len(grid_n))

art = ax.imshow(field, extent=(p1[0],p2[0],p1[1],p2[1]), cmap='hot')


ax.set_title('%s model' % V.model.__name__)
plt.colorbar(art, ax=ax)
return field,ok.sigma.reshape(len(grid_e),len(grid_n))

print (dted.shape)

grid_e = []
grid_n = []
## Calculate map coordinates for grid points
dx = 4
dy = 4
for row in range(0,400//dy,):
p = dataset.xy(row*dy, 0)
grid_n.append(p[1])
for col in range(0,400//dx):
p = dataset.xy(0, col*dx)
grid_e.append(p[0])

## Actually it is simpler to use a meshgrid to create this.

grid_e = np.array(grid_e)
grid_n = np.array(grid_n)

print (grid_e.shape, grid_n.shape, grid_e.min(), grid_n.min(), grid_e.max(), grid_n.max())

fields = []
fig, ax = plt.subplots(1,1, figsize=(18, 12), sharex=True, sharey=True)
V1.model = 'exponential'
field, sigma = interpolate(V1, ax, grid_e, grid_n)

(451, 442)
(100,) (100,) 3840621.9500124436 4849999.262507983 3852867.071867647 4865859.425142129
3840621.9500124436 3852867.071867647

In [24]: plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(field, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()

Out[24]: <matplotlib.colorbar.Colorbar at 0x13fb4d450>

In [25]: plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(sigma, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.colorbar()

Out[25]: <matplotlib.colorbar.Colorbar at 0x13f9d0710>

General steps to apply Ordinary Kriging


Produce a description variance in the dataset. Check for isotropy! You may check for isotropy by plotting the variances not only looking at distance bins
but also for direction bins of Δ𝜃 . In case of anisotropy you may also want to use 𝜃 as a variable for representing variation.
Fit a mathematical model to the spatial variation to represent the semi-variogram. For example spherical or exponential.
create an epthy grid of the study region.
For every grid location, use the semi-variogram model and Kriging equations to obtain optimal weights.
For every grid location, using optimal weigts calculate interpolated field value.
For every grid location, using optimal weights calculate error for prediction.
Plot the grid as a raster representation of the field

You might also like