Field Analysis
Field Analysis
where 𝑧 is the attribute value, 𝑥 and 𝑦 are coordinates inside the region. Thus, 𝑓(𝑥, 𝑦) represents the variation of the attribute value depending on location
over the region.
Vector fields are represented by a vector valued function depending on location. For example the wind dataset from meteorology may be represented by a
vector field where both speed and direction are represented by a vector quantity.
A Matrix of Attribute Values For example, the height values over a region are represented by a matrix where the region is represented as a grid of equal
area quads (generally known as Digital Elevation Model).
Triangulated Irregular Network Every triangle represents a facet of the surface. The vertices of the TIN are generally selected to represent characteristic
points (Very Important Points) of the surface such as peaks or valleys.
Digitized Contours set of level curves of constant attribute values.
Mathematical Models The variation of attribute values are represented by a function.
Point Clouds The surface is represented by a dense point cloud data mostly acquired by a scanning device like LIDAR or by Photogrammetric methods.
In order to derive a field representation of an attribute over a region, we need to collect measurement of attribute values over the field with known locations.
Due to many practical reasons (can you mention a few?) we may not be able to collect measurements everywhere in the field.
Fields may also have time dependency, thus the samples from different times may not be used together.
Most of the time we cannot control the data collection procedure.
Thus, we need tools and procedures to obtain continuous field representation from available field observations.
Generally, interpolated values are equal to observed field values at locations where samples are collected (interpolation passes through sample values at
observation locations).
Some interpolation techniques (e.g Least Squares Interpolation) results in interpolators which may not pass through sample points, where we get an
approximate value.
Proximity Polygons
The most general and straightforward method for deterministic interpolation is the nearest neighbour interpolator which assigns the field value for a location the
value of the nearest observation. For this problem the general method is to construct a proximity polygon mesh over the region using the observed locations
and assign the observed attribute valueto each polygonal region.
1
power of inverse distances 𝑤𝑖,𝑗 = .
𝑑𝑧𝑖,𝑗
exponentials 𝑤𝑖,𝑗 = 𝑒−𝑘𝑑𝑖,𝑗 .
In addition, nearest neighbor, bilinear or cubic spline interpolations are commonly used for up and down-sampling of raster datasets.
(wikipedia)
Slope
The slope of a surface is the maximum rate of change in the attribute value (for example elevation).
Δ𝑧
tan(𝜃) =
||Δ𝑥, Δ𝑦||
It can also be specified by a gradient vector which is partial derivatives of surface with respect to 𝑥 and 𝑦 .
∂𝑧
[ ]
∂𝑥
∇𝑧 = ∂𝑧
∂𝑦
A numerical approximation of the partial derivatives can be obtained when the field is represented as a regular grid by
print (dted.shape)
print (p1, p2)
EPSG:3857
(451, 442)
(3840621.9500124436, 4847796.4621421285) (3854289.4850124437, 4865859.425142129)
In [3]: plt.imshow(dted,cmap='cool')
plt.colorbar()
1060 1556
plt.imshow(partial_x)
plt.imshow(partial_y)
aspect[aspect<0] +=(2*np.pi)
print (aspect.min(), aspect.max())
plt.imshow(aspect,cmap='hot')
plt.colorbar()
0.0 6.27705041482171
plt.imshow(np.degrees(slope), cmap='hot')
plt.colorbar()
sun_elevation = np.radians(45)
sun_azimuth = np.radians(315)
shade = np.cos(sun_azimuth-aspect)*np.sin(slope)*np.cos(sun_elevation)
+ np.cos(slope)*np.sin(sun_elevation)
plt.figure(figsize=(10,10))
plt.imshow(shade,cmap='Greys_r')
plt.colorbar()
plt.figure(figsize=(10,10))
plt.imshow(dted,cmap='gist_earth')
plt.colorbar()
plt.imshow(shade,cmap='Greys_r', alpha=.40)
sun_elevation = np.radians(45)
sun_azimuth = np.radians(315)
shade = np.cos(sun_azimuth-aspect)*np.sin(slope)*np.cos(sun_elevation)
+ np.cos(slope)*np.sin(sun_elevation)
plt.figure(figsize=(10,10))
plt.imshow(dted,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='gist_earth')
plt.colorbar()
plt.imshow(shade,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='Greys_r', alpha=.40)
X,Y = np.meshgrid(X,Y)
where we will select easting by a uniform random sampling along the easting. A similar procedure will be applied for northing. Then we will use the map to get
the height values.
## Lets plot the height values and associated measurements from the field
plt.figure(figsize=(10,10))
plt.imshow(dted,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='gist_earth')
plt.imshow(shade,extent=(p1[0],p2[0],p1[1],p2[1]),cmap='Greys_r', alpha=.40)
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
import scipy.spatial
(100, 2)
## For each pixel in the new dataset we will search for k nearest neighbors and inverse distance weight
plt.figure(figsize=(10,10))
plt.imshow(knn_raster, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
In using the k the nearest neighbors, when we increase the number of neighbors used (𝑘 ), we will get smoother and smoother fields. Actually if 𝑘 = 𝑁 , we use
all data in the dataset to local average which will be equal to the global average of the data.
## For each pixel in the new dataset we will search for k nearest neighbors and take average
height = np.sum(h_neighbors)/len(neighbors)
knn_raster_avg[row,col] = int(height)
plt.figure(figsize=(10,10))
plt.imshow(knn_raster_avg, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
voronoi = scipy.spatial.Voronoi(data_points)
knn_raster_voronoi = np.zeros_like(dted)
k = 1 # first nearest neighbor
## For each pixel in the new dataset we will search for k nearest neighbors and take average
knn_raster_voronoi[row,col] = int(h_neighbor)
plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(knn_raster_voronoi, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
Out[16]:
Trend surface analysis is a kind of multivariate regression analysis where the independent variables are locations and the dependent variable is the field value.
In practice, we may also add attribute values are independent variables to the multivariate regression analysis for a better field representation.
For a given location 𝑠𝑗 = (𝑥𝑗 , 𝑦𝑗 ) , the field is represented as a mathematical function of location as
𝑓(𝑠𝑗 )
The measurements collected from field are assumed to follow the following measurement equation
𝑧𝑗 = 𝑓(𝑠𝑗 ) + 𝑒𝑗
Trend surface is generally expressed as a collection of basis functions with associated unknown parameters.
𝑀
∑
𝑓(𝑠) = 𝛽𝑖 𝑔𝑖 (𝑠)
𝑖
where M is the number of basis functions in the model. The basis functions 𝑔𝑖 are generally selected depending on the field characteristics. Mostly used basis
function set is the polynomial basis which may be represented as
𝑔𝑖 (𝑠) = 𝑥𝑚(𝑖) 𝑦𝑛(𝑖) ,
where 𝑚(𝑖) and 𝑛(𝑖) represents the degree of the 𝑖𝑡ℎ term. For example for a first degree polynomial
𝑓(𝑠) = 𝛽0,0 𝑥0 𝑦0 + 𝛽1,0 𝑥1 𝑦0 + 𝛽0,1 𝑥0 𝑦1 + 𝛽1,1 𝑥1 𝑦1
or
𝑦 = 𝑋𝜷 + 𝑒
In addition, we can also construct an estimate of measurement error variance assuming that the measurement errors are normally distributed.
^2 (𝑦 − 𝑋 𝜷 )̂ 𝑇 (𝑦 − 𝑋 𝜷 )̂ 𝑒𝑇 𝑒
𝜎 = =
𝑁−𝑀 𝑁−𝑀
where 𝑁 is the number of measurements, and 𝑀 is the number of parameters. Additionally, variance and covariance of parameters can be estimated as
^
𝐷(𝛽) = 𝜎 2 (𝑋 𝑇 𝑋)−1
Using the estimated paramters 𝛽̂ we can approximate field values given location (𝑥, 𝑦) by using the mathematical expression of our field representation as
^ ^ ^ ^
ℎ = 𝑓(𝑥, 𝑦) = 𝛽0,0 𝑥0 𝑦0 + 𝛽1,0 𝑥1 𝑦0 + 𝛽0,1 𝑥0 𝑦1 + 𝛽1,1 𝑥1 𝑦1
Generally we will calculate the height value for interpolating a regular grid as we did with previous weighted average interpolation techniques. In this case we
can construct a new 𝑋 matrix that contains each pixel in the grid and use
ℎ𝑔𝑟𝑖𝑑 = 𝑋 𝑔𝑟𝑖𝑑 𝜷̂
However, note that our estimated parameters are also random variables having mean 𝛽̂ and variance 𝐷(𝛽) . Thus the predicted height values will also have
some uncertainity associated with them. Using the error propagation law we can calculate the uncertainity in the calculated height values as
𝐷(ℎ𝑔𝑟𝑖𝑑 ) = 𝑋 𝑔 𝑟 𝑖 𝑑 𝐷(𝛽̂)𝑋 𝑔𝑇𝑟 𝑖 𝑑
We can use the diagonals of the 𝐷(ℎ𝑔𝑟𝑖𝑑 ) as the estimated uncertainity in the grid values. This will give a kind of error map which shows the uncertainity in the
approxiated height value.
Y = heights
x = eastings
y = northings
## Important notice, using large x and y values may reveal ugly numerical problems with powers or multiplications
## Generally x and y values are scaled between -1 and 1 or 0 and 1.
X = np.vstack(
(
np.ones_like(heights),
x,
y,
x*y
# ,
# x**2,
# x**3,
# y**2,
# y**3,
# x**2 * y,
# y**2*x
)
).T
betas = np.linalg.lstsq(X,Y,rcond=None)[0]
residuals = Y - np.dot(X,betas)
var = np.sum(residuals**2)/(len(Y)-4)
var_beta = var*np.linalg.inv(np.dot(X.T,X))
print ('measurement variance ', var, 'Estimated parameters ', betas, ' And associated uncertainity ', np.sqrt(np.diag(
var_beta)))
ts_raster = np.zeros_like(dted)
k = 1 # five nearest neighbor
## For each pixel in the new dataset we will search for k nearest neighbors and take average
grid_e = []
grid_n = []
## Calculate map coordinates for grid points
for row in range(ts_raster.shape[0]):
for col in range(ts_raster.shape[1]):
p = dataset.xy(row, col)
grid_e.append(p[0])
grid_n.append(p[1])
grid_e = np.array(grid_e)
grid_n = np.array(grid_n)
X_grid = np.vstack(
(
np.ones_like(grid_e_norm),
grid_e_norm,
grid_n_norm,
grid_e_norm*grid_n_norm
# ,
# grid_e**2,
# grid_e**3,
# grid_n**2,
# grid_n**3,
# grid_e**2 * grid_n,
# grid_n**2 * grid_e
)
).T
print (X_grid.shape)
h_grid = np.dot(X_grid,betas)
ts_raster[:] = h_grid.reshape(ts_raster.shape[0],ts_raster.shape[1])
plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(ts_raster, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
(199342,) (199342,)
(199342, 4)
In [19]: #We can also contruct an error map according to error propagation
d_h = np.sqrt(
np.dot(X_grid**2, np.diag(var_beta))
)
plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(error_map, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=residuals, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
In [30]: ## We can also use more complicated basis expansions for interpolation
## Here we use radial basis functions to interpolate the data
W = (heights.max() - heights.min())
Min = heights.min()
h = (heights - heights.min())/W
z = np.zeros_like(eastings)
z_grid = np.zeros_like(grid_e_norm)
d_rbf = d_rbf*W+Min
d_rbf = d_rbf.reshape(ts_raster.shape[0],ts_raster.shape[1])
plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(d_rbf, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
The choice of sampling locations is an important part of field survey. Most of the time we do not have control over the sampling locations. But if we have,
we need to take care of spatial variation over the study region. Thus, a prior knowledge of spatial auto-correlation of the surface in the study region may
help. Generally we need to take dense samples for those regions where spatial variation is high.
Contour lines of continuous surface representation where each contour line hold the constant field value.
A Mathematical expression of continuous field 𝑧𝑖 = 𝑓(𝑥𝑖 , 𝑦𝑖 ) .
Surface can be represented as a point cloud for example LIDAR data. In addition most of the digital elevation models are represented as a grid of sample
locations which is generally stored as raster data.
Triangulated Irregular Networks: a triangle mesh is created from control points to represent the surface. The nodes of the triangle are chosen wisely
(Delaunay Triangulation) and every triangle represent a facet of the surface.
Spatial Interpolation: Considering spatial-autocorrelation we may apply weighted averaging of nearby control points to obtain surface values at unknown
locations such that 𝑧𝑗 = ∑𝑀
𝑖 𝑤𝑗,𝑖 𝑧𝑖 .
Proximity Polygons: Use nearest neighbor to assign surface value to an unknown location.
Use k nearest neighbors (kNN): Use the local spatial average of k nearest neighbors.
1
Inverse Distance Weighting: Use a weighted local spatial spatial average with weights 𝑤𝑖,𝑗 ∼ , ehere 𝑝 is the power of the distance.
𝑑𝑝𝑖,𝑗
Use Kernel Density Estimators (KDE) which calculate the weight based on a kernel function for example Gaussian Kernel.
Use local regression such as GWR.
Use Least Squares to obtain a global surface representation model. We can use polynomial of coordinates, B-splines, Thin-plate Splines, Nurbs,
Spaherical Harmonics etc. to represent continuous fields and obtain coefficients of the associated basis functions.
Seperate some of the control points for validation (validation set appoach).
Randomly generate validation set K times (K- fold cross validation)
each time leave one measurement out and test the resulting surface with this seperated validation point. (Leave one out cross validation).
In adition, if we are using a trend surface analysis methods based on Least Squares, we may also produce an error map associated with out predicted
map.
Review Qestions
1) List which kind of derived information can be obtained from fields.
2) List methods for deterministic spatial interpolation. Which of them pass through samples (interpolates data) and which do not (approximates).
3) What is the effect of increasing the number of neighbors in knn or IDW interpolations.
4) What is the effect of sampling (observation locations) on the resulting interpolated fields?
5) Could you suggest a better sampling for the digitial elevation model generation problem above? Where would you collect points if you have been in the field?
6) Does your selection of points in Q5 depend on the statistics of field (mean and variations - geostatistical properties like variance)? Describe how?
where the matrix 𝑋 represents the deisng matrix given in previous sections. We have discussed various ways of assessing the quality of a least squares
estimate, such as p-values or 𝑅2 . However we still need to consider some important aspects of this analysis:
Is the mathematical model to represent the surface complete? Are we missing some important variations in the field values.
Does the residuals show another spatial signal? As you may recall from our discussions about spatial auto-correlation, If the residuals only contain
measurement error, then they will potentially be distributed IID (Independent and Identically) and they will not have spatial auto-correlation. If they have
some kind of spatial auto-correlation this may indicate that we are missing a spatial signal when representing the surface with this trend surface.
Another important concept with the above formulation is that the variation 𝑅 in the field values does not depend on the direction. Thus the auto correlation
structure is the same in all directions within the field. In this case we say that there is isotropy in the dataset. If there is a dependency between direction and
autocorrelation structure then it is an indicator of anisotropy.
But generally we do not know the auto-correlation structure of the field before obtaining measurements. Thus we need to construct from the measurements.
This may be established by a variogram analysis of the dataset.
The variogram (generally called experimental-variogram since depends on the data) is a scatter plot of variation in field values at a specific distance.
1
(𝑧𝑖 − 𝑧𝑗 )2
𝑛(𝑑) 𝑑∑
2𝛾(𝑑) =
=𝑑𝑖,𝑗
here 𝑑 is the distance, 𝑛(𝑑) is the number of pairs 𝑧𝑖 , 𝑧𝑗 which have a distance of 𝑑𝑖,𝑗 between each other. However you may directly observe that most of the
time we may not have enough points which have a distance 𝑑 . Thus generally, we choose bins of distance 𝑑 + Δ𝑑 , and choose pairs with distances that fall
into these bins.
1
(𝑧𝑖 − 𝑧𝑗 )2
∑
2𝛾(𝑑) =
𝑛(𝑑 ± Δ) 𝑑−Δ𝑑<𝑑𝑖,𝑗 <𝑑+Δ𝑑
Warning: 'harmonize' is deprecated and will be removedwith the next release. You can add a 'SKG_SUPPRESS' environment
variable to suppress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Using the semi-variances created from the dataset (blue dots in the figure) we can establish a mathematical model depending on distance for the variation in
the dataset which we call a semi-variograms models. The general shape of a semi-variogram may be given as:
(https://gisgeography.com/semi-variogram-nugget-range-sill/)
In the figure semi-variance is given in the vertical axis and distance lag is given in the horizontal axis. When a mathematical model is fit to the semi-variances
calculated from the dataset the general shape of the model may be represented by the following terms:
Sill: is the semi-variance value of the semi-variogram where the curve flattens.
Range: The distance at which the semi-varigram flattens. This extent can also be used to limit the neighborhood taken for averaging.
Nugget: The intercept (the point at which the semi-variogram model has the semi-variance at 0 distance). Actually we expect the variance to be zero at
zero distance lag. We almost always observe nugget since we are using distance binning and fitting a mathematical model to the observed semi-variances.
The mathematical models describing a semi-variogram depends on the dataset and its spatial variation. The following figure shows some of the mathematical
models used to represent semi-varigrams.
Depending on the variance in the dataset a corresponding semivarigram shape will be generated. The figure given below shows potential semi-variograms that
represents the spatial variation of a field (e.g. a digital elevation over a region) reprensented by a profile. Please also observe the change of the Range, Sill and
Nugget in the resulting variograms. For example, in the last figure where we have a very fast varying spatial structure in the dataset we have a very small Range
whereas in the first figure we have a very smooth surface and the resulting semi-variogram has a longer Range.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
Warning: compiled_model is deprecated and will be removed. Use Variogram.fitted_model instead. You can add an SKG_SUP
PRESS environment variable to supress this warning.
where 𝜇 is the constant mean over the region and 𝜖 is a spatially autocorrelated process. Then the following conditions may be assumed for this stationary
process.
Unbiasedness
𝐸{𝑧(𝑠) − 𝑧(𝑠 + 𝑑)} = 0
Then we may compute an interpolated value for a given location by a weighted sum of neighboring observations.
𝑧𝑠 =
∑
𝑤𝑖 𝑧𝑖
𝑖
However, unlike our previous discussion of deterministic interpolation, the weights are chosen depending on location to satisfy both unbiasedness and
variance structure in the dataset. The weights that satisfy both of these conditions may be estimated pointwise by solving the following linear equation
𝐴𝑤 = 𝑏
where
⎡ 𝛾(𝑑11 ) ... 𝛾(𝑑1𝑛 1⎤
⎢ ⎥
⎢ 𝛾(𝑑21 ) ... 𝛾(𝑑2𝑛 1⎥
𝐴 = ⎢⎢ ⋮ ⋮ ⋮ ⋮⎥
⎥
⎢ 𝛾(𝑑 ) 1 ⎥⎥
⎢ 𝑛1 ... 𝛾(𝑑𝑛𝑛
⎣ 1 1 1 0⎦
and
𝑏 = [ 𝛾(𝑑1𝑝 ) ... 𝛾(𝑑𝑛𝑝 1]
with
𝑤 = [𝑤1 , 𝑤2 , 𝑤3 , . . . , 𝜆]
where 𝜆 is a lagrange multiplier. Actually the equation system estimates weights such that the sum of weights is equal to 1 for unbiasedness, ∑𝑖 𝑤𝑖 = 1 and
the variance structure of the interpolated point is similar to the variance structure of the data. There are different variants of Kriging with different assumptions
of the mean in the literature. For example the Universal Kriging assumes the mean is a trend surface.
field = field.reshape(len(grid_e),len(grid_n))
print (dted.shape)
grid_e = []
grid_n = []
## Calculate map coordinates for grid points
dx = 4
dy = 4
for row in range(0,400//dy,):
p = dataset.xy(row*dy, 0)
grid_n.append(p[1])
for col in range(0,400//dx):
p = dataset.xy(0, col*dx)
grid_e.append(p[0])
grid_e = np.array(grid_e)
grid_n = np.array(grid_n)
fields = []
fig, ax = plt.subplots(1,1, figsize=(18, 12), sharex=True, sharey=True)
V1.model = 'exponential'
field, sigma = interpolate(V1, ax, grid_e, grid_n)
(451, 442)
(100,) (100,) 3840621.9500124436 4849999.262507983 3852867.071867647 4865859.425142129
3840621.9500124436 3852867.071867647
In [24]: plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(field, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.scatter(eastings, northings, s=np.array(heights)-1000, c=heights, cmap='hot',edgecolor='black')
plt.colorbar()
In [25]: plt.figure(figsize=(10,10))
ax = plt.subplot(111)
plt.imshow(sigma, extent=(p1[0],p2[0],p1[1],p2[1]),cmap='hot')
plt.colorbar()