Part III - Geographic Information and Spatial Data Types
Part III - Geographic Information and Spatial Data Types
1
Introduction
• Geographic phenomena exist in the real world: for true examples, one
has to look outside the window.
• In using GIS software, we first obtain some computer representations
of these phenomena—stored in memory, in bits and bytes—as
faithfully as possible.
• The representation of the real world phenomena in GIS or any other
system is called modelling.
• Reality is so complex that one can never succeed in representing it in
every detail on a map or in a GIS.
• As a result one has to work with a spatial model (simplification) of
reality.
2
…cont’d
• The model allows us to operate on the model instead of the real
world.
• Using models, it is possible to test what happens under various
conditions.
• What if questions can also be answered.
• Modelling is the process of producing an abstraction of the ‘real
world’ so that some part of it can be more easily handled.
3
2. Geographic phenomena
• 2.1. Definition of geographic phenomena
• We might define a geographic phenomenon as something of interest that:
• can be named or described,
• can be georeferenced, and
• can be assigned a time (interval) at which it is/was present.
• What the relevant phenomena are for one’s current use of GIS depends entirely
on the objectives that one has.
• For instance, in water management, the objects of study can be river basins,
agro-ecologic units, measurements of actual evapotranspiration,
meteorological data, ground water levels, irrigation levels, water budgets and
measurements of total water use. Observe that all of these can be
named/described, georeferenced and provided with a time interval at which
each exists.
4
2.1. Definition of geographic phenomena
5
2.1.2 Different types of geographic phenomena
• Geographic phenomena was necessarily abstract, and therefore
perhaps somewhat difficult to grasp.
• The main reason for this is that geographic phenomena come in so
many different ‘flavours’. We will now try to categorize the different
‘flavours’ of geographic phenomena.
• In order to be able to represent a geographic phenomenon in a GIS, it
requires us to state what it is, and where it is.
• We must provide a description or at least a name on the one hand,
and a georeference on the other hand.
• Some phenomena manifest themselves essentially everywhere in the
study area, while others only do so in certain localities.
• Hence, we can classify geographic phenomena as geographic field
and geographic object. 6
2.1.2 Different types of geographic
phenomena
• If we define our study area as the equatorial Pacific Ocean, we can say that
Sea Surface Temperature can be measured anywhere in the study area.
Therefore, it is a typical example of a (geographic) field.
8
2.1.3 Geographic fields
• A field is a geographic phenomenon that has a value ‘everywhere’ in the
study space.
• We can therefore think of a field f as a function from any position in the
study space to the domain of values of the field. If (x, y) is a position in
the study area then f(x, y) stands for the value of the field f at locality (x,
y).
• Fields can be discrete or continuous, and if they are continuous, they can
even be differentiable.
• In a continuous field, the underlying function is assumed to be
continuous, such as is the case for temperature, barometric pressure or
elevation.
9
2.1.3 Geographic fields
• Continuity means that all changes in field values are gradual.
• A continuous field can even be differentiable.
• In a differentiable field we can determine a measure of change (in the
field value) per unit of distance anywhere and in any direction.
• If the field is elevation, this measure would be slope, i.e., the change
of elevation per metre distance; if the field is soil salinity, it would be
salinity gradient, i.e., the change of salinity per metre distance.
10
2.1.3 Geographic fields
• There are many variations of non-continuous fields, the simplest
example being elevation in a study area with perfectly vertical cliffs.
• At the cliffs there is a sudden change in elevation values.
• An important class of non-continuous fields are the discrete fields.
• Discrete fields cut up the study space in mutually exclusive, bounded
parts, with all locations in one part having the same field value.
• Typical examples are land classifications, for instance, using either
geological classes, soil type, land use type, crop type or natural
vegetation type.
11
2.1.3 Geographic fields
• One may note that discrete fields are a step from continuous fields
towards geographic objects: discrete fields as well as objects make
use of ‘bounded’ features.
• Observe, however, that a discrete field still assigns a value to every
location in the study area, something that is not typical of geographic
objects.
• A field-based model consists of a finite collection of geographic fields:
we may be interested in elevation, barometric pressure, mean annual
rainfall, and maximum daily evapotranspiration, and thus use four
different fields.
12
…cont’d
– It is mostly applied to describe phenomena that
are characterized by a continuous spatial variation
(e.g. temperature, elevation,...).
– Attribute values are often sampled for an irregular
set of locations. The characteristics of the
sampling (sampling scheme, sampling density)
primarily depend on the sampling technique, the
expected spatial correlation between attribute
values and the accuracy required.
– Measured values for an irregular set of locations
are often transformed into a regular grid by means
of an interpolation method.
13
What is interpolation?
• Interpolation is the estimation of surface values at
unsampled points based on known surface values of
surrounding points.
• It is a technique of combining sampled values and
positions to estimate values at unmeasured
locations.
• Interpolation can be used to estimate elevation,
rainfall, temperature, chemical dispersion, or other
spatially-based phenomena.
• Interpolation is commonly a raster operation, but it
can also be done in a vector environment using a TIN
surface model.
14
…cont’d
15
Inverse Distance Weighted
Interpolation (IDW)
•Accounting
• for “vicinity/nearness” by
(1) selecting points within a Kernel radius or
(2) a fixed number of “near” points (known
points)
•
•“Contribution of a point is the more decreased
the more distant it is from the unmeasured
location”
•
•Weight of each sample point is the inverse
proportion to the distance
•
•This is an exact interpolator
where d = 0 surface takes the value of the data
point
16
…cont’d
17
IDW Computation
Zi
Zj - estimated value for the unknown point at
•
i d ijn
location j
Zj =
• dij - distance between known point i and
unknown point j
1
n
• Zi - is the value at known point i
i
d ij
• n - user-defined exponent for weighting
• Fixed number of points normally
18
Characteristics of IDW
•Exact interpolator
•Interpolated values equal Zi
sample point values at the
sample locations dijn
Zj = = Zi
•Reduction of the formula at 1
sample point locations dijn
19
2.1.4 Geographic objects
• When the geographic phenomenon is not present everywhere in the
study area, but somehow ‘sparsely’ populates it, we look at it in terms
of geographic objects.
• Such objects are usually easily distinguished and named.
• Their position in space is determined by a combination of one or more
of the following parameters:
– location (where is it?),
– shape (what form is it?),
– size (how big is it?), and
– orientation (in which direction is it facing?).
20
2.1.4 Geographic objects
• Objects are defined on the terrain. For each object geometric as well as
thematic characteristics are determined.
– For each object class one has to decide how objects belonging to that
class will be spatially represented : by means of points, lines or areas.
The choice depends on:
• the nature of the objects;
• the scale one is working on;
• the purpose of the analysis.
21
2.1.4 Geographic objects
• The method is mostly used to represent phenomena that occur on
the earth’s surface as a collection of objects with clear boundaries, in
other words phenomena that are characterised by a discrete spatial
variation (e.g. parcels, houses, roads,...).
– In the treatment and interpretation of data one will often move
from a data model, based on a field approach, to a data model
based on objects (model transformation):
• Production of a soil map based on soil profile analysis
• Interpretation of land use or vegetation characteristics, based
on aerial photographs
• Construction of a DEM based on triangulation (see further) 22
Visual interpretation of land use, based on a SPOT satellite image
23
Digital representation of a spatial data model
Creation of a raster
26
…cont’d
Rasterizing error for central point (c, d) and dominant unit rasterizing (e, f)
27
Vector Data
• The Vector data model is the most popular ways to
store geographic data.
• This model is ideally suited for representing discrete
objects.
• The Vector data model can only accurately represent
discrete objects.
• It uses points and edges to represent three basic
types of spatial features: points, lines, and polygons.
• All of these types are capable of storing attribute
data about the particular feature they represent.
28
Point Data
• Point data are data that can be represented as a single
location on a map.
• Point data can be used to represent house locations on
a street.
• This is by far the simplest data type and is very good for
storing data when all we are concerned with is the
location of a feature and not its length or width.
• Point data are zero dimensional and have no width,
length or height. 29
Line Data
• Features that have a location, a length, but no width are
represented in the vector model by lines.
• Examples of some features well represented by lines are
contours, administrative boundaries, roads, rivers, and
sewers.
• It is important to mention that while some line features
such as rivers and roads, may have an area, we use lines to
represent them at scales where their width cannot be
accurately reflected.
• Line features are stored using a collection of points called
nodes and vertices which each have their own unique
coordinate pair.
• Nodes are the endpoints of a line while vertices are
intermediate points located between the two end points 30
…cont’d
Polygon Data
Many more features we may wish to represent in a GIS are going
to have a width and an area associated with them.
32
…cont’d
• When choosing the resolution of the raster a proper
balance has to be achieved between the level of
detail of the terrain description and the amount of
data one will have to treat.
• Each cell in a raster can only have one attribute
value, corresponding to a particular theme
(attribute).
• To combine different themes one has to define
several rasters, one raster for each selected attribute.
• This leads to a typical layer structure, where each
layer corresponds to a single theme. This layer
structure is characteristic for geographical
information systems that are based on the raster
model. 33
…cont’d
• The raster model makes it computationally easy to
combine several themes that are represented by
different layers (overlay analysis).
• This explains the initial success of the model with
planners and landscape architects, who started
experimenting with it in the early ’60.
• They invented a technique which is presently known
as map algebra, and which is essential to spatial
analysis in a raster environment (see further). 34
…cont’d
• In the vector approach spatial structures are
represented by means of objects. The objects are
described by three types of data:
– A unique identification code (ID), which allows
each object in the database to be identified (name,
number).
– A set of thematic characteristics (attributes) that
are linked to a specific object class.
– The geometry of the objects.
• The geometry of the objects is defined by means of a
number of so-called graphic primitives (geometric
building elements): points, nodes, segments, chains,
polygons. 35
…cont’d
• Different vector models are used to represent object
geometry, depending on the GIS-software one works
with. The vector data stored in the GIS software in two
ways: spaghetti model or ring model and the
topological model:
– Spaghetti model or ring model:
• All point, line and area objects are represented
by separate geometric elements, without explicit
definition of topology.
• This leads to data redundancy and complicates
editing work (risk for inconsistencies).
• All spatial relations need to be analyzed “on the
fly”. 36
Topological model:
39
Topological model
• GIS analysis answers many questions:
• Where is it?
• What is it next to?
• Is it inside or outside
• How far is it from something else
• The mathematical terms for these answers is:
• Where is it? (location)
• What is it next to (adjacency)
• Is it inside or outside (containment)
• How far is it (connectivity)
• Topology represents the structuring of coordinate data which
clearly describes adjacency, containment, and connectivity.
40
Kinds of topological relationships
41
Ring model 42
43
Topological model 44
45
46
Raster Data
• A raster model uses a grid of square cells to
store spatial data.
• The most common rasters show up as images
in web pages, and computer graphics.
• A raster is defined by:
– The co-ordinates of its origin
– The resolution (size) of the cells
– The dimension of the raster: number of columns
(x-direction) and rows (y-direction)
47
…cont’d
• When choosing the resolution of the raster a proper
balance has to be achieved between the level of
detail of the terrain description and the amount of
data one will have to treat.
• Each cell in a raster can only have one attribute
value, corresponding to a particular theme
(attribute).
• To combine different themes one has to define
several rasters, one raster for each selected attribute.
• This leads to a typical layer structure, where each
layer corresponds to a single theme. This layer
structure is characteristic for geographical
information systems that are based on the raster
model. 48
…cont’d
• The raster model makes it computationally easy to
combine several themes that are represented by
different layers (overlay analysis).
• This explains the initial success of the model with
planners and landscape architects, who started
experimenting with it in the early ’60.
• They invented a technique which is presently known
as map algebra, and which is essential to spatial
analysis in a raster environment (see further). 49
…cont’d
50
Basic idea of map algebra
51
Vector Data Analysis
52
…cont’d
• The term “Data Analysis” is used here to describe the
collection of methods, techniques and approaches to
extract meaningful information from sets of data,
represented in geospatial form in modern GIS packages.
• In other words, the role of analysis in GIS is to turn data
into information and create new data by manipulating
collected data.
• Spatial Analysis has several levels of sophistication:
manipulation, queries, statistics and modelling.
• Spatial data manipulation is one of the classic GIS
capabilities. This includes spatial queries and
measurements, buffering and map layer overlay.
53
Vector data properties
• Vector analysis is based on vector data properties:
geometry and structure.
• Vector data models use mathematical primitives
(points and their x- and y-coordinates) to construct
fundamental geometric spatial features such as points,
lines and polygons.
• Polygons evolve from point and line geometric
primitives which compose its boundary using three line
segments as a minimum. 54
…cont’d
• The length of these lines defines the perimeter and
the area of the polygon.
• It is important to mention here that as a geospatial
feature, polygons have attributes which allow their
identification and manipulation.
• The location of a polygon in any given space is
defined by its centroid.
55
…cont’d
• Basic vector analysis is primarily based on proximity
operations and tools that are used to implement the
following fundamental spatial concepts:
– Buffering
– Overlay
– Distance measurement
– Pattern analysis
– Map manipulation
56
Buffering
• Buffering creates new polygons by expanding or
shrinking existing polygons or by creating polygons
from points and lines.
• Buffers are based on the concept of distance from
the neighbouring features.
• Buffers are generated for spatial analysis to address
proximity, connectivity and adjacency of features in a
geospatial place.
• A buffer is a spatial zone around a point, line or
polygon feature.
57
Figure. Point, line and polygon (area) buffers
58
…cont’d
• There are many variations of buffers. The shape and
size of buffers can be defined by variable distance
(distance based on a feature’s attribute), buffers can
be defined by multiple zones and can have dissolved
or merged boundaries.
• How does a buffer process work? Buffer processes
use mathematical algorithms to identify the space
around a selected landscape feature.
• First, features are selected for buffering through a
variety of selection processes. Then a buffer distance
is specified.
59
Figure . Variations of buffering
60
…cont’d
65
…cont’d
• Map Algebra
– Operand: rasters
– Operations: local, focal, zonal and global
• Image Algebra
– Operand: images
– Operations: crop, zoom, rotate
• There are four types of raster operations.
– Local: only those pixels that overlap a particular
pixel are used to calculate that pixel’s value (must
have multiple input rasters).
– Focal: all pixels in a predetermined neighbourhood
are used to calculate a pixel’s value.
66
…cont’d
– Zonal: use zones defined in one layer to make
calculations on another (variable shaped and sized
neighbourhoods).
– Global: all cells in a raster are used as inputs to
calculate the value of a single pixel.
67
Figure. Four types of raster operations
68
Local operations
• Local operations:
– Perform calculation on single cell at a time
– Surrounding cells do not affect the calculation
– Can be applied to one raster layer or several
Figure.The logarithmic function ‘Exp’ (base e). Syntax: ouput raster = Exp(Inlayer1).
69
Local operation……. cont’d
70
Local operation…cont’d
71
Local operations….cont’d
• A distinction can be made between local operations that work with one input
layer, and local operations that work with several input layers. A few
examples:
– On one layer:
• Recoding of values (reclass, assign, recode)
• Arithmetic operations on one layer (e.g. calculating the square, dividing
cell values by a constant factor)
– On several layers:
• Combining values (each unique combination of values in the input
layers gets a unique value in the output layer) (crosstab, combine)
• Arithmetic operations on several layers = typical overlay (add, subtract,
multiply, ratio, minimize, maximize,...)
72
Focal operations:
Perform calculation on a
single cell and its
neighbouring cells.
Also known as local
neighbourhood functions
73
…cont’d
74
…cont’d
Zonal operations:
• A zone is where all cells in a raster have the same value,
regardless of whether or not they are contiguous. A zone is
defined as a collection of cells within one layer that all have
the same attribute value.
• A zone represents one thematic class, for example ‘buildings’
of thematic raster dataset ‘land use’.
• Zonal functions are similar to focal functions except that the
definition of the neighborhood in a zonal function is the
configuration of the zones of the input dataset and not a
specified neighborhood shape.
– Perform a calculation on a zone, which is a set of cells with a
common value
– Cells in a zone can be discontinuous 75
…cont’d
• Zonal operations calculate a new value for a location
based on a specific characteristic of the zone to
which the location belongs. Some examples:
– Calculation of the area or the perimeter of a zone
(area, count / perim)
– Calculation of a summary value for a zone for a
specific attribute, based on an extra layer that
contains local values for the attribute (total,
average, standard deviation, minimum,...) (extract,
score)
76
Global operations
• Global functions compute an output raster dataset in which the output
value at each cell location is potentially a function of the cells combined
from the various input raster datasets.
• There are two main groups of global functions: the Euclidean and cost
(or weighted distance) functions
77
78
…cont’d
79
Quiz
• Suppose you wish to produce a final product that
shows those areas with slopes greater than 20
degrees.
– What data are necessary to produce such a map?
– Show the procedures to reach to the final product.
• Suppose you wish to compute the percentage change
in the forest coverage of Ethiopia between 1950 and
2010.
– What data are necessary to execute the task?
– Show the procedures to reach to the final product.
80
Cartographic modelling
• Logically combining local, focal and zonal operations, in
such a way that the output of one operation becomes
the input of another one, relatively complex spatial
problems can be analyzed.
• A specific sequence of operations which allows one to
solve a particular spatial problem is called a cartographic
model.
• The process itself, which consists of defining a flow
chart, is called cartographic modelling.
• Cartographic modelling is often applied in projects
related to land evaluation and land allocation, where the
objective mostly is to define an optimal use of space,
based on multi-criteria analysis. 81
Implementing a cartographic model
1. Identify the map layers or spatial data sets which are
required.
2. Use logic and natural language to develop the process
of moving from the available data to a solution.
3. Set up a flow chart with steps to graphically represent
the above process. In the context of map algebra this
flow chart represents a series of equations you must
solve in order to produce the solution.
4. Annotate this flow chart with the commands necessary
to perform these operations within the GIS you are
using.
82
Cartographic modelling (flow chart)
83
…cont’d
• To explore cartographic modelling stages let us
consider a supermarket siting example. We can
complete stage one of the cartographic modelling
process by identifying four data layers:-
• land_use
• site_status
• river_map
• roads_map
• Stage two is completed by describing, in natural
language, a scheme of spatial operations required to
identify potential sites for the supermarket.
84
…cont’d
85
…cont’d
91
…cont’d
– Constraints (restrictions): define the locations where a
certain objective cannot be reached.
• Examples:
– Location above a certain slope gradient are not
suitable for development (Boolean criterion)
– A zone should have an area of at least 20ha to be
suited for development or exploitation (goal, target)
– The Exclusion from development of areas designated
as wildlife reserves.
• Evaluating the suitability of a location based on a set of
criteria is done based on one or more decision rules,
which rely on the scores of a location for each of the
criteria.
92
…cont’d
• If only constraints are applied, use will be made of a
Boolean “AND” (multiply).
• If factors are used, then the evaluation will mostly be
based on a combined suitability index that measures
the effect of different criteria. Such an index can be
defined in different ways:
– Determination of the maximal or minimal factor
score (worst-case scenario): the most restrictive
factor determines the result
– Calculation of a weighted linear combination of
factor scores (trade-off scenario):
93
…cont’d
• For determining the factor weights one often uses
the method of Saaty (Saaty, 1977).
• The method is based on a pairwise estimation of the
relative importance of the different factors based on
a scale with 9 classes.
• From the obtained matrix an optimal set of weights is
derived (first eigenvector of the matrix).
• The consistency of pairwise comparisons can be
evaluated by means of an overall consistency ratio,
which according to Saaty should be smaller than 0.10
(Saaty, 1977).
• It is also possible to identify specific inconsistencies
in the matrix. 94
…cont’d
• A global suitability map, based on a weighted
combination of factor scores, can be recoded into a
map with qualitative suitability classes and/or,
through the definition of a threshold value (often
based on the area that should be allocated), can be
transformed into a Boolean suitability map.
• The method makes it possible to deal with different
priorities (ecological, social, economical,...) and also
allows one to study the impact of assigning more or
less weight to a particular criterion on the outcome
of the allocation process.
95
…cont’d
96
…cont’d
• When dealing with spatially conflicting objectives
(when one location fits several objectives) two
approaches for allocation are possible:
– Hierarchical approach: objective 1 has priority
over objective 2.
• Iterative increase/decrease of the threshold
value for the suitability map of objective 1 until
sufficient area is allocated to objective 1
• Idem for objective 2 in the remaining area
– Conflict approach: looking for a compromise
based on a decision heuristic
97
…cont’d
• Conflict approach:
– Identification of the best x hectares for objective 1 and y
hectares for objective 2 based on the two suitability maps
– Partitioning of the conflict areas based on distance in the
decision space to the ideal conditions for both objectives
– Iterative decrease of the threshold values for both
suitability maps and repeating of the allocation process
until the required area for each type of land use is
obtained
• Method allows to give more weight to one of both objectives
• Correct application of the procedure requires the “ranking”
(histogram equalization) of both suitability maps
98
…cont’d
100
Advantages …cont’d
• Flexible method that allows easy testing of “what if...?”
scenarios (e.g. by modifying the content of one or
more input layers or by changing some of the model
parameters)
• Further refinement/expansion of an existing model is
easy (by adding extra input layers and/or relations)
Disadvantages of cartographic modelling
• Strong deterministic assumptions of the method,
especially if only constraints are used (Boolean overlay)
• Recently a lot of research has been carried out to
define techniques that allow us to quantify the impact
of errors and uncertainties in input data and model
parameters on the outcome of the analysis.
101