0% found this document useful (0 votes)
18 views

Principles and Applications of GIS-1-1

Uploaded by

skjadrian287
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Principles and Applications of GIS-1-1

Uploaded by

skjadrian287
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

1

1.0 GEOGRAPHIC INFORMATION SYSTEM (GIS)


Background
Originally, Geographic Information (geoinformation) was obtained from hand drawn
maps and later computers were used in computer assisted cartography in 1960s and
1970s. The discipline that deals with all aspects of handling spatial data and
geoinformation is called Geographic Information Science.

Geographical Information Science was developed from several disciplines i.e.


Surveying, Cartography (map drawing), geography, Engineering, Remote sensing and
photogrammetry.

In a narrow sense, Geographical Information System is defined in terms of its


functions as a computerized system that facilitates phases of data entry, data
management, data analysis and data presentation specifically dealing with
georeferenced data.

Geographical Information Systems arose when it was realized that there was
deficiency in aiding prediction as humans were limited to showing things possible
happenings as they were and difficult to make decision.

1.1 Definitions
Geographical Information System
"…a system of computer software and procedures designed to support the capture,
management, manipulation, analysis, and display of spatially referenced data for
solving complex planning and management problems."

OR, GIS is a computerized system that facilitates the phases of data entry, data
analysis and data presentation especially in areas when we‟re dealing with geo-
referenced data. Geo-referenced data is the data showing the distribution of different
things in space on the earth‟s surface.

OR. A Geographic Information System (GIS) is a computer based system, designed to


collect, store, query, retrieve, manipulate, analyze and display data referenced by

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


2

their geographic locations.


1.1.1 CATEGORIES/CLASSIFICATION OF DATA IN A GIS
Data vs Information.
Data are representations that can be operated upon by a computer. More specifically,
by spatial data we mean data that contains positional values, such as (x, y) co-
ordinates.

Information means that that has been interpreted by a human being. Humans work
with and act upon information, not data. Human perception and mental processing
leads to information, a hopefully understanding and knowledge, Geoinformation is
specific type of information resulting from the interpretation of spatial data.

Categories of data explained;

(i) Spatial data:


Data referenced to a given location on the earth‟s surface (geo-referenced data). Geo-
referenced data involves using the coordinate (latitudes and longitudes) e.g. Northings
and Easting as coordinates on the map.

(ii) Non- spatial data/ Attribute data:


It refers to the data that describes characteristics of the spatial features. These
characteristics can be quantitative and/or qualitative in nature. Attributes provide
descriptive characteristics of the geographic features. Attribute data may be tabular or
textual data describing the geographical characteristics of features.

(iii) Temporal data: is the data that shows change over time.
Conceptually, the basic objective of any temporal database is to record or portray
change over time. Change is normally described as an event or collection of events.
Perhaps the most encompassing definition of an event is „something of significance
that happens‟.

For the purpose of space–time modeling a better definition might be „a change in


state of one or more locations, entities, or both‟. For example, a change in the
dominant species within a forest, a forest fire, change of ownership of the land, or
building of a road would all be events.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


3

Change, and therefore also events can be distinguished in terms of their temporal
pattern into four types:
a) continuous – going on throughout some interval of time
b) majorative – going on most of the time
c) sporadic – occurring some of the time
d) Unique – occurring only once.
This means that duration and frequency become important characteristics in describing
temporal pattern.

1.1.3 Some of the Applications of GIS


a) GIS may be used for urban planning purposes which involve the study of man
made things, the roads sidewalks and at a larger scale suburbs and
transportation routes are manmade. These entities often have clear cut
boundaries.
b) Geomorphologists, ecologists and soil scientists often have natural phenomena
as their study objects. These include rock-formations, plate tectonics,
distribution of natural vegetation or soil units. These entities do not have clear
cut boundaries and there exist transition zones where one vegetation type for
instance may be gradually replaced by one another and studied under a GIS.
c) GIS may be used to study the effect of human activity on the environment
(Change detection of man-made developments) e.g railway, road-construction,
in such an area may involve parts to be reclaimed by government,
Environmental Impact Assessments, and will usually be influenced by many
restrictions like not crossing seasonally flooded lands.
d) GIS may also be used in suitability studies. Such projects usually have defined
feasibility studies, e.g. site stability; simulation studies like erosion modeling etc.
e) Institutional GIS applications provide basic data to others bodies e.g. monitoring
systems like early warning systems for food or water scarcity or systems that
can keep track of whether patterns. For instance, National agencies e.g.
National Topographic Surveys, National Census Bureaus, see it as their task to
administer geographic changes and they stay up to date and provide data to

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


4

others either in form of printed materials such as maps or in form of digital


data.
1.2 Basic Components of a GIS.
GIS is an integration of five basic components:
 Hardware
 Software
 People/organizational structure
 Procedures/methods
 Data
a) Hardware component;
That is; computers and its accessories which converts graphics into digitals, devices for
drawing maps, those which records graphics into digital and stores it.

Main hardware used in GIS includes;


 Printers
 Digitizer
 Central Processing Unit (CPU)
 Plotters
 CD Drives
 Scanners
 Hard drives
(b) Software component.
These are programmes/software which has been developed specifically for analyzing
geographical data. Software has specific sub-programmes called modules which do
specific things;
These basic modules are subsystems for;
 Data input and preparation, which involves data capture and input into
the computer.Data storage and Data Base Management Systems; which involve
storage of non-redundant data in a structured way.
 Analysis and manipulation modules; that aid analysis, manipulation and
querying the system.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


5

 Data output and presentation/display; which prints maps, tables,


graphics etc.
 Data transformation, which may involve several formats of presenting
data.
 Interactions with the user that involves updating and querying the
system.
Examples of Software Packages used
– Arc View GIS from ESRI (e.g. 3.1, 3.2, and 3.3)
– Arc GIS from ESRI (e.g. 9.1, 9.2, 9.3, 10, 10.1 series)
– ILWIS from ITC
– Map Info
– Quantum GIS

Main GIS Software Classification


GIS software contains a powerful set of tools for collecting, storing, retrieving at will,
transforming and displaying spatial data from the real world. (GIS has spatial analysis
capabilities.)

Up to date, accurate and timely information in a ready interpretable form can be


provided by a GIS in order supports decision making and can be availed on computers
using several server types.
The following are examples;
(i) Desktop GIS
This gives you the power to manage and integrate ones data, perform advanced
analysis, model and automate operational processes, and display your results on
professional-quality maps.

The following examples of open-source desktop GIS software;


 GRASS GIS – Originally developed by the U.S. Army Corps of Engineers: a
complete GIS.
 gvSIG – Written in Java. Runs on Linux, Unix, Mac OS X and Windows.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


6

 ILWIS (Integrated Land and Water Information System) – Integrates image,


vector and thematic data.
 ARC GIS by ESRI
 JUMP GIS / OpenJUMP ((Open) Java Unified Mapping Platform)
 MapWindow GIS – Free desktop application and programming component.
 QGIS (previously known as Quantum GIS) – Runs on Linux, Unix, Mac OS X and
Windows.
 SAGA GIS (System for Automated Geoscientific Analysis) –- A hybrid GIS
software. Has a unique Application Programming Interface (API) and a fast-
growing set of geoscientific methods, bundled in exchangeable Module
Libraries.
 uDig – API and source code (Java) available

(ii) Server GIS/Web GIS:


It enables you to distribute your maps, models, and tools to others in your
organization in a way that fits well into their workflows. Staff in other departments and
in the field can query accurate, up-to-date data with minimal training. It may be as
internet based or intranet based.
Examples of software i.e, Web map servers
 GeoServer – Written in Java and relies on GeoTools. Allows users to share and
edit geospatial data.
 MapGuide Open Source – Runs on Linux or Windows, supports Apache and IIS
web servers, and has APIs (PHP, .NET, Java, and JavaScript) for application
development.
 Mapnik – C++/Python library for rendering - used by OpenStreetMap.
 MapServer – Written in C. Developed by the University of Minnesota.

(iii) Mobile GIS:


Mobile mapping and field data collection solutions that provides database access,
mapping, GIS and Global Positioning System (GPS) integration via handheld and
mobile devices

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


7

(b) Organizational structure/People/Institutional component. This is


the institutional arrangement and procedures which is the way to develop and
manage the GIS i.e. human and financial resources. GIS technology is of limited
value without the people who manage the system and develop plans for
applying it to real world problems. GIS users range from technical specialists
who design and maintain the system to those who use it to help them perform
their everyday work. The identification of GIS specialists versus end users is
often critical to the proper implementation of GIS technology.
(c) Procedures/methods: A successful GIS operates according to a well-
designed implementation plan and business rules, which are the models and
operating practices unique to each organization. As in all organizations dealing
with sophisticated technology, new tools can only be used effectively if they are
properly integrated into the entire business strategy and operation.
(d) Data: Perhaps the most important component of a GIS is the data. A
GIS can integrate spatial data with other existing data resources, often stored in
a corporate Database Management System. The integration of spatial data
(often proprietary to the GIS software), and tabular data stored in a DBMS is a
key functionality afforded by GIS. The availability and accuracy of data can affect
the results of any query or analysis.

1.3 DOMAINS/ MAIN FUNCTIONAL COMPONENTS OF A GIS


i) Data capture and preparation
This involves Input and verification of geospatial data
It involves transforming data from; Maps, Field notes, Remote sensing, Aerial photos,
Satellite imagery, Text files, recording instruments/data loggers; into a „compatible‟
digital form.

The data input and verification is done through such hardware;


(i) Visual Display Unit/monitors
(ii) Digitizers
(iii) Scanners

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


8

(iv) Satellites scanning directly


(v) Sensors
(vi) Cameras etc.
ii) Data storage and data maintenance.
Ways, in which spatial data is stored (in a database), structured, organized and
perceived by GIS users. It‟s done by a computer program called a DBMS.

iii) Data analysis and manipulation


Ways how spatial data can be modified to meet user needs, how data can be retrieved
at will for several decisions. It involves Interactions with users e.g. querying the
database (with structured query language - SQL). It also involves Data transformation
which is done to; remove errors, make data up to date, match them to other datasets
iv) Data output and presentation
Ways in which data are displayed and results of analyses are reported to users. e.g.
Maps, tables, figures, graphs, charts, ephemeral images on CRTs, to hard copy maps.

v) Organizational Context
 For effective GIS use, an appropriate organizational context is required.
 Training of Human Resource is essential if GIS is to be applied in management.
 Personnel and managers with knowledge of GIS help the whole system run.

Other GIS functionalities come with these tools including; supporting various
coordinate systems and transformations between them, many different kinds of
computing with geo-referenced data and the freedom of choice in presentation of
results.

Comparisons of GIS and Computer graphics


Graphics are for manipulation and display only while GIS involves all, including non-
graphic attributes, however, good computer graphics are essential for geographical
information system.

Comparisons with GIS vs. CAD

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


9

Both have a frame of reference. (Coordinate system)

Both handle non graphic attributes.

Both describe topological relations.

However, they differ as much greater volume and density of data input to GIS in
addition to much more analysis in GIS.

Common Uses of a GIS


 Allowing for overlay of different information.
 Generating thematic maps
 Creating buffer areas around features for analysis
 Carrying out specific operations e.g. census
 Calculating distances quickly (on screen)
 Querying, on maps and tables
 Providing a range of extrapolation techniques.
 Fore casting epidemics
 mapping populations at risks
 weather forecasting
 determining prevalence of epidemics
 analysis of optimum placement of power lines
 identification of what type of vegetation
 monitoring impact of humans on environments
 disaster management
 modeling what if scenarios
 Investigation of transport routes.

1.4 GIS Primitive Elements


All geographical data can be in three main/basic topological concepts/entities;
 Points.
 Lines.
 Polygons (areas).

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


10

Every geographical phenomenon can be represented by the above three and the „label‟
describing what they are. e.g. Oil wells are represented by a “point” entity consisting
of a „single‟ (x and y) coordinate and a label explaining what it is.

A section of a railway line could be represented by a line entity/feature class consisting


of a starting (x and y) coordinate and an ending (x and y) coordinate and a label-
railway.

A lake could be represented by an area entity covering a set of (x and y) coordinates


and its label “lake.”

GIS DATA TYPES/FORMATS


GIS recognizes two data type/formats which are
1. Vector data type/formats
2. Raster data type/formats.
All spatial data models are approaches of storing spatial location of a geographic
feature which may be in raster or vector type/formats.

The translation of GIS data (spatial or attribute) from one type/format to another is
what is referred to as Data conversion.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


11

Vector data type/format


This implies use of directional lines (vectors) to represent a geographic feature. It is
characterized by use of sequential points or vertices to define a linear segment. Each
vertex consists of an X and a Y coordinate. Coordinate spaces are continuous and not
quantized and these represent the objects as exact as possible.

Vector lines are often referred to as arcs and these consist of strings of vertices
terminated by a node, a node being a vertex that starts or ends an arc segment.
Point features are defined by one coordinate pair, a vertex. Polygonal features are
shown by a set of closed coordinate pairs.

In vector representation, storage of the vertices for each feature is important as well
as connectivity between features i.e. sharing of common vertices for intersection of
features. Also vector type/format emphasizes other data models some of which
include topologic data model and Computer Aided Drafting (CAD).

Topological vs. CAD models


The most popular method of retaining spatial relationships among features is to
explicitly record adjacency information in what is known as the topologic data
model. Topology is a mathematical concept that has its basis in the principles of
feature adjacency and connectivity.

Computer Aided Drafting data models consists of listing elements, not features,
defined by strings of vertices, to define geographic features, e.g. points, lines, or
areas. There is considerable redundancy with this data model since the boundary
segment between two polygons can be stored twice, once for each feature.

The diagram illustrates vector data type/format/structure.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


12

Figure 1: Vector data type/format

Raster data type/format


This incorporates the use of an array of grid cell/pixels data structures where the
geographic area is divided into cells identified by row and column. This data type is
commonly referred to as a Raster.

The size of cells depends on data accuracy and resolution required by the user. There
is no explicit coding of geographic coordinates required since it is implicit in the layout
of cells. Points are represented by one pixel/grid cell. Lines are represented by a
number of pixels in a given direction while areas are represented by an aggregation of
pixels.

Raster data type involves a division of spatial data into regularly spaced cells each
having the same shape and size, the most utilized shape being a square. Most of them
require that a raster cell contains only a single discrete value hence a data layer may
consist of a series of raster maps representing an attribute. E.g. height map, density
map among others.

The use of raster data types allow for sophisticated mathematical modeling processes
while vector based systems are often constrained by the capabilities and language of a
relational DBMS.

The diagram explains the raster data type

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


13

Figure 2: Raster data type/format

Rasterisation
In many cases vector data may be converted to raster data in a process called vector
to raster data conversion (rasterisation). In this process, a digitizer is used to
encode the polygons by digitizing their arcs (line segments forming polygons borders
or individual linear features).

These sets of arcs can easily be converted into raster form at any resolution required
by using programs or application software packages. Most GIS software allows the
user to define a raster grid cell for conversion. It is imperative that the original scale
e.g. accuracy of data be known prior to conversion.

The accuracy of the data is often referred to as resolution and this should determine
the cell size of the output raster map during conversion. This rasterisation leads to loss
of information because cells or pixels near the digitized boundaries arc miscoded. The
loss in accuracy is proportional to both the size of the grid cell and wiggliness of the
boundaries.

Vectorization
This is data conversion from a raster to a vector format. In its process, algorithms are
needed and used to convert arrays of pixels/cells into line data. This enables capability

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


14

to convert data from scanners, digitizers, into lines text and also where raster data are
output to devices such as pen plotters.

The vectorisation process


It consists mainly of threading a line through a swarm of pixels using what is
collectively known as thinning algorithms since the swarm of pixels is always thinned
to a line. And the resulting thinned lines will contain more coordinate points than what
is necessary i.e. they are over defined and so will take up a lot of storage space.

It should be noted that data conversions come with advantages and disadvantages as
used in GIS some of which are given below.

Advantages of raster data


• It is a Simple data structures
• The geographic location of each cell is implied by the position in the
cell matrix. Accordingly, other than an origin point, e.g. bottom left
corner, no geographic coordinates are stored.
• Due to the nature of the data storage technique data analysis is usually easy
to program and quick to perform. :

• Some spatial analysis methods simple to perform; Simulation easy, because


cells have the same size and shape

• Discrete data, e.g. forestry stands, is accommodated equally well as


continuous data, e.g. elevation data, and facilitates the integrating of the two
data types.
• Technology is cheap as grid-cell systems are very compatible with raster-
based output devices, e.g. electrostatic plotters, graphic terminals.
• Overlay and combination of maps and remote sensed images easy.

Disadvantages of raster data type/formats


• The cell size determines the resolution at which the data is represented.
• It is especially difficult to adequately represent linear features depending

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


15

on the cell resolution. Accordingly, network linkages are difficult to


establish.
• Processing of associated attribute data may be cumbersome if a large
amount of data exists. Raster maps inherently reflect only one attribute or
characteristic for an area.
• Since most input data is in vector form, data must undergo vector-to-raster
conversion.
• Besides increased processing requirements this may introduce data integrity
concerns due to generalization and choice of inappropriate cell size.
• Most output maps from grid-cell systems do not conform to high-quality
cartographic needs.

Advantages of vector data type/formats


• Graphic output is usually more aesthetic (attractive), and accurate.
• It is a compact structure.
• Since most data, e.g. hard copy maps, is in vector form no data conversion is
required.
• Accurate geographic location of data is maintained.
• Allows for efficient encoding of topology, and as a result more efficient
operations that require topological information, e.g. proximity, network
analysis.
• Retrieval, updating and generalization of graphics and attributes possible

Disadvantages of vector data type/formats


• It‟s a Complex Data Structure
• For effective analysis, vector data must be converted into a topological structure.
This is often processing intensive and usually requires extensive data cleaning.
• The location of each vertex needs to be stored explicitly.
• Algorithms for manipulative and analysis functions are complex and may be
processing intensive. Often, this inherently limits the functionality for large
data sets, e.g. a large number of features,

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


16

• Display and plotting can be expensive, particularly for high quality color
• The technology is expensive, particularly for the more sophisticated software
and hardware
• Spatial analysis and filtering within polygons are impossible
• Combination of several vector polygon maps through overlay creates
difficulties
• Continuous data, such as elevation data, is not effectively represented in
vector form. Usually substantial data generalization or interpolation is required
for these data layers.
• Spatial analysis and filtering within polygons is impossible. i.e. Simulation is
difficult because each unit has a different topological form

1.5 DATA INPUT PROCESS

Sub-items

(a) Sources of Data


-Spatial data sources
-Non-spatial data sources

(b) Data Input Techniques

(c) Data transfer/exchange formats

(d) Data Editing and Quality Assurance

(e) Data verification.

(A) DATA SOURCES


(i) Spatial data sources;
hard copy maps;
aerial photographs;
remotely-sensed imagery;
point data samples from surveys; and
Existing digital data files.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


17

(ii) Attribute data sources.


 Any textual or tabular data that can be referenced to a geographic feature, e.g.
a point, line, or area, can be input into a GIS.

Attribute data is usually input by manual keying or via a bulk loading utility of the
DBMS software. ASCII format is a de facto standard for the transfer and conversion
of attribute information.

(B) DATA INPUT TECHNIQUES


In this section, data input techniques will be limited to spatial data only. There is no
single method of entering the spatial data into a GIS. Rather, there are several,
mutually compatible methods that can be used singly or in combination.

The choice of data input method is governed largely by the application, the
available budget, and the type and the complexity of data being input.

There are at least four basic procedures for inputting spatial data into a GIS. These
are:
(i) Manual digitizing;

(ii) Automatic scanning;

(iii) Entry of coordinates using coordinate geometry; (from survey


measurements)

(iv) Conversion of existing digital data.

1. Manual Digitizing
Majority of GIS spatial data entry is done by manual digitizing.
A digitizer is an electronic device consisting of a table upon which the map or drawing
is placed. The user traces the spatial features with a hand-held magnetic pen, often
called a mouse or cursor or a digitizing puck. While tracing the features the
coordinates of selected points, (e.g. vertices,) are sent to the computer and stored.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


18

All points that are recorded are registered against positional control points,
usually on the map corners, which are keyed (typed) in by the user at the beginning of
the digitizing session. The coordinates are recorded in a user defined coordinate
system or map projection. Latitude and longitudes (geographic coordinates) and
UTM coordinates (planer coordinates) are most often used.

The ability to adjust or transform data during digitizing from one projection or
coordinate system to another is a desirable function of the GIS software. Numerous
functional techniques exist to aid the operator in the digitizing process.

Digitizing methods
Point vs. stream digitizing.
Digitizing can be done in a point mode, where single points are recorded one at a
time, or in a stream mode, where a point is collected on regular intervals of time or
distance, measured by an X and Y movement, e.g. every 3 metres records a point
feature.
Blind vs. on screen digitizing
Digitizing can also be done „blindly‟ or with a „graphics terminal.‟ Blind digitizing infers
that the graphic result is not immediately viewable to the person digitizing. Most
systems display the digitized linework as it is being digitized on an accompanying
graphics terminal.

Spaghetti mode of digitizing: This allows the user to simply digitize lines by
indicating a start point and an end point. Data can be captured in point or stream
mode. However, some systems do allow the user to capture the data in an arc/node
topological data structure. The arc/node data structure requires that the digitizer
identify nodes.

Data capture in an arc/node approach helps to build a topologic data structure


immediately. This lessens the amount of post processing required to clean and build
the topological definitions. However, most often digitizing with an arc/node approach

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


19

does not negate the requirement for editing and cleaning of the digitized linework
before a complete topological structure can be obtained.

Advantages of manual digitizing


Low capital cost, e.g. digitizing tables are cheap;

Low cost of labour,

Flexibility and adaptability to different data types and sources;

Easily taught in a short amount of time i.e. it is an easily mastered skill

Generally the quality of data is high;

Digitizing devices are very reliable and most often offer a greater precision that the
data warrants; and

Ability to easily register and update existing data.


variation in scale

For raster based GIS software data is still commonly digitized in a vector format and
converted to a raster structure after the building of a clean topological structure. The
procedure usually differs minimally from vector based software digitizing, other than
some raster systems allow the user to define the resolution size of the grid-cell.
Conversion to the raster structure may occur on-the-fly or afterwards as a separate
conversion process.

2. Automatic Scanning
A variety of scanning devices exist for the automatic capture of spatial data. There is
an advantage of being able to capture spatial features from a map at a rapid rate of
speed.

Disadvantage
 Scanners are generally expensive to acquire and operate.
 Most scanning devices have limitations with respect to the capture of selected
features, e.g. text and symbol recognition.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


20

 Most scanned data requires a substantial amount of manual editing to create a


clean data layer.

Practical limitations of scanners for organizations include:


hard copy maps are often unable to be moved to where a scanning device
is available, e.g. most companies or agencies cannot afford their own
scanning device and therefore must send their maps to a private firm for
scanning;

hard copy data may not be in a form that is viable for effective scanning,
e.g. maps are of poor quality, or are in poor condition for instance Most
cadastral maps once scanned in the LIS were in poor unreadable state;

geographic features may be too few on a single map to make it practical,


cost-justifiable, to scan; e.g. Old hard copy maps of cadastral sheets.

often on conjested maps a scanner may be unable to distinguish the


features to be captured from the surrounding graphic information, e.g.
dense contours with labels;

with raster scanning there it is difficult to read unique labels (text) for a
geographic feature effectively; and

Scanning is much more expensive than manual digitizing, considering all


the cost/performance issues.

NOTE:
Consensus within the GIS community indicates that scanners work best when the
information on a map is kept very clean, very simple, and uncluttered (not congested)
with graphic symbology.

The sheer cost of scanning usually eliminates the possibility of using scanning methods
for data capture in most GIS implementations. Large data capture shops and
government agencies are those most likely to be using scanning technology.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


21

3. Coordinate Geometry from survey measurements


A third technique for the input of spatial data involves the calculation and entry of
coordinates using coordinate geometry (COGO) procedures.

This involves entering, from survey data, the explicit measurement of features from
some known survey control.

Disadvantage
This input technique is obviously very costly and labour intensive. In fact, it is rarely
used for natural resource applications in GIS.

Advantage
This method is useful for creating very precise cartographic definitions of property, and
accordingly is more appropriate for land records management at the cadastral or
municipal scale. It is currently used to input data in the LIS at the district MZOs.

4. Conversion of Existing Digital Data


This technique is becoming increasingly popular for data input. A variety of spatial
data, including digital maps, are openly available from a wide range of government
and private sources.

The most common digital data to be used in a GIS is data from CAD (computer aided
drafting) systems. However, a number of data conversion programs exist, mostly from
GIS software vendors, to transform data from CAD formats to a raster or topological
GIS data format. Several specific standards for data exchange have been established
in the market place.

(C) DATA TRANSFER AND EXCHANGE FORMATS


Given the wide variety of data formats that exist, most GIS users has developed and
provide data exchange/conversion software to go from their format to those
considered common.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


22

E.g. Most GIS software vendors provide an ASCII data exchange format specific to
their product, and a programming subroutine library that will allow users to write their
own data conversion routines to fulfill their own specific needs.

Some of the data formats common to the GIS marketplace are listed below. Most
formats are only utilized for graphic data. Attribute data is usually handled as ASCII
text files.

IGDS - Interactive Graphics Design This binary format is a standard in the


Software (Intergraph / Microstation) turnkey CAD market and has become a de
facto standard in Canada's mapping
industry. It is a proprietary format, however
most GIS software vendors provide DGN
translators.

DLG - Digital Line Graph (US Geological This ASCII format is used by the USGS as a
Survey) distribution standard and consequently is
well utilized in the United States. It is not
used very much in Canada even though
most software vendors provide two way
conversion to DLG.

DXF - Drawing Exchange Format This ASCII format is used primarily to


(Autocad) convert to/from the AutoCAD drawing
format and is a standard in the engineering
discipline. Most GIS software vendors
provide a DXF translator.

GENERATE - ARC/INFO Graphic A generic ASCII format for spatial data used
Exchange Format by the ARC/INFO software to accommodate
generic spatial data.

EXPORT - ARC/INFO Export Format . An exchange format that includes both

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


23

graphic and attribute data. This format is


intended for transferring ARC/INFO data
from one hardware platform, or site, to
another. It is also often used for archiving.

ARC/INFO data. This is not a published data


format, however some GIS and desktop
mapping vendors provide translators.
EXPORT format can come in either
uncompressed, partially compressed, or
fully compressed format

(e) DATA EDITING AND DATA QUALITY ASSURANCE


Data editing and verification is in response to the errors that arise during the encoding
of spatial and non-spatial data.

The editing of spatial data is a time consuming, interactive process that can take as
long, if not longer, than the data input process itself.

Classification of errors to be corrected under data input

(i) Spatial data input errors:

Incompleteness of the spatial data. This includes missing points, line


segments, and/or polygons.

Locational placement errors of spatial data. These types of errors usually are
the result of careless digitizing or poor quality of the original data source.

Distortion of the spatial data. This kind of error is usually caused by base maps
that are not scale-correct over the whole image, e.g. aerial photographs, or from
material stretch, e.g. paper documents.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


24

(ii) Non spatial data/attribute data input errors.

Incorrect linkages between spatial and attribute data. This type of error is
commonly the result of incorrect unique identifiers (labels) being assigned during
manual key in (Typing) or digitizing. This may involve the assigning of an entirely
wrong label to a feature, or more than one label being assigned to a feature.

Attribute data is wrong or incomplete. Often the attribute data does not
match exactly with the spatial data. This is because they are frequently from
independent sources and often different time periods. Missing data records or too
many data records are the most common problems.

The identification of errors in spatial and attribute data is often difficult. Most spatial
errors become evident during the topological building process. The use of check plots
to clearly determine where spatial errors exist is a common practice.

SPATIAL DATA PROBLEMS ENCOUNTERED


A variety of common data problems occur in converting data into a topological
structure. These come from the original quality of the source data and the
characteristics of the data capture process.

Usually data is input by digitizing. Digitizing allows a user to trace spatial data from a
hard copy product, e.g. a map, and have it recorded by the computer software. Most
GIS software has utilities to clean the data and build a topologic structure.

If the data is unclean to start with, for whatever reason, the cleaning process can be
very lengthy.

Interactive editing of data is a distinct reality in the data input process. Experience
indicates that in the course of any GIS project 60 to 80 % of the time required to
complete the project is involved in the input, cleaning, linking, and verification of the
data.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


25

The most common problems that occur in converting data into a topological
structure include:
1. Slivers/flakes and gaps in the line work;

2. dead ends/improper junctions, e.g. also called dangling arcs, resulting


from overshoots and undershoots in the line work; and

3. Weird polygons from inappropriate closing of connecting features.

Note:
 Topological errors only exist with linear and polygon features.
 They are most evident with polygonal features.

SLIVERS, DEAD ENDS, DUPLICATE LINES


Slivers/flakes are the most common problem when cleaning data. Slivers frequently
occur when coincident boundaries are digitized separately, e.g. once each for
adjacent forest stands, once for a lake and once for the stand boundary, or after
polygon overlay. Slivers often appear when combining data from different sources, e.g.
forest inventory, soils, and hydrography.

SOLUTION
 It is advisable to digitize data layers one at a time with respect to an existing
data layer, e.g. hydrography, rather than attempting to match data layers later.
 A proper plan and definition of priorities for inputting data layers will save many
hours of interactive editing and cleaning.

Dead ends usually occur when data has been digitized in a spaghetti mode, or without
snapping to existing nodes. Most GIS software will clean up undershoots and
overshoots based on a user defined tolerance, e.g. distance.

SOLUTION
The definition of a proper tolerance for cleaning requires an understanding of the scale
and accuracy of the data set.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


26

Duplicate lines. These usually occur when data has been digitized or converted from a
CAD system. The lack of topology in this type of drafting systems permits the
inadvertent creation of elements that are exactly duplicate.

However, most GIS packages afford automatic elimination of duplicate elements during
the topological building process. Accordingly, it may not be a concern with vector
based GIS software. Users should be aware of the duplicate element that retraces
itself, e.g. a three vertical line where the first point is also the last point.

Some GIS packages do not identify these feature inconsistencies and will build such a
feature as a valid polygon. This is because the topological definition is mathematically
correct, however it is not geographically correct. Most GIS software will provide the
capability to eliminate bow ties and slivers by means of a feature elimination command
based on area, e.g. polygons less than 100 square metres.

(b) Attribute Data Errors


These include; simple errors of linkage, e.g. missing or duplicate records, and they
become evident during the linking operation between spatial and attribute data.

Solution: Most GIS software contains functions that check for and clearly identify
problems of linkage during attempted operations.

(E). DATA VERIFICATION


Data verification ensures the integrity between the spatial and attribute data.
Six clear steps stand out in the data editing and verification process for spatial data.
These are:
Visual review. This is usually by check plotting.

Cleanup of lines and junctions. This process is usually done by software first and
interactive editing second.

Weeding/removal of excess coordinates. This process involves the removal of


redundant vertices by the software for linear and/or polygonal features.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


27

Correction for distortion and warping. Most GIS software has functions for scale
correction and rubber sheeting. However, the distinct rubber sheet algorithm used will
vary depending on the spatial data model, vector or raster, employed by the GIS.
Some raster techniques may be more intensive than vector based algorithms.

Construction of polygons. Since the majority of data used in GIS is polygonal, the
construction of polygon features from lines/arcs is necessary. Usually this is done in
conjunction with the topological building process.

The addition of unique identifiers or labels. Often this process is manual.


However, some systems do provide the capability to automatically build labels for a
data layer.

These data verification steps occur after the data input stage and prior to or during the
linkage of the spatial data to the attributes. Verification should include some brief
querying of attributes and cross checking against known values.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


28

DATA STORAGE AND DATABASE MANAGEMENT


Data has traditionally been stored in a file system.
A file is a collection of records such as parcels of mapped land. A collection of records
is called a logical file. Logical files are sometimes referred to as datasets.

In general, logical files are categorized as;


a) Simple files
b) Ordered sequential files
c) Indexed files

Simple file (Serial Access Files)


The simplest form of a database is a simple list of all items
A new item is simply placed at the end of the list which gets longer and longer
It‟s very easy to add data to such a system but retrieval is inefficient.
Searching through structured lists in time consuming
Keys are provided to speed up data retrieval.

Ordered sequential file


Words in a dictionary or names in telephone books are structured alphabetically.
Addition of new item means that extra room must be created to insert it. However, the
advantage is that stored items can be reached very much faster.

These files are accessed by binary search procedures. Instead of beginning the search
at the beginning of the list, the record in the middle is examined first. Binary search
procedures requires (n+1) steps. E.g If the file is 10000 and it takes 1 second for
each name, then time required is (10000+1) =13.82s

Indexed files
Index file contain lists of index tables containing a list of keys and addresses of
corresponding records. For instance, a land parcel referenced to a particular volume
and portfolio on which data is stored. Access to original data is fast. There are two sub
types under indexed files

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


29

If the data items in files provide the main order of the file then these are called direct
files.
Inverted files are those where location of items in the main file can also be specified
according to topic, normally given in the second file. Index files permit rapid access to
databases.

DATABASE
A database is a collection of structured, non-redundant data „sharable‟ between
different application systems. The definition highlights the essence of sharing data
between the given application systems thus calling for data consistency. A geo-
database hence is a collection of structured, non-redundant geospatial data.

The simplest way to reduce the incidence of the consistent data is to eliminate un-
necessary duplication of data; this in turn implies that data should be stored as a
common pool of data sharable between application systems. This pool of data is the
enterprise database.

Role of a database
Database or information systems allow the user to achieve the addition of new
information easily, retrieve information change or edit information easily giving a prime
role in handling related data.
It comes with a number of useful functions:
 The database can be used by multiple users at the same time i.e. it allows for
concurrent use.
 The database offers a number of techniques for storing data and allows to use
the most efficient one i.e. it supports storage optimization
 The database allows imposing rules on the stored data, which are automatically
checked after each update to the data i.e. it supports data integrity
 The database offers an easy way to use data manipulation language, which
allows performing all sorts of data extraction and data updates i.e. it has a
query facility
 The database will try to execute each query in the data manipulation language

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


30

in the most efficient way i.e. it offers query optimization.

Objectives of creating a Cadastral Information System database


The design of the any database had some objectives among others to be achieved.

 Structuring of the parcel data to permit various methods of access


 Storing of the land related data in formats that are independent of current and
potential applications.
 The control of access to the comprehensive parcel data including who is allowed
or alter any data entry
 The facilitation of record updating, changing or modification, including the
insertion of new records and deletion of some unwanted ones.
 The minimizing of parcel data redundancy.

DATABASE MANAGEMENT SYSTEM (DBMS)


A Database Management System (DBMS) is a computer program (software) used to
control the input, output, storage, retrieval and modification of data in a database. The
basic subsystems are for the DBMS are; file handling and file management systems.

Data Base Management System (DBMS) acts in support of data storage and
processing.

Database design
Databases are made in tabular forms and being able to relate them.
A table is an elementary building blocks used to describe conceptual models. A
“Model” , being a representation of the actual phenomena.

Database design involves identifying what to be included in these tables i.e. entities

Therefore an entity is a distinct object (a person, place, thing, concept or event) in the
organization that is to be represented in the database. An “alias” is just another name
of an entity.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


31

Enterprise rules are rules that govern data in a database and are applicable to the
conceptual model of the data base.

An identifier/primary key is an attribute (possibly composite) which can never have


duplicated values within a table occurrence (row), and whose value is therefore always
sufficient to identify a row.

Relationships between entities


A relationship is an association between entities. Relationships are indicated by verbs
or verbal expressions. Having created the entities of actual phenomena, they are
„related‟ using their identifiers.
When a „posted identifier‟ from a table is used to link to the data in another table, this
is called joining.
Joining relationships occurs as follows in mainly three types described below:
 One-to-One (1:1) Relationship; a relationship that exists when one row of
the first table matches to one and only one row of the second table and both
tables having the same unique identifier (primary key). For example one person
owning only one parcel especially in leasehold tenure system.

 One-to-many (1: N) Relationship; this exists when one row of the first table
matches to multiple rows in the second table. For example one parcel being
owned by more than one person (many people) especially in customary tenure
system.

 Many-to-Many (N:M) Relationship; This exists when one row in the first
table matches multiple rows in the second table and one row in the second
table matches to multiple rows in the first table.

RELATIONAL DATABASE
The term relational database was originally defined and coined by Edgar Codd at IBM
Almaden Research Center in 1970.
Key Terms

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


32

Relational database theory uses a set of mathematical terms, which are roughly
equivalent to Structured Query Language (SQL) database terminology.

The table below summarizes some of the most important relational database terms
and their SQL database equivalents.

Relational term SQL equivalent

relation, base relvar Table

Derived relvar view, query result, result set

Tuple row

Attribute column

Relations or Tables
A relation is defined as a set of tuples that have the same attributes. A tuple usually
represents an object and information about that object. Objects are typically physical
objects or concepts. A relation is usually described as a table, which is organized into
rows and columns. All the data referenced by an attribute are in the same domain and
conform to the same constraints.

The relational model specifies that the tuples (rows) of a relation have no specific
order (ordering of rows is not significant) and that the tuples, in turn, impose no order
on the attributes.

Applications access data by specifying queries, which use operations such as select to
identify tuples, project to identify attributes, and join to combine relations.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


33

Relations can be modified using the insert, delete, and update operators. New tuples
can supply explicit values or be derived from a query. Similarly, queries identify tuples
for updating or deleting. It is necessary for each tuple of a relation to be uniquely
identifiable by some combination (one or more) of its attribute values. This
combination is referred to as the primary key.

2.4 TABLES
An example of tables and characteristics

Attribute types

Part # Part-description Quantity-in- Stock


stock
P2 nut 5200 Attribute
P1 bolt 8700 occurence

Rows/tuple P3 screw 9750


P4 nut 2350
Column/attribute

Tables are elementary building blocks used to describe conceptual models in a


database. The table_name above is stock. It contains three columns and three rows.
Columns are headed by “attribute_types” and four rows (tuples). The intersection of
rows and columns in a table contains an attribute occurrence or an attribute
value.

A table type contains only table name and attribute types but not attribute
occurrences/values

e.g. stock(part#, part-description, quantity-in-stock)

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


34

RULES/RESTRICTIONS WHEN SETTING UP A RELATIONAL DATA


(NORMALISED DATA)
There are three basic rules and these include;
i. Each row in the table must be distinct i.e. No two rows can have the same
attribute values throughout. (No two rows must be exactly the same.)
ii. Multiple attribute values are prohibited. Each row/column intersection contains a
single attribute value.
iii. Ordering of rows and columns is not significant, i.e. rows and columns can be
interchanged without affecting the information content in the table. Each
column has a single attribute type name.

NORMALIZATION
The table that satisfies the above rules/restrictions is called a normalized table. One
that violates the rules above is called unnormalized table. Unnormalized tables have
redundant attribute values.

NULL VALUES
An attribute may be null, that is; “Not yet known” or “Not applicable.” Any null value in
the table will usually be represented by a blank, this shouldn‟t mean that “null‟ and
blank are the same.

REDUNDANT DATA
This is any data if removed will not cause any loss in information. i.e. a data value is
redundant if its deletion leas to no loss of information.

Redundancy is hence unnecessary duplication of data.

DUPLICATE VS REPEATED DATA


Reg No. Name Age Qualification

01 John 27 Diploma in Surveying

02 Josh 25 Diploma in Land

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


35

management

03 Julius 27 Bsc. in Surveying

04 Joseph 21 Bsc. in Physical Planning

From the table we have the following types of data;


1. DUPLICATED data: this occurs when an attribute (column) has two or more
identical values. i.e. attribute values are repeated e.g. age of John and Julius
that is; (27) is duplicated data. However when duplicated data are removed
there is loss of information because if we delete age of John we cannot tell his
age by looking at the age of Julius.
2. REPEATED data: This is when there are multiple attribute values in a
relational table e.g. Bsc, Diploma in Surveying, Bsc in Physical planning,
Diploma in Land management. To solve such a problem the table is split so that
each qualification is written separately.

When the table contains no multiple values and no redundant data we say the table is
fully normalized.

ADVANTAGES OF RELATIONAL DATA


- It is the simplest form of data base which can easily be understood.
- It is quite flexible and many meet the data of all queries that can be formulated
using the rule of mathematical expressions.
- It is easy to make queries in the database like logic operations e.g. select
students whose age is 27 and qualified with a degree from the table given.
- It allows different forms of data to be manipulated.
- Additional or removal of data is easy because it involves adding of removing a
tuple or even a whole table.
- Querying across different relational tables is made by joining them through
common fields. This is good for situations where all records have the same
number of attributes and there is no natural hierarchy.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


36

DISADVANTAGES
1. The search is sequential and considerable amount of time can be spent in large
database.
2. Relational database system has to be very skillfully designed in order to support
the capabilities with reasonable speed that is it is very expensive to design a
relational database structure.
3. Relational databases are not very good at storing more complex types of data.
4. Relational data bases are not set up when dealing with spatial data.

Base and derived relations


In a relational database, all data is stored and accessed via relations. Relations that
store data are called "base relations", and in implementations are called "tables".

Other relations do not store data, but are computed by applying relational
operations to other relations. These relations are sometimes called "derived
relations". In implementations, these are called "views" or "queries".

Derived relations are convenient in that though they may grab information from
several relations, they act as a single relation. Also, derived relations can be used as
an abstraction layer.

Domain
A domain describes the set of possible values for a given attribute. Because a
domain constrains the attribute's values and name, it can be considered constraints.
Mathematically, attaching a domain to an attribute means that "all values for this
attribute must be an element of the specified set."

Topology
A core feature of Geographical Information Systems (GIS) is the ability to create and
manipulate topological data structures for vector-based data.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


37

Topology is typically defined as spatial relationships between adjacent or neighboring


features or „...properties which define relative relationships between spatial elements...
including adjacency, connectivity, and containment‟

Therefore, a topological data structure is typically defined as a data structure in which


the inherent spatial connectivity and adjacency relationships of features are explicitly
stored.

Advantages of topological data model as:


 Data storage for polygons is reduced because boundaries between adjacent
polygons are not stored twice,
 explicit neighborhood relations are maintained,
 data entry and map production is improved by providing a rigorous, automated
method to handle island and self-intersecting polygons, overshoots and
undershoots, and gaps.

DATABASE STRUCTURES/ATTRIBUTE DATA MODELS


There are mainly four (4) database structures/Attribute data models that may be
used:
 Hierarchical database structures
 Network database structures
 Relational database structures
 Objected oriented database structures

Hierarchical attributedata model/structure


Hierarchical data structures provide quick and convenient means of data access where
data has a parent-to-child or a one-to-many relation. This structure assumes that each
part of the hierarchy (like a tree) can be accessed using an identifier that fully
describes the data structure.

Hierarchical systems assume there is a good correlation between the key attributes
and the associated attributes.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


38

Advantages portrayed are;


 Data access via unique identifiers is easy for key attributes.
 These systems are easy to understand and easy to update.
 Data retrieval is easy if all possible queries are known beforehand.

Disadvantages are;
 Data access via unique identifiers is difficult for associated attributes.
 Large index files have to be maintained and certain attribute values may have
to be repeated many times leading to data redundancy; thus increasing storage
and access overheads. Illustration

Consider MAP M with parts I and II


2 b
3

a e
I
c 5

1 II
d
4 f

Figure 1: Map M

2
b 3
3
a e
I
c
c 5
1 II
f
d 4
4
g

6
Figure 2: Polygons I and II (from M)

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


39

I II

a b c
d c e f
g

1 2 2 3 3 4 4 1 3 4 3 5 5 6 6 4

Figure 3: Hierarchical data structure (tree structure)

Network data base Structures


In many cases, a much more rapid linkage is required particularly in data structures for
graphic features where adjacent items in a map or figure need to be connected
together even if the actual data above their coordinates may be written in very
different parts of the database. Network systems thus fulfill this requirement.

I II

a b c d e f g

1
2 3 4 5 6

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


40

Advantage;
 Network systems are very functional when tables or linkages can be specified
before. (In the process, avoiding data redundancy while using of available
data).

Disadvantage;
The database is enlarged by the overhead pointers which in complex systems can
become quite a substantial part of the database. The problem is that these pointers
must be updated every time change is made to the database.

Relational Database Structures


This is a database that is viewed as relations inform of tables. The data is laid on two
dimensional tables. It is the simplest form of database structure and most commonly
used in GIS

Data are stored in simple records called tuples and a complete row is called a record.
E.G Attribute type

No. Name Area Population Number of


Students
tuple
01

02 Attribute occurence
column
.

E.G. relational data base structure representation of Map M


Map M I II

Polygons

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


41

I a b c d
II c e f g

Lines
I A 1 2
I B 2 3
I C 3 4
I D 4 1
II E 3 5
II F 5 6
II G 6 4
II C 4 3

Characteristics
 It consists of rows and columns.
 When a row and a column intersect we call the place an attribute
occurrence.
 Columns contain attribute types e.g. code, name, area etc.

These structures store no pointers and express no hierarchy. Instead, data are stored
in simple records known as „tuples‟ containing an ordered set of attribute values
grouped together in two-dimensional tables called relations. Each table or relation is
usually a separate file.

The pointer structures in network models and identifiers in hierarchical structures in


this case replaced by data redundancy in the form of „identification codes‟ that are
used as unique identifier to identify the records in each file. This database structure
has been employed in this project to design the current Land Information System in
the Ministry of Lands, Housing and Urban Development.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


42

Object-Oriented structure:
The object-oriented database model manages data through objects.

An object is a collection of data elements and operations that together are considered
a single entity. The object-oriented database is a relatively new data structure.

This approach has the attraction that querying is very natural, as features can be
bundled together with attributes at the database administrator's discretion. To date,
only a few GIS packages are promoting the use of this attribute data model.

Many of the most widely used programming languages are multi-paradigm


programming languages that support object-oriented programming to a greater or
lesser degree, typically in combination with imperative, procedural programming.

Significant object-oriented languages include;


 C++,
 Objective-C,
 Smalltalk, Delphi,
 Java, C#,
 Perl,
 Python,
 PHP.

Database design in Microsoft Access


Microsoft Access (MS Access) is Microsoft software used to design relational databases.
During physical modeling, objects such as tables and columns are created based on
the entities and attributes that are defined.

The tables designed are fully „normalized‟ by enhancing „referential integrity‟. In the
process, Constraints are defined, including primary identifiers (keys), posted
identifiers, other unique identifiers, and check constraints. This ensures integrity on
data and removal of redundant data.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


43

One such integrity, that is; referential integrity rules keep the relationships between
tables intact and unbroken in a relational database management system - referential
integrity prohibits you from changing existing data in ways that invalidate and harm
the links between tables.

Referential Integrity preserved the defined relationships between tables when records
are added, modified or deleted by ensuring that the identifier values are consistent
across tables; such consistency required that there are no references to non-existent
values and if a identifiers value changed, all references to it had to change consistently
through database ensuring that a identifiers couldn‟t be changed

Physical database design in Arc GIS


Arc GIS is one of the series of a Geographical Information System software package
developed by Environmental Systems Research Institute (ESRI.). It constitutes a
wizard-based approach towards accomplishing a variety of more sophisticated tasks
such as advanced geo-processing and advanced map accomplishment with a shape file
utility for providing support for easy transfer of data projection and datum
transformation.

This package/software has a relational database extension that helps to design


databases of given attributes in its environment, allow for joining and relating them
and can offer tasks like statistical analysis, spatial analysis and querying, proximity
analysis and connectivity with a subsequent presentation being appeasing..

Data storage/backup media


Magnetic films, CDs, Memory sticks, hard drives/disks, DVDs, floppy discs.

SPATIAL DATA ANALYSIS


These are sets of techniques for analyzing spatial data. It involves data queries
performed on geo-referenced information to answer complex questions. E.g. how
many people live within a certain hazardous area?

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


44

Spatial analysis can also be defined as the process of extracting or creating new
information about a set of geographic features to perform routine examination,
assessment, evaluation, analysis or modeling of data in a geographic area based on
established and computerized criteria and standards.

Spatial analysis can also be defined as a process of modeling, examining, and


interpreting model results useful for evaluating suitability and capability, for estimating
and predicting and interpreting and understanding.

Spatial analysis helps us to;


 Identify trends on data
 Create new relationships between data sets
 View complex relationships from the data
 Make better decisions

Main analytical tools in Geographical information systems namely;


1. Connectivity analysis
2. Proximity analysis
3. Overlay analysis
4. Network analysis
5. Query.

Connectivity analysis
This is the analysis of connectivity between points, lines and polygons in terms of
distance, area, travel time, optimum paths etc. examples, proximity by buffering,
network analysis.

Proximity analysis
This is primarily concerned with closeness of one feature to another. Proximity analysis
means the ability to identify any feature that is near any other feature based on
location, attribute values, or specific distance. It may also be defined as,
“measurement of distances from points, lines and boundaries of polygons.”

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


45

Questions like these are answered, „To identify parcels which are within 80m of a
certain road?‟ “Which parcels are within 60m of the railway line?” “How many houses
lie within 100 meters of this water main?” What is the total number of customers within
10 kilometers of this store? What proportion of the alfalfa crop is within 500 meters of
the well?

To answer such questions, GIS technology uses a process called buffering to determine the
proximity between features.

Proximity to roads and engineering infrastructure is typically important for decision


making involving;
 Developments planning
 Tax calculations
 Utility billing

In proximity, more questions like; “how many houses lie within 30km of this
reservoir?” “What is the total number of patients within 20km of a health care facility?”
These questions also can be answered by a process called “Buffering” to determine
proximity within features.

GIS analysis functions by creating buffers around selected features for example a
radius of 15km around a health centre to denote a catchment area. Proximity analysis
is not always based on distance but also on time.

Buffering
A buffer operation is one of the most common spatial analysis tools.

“A buffer is a map feature that represents a uniform distance around a feature.”

When creating a buffer, the user selects the feature to buffer from as well as the
distance to be buffered. The buffer operation creates a new polygon dataset, where a
specified distance is drawn around specific feature within a layer.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


46

Buffers are a frequently used analysis tools/device. In fact, buffers calculate distances
from spatial objects and produce polygons that reflect the objects and area around.
Buffer zones are frequently used to mitigate environmental hazards.

Network analysis
This type of analysis examines how linear features are connected and how easily resources can
flow through them. Many analyses can be carried out on the network, for transportation
planning, utility management, air-line scheduling, and navigation etc.

Network analysis includes determination of optimum paths using specified decision


rules. These decision rules are likely to be based on minimum time or distance. E.g.
“Find the shortest paths through a network between selected points?”, “Determine
whether one point on a stream network is downstream or upstream over another?”
“Find the parts of a network which can be reached within a given travel time.

Examples where they are used;


 Street network analysis
 Traffic flow modeling
 Telephone cable networking

Overlay
An overlay process combines the features of two layers to create a new layer that contains
the attributes of both. This resulting layer can be analyzed to determine which features
overlap, or to find out how much of a feature is in one or more areas.

An overlay could be done to combine soil and vegetation layers to calculate the area of a
certain vegetation type on a specific type of soil.

Overlay analysis integrates spatial data layers with attribute data. This is done by
combining information from different GIS layers and finally deriving an attribute
(another layer).

In fact all types of spatial objects can be overlain in order to analyses spatial
relationship between sets of objects and their surroundings.
Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]
47

Overlays/spatial joins can for example link land use and environmental data to
population and disease data. It could also integrate data of different types such as
soils, vegetation and land ownership.

Two types exist;


1. Raster overlay (overlay of raster data)
This is relatively straight forward operation and often many datasets can be combined
and displayed at once.

2. Vector overlay (overlay of vector data)


This type is far more difficult and complex and involves more processing because it
must update the topological tables of spatial relationship between points, lines and
polygons.

Vector overlay
It is a combination of two separate spatial data sets to create a new output vector
datasets. These are similar to mathematical Venn diagram overlays.

In the process, map features and associated attributes are integrated to produce
composite maps.

Logical rules can be applied to how maps are combined.

It is subdivided into; Union vs intersect overlays

Union overlay
It combines the geographical features and attributes tables of both inputs into a single
new output.

Intersect overlay
It defines the area where both inputs overlap and retains a set of attribute field for
each.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


48

DEVELOPING INFORMATION FROM A GIS DATABASE


This is mainly done through data extraction, data retrieval and querying.

Data extraction
It is a GIS process similar to vector overlay but can be used in either vector and raster
data analysis. Rather than combining the properties and features of both datasets,
data extraction involves using “clip” or “mask” in which features fall in a given extent.

Identifying features based on conditions


GIS querying involves to determining the locations that satisfy certain conditions. In this
case the user knows what characteristics are important and wants to find out where the
features are that have those characteristics.

One can perform analysis to obtain the answers to a particular question or find solutions
to particular problem

Retrieval
GIS analysis allows the user to retrieve data. Retrieval occurs in both spatial and
attribute data.

Data retrieval involves the capability to easily select data for graphic or attribute
editing, updating, querying, analysis and/or display.

Retrieval involves the selecting, search, manipulation and output of data without the
requirement to modify the geographic location of the features involved. Often, data is
selected by “Select attributes” and viewed “Graphically.”

The ability to retrieve data is based on the unique structure of the DBMS and
command interfaces are commonly provided with the software. Most GIS software also
provides a programming subroutine library, or macro language, so the user can write
their own specific data retrieval routines if required.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


49

Querying
This is the capability to retrieve data, usually a data subset, based on some user
defined formula. These data subsets are often referred to as logical views. Often the
querying is closely linked to the data manipulation and analysis subsystem.

Many GIS software offerings have attempted to standardize their querying capability
by use of a Standard Query Language (SQL). This is especially true with systems that
make use of an external relational DBMS. Through the use of SQL, GIS software can
interface to a variety of different DBMS packages.

Reclassification
Reclassification involves the selection and presentation of a selected layer of data
based on the classes or values of a specific attribute e.g. cover group. It involves
looking at an attribute, or a series of attributes, for a single data layer and classifying
the data layer based on the range of values of the attribute.

Accordingly, features adjacent to one another that have a common value, e.g. cover
group, but differ in other characteristics, e.g. tree height, species, will be treated and
appear as one class.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


50

Reclassification in raster based GIS software,


Numerical values are often used to indicate classes. Reclassification is an attribute
generalization technique. Typically this function makes use of polygon patterning
techniques such as cross-hatching and/or color shading for graphic representation.

Reclassification In a vector based GIS,


Boundaries between polygons of common reclassed values should be dissolved to
create a cleaner map of homogeneous continuity. Raster reclassification basically
involves boundary dissolving.

The dissolving of map boundaries based on a specific attribute value often results in a
new data layer being created. This is often done for visual clarity in the creation of
derived maps. Almost all GIS software provides the capability to easily dissolve
boundaries based on the results of a reclassification. Some systems allow the user to
create a new data layer for the reclassification while others simply dissolve the
boundaries during data output.

Note;
The querying capability of the DBMS (software) is a necessity in the reclassification
process. The ability and process for displaying the results of reclassification, a map or
report, will vary depending on the GIS.

In some systems the querying process is independent from data display functions,
while in others they are integrated and querying is done in a graphics mode. The exact
process for undertaking a reclassification varies greatly from GIS to GIS.

NETWORKS AND NETWORK ELEMENTS


Collections of (connected) lines may represent phenomena that are best viewed as
networks. With networks, specific types of interesting questions arise that have to do
with connectivity and network capacity these relate to activities e.g. traffic monitoring
and watershed management.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


51

With network elements, that is, the lines that make up the network, extra values are
commonly associated like distance, quality of the link, or carrying capacity.

Triangulated Irregular Network (TINS)

A Commonly used data structure in GIS software is the Triangulated Irregular Network,
or TIN. It is one of the standard implementation techniques for digital terrain models,
but it can be used to represent any continuous field.

Principle of a TIN
 It is built from a set of locations (coordinated points) for which we have a
measurement, for instance an elevation.
 The locations can be arbitrarily scattered in space.
 From these three-dimensional points, constructing an irregular tessellation made
of triangles is done.

This is illustrated in figure 2.8. Two such tessellations are illustrated in figure 2.9.

NOTE:
In a three-dimensional space, three points uniquely determine a plane, as long they are
not collinear, i.e., they must not be positioned the same line.

A plane fitted through these points has a fitted aspect and gradient, and can be used
general spatial topology.
Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]
52

The triangulation of figure 2.9(b) happens to be a Delaunay triangulation, which in


a sense is an optimal triangulation.

Properties of a Delaunay triangulation


1. Triangles are as equilateral (equal– sided) as they can be, given the set of
anchor points,
2. For each triangle the circumcircle through its three anchor points does not
contain any other anchor point. (One such circumcircle is depicted on the right
of figure 2.9 (b)).

A TIN clearly is a vector representation: each anchor point has a stored reference
(coordinate). It is also called an irregular tessellation, as the chosen triangulation
provides a partitioning of the entire study space.

However, in the case the cells do not have an associated stored value as is typical
of tessellations, but rather a simple interpolation function that use the elevation
values of its there anchor points.

DATA QUALITY
As Geo-information is intended to reduce uncertainty in decision making, any errors
and uncertainties in spatial information products may have practical, financial and even
legal implications for user.

For this reason, it is important to that those involved in the acquisition and processing
of spatial data are to assess the quality of the base data and the derived information
resulting products.

The international standards Organization (ISO) considers quality to be the totality of


characteristic of a product that bear on its ability to satisfy and implied need. The
extent to which error and other short comings of a data affect decision making
depending on the data

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


53

Spatial data quality


Quality is often defined as “fitness for use”. Traditionally, most spatial data were
collected and held by individual, specialized organizations. In recent years, increasing
availability and decreased cost of data capture equipment has resulted in many users
collecting their own data.

However the collection and maintenance of base data remains the responsibility of the
various governmental agencies, such as National Mapping Agencies (NMP), which are
responsible for collecting topographic data for the entire country following supply
companies, local government department s and many others all collect and maintain
spatial data for their own particular purposes.

If data is to be shared among different users, these users need to know not only what
data exists, where and in what format it is held, but also whether the data meets their
particular quality requirements. This “data about data” is known as metadata.

Since the real power of GIS lies in their ability to combine and analyze georeferenced
data from a range of sources, we must pay attention to the issues of data quality and
error as data from different sources are also likely to contain different kinds of error,

This may include mistakes or variation in the measurement of position and/ or


elevation, in the quantitative measurement of attributes or in the labeling or
classification of features. Some degree of error is present in every spatial data set.

It is important, however, to distinguish between gross errors (blunders or mistakes),


which much be detected and removed before the data is used, variations in the data
caused by unavoidable measurement and classification errors

Key components of spatial data quality include;


Positional accuracy (both horizontal and vertical),
Temporal accuracy (that the data up-to-dateness),
Attribute accuracy (e.g. in labeling of features or of classifications),
Lineage (history of the data in including source),
Completeness (if the dataset represents all related feature of reality),

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


54

Logical consistency (that the data is logically structured).

Applicability (suitability of a particular dataset to meet a purporse

Precision (degree of closeness of values to eachother)

These components play an important role in assessment of data quality for several
reasons:
1. Even when source data, such as official topographic maps have been subject to
stringent quality control, errors are introduced when these data are input to GIS.
2. Unlike a conventional map, which is essentially a single product, a GIS database
normally contains data from different sources of varying quality.
3. Unlike topographic or cadastral database, natural resource database contain
data that are inherently uncertain and therefore not suited to conventional
quality control procedures.
4. Most GIS analysis operations will themselves introduce errors.

DATA PRESENTATION/OUTPUT
The end result is best visualized as a maps, images, 3D views, reports or graph. Maps are
efficient for storing and communicating geographic information. GIS provides new and
exciting tools to extend the art and science of map making. Maps can be integrated with
reports, three-dimensional views, photographic images, and other digital media.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


55

Sharing the results of your geographical analysis is one of the primary justifications
for investing resources in GIS. Taking displays created through a GIS and
outputting them into distributable formats is a great way to do this.

The more avenues for output a GIS can offer, the greater the potential for
reaching the right audience with the right information.

MAP PROJECTION
This is a system in which the locations on the curved surface of the earth are displayed
on a flat surface. There are 3 basic types of map projections which include the
following.

a. Planner projections: also called Azimuthal projections. These are the ways
how the curved surfaces of the earth are projected into the plane. A flat surface
is in contact with the globe at the poles and then all the points are projected in a
flat surface e.g.

b. Conical projections: Here projections are made at the surfaces of the cone
tangent at the circle e.g.

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


56

c. Cylindrical projection: Transformations made to the surface of the cylinder


tangent at the circle.

UNIVERSAL TRANSVERSE MERCATOR (UTM)


This is the most commonly used system in topographic and cadastral mapping i.e.
mapping specific things on the maps like drainage and settlement. In order to stop
distortions, the world is divided into 60 zones each 6o degrees of longitudes hence
60x6o =360o a circular shape is involved. The zones are numbered east wards beginning
at 180o west near America and each zone is divided into strips of 8o latitude i.e.

F
E
D
C

8O
6O

The strips are numbered upwards from C to S. the coordinates are in UTM are
expressed in meters as Eastings and Northings. The central meridian is given an easting
of 500,000m at start. The northings vary from the equator and the equator is given the
value zero (0) but from the southern hemisphere, the equator is given a value of
10,000,000 metres. the UTM system only works between 84o to 80o. Beyond this the
UTM does not work.(causes distortions).

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]


57

Advantages of UTM
a. They are the most frequently used systems during mapping.
b. It is a universal approach to geo-referencing.
c. It is consistent to most parts of the globe.

Disadvantages of UTM

There is distortion in the edges( beyond 80-84 degrees).

Principles of GIS, Compiled by Lubwama Raymond 0773709336; [email protected]

You might also like