Data Input
Data Input
Data Input
• There are different methods to get data into a GIS. These include keyboard
entry, digitizing, scanning and electronic data transfer.
Analogue and Digital Sources
• If details of the hotels in Happy Valley were obtained from a tourist guide, both
spatial data (the locations of the hotels – probably given as postal or zip codes)
and attributes of the hotels (number of rooms, standard and full address) would
be entered at a keyboard. For a small number of hotels keyboard entry is a
manageable task. If there were hundreds of hotels to be coded, an alternative
method would probably be sought.
• Text scanners and optical character recognition (OCR) software can be used to
read in data automatically. Attribute data, once in a digital format, are linked to
the relevant map features in the spatial database using identification codes.
These are unique codes that are allocated to each point, line and area feature in
the data set.
• Flat-bed or desktop scanners are currently found in many offices. They are of
relatively small format so that larger maps must be scanned in several parts and
joined in the computer. The document is placed upside down on a glass plate and
the camera and light source move along the document beneath the glass.
• The strength of flat- bed scanners is their low cost and easy set-up and maintenance.
They are useful for scanning text documents—for example data tables—which are
later interpreted using optical character recognition software. They also provide a
means to bring small graphics and maps into a computer.
• They are less suitable for large scale map conversion tasks, where many large
format topographic and thematic maps need to be scanned. Scanning such maps in
sections and joining the pieces later in the computer is time consuming and may
introduce a large number of errors.
Drum scanners
• Drum scanners are more expensive and are used for professional applications that
require very high precision (e.g., photogrammetry or medical applications).
• The map is fixed on a rotating drum. A sensor system then moves along the map and
registers the light intensity or colour of each pixel.
• While drum scanners provide very high precision, they are also very expensive and
fairly slow. A single scan may take from 15 to 20 minutes.
Feed scanners
• Feed scanners are currently the most commonly used scanner type for large-scale
GIS applications.
• In feed scanners the sensor system is static. Instead, the map is moved across a
sensor array.
• Their accuracy is lower than that of drum scanners, since the map feed can be less
precisely controlled than the scanner movement. But their accuracy is usually
sufficient for GIS applications, their cost is lower and they typically produce
images in less than five minutes. A limitation is that older or fragile documents
might be damaged by the feed scanner’s rollers.
Scanning Parameters
• The scanner settings chosen by the operator have a large impact on the output image
characteristics. Choosing the optimal parameters requires a certain amount of
experimentation, since it depends on the scanner options, the characteristics of the
base maps or photos that are scanned and the anticipated further processing steps. The
most important parameters are the following:
– Vectorization : The output from scanned maps is often used to generate vector
data. This process involves either automatic or interactive (user-controlled)
raster to vector conversion. Problems occur with this process due to
topographical effects at intersections, cartographic annotation such as contour
heights or road names, generalization of features smaller than the resolution of
the scanner and the subsequent coding of attributes.
• Stream mode
– Points are automatically sampled at a fixed time e.g., once each second.
– It is not appropriate when digitizing point features, because it is usually not
possible to find and locate points at a uniform rate.
– It may be advantageous when large numbers of lines are digitized, because
points may be sampled more quickly and there may be less operator fatigue.
– Sampling rate must be specified with care to avoid over- or undersampled
lines.
• Too rapid a collection frequency results in redundant points not needed
to accurately represent line or polygon shape.
• Too slow a collection frequency in stream mode digitizing may result in
the loss of important spatial detail
Advantages and disadvantages of manual digitizing
• Given the difficulties and the time associated with keyboard encoding, manual
digitizing and automatic digitizing, the prospect of using data already in digital
form is appealing. If a digital copy of the data required is available in a form
compatible with your GIS, the input of these data into your GIS is merely a
question of electronic data transfer. However, it is more than likely that the data you
require will be in a different digital format to that recognized by your GIS.
Therefore, the process of digital data transfer often has to be followed by data
conversion. During conversion the data are changed to an appropriate format for
use in your GIS.
• Spatial data may be collected in digital form and transferred from devices such as
GPS receivers, total stations (electronic distance-metering theodolites), and data
loggers attached to all manner of scientific monitoring equipment. All that may be
required is wireless data transfer for a user to download the data to a file on their
computer. In some cases it may be possible to output data from a collection device
in a GIS format.
• Electronic data transfer will also be necessary if data have been purchased from a
data supplier, or obtained from another agency that originally encoded the data.
• Users must address a number of questions if they wish to obtain data in digital form
from another source:
1. What data are available?
– There are few ‘data hypermarkets’ where you can go to browse, select and
purchase spatial data. Instead, you usually have to rummage around in the data
marketplace trying to find what you need at an affordable price.
Advertisements for data in digital format can be found in trade magazines, data
can be obtained from national mapping agencies and a range of data is
available from organizations via the Internet. Several organizations have set up
data ‘clearing houses’ where you can browse for and purchase data online.
• Data editing can be likened to the filter between the fuel tank and the engine that
keeps the fuel clean and the engine running smoothly. Four topics are covered here:
– detection and correction of errors;
– reprojection, transformation and generalization;
– Edge matching and rubber sheeting; and
– updating of spatial databases.
Detecting and correcting errors
• Errors in spatial data are often more difficult to identify and correct than
errors in attribute data. These errors take many forms, depending on the
data model being used (vector or raster) and the method of data capture.
Examples of errors that may arise during encoding (especially during
manual digitizing) are presented in Table in next slide. Figure in next slide
illustrates some of the errors that may be encountered in vector data.
156 Chapter5 Data input and editing
PRACTICE
BOX5.9 Methods of
attribute data checking
Several methods may be used to check for errors in found, then there must be an error somewhere in the
the encoding of attribute data. These include: attribute data.
1 Impossible values. Simple checks for impossi- 4 Scattergrams. If two or more variables in
ble data values can be made when the range of the the attribute data are correlated, then errors can
data is known. Data values falling outside this range be identified using scattergrams. The two vari-
are obviously incorrect. For example, a negative ables are plotted along the x and y axes of a graph
rainfall measurement is impossible, as is a slope of and values that depart noticeably from the regres-
sion line are investigated. Examples of correlated
100 degrees.
variables from Happy Valley might be altitude and
2 Extreme values. Extreme data values should be temperature, or the category of a hotel and the cost
cross-checked against the source document to see of accommodation.
if they are correct. An entry in the attribute database
5 Trend surfaces. Trend surface analyses may
that says the Mountain View Hotel has 2000 rooms
be used to highlight points with values that depart
needs to be checked. It is more likely that this hotel
markedly from the general trend of the data. This
has 200 rooms and that the error is the result of a
technique may be useful where a regional trend is
typing mistake.
known to exist. For example, in the case of Happy
3 Internal consistency. Checks can be made Valley most ski accidents occur on the nursery slopes
against summary statistics provided with source and the general trend is for accidents to decrease as
documents where data are derived from statistical the ski piste becomes more difficult. Therefore, an
tables. Totals and means for attribute data entered advanced piste recording a high number of accidents
into the GIS should tally with the totals and means reflects either an error in the data set or an area
reported in the source document. If a discrepancy is requiring investigation.
TABLE5.4 Commonerrorsinspatialdata
• Edge matching
– When a study area extends across two or more map sheets, small differences or
mismatches between adjacent map sheets may need to be resolved. Normally,
each map sheet would be digitized separately and then the adjacent sheets
joined after editing, re-projection, transformation and generalization. The
joining process is known as edge matching
• Rubber sheeting
– It involves stretching the map in various directions as if it were drawn on a rubber
sheet. Objects on the map that are accurately placed are ‘tacked down’ and kept
still whilst others that are in the wrong location or have the wrong shape are
stretched to fit with the control points. These control points are fixed features that
may be easily identified on the ground and on the image. Their true co-ordinates
may be determined from a map covering the same area or from field observations
using GPS. Distinctive buildings, road or stream intersections, peaks or coastal
headlands may be useful control points.
Updating and maintaining spatial databases
• A great deal of effort is put into creating spatial databases and so it makes good
sense to keep important data as up to date as possible.
• The world is a very dynamic place and things change, often rapidly and especially
in urban areas where new buildings and roads are being built. This means that
spatial data can go out of date and so needs regular updating.
• National mapping agencies such as the Ordnance Survey spend a lot of time and
resources maintaining the currency of their national mapping products so that they
are fit for use by their customers.