0% found this document useful (0 votes)
51 views

Data Input

The document discusses data input methods in GIS, including obtaining data from both analog and digital sources. Analog data like paper maps must be converted to digital formats before use in GIS. There are various methods for inputting data, such as keyboard entry of attribute data, scanning of paper maps, and electronic data transfer of digital data. Scanning parameters like resolution and scanning mode must be chosen appropriately based on the source data and intended use of the scanned raster files.

Uploaded by

kmj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Data Input

The document discusses data input methods in GIS, including obtaining data from both analog and digital sources. Analog data like paper maps must be converted to digital formats before use in GIS. There are various methods for inputting data, such as keyboard entry of attribute data, scanning of paper maps, and electronic data transfer of digital data. Scanning parameters like resolution and scanning mode must be chosen appropriately based on the source data and intended use of the scanned raster files.

Uploaded by

kmj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Input

Data Input

• Data encoding is the process of getting data into the computer. It is a


process that is fundamental to almost every GIS project.

• Spatial data can be obtained from many different sources, in different


formats, and can be input to GIS using a number of different methods.

• In GIS, data almost always need to be corrected and manipulated to ensure


that they can be structured according to the required data model.

• There are different methods to get data into a GIS. These include keyboard
entry, digitizing, scanning and electronic data transfer.
Analogue and Digital Sources

• There is a difference between analogue (non-digital) and digital sources of


spatial data.
• Analogue data are normally in paper form. It includes paper maps, tables of
statistics and hard-copy (printed) aerial photographs. These data need to be
converted to digital form before use in a GIS. Therefore, the data encoding and
correction procedures are longer than those for digital data. Digital data are
already in computer-readable formats and are supplied on CD-ROM or across a
computer network or the Internet. Map data, aerial photographs, satellite
imagery, data from databases and automatic data collection devices (such as
data loggers and GPS) are all available in digital form.
• If data were all of the same type, format, scale and resolution, then data
encoding and integration would be simple. However, this task is complicated as
the characteristics of spatial data are as varied as their sources. This variety has
implications for the way data are encoded and manipulated to develop an
integrated GIS database. Much effort has been made in recent years to develop
universal GIS data standards and common data exchange formats
Methods Of Data Input

• Data in analogue or digital form need to be encoded to be compatible with


the GIS being used. This would be a relatively straightforward exercise if
all GIS packages used the same spatial and attribute data models. However,
there are many different GIS packages and many different approaches to
the handling of spatial and attribute data.
• Following input methods will be discussed
– Keyboard entry
– Scanning
– Manual digitizing/ Heads-up digitizing
– Electronic data transfer
Keyboard entry
• Keyboard entry, often referred to as key-coding, is the entry of data into a file
at a computer terminal. This technique is used for attribute data that are only
available on paper.

• If details of the hotels in Happy Valley were obtained from a tourist guide, both
spatial data (the locations of the hotels – probably given as postal or zip codes)
and attributes of the hotels (number of rooms, standard and full address) would
be entered at a keyboard. For a small number of hotels keyboard entry is a
manageable task. If there were hundreds of hotels to be coded, an alternative
method would probably be sought.

• Text scanners and optical character recognition (OCR) software can be used to
read in data automatically. Attribute data, once in a digital format, are linked to
the relevant map features in the spatial database using identification codes.
These are unique codes that are allocated to each point, line and area feature in
the data set.

• The co-ordinates of spatial entities can be encoded by keyboard entry, although


this method is used only when co-ordinates are known and there are not too
many of them.
Raster Data Capture
• Primary data capture:
– It involves the direct measurement of objects.
– The most popular form of primary raster data capture is remote sensing.
– Sources
• Satellite images
• Arial photographs

• Secondary data capture:


– Secondary sources are digital and analog datasets that were originally captured for
another purpose and need to be converted into a suitable digital format for use in a
GIS project.
– Typical secondary sources include raster scanned color aerial photographs of urban
areas and United States Geological Survey (USGS) or SOI paper maps that can be
scanned and vectorized.
Raster data capture using scanners
• A scanner is a piece of hardware for converting an analogue source document into
digital raster format.
• All scanners work by sampling the source document using transmitted or reflected light.
All scanners work on the same principles. A scanner has a light source, a background
(or source document), lens, Charge-coupled device (CCD) sensor. It scans successive
lines across a map or document and record the amount of light reflected from a local
data source. The differences in reflected light are normally scaled into bilevel black and
white (1 bit per pixel), or multiple gray levels (8, 16, or 32 bits).
• The result of the scanning process is a raster image of the original map, which can be
stored in a standard image format such as a geographic interchange file (GIF) or TIFF.
After georeferencing the image—this involves specifying the coordinates of an image
corner and the pixel size both in real-world units—it can be displayed in many GIS
packages as a backdrop to existing vector data. Usually, however, geographic features
from the image are extracted either manually or automatically and converted to vector
data.
• There are three different types of scanner in widespread use
– flat-bed scanners
– rotating drum scanners
– large-format feed scanners
Flat-bed scanners

• Flat-bed or desktop scanners are currently found in many offices. They are of
relatively small format so that larger maps must be scanned in several parts and
joined in the computer. The document is placed upside down on a glass plate and
the camera and light source move along the document beneath the glass.

• The strength of flat- bed scanners is their low cost and easy set-up and maintenance.
They are useful for scanning text documents—for example data tables—which are
later interpreted using optical character recognition software. They also provide a
means to bring small graphics and maps into a computer.

• They are less suitable for large scale map conversion tasks, where many large
format topographic and thematic maps need to be scanned. Scanning such maps in
sections and joining the pieces later in the computer is time consuming and may
introduce a large number of errors.
Drum scanners

• Drum scanners are more expensive and are used for professional applications that
require very high precision (e.g., photogrammetry or medical applications).

• The map is fixed on a rotating drum. A sensor system then moves along the map and
registers the light intensity or colour of each pixel.

• While drum scanners provide very high precision, they are also very expensive and
fairly slow. A single scan may take from 15 to 20 minutes.
Feed scanners

• Feed scanners are currently the most commonly used scanner type for large-scale
GIS applications.

• In feed scanners the sensor system is static. Instead, the map is moved across a
sensor array.

• Their accuracy is lower than that of drum scanners, since the map feed can be less
precisely controlled than the scanner movement. But their accuracy is usually
sufficient for GIS applications, their cost is lower and they typically produce
images in less than five minutes. A limitation is that older or fragile documents
might be damaged by the feed scanner’s rollers.
Scanning Parameters
• The scanner settings chosen by the operator have a large impact on the output image
characteristics. Choosing the optimal parameters requires a certain amount of
experimentation, since it depends on the scanner options, the characteristics of the
base maps or photos that are scanned and the anticipated further processing steps. The
most important parameters are the following:

– Scanning mode. Binary or “line art” is appropriate for monochrome drawings or


sketches, as well as for colour separations, where all features are basically of the
same type. Grey-scale mode preserves variation on a map and subsequent image
manipulation can be used to extract only features that have a certain reflectance
value in a graphics or image processing system. This is even easier when the
maps are scanned in colour mode, where, for instance, all features drawn in green
on the map can be extracted using a few simple commands.

– Image resolution is measured in dots per inch (dpi). Common scanning


resolutions are between 100 and 400 dpi (although air photos are usually scanned
at higher resolution on special-purpose scanners). A higher scanning resolution
preserves more details of the original map and results in smoother lines in the
vectorized GIS data set. But the resulting images will be larger and will require
more memory and disk space; a doubling of scanning resolution results in a four-
times larger image size. The choice depends on the properties of the source
document, available hardware and the intended use of the resulting image.
Scanning Parameters

– Vectorization : The output from scanned maps is often used to generate vector
data. This process involves either automatic or interactive (user-controlled)
raster to vector conversion. Problems occur with this process due to
topographical effects at intersections, cartographic annotation such as contour
heights or road names, generalization of features smaller than the resolution of
the scanner and the subsequent coding of attributes.

– Georeferencing: Ultimately the output from a scanner needs to be correctly


referenced according to the co-ordinate system used in the GIS. Normally this
process is controlled using linear transformation from the row and column
number of the scanned image to the chosen geographic co-ordinate system.
• Practical problems faced when scanning source documents include:
– the possibility of optical distortion when using flat-bed scanners;
– the automatic scanning of unwanted information (for example, hand-drawn
annotations, folds in maps or coffee stains);
– the selection of appropriate scanning tolerances to ensure important data are
encoded, and background data ignored;
– the format of files produced and the input of data to GIS software; and
– the amount of editing required to produce data suitable for analysis.
• Scanners work best when very clean map materials are available. Even the
most expensive scanners may report a significant number of spurious lines
or points when old, marked, folded, or wrinkled maps are used. These
spurious features must be subsequently removed via manual editing, thus
negating the speed advantage of scanning over manual digitizing.
Vector data capture

• Primary Data Source


– Land surveying
– GPS
• Secondary Data Source
– manual digitizing or heads-down digitizing
– heads-up digitizing and vectorization
Manual Digitizing/ Heads-down Digitizing
• Manually operated digitizers are the simplest, cheapest, and most commonly
used means of capturing vector objects from hardcopy maps.
• It requires a digitizing table that is linked to a computer workstation.
• The digitizing table is essentially a large flat table, that may range in size from
small tables of 30 x 30 cm to large digitizing tables of 120 x 180cm. Larger
digitizing tables facilitate the digitization of larger map sheets.
• The surface of the table is underlain by a very fine mesh of wires. This grid
creates an electromagnetic field. The cursor contains a metal coil so that the
digitizing board and cursor act as a transmitter and receiver. This allows the
cursor to determine the nearest wires in the x and y direction.
• A cursor is attached to the digitizer via a cable, that can be moved freely over
the surface of the table. Buttons on the cursor allow the user to send
instructions to the computer. The position of the cursor on the table is registered
by reference to its position above the wire mesh.
• They operate on the principle that it is possible to detect the location of a cursor
or puck passed over a table inlaid with a fine mesh of wires.
• Digitizing table accuracies typically range from 0.0004 inch (0.01 mm) to 0.01
inch (0.25 mm).
Using a manual digitizing table
• Before starting to digitize information from a paper map, care needs to be taken to
ensure that the source document is in good condition and free from physical defects
that could affect the digitizing process. Creases and folds can prevent the map from
lying flat on the digitizer, while coffee stains, handwritten annotations etc can
obscure detail. The procedure followed when digitizing a paper map using a manual
digitizer has five stages:
1 Registration. The map to be digitized is fixed firmly to the table top with sticky tape.
Once the map has been registered the user may begin digitizing the desired features
from the map.
2 Digitizing point features. Point features are recorded as a single digitized point. A
unique code number or identifier is added so that attribute information may be
attached later. For instance, the hotel with ID number ‘1’ would later be identified
as ‘Mountain View’.
3 Digitizing line features. Line features are digitized as a series of points that the
software will join with straight line segments. In some GIS packages lines are
referred to as arcs, and their start and end points as nodes. As with point
features, a unique code number or identifier is added to each line during the
digitizing process and attribute data is attached using this code. For example,
when digitizing a road network, data describing road category, number of
carriageways, surface type, date of construction and last resurfacing might be
added.
4 Digitizing area (polygon) features. Area features or are digitized as a series of
points linked together by line segments in the same way as line features. Here it
is important that the start and end points join to form a complete area.
5 Adding attribute information. Attribute data may be added to digitized polygon
features by linking them to a centroid (or seed point) in each polygon. These are
either digitized manually (after digitizing the polygon boundaries) or created
automatically once the polygons have been encoded. Using a unique identifier
or code number, attribute data can then be linked to the centroids of appropriate
polygons. In this way, the forest stand may have data relating to tree species,
tree ages, tree numbers and timber volume attached to a point within the
polygon.
Modes of manual digitizer
• Point mode
– the operator must depress a button to sample each point
– person digitizing decides where to place each individual point such as to
most accurately represent the line within the accepted tolerances of the
digitizer. Points are placed closer together where the line is most complex
and where the line changes direction. Points are placed further apart where
the line is less complex or made up of straight line segments.

• Stream mode
– Points are automatically sampled at a fixed time e.g., once each second.
– It is not appropriate when digitizing point features, because it is usually not
possible to find and locate points at a uniform rate.
– It may be advantageous when large numbers of lines are digitized, because
points may be sampled more quickly and there may be less operator fatigue.
– Sampling rate must be specified with care to avoid over- or undersampled
lines.
• Too rapid a collection frequency results in redundant points not needed
to accurately represent line or polygon shape.
• Too slow a collection frequency in stream mode digitizing may result in
the loss of important spatial detail
Advantages and disadvantages of manual digitizing

• The advantages of digitizing include the following:


– Digitizing is easy to learn and thus does not require expensive skilled
labour.
– Attribute information can be added during the digitizing process;
– High accuracy can be achieved through manual digitizing; that is., there
is usually no loss of accuracy compared to the source map.
• The disadvantages are as follows:
– Digitizing is tedious possibly leading to operator fatigue and resulting
quality problems that may require considerable post-processing;
– Manual digitizing is quite slow. Large-scale data conversion projects
may thus require a large number of operators and digitizing tables;
– In contrast to primary data collection using GPS or aerial photography,
the accuracy of digitized maps is limited by the quality of the source
material.
Heads-up digitizing and vectorization

• One of the main reasons for scanning maps is as a prelude to vectorization


– the process of converting raster data into vector data. Vectorization is the
process of converting raster data into vector data. The reverse is called
rasterization.
• The simplest way to create vectors from raster layers is to digitize vector
objects manually straight off a computer screen using a mouse or digitizing
cursor. This method is called heads-up digitizing because the map is
vertical and can be viewed without bending the head down.
• It is widely used for the selective capture of, for example, land parcels,
buildings, and utility assets.
Electronic data transfer

• Given the difficulties and the time associated with keyboard encoding, manual
digitizing and automatic digitizing, the prospect of using data already in digital
form is appealing. If a digital copy of the data required is available in a form
compatible with your GIS, the input of these data into your GIS is merely a
question of electronic data transfer. However, it is more than likely that the data you
require will be in a different digital format to that recognized by your GIS.
Therefore, the process of digital data transfer often has to be followed by data
conversion. During conversion the data are changed to an appropriate format for
use in your GIS.
• Spatial data may be collected in digital form and transferred from devices such as
GPS receivers, total stations (electronic distance-metering theodolites), and data
loggers attached to all manner of scientific monitoring equipment. All that may be
required is wireless data transfer for a user to download the data to a file on their
computer. In some cases it may be possible to output data from a collection device
in a GIS format.
• Electronic data transfer will also be necessary if data have been purchased from a
data supplier, or obtained from another agency that originally encoded the data.
• Users must address a number of questions if they wish to obtain data in digital form
from another source:
1. What data are available?
– There are few ‘data hypermarkets’ where you can go to browse, select and
purchase spatial data. Instead, you usually have to rummage around in the data
marketplace trying to find what you need at an affordable price.
Advertisements for data in digital format can be found in trade magazines, data
can be obtained from national mapping agencies and a range of data is
available from organizations via the Internet. Several organizations have set up
data ‘clearing houses’ where you can browse for and purchase data online.

2. What will the data cost?


– Data are very difficult to price, so whilst some digital data are expensive,
others are freely available over the Internet. The pricing policy varies
depending on the agency that collected the data in the first place, and this in
turn may be affected by national legislation. Because of the possibility of
updating digital data sets, many are bought on an annual licensing agreement,
entitling the purchaser to new versions and corrections. The cost of digital data
may be an inhibiting factor for some GIS applications.
3. On what media will the data be supplied?
– Data may be available in a number of ways. These range from optical disks
(CD-ROM) to network transfers across internal local area networks (LAN) or
the Internet. Even these methods of data input are not without their problems
like networks are often subject to faults or interruptions. These may lead to data
being lost or corrupted.
4. What format will the data be in – will standards be adhered to?
– In recent years, a great deal of effort has been put into the development of
national and international data standards to ensure data quality and to improve
compatibility and transfer of data between systems. The GIS software vendors
have developed their own standards. As a result, software is becoming
increasingly compatible and there is a plethora of standards, developed by
vendors, users and national and international geographic information agencies.
DATA EDITING
• Data encoding may not give error-free data set into GIS. Data may include errors
derived from the original source data, as well as errors that have been introduced
during the encoding process. There may be errors in co-ordinate data as well as
inaccuracies and uncertainty in attribute data. Good practice in GIS involves
continuous management of data quality, and it is normal at this stage in the data
stream to make special provision for the identification and correction of errors. It is
better to intercept errors before they contaminate the GIS database and go on to
infect (propagate) the higher levels of information that are generated. The process is
known as data editing or ‘cleaning’.

• Data editing can be likened to the filter between the fuel tank and the engine that
keeps the fuel clean and the engine running smoothly. Four topics are covered here:
– detection and correction of errors;
– reprojection, transformation and generalization;
– Edge matching and rubber sheeting; and
– updating of spatial databases.
Detecting and correcting errors

• Errors in input data may derive from three main sources:


– Errors in the source data
• Errors in source data may be difficult to identify. For example, there
may be subtle errors in a paper map source used for digitizing because
of the methods used by particular surveyors, or there may be printing
errors in paper based records used as source data.
– Errors introduced during encoding
• During encoding a range of errors can be introduced. During keyboard
encoding it is easy for an operator to make a typing mistake; during
digitizing an operator may encode the wrong line; and folds and stains
can easily be scanned and mistaken for real geographical features.
– Errors propagated during data transfer and conversion
• During data transfer, conversion of data between different formats
required by different packages may lead to a loss of data.
• Errors in attribute data are relatively easy to spot and may be identified
using manual comparison with the original data. Various methods, in
addition to manual comparison, exist for the correction of attribute errors.
These are described in the next slide.

• Errors in spatial data are often more difficult to identify and correct than
errors in attribute data. These errors take many forms, depending on the
data model being used (vector or raster) and the method of data capture.
Examples of errors that may arise during encoding (especially during
manual digitizing) are presented in Table in next slide. Figure in next slide
illustrates some of the errors that may be encountered in vector data.
­ ­156 Chapter­5 Data input and editing

PRACTICE
BOX­5.9 Methods of
attribute data checking

Several methods may be used to check for errors in found, then there must be an error somewhere in the
the encoding of attribute data. These include: attribute data.

1 Impossible values. Simple checks for impossi- 4 Scattergrams. If two or more variables in
ble data values can be made when the range of the the attribute data are correlated, then errors can
data is known. Data values falling outside this range be identified using scattergrams. The two vari-
are obviously incorrect. For example, a negative ables are plotted along the x and y axes of a graph
rainfall measurement is impossible, as is a slope of and values that depart noticeably from the regres-
sion line are investigated. Examples of correlated
100 degrees.
variables from Happy Valley might be altitude and
2 Extreme values. Extreme data values should be temperature, or the category of a hotel and the cost
cross-checked against the source document to see of accommodation.
if they are correct. An entry in the attribute database
5 Trend surfaces. Trend surface analyses may
that says the Mountain View Hotel has 2000 rooms
be used to highlight points with values that depart
needs to be checked. It is more likely that this hotel
markedly from the general trend of the data. This
has 200 rooms and that the error is the result of a
technique may be useful where a regional trend is
typing mistake.
known to exist. For example, in the case of Happy
3 Internal consistency. Checks can be made Valley most ski accidents occur on the nursery slopes
against summary statistics provided with source and the general trend is for accidents to decrease as
documents where data are derived from statistical the ski piste becomes more difficult. Therefore, an
tables. Totals and means for attribute data entered advanced piste recording a high number of accidents
into the GIS should tally with the totals and means reflects either an error in the data set or an area
reported in the source document. If a discrepancy is requiring investigation.

TABLE­5.4­ Common­errors­in­spatial­data

Error Description Unclosed polygon

Missing entities Missing points, lines or


Duplicate line
boundary segments segments
Duplicate entities Points, lines or boundary Pseudonode
segments that have been
Unlabelled
digitized twice
polygon
Mislocated entities Points, lines or boundary Overshoot
segments digitized in the
wrong place Loop or knot

Missing labels Unidentified polygons


Spike
Duplicate labels Two or more identification
labels for the same polygon Duplicate
Dangling label points
Artefacts of digitizing Undershoots, overshoots, node
wrongly placed nodes,
loops and spikes
Figure­5.13­ Examples of spatial error in vector data
Re-projection, transformation and generalization
• Once spatial and attribute data have been encoded and edited, it may be necessary to
process the data geometrically in order to provide a common framework of reference. The
scale and resolution of the source data are also important and need to be taken into
account when combining data from a range of sources into a final integrated database.
• Re-projection
– Data derived from maps drawn on different projections will need to be converted to a
common projection system before they can be combined or analysed. If not re-
projected, data derived from a source map drawn using one projection will not plot in
the same location as data derived from another source map using a different
projection system.
• Transformation
– Data derived from different sources may also be referenced using different co-
ordinate systems. The grid systems used may have different origins, different units of
measurement or different orientation. If so, it will be necessary to transform the co-
ordinates of each of the input data sets onto a common grid system.
• Generalization
– Data may be derived from maps of different scales. The accuracy of the output from a
GIS analysis can only be as good as the worst input data. Thus, if source maps of
widely differing scales are to be used together, data derived from larger-scale
mapping should be generalized to be comparable with the data derived from smaller-
scale maps.
Edge matching and rubber sheeting

• Edge matching
– When a study area extends across two or more map sheets, small differences or
mismatches between adjacent map sheets may need to be resolved. Normally,
each map sheet would be digitized separately and then the adjacent sheets
joined after editing, re-projection, transformation and generalization. The
joining process is known as edge matching
• Rubber sheeting
– It involves stretching the map in various directions as if it were drawn on a rubber
sheet. Objects on the map that are accurately placed are ‘tacked down’ and kept
still whilst others that are in the wrong location or have the wrong shape are
stretched to fit with the control points. These control points are fixed features that
may be easily identified on the ground and on the image. Their true co-ordinates
may be determined from a map covering the same area or from field observations
using GPS. Distinctive buildings, road or stream intersections, peaks or coastal
headlands may be useful control points.
Updating and maintaining spatial databases

• A great deal of effort is put into creating spatial databases and so it makes good
sense to keep important data as up to date as possible.

• The world is a very dynamic place and things change, often rapidly and especially
in urban areas where new buildings and roads are being built. This means that
spatial data can go out of date and so needs regular updating.

• National mapping agencies such as the Ordnance Survey spend a lot of time and
resources maintaining the currency of their national mapping products so that they
are fit for use by their customers.

You might also like