Lecture 6 - GIS Functions - Part 2
Lecture 6 - GIS Functions - Part 2
Systems
Lecture 6
GIS Basic Functions -Part II
Data Management & Exploration
Prepared by
Dr. Naglaa Fathy
[email protected]
Image source: Westfield State University
Agenda
• GIS Data Management
✓Vector Data
✓Raster Data
• GIS Data Quality
• GIS Data Exploration
2
GIS Data Management
• A geographic information system (GIS) involves both spatial and
attribute data:
➢ Spatial data relate to the geometries (locations) of spatial features, and
➢ Attribute data describe the characteristics of the spatial features.
3
GIS Data Management - Vector file formats
• The georelational data model
• Stores spatial data and attribute data separately and links the two by the
feature ID
• The two data sets are synchronized so that they can be queried, analyzed, and
displayed in unison.
• It is provided in two file formats : Shape files and Coverage files.
• The object-based data model (e.g., geodatabase)
• Combines both geometries and attributes in a single system.
• Each spatial feature has a unique object ID and an attribute to store its
geometry.
• Although the two data models handle the storage of spatial data
differently, both operate in the same relational database environment.
4
GIS Data Management - Vector file format (Shape)
5
GIS Data Management - Vector file format (Shape)
• Despite being called a “shapefile,” this format is actually a compilation
of many different files.
• One shapefile must have at least 3 files, but most shapefiles have
around 6 files. A shapefile must have:
➢ .shp – this file stores the geometry of the feature
➢ .shx – this file stores the index of the geometry
➢ .dbf – this file stores the attribute information for the feature
• All files for the shapefile must be stored in the same location with the
same name or else the shapefile will not load.
6
GIS Data Management - Vector file format (Coverage)
• The earliest vector format file for use in GIS software packages, which
is still in use today, is the ArcInfo coverage.
• This georelational file format supports multiple features types (e.g.,
points, lines, polygons, annotations) while also storing the topological
information associated with those features.
➢ Information for Arc-Node topology, Polygon-Arc topology, and Left-Right
topology is stored in coverage files.
• Attribute data are stored as multiple files in a separate directory
labeled “Info”.
7
GIS Data Management - Raster file format
• The raster data model presents a different scenario in terms of data
management.
• The cell value corresponds to the value of a continuous/discrete data
at the cell location. And the Value Attribute Table (VAT) summarizes
cell values and their frequencies rather than cell values by cell.
8
GIS Data Management - Raster file format
9
GIS Data Management - Raster file format
• Native JPEG, TIFF, and PNG files do not have georeferenced information
associated with them, and therefore cannot be used in any geospatial
mapping efforts.
• In order to employ these files in a GIS, a world file must first be created.
• A world file is a separate, plaintext data file that specifies the locations and
transformations that allow the image to be projected into a standard coordinate
system.
• Other examples of a raster file formats with explicit georeferencing
information is the MrSID (Multiresolution Seamless Image Database) format,
and the ECW (Enhanced CompressionWavelet) format.
10
GIS Data Management - Hybrid file formats
11
GIS Data Quality
• Data quality refers to the ability of a given dataset to satisfy the
objective for which it was created. It could be characterized by accuracy.
• Accuracy describes how close a measurement is to its actual value and is
often expressed as a probability.
➢ (e.g., 80 percent of all points are within +/− 5 meters of their true locations)
13
GIS Data Quality - Positional accuracy
• Positional inaccuracy in a digitized map is evaluated by measuring the
root-mean square (RMS).
➢ This statistic measures the deviation between the actual (true) and estimated
(digitized) locations of the control points.
• For example
• This figure illustrates the inaccuracies of lines representing soil types.
• By applying an RMS error calculation to the dataset, one could determine the
accuracy of the digitized map and thus determine its suitability for inclusion in
a given study.
14
GIS Data Quality - Positional accuracy
• Positional errors can also arise
when features to be mapped are
inherently vague.
• Take the example of a wetland,
What defines a wetland boundary?
15
Data quality - Attribute & Temporal
• Attribute accuracy
Attribute errors can occur when an incorrect value is recorded within the
attribute field or when a field is missing a value.
• Temporal accuracy
➢ It addresses the age or timeliness of a dataset. No dataset is ever
completely current.
➢ several dates to be aware of while using a dataset(publication date,
collection date, etc.)
➢ To address temporal accuracy, many datasets undergo a regular data
update regime.
16
Data quality - Logical & Completeness
• Logical consistency
➢ It requires that the data are topologically correct.
➢ For example, Do roadways connect at nodes? Do all the connections and
flows point in the correct direction in a network?
• Data completeness
• All the data must be present for a dataset to be accurate.
• For example, Are all of the counties in the state represented? Are all of the
stream segments included in the river network?
17
GIS data exploration
18
GIS data exploration - Statistics
19
GIS data exploration - Graphs
21
Data exploration - Spatial query
24
Data exploration - Spatial query
Combining attribute and spatial queries:
• In many cases data exploration requires both types of queries:
• Example combined queries:
Find gas stations that are within 1 mile of a freeway exit in southern
California and have an annual revenue exceeding $2 million each.
➢ Given the layers of gas stations and freeway exits, there are at least two ways
to answer the question. (next slides)
25
Data exploration - Spatial query
Combining attribute and spatial queries - Solution 1
Use freeway exits as target features and gas stations as target features
in a spatial query to find gas stations that are within a distance of 1
mile of a freeway exit.
We can then use an attribute query to find those gas stations that
have annual revenues exceeding $2 million.
(A join query is performed because the attributes of selected gas
stations are joined to the attribute table of freeway exits.)
26
Data exploration - Spatial query
Combining attribute and spatial queries - Solution 2
27
28