0% found this document useful (0 votes)
5 views

Data Classification

Data classification involves organizing raw data into distinct classes for better visualization, particularly in choropleth maps. Various methods of classification include equal intervals, quantiles, mean standard deviation, natural breaks, optimal classification, and head/tail breaks, each with its own approach to grouping data. The document discusses the principles and algorithms behind these methods, emphasizing the importance of spatial distribution in data classification.

Uploaded by

Mariam Kariam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Classification

Data classification involves organizing raw data into distinct classes for better visualization, particularly in choropleth maps. Various methods of classification include equal intervals, quantiles, mean standard deviation, natural breaks, optimal classification, and head/tail breaks, each with its own approach to grouping data. The document discusses the principles and algorithms behind these methods, emphasizing the importance of spatial distribution in data classification.

Uploaded by

Mariam Kariam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Classification

Data Classification
• Data classification involves grouping raw data into
classes, with each resulting class depicted by a different
symbol. Data classification is particularly appropriate for
choropleth maps because of the difficulty of
differentiating areal symbols (e.g., lightnesses of a single
hue) on an unclassed map.
A. Equal Intervals
• In the equal intervals (or equal steps) method of
classification, each class occupies an equal interval along
the number line.
The steps for computation for our six-class map are
as follows:
• Class Interval or Width
Equal Interval
• Determine the lower limit of each
class.
• Determine the Upper limit of
each class.
• Specify the class limits actually
shown in the legend
• Determine which observations
fall in each class
B. Quantiles
• In the quantiles method of classification, data are rank
ordered and the same number of observations is placed
in each class
• Quartiles
• Quintiles
• Sextiles
©. Mean Standard Deviation
The mean–standard deviation method is one of several
classification techniques that do consider how data are
distributed along the number line.
In this method, classes
are formed by repeatedly adding or subtracting the
standard deviation from the mean of the data.
Distribution
• Data are normally distributed (or near normal), the mean serves
as a useful dividing point, enabling a contrast of values above
and below it.

• For our sixclass map, Calculated Limits are computed using the
mean and standard deviation values listed under Normal
Distribution Limits.

• For a fiveclass map, the two middle classes could be combined,


and the mean would fall in the middle of the resulting class.
(D). Natural Breaks
• In natural breaks classification, graphs (e.g., a dispersion
graph or histogram) are examined visually to determine
logical breaks (or, alternatively, clusters) in the data.
• Minimize differences between data values in the same
class and to maximize differences between classes.
E.Optimal Classification
• The optimal classification method is a solution to the
subjectivity of natural breaks. The optimal method
places similar data values in the same class by
minimizing an objective measure of classification error.
Optimal
Jenks–Caspall algorithm
• The Jenks–Caspall algorithm, developed by George Jenks
and Fred Caspall (1971), is an empirical solution to the
problem of determining optimal classes.
• we assume that we wish to minimize the total map error
(ADCM)
Fisher–Jenks algorithm
• The Fisher–Jenks algorithm has a mathematical
foundation that guarantees an optimal solution. Walter
Fisher (1958) was responsible for developing the
mathematical foundation, and George Jenks (1977)
introduced the idea to cartographers—hence the term
Fisher-Jenks algorithm.
(E).Optimal (Median)
• ADCM is the sum of absolute deviations about class medians
for a particular number of classes, and ADAM is the sum of
absolute deviations about the median for the entire data set.
• An analogous measure can be computed when the mean is
used as the measure of central tendency (and the error in a
class is the sum of squared deviations about the mean) and is
known as the goodness of variance fit (GVF).
• GADF ranges from 0 to 1, with 0 representing the lowest
accuracy (a one-class map) and 1 representing the highest
accuracy.
HEAD/TAIL BREAKS:
• Bin Jiang (2013) has developed a new data classification
method known as head/tail breaks, which Jiang argues is
appropriate for data that are heavy-tailed (they have a
strong positive skew).
Head/tail breaks is a novel classification method that uses
the mean of the data to recursively divide the data as long
as the data above the mean are heavy-tailed (strongly
positively skewed). In addition to automatically
determining an appropriate number of classes, head/tail
breaks are unique in the sense that each lower-valued class
should be viewed as a background for higher-valued
classes.
Ceriteria
CONSIDERING THE SPATIAL
DISTRIBUTION OF THE DATA

You might also like