Integration of Principal Components Analysis and Cellular Automata For Spatial Decisionmaking and Urban
Integration of Principal Components Analysis and Cellular Automata For Spatial Decisionmaking and Urban
Abstract This paper discusses the issues about the correlation of spatial variables during spatial
decisionmaking using multicriteria evaluation (MCE) and cellular automata (CA). The correlation of
spatial variables can cause the malfunction of MCE. In urban simulation, spatial factors often ex-
hibit a high degree of correlation which is considered as an undesirable property for MCE. This
study uses principal components analysis (PCA) to remove data redundancy among a large set of
spatial variables and determine ‘ideal points’ for land development. PCA is integrated with cellular
automata and geographical information systems (GIS) for the simulation of idealized urban forms
for planning purposes.
Keywords: principal components analysis, cellular automata, geographical information systems, urban simulation.
Cellular automata (CA) were first introduced in 1948 by von Neumann and Ulam to model
complex dynamic systems, such as biological reproduction and crystal growth. Although CA
models only use very simple rules, they can generate very complex behavior and global structures.
In this way, the role of local rules can be compared to that of DNA in life sciences. CA models
have been increasingly used in the simulation of complex systems, such as biological reproduction,
chemically self-organizing systems, propagation phenomenon, and human settlements[1, 2].
CA are quite suitable for the simulation of land use changes and evolution of urban systems
because of their powerful spatial modeling capabilities. In recent years, many studies on urban CA
ü
models have been reported with interesting outcomes[2 7]. CA models can be used for testing hy-
potheses and theories, such as fractal properties and the evolution of dynamic systems. The inte-
gration of GIS and CA can help to solve complex decision problems as they can benefit from each
other. A series of constraints can be defined and obtained from GIS to address environmental con-
cerns so that sustainable cellular cities can be simulated[5, 8]. Multiple criteria evaluation tech-
niques (MCE) can be incorporated into CA models to deal with various complex spatial variables
in urban simulation[4].
Numerous complex and conflicting factors are involved in spatial analysis and decisionmak-
ing processes. Multicriteria evaluation techniques (MCE) can be employed to handle a number of
criteria in decisionmaking[9]. MCE techniques began to emerge to solve decisionmaking and plan-
ning problems in the early 1970s[10]. The planning process is becoming more complicated in tech-
nical, physical, social and economic aspects. MCE can be used for analyzing the complex trade-
522 SCIENCE IN CHINA (Series D) Vol. 45
offs between different alternatives. MCE typically requires that the evaluation criteria be inde-
pendent of each other. A high degree of correlation between evaluation criteria is considered as an
undesirable property for decisionmaking[11].
This paper discusses the issues about the correlation of spatial variables in urban simulation.
Principal components analysis (PCA) is used to remove data redundancy. PCA is among the most
widely used methods for spatial data handling, owing to its simplicity and straightforward inter-
pretation. It can transform a set of correlated variables into uncorrelated orthogonal variables. This
paper examines the integration of PCA and CA models in reducing data redundancy among a large
set of spatial variables for urban planning.
It is difficult to determine weights when many factors are involved. It is inadequate to carry
out CA simulations based on the direct use of MCE when there are correlated spatial variables.
The correlation of factors may result in the malfunction of the weighting for MCE by ‘double
counting’ similar variables. Principal components analysis (PCA) can be integrated in CA simula-
tion to tackle the problem of correlation among many layers of spatial data. PCA is a linear trans-
formation of data which rotates the axes of variable space along lines of maximum variance. The
transformation is based on the following equation[12]:
n
pcij = ∑ X ik Ekj , (1)
k =1
where pcij is the component score of the jth principal component for cell i, Xik is the value of the
kth criterion or layer for cell i, and Ekj is the element of the eigenvector matrix at row k and col-
umn j.
The eigenvectors and eigenvalues for the linear transformation are mathematically derived
from the covariance matrix by the following equation:
E CovE T
=V, (2)
where Cov is the covariance matrix, V is the diagonal matrix of eigenvalues, E is the matrix of
eigenvectors, and T is the transposition function.
Independent compressed components can be produced by PCA and used for CA simulation.
This can help to solve the problems for general MCE methods in dealing with correlated variables.
PCA can be integrated with CA for better urban simulation. Standard cellular automata may be
given by the neighbourhood function[8].
S t+1 = f (S t, N), (3)
where S is a set of all possible states of the cellular automata, N is a neighbourhood of the cells
providing input values for the function f, and f is a transition function that defines the change of
the state from time t to t +1.
CA models usually use discrete states for simulation. Traditionally, CA simulation only uses
a binary value to address the status of conversion based on the estimated probability. The prob-
No. 6 PRINCIPAL COMPONENTS ANALYSIS & CELLULAR AUTOMATA 523
ability of conversion is calculated based on some kind of neighborhood function. Usually, the
probability is further compared with a random value to decide whether a cell is converted or not (1
for converted and 0 for non-converted). In our model, the status of a cell has a continuous ‘grey
value’ between 0ü1 to represent the stepwise selection or conversion process. A cell will not be
suddenly ‘selected’ or converted for land development. The ‘grey value’ is calculated based on the
cumulative equation.
Git +1 = Git + ∆Git , (4)
where G is the ‘grey value’ for development which falls within the range of 0ü1 at time t, and i
t
is the location of the cell. The simulation will stop when t reaches the final time T 0. A candidate
cell will not be regarded as a developed cell until its ‘grey value’ reaches 1.
The increase of the ‘grey value’ is based on the neighborhood function and the similarity
between a candidate cell and the ‘ideal point’. The first part is the traditional neighborhood func-
tion which counts the number of developed cells in the neighborhood. There is a higher probabil-
ity for conversion when a cell is surrounded by a larger number of developed cells[13]. The second
part is related to the similarity between a candidate cell and the ‘ideal point’. The ‘ideal point’ can
produce the best benefit if it is developed. Devel-
opment suitability can be obtained based on various
criteria using land evaluation[14]. The ‘ideal point’
should achieve the maximum scores for all criteria.
A cell with a larger value of similarity with the
‘ideal point’ means that the cell is more similar to
the ‘ideal point’ and a higher growth rate of ‘grey
value’ should be applied to the cell proportionally.
The ‘ideal point’ should have the best criterion
scores for all criteria (fig. 1). The ‘ideal point’ in the Fig. 1. Principal components transformation and the
‘ideal point’.
variable space can be expressed as
ξ = ( X1max , X 2max ,..., X max
j ..., X Kmax ), (5)
max
where X j
is the maximum score for the jth criterion.
In fact, the ‘ideal point’ is a virtual point. Its transformed coordinate in components space can
be obtained using eq. (1). A series of factors for environmental protection and sustainable devel-
opment can be incorporated in the model by using the ‘ideal point’ approach. A candidate cell that
is more similar to the ‘ideal point’ in terms of site attributes will have a faster rate of urban growth.
This can ensure that greater benefits can be achieved. As mentioned before, the attributes have
been compressed into a few major principal components, but they still contain the most original
information. The principal components are then used to calculate the similarity based on a form of
Euclidean ‘distance’ given by
524 SCIENCE IN CHINA (Series D) Vol. 45
m
diξ = ∑ w2j ( pcij − pc0j )2 , (6)
j
where diζ is the ‘distance’ between cell i and ‘ideal point’ ξ based on the attributes of m compo-
nents, pcij is the value of jth component for cell i, wj is the weight for the jth component, and pc 0j
is the transformed score of the ‘ideal point’ for the jth principal component.
The similarity (SIM) is given by
diξ
SIM = 1 − , (7)
d imax
ξ
max
where d iξ is the maximum value of diζ.
The increase of ‘grey value’ should be proportional to the neighborhood function and the
similarity. There holds
∆Git = fi (q t , N ) × SIMt
k
qt d itξ (8)
=
× 1− ,
π l 2 d imax
ξ
where q t is the total amount of developed cells in the neighborhood N at time t, l is the radius of
the circular neighborhood, and k is the parameter for power transformation.
The parameter k is used to generate more discriminated growth results[4,5,8]. A stochastic dis-
turbance term is also added to represent unknown errors during the simulation. This can allow the
generated patterns to be more close to reality[3]. The error term (RA) can be given by
RA = 1+(−ln γ)α, (9)
where γ is a uniform random variable within the range {0, 1}, and α is a parameter to control the
size of the stochastic perturbation. α can be used as a dispersion factor in this simulation.
Finally, by adding eq. (9) to the model, eq. (8) is revised as
k
qt dt
∆Git = RA × × 1 − iξ
π l 2 d imax
ξ
(10)
k
q t dt
= (1 + ( − ln γ )α ) × × 1 − iξ .
π l 2 d imax
ξ
At each iteration, the increase of ‘grey value’ will be calculated to determine urban growth.
The cells will be converted into urban areas when their ‘grey values’ reach 1. Complex urban sys-
tems can be simulated by the iterations of CA simulation.
The model is applied to the simulation of urban development in Shenzhen and Dongguan in
No. 6 PRINCIPAL COMPONENTS ANALYSIS & CELLULAR AUTOMATA 525
the Pearl River Delta of southern China. The first step was to obtain and examine the spatial fac-
tors that play an important role in influencing urban development. Distance-based variables can be
used to represent spatial influences. The amenities for urban development may be measured by the
proximities to urban major centres, sub-centres, roads, expressways, railways, parks and rivers.
Distance gradient functions can be used for the estimation of such influences[15]. There is a larger
amount of benefits for a closer distance to these types of influences. However, a spectrum of en-
vironmental suitability could also be used as constraints for CA simulation to reduce development
costs. Environmental suitability can be defined using distance decay functions according to vari-
ous objectives, such as the protection of drinking water (reservoirs), cropland, orchard, vegetable
land, fishpond, forest and wetland. A closer distance to these types of influences can bring about a
larger amount of costs.
Remote sensing and GIS can be used to obtain spatial variables. The first set of six spatial
variables was identified to address the benefits that can be obtained from closer distance to
sources of development attraction. They are a) Distance to the major urban center (city proper); b)
distance to town sub-centres (town centers); c) distance to railways; d) distance to expressways;
e) distance to roads; f) distance to rivers.
A closer distance to these sources of attraction is more beneficial to urban development be-
cause energy and construction costs can be saved. These spatial variables (Xik) can be defined us-
ing the negative exponential function.
− β j dist j
Xj =e , (11)
where Xj is the spatial variable for the positive criterion j, distj is the distance to the source of de-
velopment attraction for criterion j, and βj is its respective parameter of the distance decay func-
tion. The second set of variables includes these negative factors, a) distance to cropland;
b) distance to orchard; c) distance to vegetable land; d) distance to fishpond; e) distance to reser-
voir (drinking water); f) distance to forest; g) distance to wetland.
A closer distance to these sources will create disturbances or negative effects for environ-
mental and resource protection. These spatial variables can be defined using the following nega-
tive exponential function:
− β j dist j
X j =1− e . (12)
These spatial variables are usually used as the site attributes for general GIS site selection
and urban simulation. However, these spatial variables are usually correlated with each other.
There are problems for using these spatial variables for MCE. It is difficult to provide weights
when the number of spatial variables could be as many as several hundreds[16]. The PCA analysis
should be incorporated in CA simulation to remove data redundancy.
Table 1 lists the principal components created from the thirteen layers of distance variables
for Shenzhen and Dongguan. It is found that the first 5 components account for more than 90% of
the variance of the original thirteen variables (93.9% for Shenzhen and 92% for Dongguan). Even
526 SCIENCE IN CHINA (Series D) Vol. 45
the first three components contain more than 80% of the total variance (88.8% for Shenzhen and
81.4% for Dongguan). Therefore, severe data redundancy exhibits in these spatial distance vari-
ables. PCA should be carried out to remove the data redundancy in the CA simulation which deals
with a lot of spatial variables.
Table 1 Principal components and their variance
Shenzhen Dongguan
Principal components percentage of percentage of
eigenvalues eigenvalues
variance (%) variance (%)
I 90.4 64.1 62.9 44.4
II 25.9 18.4 38.9 27.5
III 8.8 6.2 13.5 9.5
IV 3.7 2.6 8.5 6.0
V 3.6 2.5 6.5 4.6
VI 3.1 2.2 3.2 2.3
VII 1.8 1.3 2.6 1.9
VIII 1.2 0.9 19 1.4
IX 1.0 0.7 1.7 1.2
X 0.5 0.4 0.9 0.7
XI 0.5 0.4 0.5 0.3
XII 0.4 0.3 0.3 0.2
XIII 0.1 0.1 0.1 0.1
Table 2 is the component loadings for the thirteen spatial variables for Dongguan. It is easy
to see that the first component is mainly related to agriculture and ecology, such as fishpond,
vegetable land and wetland. The second component is mainly related to transport conditions, such
as expressways, roads and rivers. The third component is mainly related to centers, such as city
proper and town centres. There are a couple of advantages for the principal components transfor-
mation. The transformation can allow similar variables to group together with a large proportion
of loadings in the same component. Suitable weights can be easily defined since principal com-
ponents are independent of each other. This can avoid the repeated counting that may take place in
general MCE.
The ‘ideal point’ is used to address economic, environmental and resource factors in CA
simulation. These factors are represented by principal components to reduce data redundancy. The
‘ideal point’ is a virtual point having the maximum criteria scores for each criterion with regard to
development suitability. It is the best point as the reference to urban development. The ‘ideal
point’ for urban development is therefore (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).
Only the first six principal components are used to calculate the similarity because the com-
ponents contain 94.3% of the original information. According to the PCA transformation, the
transformed ‘ideal point’ obtained by using the six principal components becomes (1.2, 2.6, 1.9,
−0.2, −0.4, 0.1).
No. 6 PRINCIPAL COMPONENTS ANALYSIS & CELLULAR AUTOMATA 527
Weights should be provided for different components according to their importance in simu-
lation. There are different combinations of weights for various planning objectives. This can result
in different simulation results. It is very difficult to provide weights when there are many variables
in the simulation. However, the problem can be solved by using PCA because the number of
variables can be much reduced.
The first six components were used to calculate the similarity. The factor loadings were ex-
amined according to table 1. This helps to define the weights reflecting various planning objec-
tives. Weights are usually decided by expert’s experience according to the importance of each
factor. For example, if the planning objective is to protect agriculture and ecology, component I
should be assigned with the highest value of 1. This study only uses five planning objectives to
illustrate the methodology (table 3).
It is easy to make various development plans for different planning objectives. Fig. 2 is the
simulation of transport-based development for Shenzhen in 1988ü1997. A higher weight was
used for the second component which has a large proportion of loadings for the variables of ex-
528 SCIENCE IN CHINA (Series D) Vol. 45
Fig. 3. CA simulation for urban development in Dongguan. (a) Urban-center-based; (b) cropland-conservation-based;
(c) ecological and agricultural protection; (d) environmental conservation and economic development.
No. 6 PRINCIPAL COMPONENTS ANALYSIS & CELLULAR AUTOMATA 529
3 Conclusion
A large set of spatial variables is used in MCE during spatial decisionmaking. These spatial
criteria can be retrieved from GIS. This study shows that there is high correlation between these
criteria according to the principal components analysis. There are problems when MCE is used to
deal with these correlated spatial variables. The correlation of spatial variables violates the princi-
ples of MCE because of repeatedly counting some variables. The study proposes the use of PCA
and the ‘ideal points’ approach to deal with the common problems of spatial correlation. The
PCA-CA model provides a useful planning tool for exploring various possible urban forms based
on a large set of environmental constraints that could be considered in land use planning. It is easy
to incorporate planning objectives in the urban simulation. Further studies are required to incor-
porate more factors, such as development density in the model for more realistic simulation.
Acknowledgements This project was supported by the National Natural Science Foundation of China (Grant No.
40071060) and the Croucher Foundation of Hong Kong (Grant No. 21009619).
References
1. Binder, P., Evidence of lagrangian tails in a lattice gas, in Cellular Automata and Modeling of Complex Physical Systems
(eds. Manneville, P., Boccara, N., Vichniac, G. Y. et al.), Berlin: Springer-Verlag, 1989, 155ü160.
2. Batty, M., Xie, Y., From cells to cities, Environment and Planning B: Planning and Design, 1994, 21: 531ü548.
3. White, R., Engelen, G., Uijee, I., The use of constrained cellular automata for high-resolution modelling of urban land-use
dynamics, Environment and Planning B, 1997, 24: 323ü343.
4. Wu, F., Webster, C. J., Simulation of land development through the integration of cellular automata and multicriteria
evaluation, Environment and Planning B, 1998, 25: 103ü126.
5. Li Xia, Yeh, G. O., Constrained cellular automata for modelling sustainable urban forms, Acta Geographica Sinica (in
Chinese), 1999, 54(4): 289ü298.
6. Li Xia, Yeh, G. O., Zoning for agricultural land protection using cellular automata, Chinese Environmental Science (in
Chinese), 20(4): 318ü322.
7. Zhou Chenghu, Sun Zhanli, Xie Yichun, Geo-cellular Automata (in Chinese), Beijing: Science Press, 1999, 1ü163.
8. Li, X., Yeh, G. O., Modelling sustainable urban development by the integration of constrained cellular automata and GIS,
International Journal of Geographical Information Science, 2000, 14(2): 131ü152.
9. Carver, S. J., Integrating multi-criteria evaluation with geographical information systems, International Journal of Geographical
Information Systems, 1991, 5(3): 321ü339.
10. Nijkamp, P., van Delft, A., Multi-Criteria Analysis and Regional Decision-Making, The Netherlands: H.E. Stenfert Kroese
B.V., 1977.
11. Malczewski, J., GIS and Multicriteria Decision Analysis, New York: John Wiley & Sons, Inc., 1999.
12. Gonzalez, R. C., Wintz, P., Digital Image Processing. Reading and Massachusetts: Addison-Wesley Publishing Company,
1977.
13. Batty, M., Cellular automata and urban form: A primer, Journal of the American Planning Association, 1997, 63(2): 266ü
274.
14. Yeh, G. O., Li, X., Sustainable land development model for rapid growth areas using GIS, International Journal of Geo-
graphical Information Science, 1998, 12(2): 169ü189.
15. Batty, M., Xie, Y. C., Sun, Z. L., Modeling urban dynamics through GIS-based cellular automata, Computer, Environment
and Urban Systems, 1999, 23: 205ü233.
16. Bauer, V., Wegener, M., A Community information feedback system with multiattribute utilities, in Conflicting Objectives
in Decisions (eds. Bell, D. E., Keeney, R. L., Raiffa, H.), West Sussex: John Wiley & Sons, Inc., 1977, 323ü357.