SlideShare a Scribd company logo
Assessing spatial heterogeneity in
crime prediction
Using geographically weighted regression to explore local patterns
in crime prediction in Belgian municipalities
November 2016
Johan Blomme
Leenstraat 11
8340 Damme-Sijsele
URL : www.johanblomme.com
Email : j.blomme@telenet.be
1
Assessingspatialheterogeneityincrimeprediction
ASSESSING SPATIAL HETEROGENEITY IN CRIME PREDICTION
Using geographically weighted regression to explore local patterns
in crime prediction in Belgian municipalities
Contents
1. Analytical framework 4
2. Exploratory spatial data analysis 7
3. Global non-spatial regression model 11
4. Global spatial regression model 13
5. Local spatial regression model 15
5.1. Visualising GWR results 18
5.2. Cluster analysis 28
6. Conclusions 36
2
Assessingspatialheterogeneityincrimeprediction
Assessing spatial heterogeneity in crime
prediction
Using geographically weighted regression to explore local patterns
in crime prediction in Belgian municipalities
Traditional regression analysis describes a modelled relationship between a dependent variable
and a set of independent variables. When applied to spatial data, the regression analysis often
assumes that the modelled relationship is stationary over space and produces a global model
which is supposed to describe the relationship at every location in the study area. This would be
misleading, however, if relationships being modelled are intrinsically different across space. One
of the spatial statistical methods that attempts to solve this problem and explain local variation
in complex relationships is Geographically Weighted Regression (GWR).
In a global regression model, the dependent variable is often modelled as a linear combination
of independent variables that is stationary over the whole area (i.e. the model returns one value
for each parameter). GWR extends this framework by dropping the stationarity assumption : the
parameters are assumed to be continuous functions of location. The result of the GWR analysis
is a set of continuous localised parameter estimate surfaces, which describe the geography of
the parameter space. These estimates are usually mapped or analysed statistically to examine
the plausibility of the stationarity assumption of the traditional regression and different possible
causes of nonstationarity.
The use of linear regression is common in many areas of science. Ordinary linear regression
implicitly assumes spatial stationarity of the regression-model that is, the relationships between
the variables remain constant over geographical space. We refer to a model in which the
parameter estimates for every observation in the sample are identical as a global model.
3
Assessingspatialheterogeneityincrimeprediction
Spatial nonstationarity occurs when a relationship (or pattern) that applies in one region does
not apply in another. Global models are statements about processes or patterns which are
assumed to be stationary and as such are local independent, i.e. are assumed to apply to all
locations. In contrast local models are spatial disaggregations of global models, the results of
which are location-specific. The template of the model is the same : the model is a linear
regression model with certain variables, but the coefficients alter geographically. If the
parameter estimates are allowed to vary across the study area such that every observation has
its own separate set of parameter estimates we have a local model.
GWR does not assume the relationships between independent and dependent variables are
constant across space. Instead, GWR explores whether the relationships between a set of
predictors and an outcome vary by geographical location. GWR is suggested to be a powerful
tool for investigating spatial nonstationarity in the relationship between predictors and the
outcome variable.
Theoretically, spatial nonstationarity is based on the concept of the social construction of
space. The interaction between individuals with each other and their physical environment
produces space. Human beings are just as much spatial as temporaral beings. By temporal, we
mean that we are most influenced by what is immediate in space. What happens near us
matters more than non-proximal events. Human’s spatiality and temporality are essential and
equal powerful in explaining human behavior. Consequently, everything that is social is
inherently spatial, just as everything spatial is inherently socialized.
From this perspective, we analyse how the macro-level relationship between crime and various
socio-economic and demographic variables unfolds over geographical space.
4
Assessingspatialheterogeneityincrimeprediction
1. Analytical framework
Our analysis strategy entails estimating regression models that summarize the “global”, or
average, effects of the predictor variables on crime rates across our sample of Belgian
municipalities. Given the well-known spatial autocorrelation evident in crime data, we generate
the global models using Ordinary Least Squares (OLS) and Spatial AutoRegression (SAR)
estimators. OLS and the spatial autoregression model are “global” models in the sense that
they both assume that a single set of parameters sufficiently describe the relationships between
predictor variables and crime rates.
The classical ordinary least Squares (OLS) model is widely used to model the global relationship
between a response variable and one or more explanatory variables. OLS assumes, among
other things that residuals are spatially independent. Residual autocorrelation captures
unexplained similarities between neighboring municipalities, which can be the result of omitted
variables or a misspecification of the regression model. Assuming a global model does exist, an
exploration of spatial patterns in the data can help determine whether a global model is
misspecified – whether the model is missing important predictor variables (spatial error model)
or if a spatial term should be included in the model (spacial lag model) – which would improve
the accuracy of the global model in explaining crime levels across the study area.
Global models that account for spatial effects are spatial autoregressive models (SAR). The
spatial error model addresses the presence of spatial autocorrelation by defining a spatial
autoregressive process for the error term and, by doing so, captures unexplained similarities.
The spacial lag model extends the standard OLS regression model by including a spatially lagged
dependent variable, which can be mostly interpreted as spill-over effects.
Global regression models assume a homogeneous behavior of the estimated parameters across
space. We expect spatial homogeneity to be rare and assume that most social phenomena are
not geographically stationary. A way to deal with spacial heterogeneity is the application of
geographically weighted regression (GWR) to investigate spatially varying relationships.
GWR models spatial autocorrelation and spatial heterogeneity for subsets of the entire data set.
Each subset is established around a regression point with near data points exhibiting a higher
influence than more distant data points. This weighting is often based on a bi-square kernel
function. Of crucial importance is the specification of an appropriate bandwidth length. The
5
Assessingspatialheterogeneityincrimeprediction
most common is the adaptive bandwidth, where is length is allowed to vary across space,
depending on the density of the data points. In densely populated areas the kernel possesses a
shorter bandwith in contrast to regions with larger inter-point distances, where the bandwidth
is longer.
While it is often argued that GWR is more suitable for exploratory analysis, it is a technique to
test whether local models yield a significant improvement in fit over the global models.
The following analysis models both spatial autocorrelation and nonstationarity by means of
global and local spatial statistical models. An exploratory spatial data analysis, a global non-
spatial regression model, a global spatial regression model and finally a local spatial regression
model were applied to explore the association between various predictors and crime in Belgian
municipalities. We rely on crime data in municipalities, the main political and administrative
unit of the Belgian territory.
The dependent variable in this study is the crime rate/1000 residents (calculated as a mean
over the period 2008-2012) in Belgian municipalities (N= 589, source data : statistics Belgian
Federal Police).
To test the impact of social deprivation on crime, we collected data at municipality level about
various indicators of inequality . Besides mean family income and the percentage unemployed,
we use the Gini coefficient as a measure of income variation, indicating the distribution of
income in each municipality (between extremes of 0 (absolute equality) and 1 (maximum
inequality). As control variables we include various socio-demographic indicators : population
density, the share of males in the age group 15 to 64, the percentage of young people (15-24) in
the population, the percentage of residents that are foreign born, the percentage of non-Euro
foreign born residents and the degree of female labour force participation (source data :
statistics Federal Government Belgium, 2011).
6
Assessingspatialheterogeneityincrimeprediction
Since the original data for the dependent variable and five of the independent variables are not normally distributed (skewness marked in red in the above table) and
normality of data is a basic assumption for both ordinary least squares regression and spatial regression, natural log values (ln) were used for these variables.
7
Assessingspatialheterogeneityincrimeprediction
2. Exploratory spatial data analysis
The first step in an exploratory spatial data analysis (ESDA) is to verify if spatial data are
randomly distributed. To do this, it is necessary to use global autocorrelation statistics. The
global indicators of spatial autocorrelation are not capable of identifying local patterns of spatial
association, such as local spatial clusters or local outliers in data that are statistically significant.
To overcome this obstacle, it is necessary to implement a spatial clustering analysis (we made
use of GeoDa open-source spatial regression software of the GeoDa Center for Geospatial
Analysis and Computation, http://geodacenter.asu.edu).
A significant Moran’s I statistic is a first clue that parameter estimates in an OLS regression can
be affected by spatial residual autocorrelation. For this reason, the Moran’s I statistic was
calculated for the dependent variable and the nine independent variables included in this study.
The neighborhood relationships for calculating the Moran’s I statistic are defined as first order
queen contiguity, which is commonly used (a municipality’s spatial lag is a weighted average of
its neighboring localities ; neighbors are typically defined in terms of their physical proximity to
the local geographic unit).
Results indicate that both the dependent and all independent variables exhibit significant
positive spatial autocorrelation. The hypothesis of spatial randomness is clearly rejected. A
positive and significant spatial dependence in the dependent variable (crime rate) indicates that
the crime rate in a particular municipality is associated with (not independent of) crime rates in
surrounding counties. The value of the spatial autocorrelation coefficient (0,297) indicates that
a 10 percentage point increase in the crime rate in a municipality results in an increase of nearly
3% in the crime rate in a neighboring municipality. This, together with the results of the LISA
cluster analysis, is evidence of the existence of significant spillover effects between
municipalities with respect to crime, and implies that there is a need of a coordination of the
municipal efforts to fight criminal activities that spill over the municipal borders.
8
Assessingspatialheterogeneityincrimeprediction
Prevalence of crime in Belgian municipalities (N = 589)
(crimerate/1000 inhabitants ; crimerate percentiles)
Global Moran’s I statistic for variables included in this study
9
Assessingspatialheterogeneityincrimeprediction
Cartograms of the geographical distribution of independent variables
10
Assessingspatialheterogeneityincrimeprediction
LISA cluster map for criminality in Belgian municipalities, N= 589
11
Assessingspatialheterogeneityincrimeprediction
3. Global non-spatial regression model
Exploring the relationship between the independent variables and crime rates starts with a
multivariate OLS regression model. None of the correlations between the predictors is
excessively high enough to yield a major concern about multicollinearity. Nevertheless, we
evaluated the diagnostics to assess the issue of multicollinearity more formally. In particular,
Variance Inflation Factors (VIFs) were investigated. Since all VIF scores are below the critical
value of 5, multicollinearity is rejected1.
Results show that the nine predictors explain about 54,2% of the variance in crime rates. Of
those, the variables representing the percentage of males in the age group 15-64, the
percentage of the age group 15-24 in the population and the percentage of foreign born
residents do not contribute significantly to the explanation of the variability in crime rates
between municipalities2.
A more detailed analysis of the error residuals reveals that they are not normally distributed
(Jarque Bera test = 410.059 ; p < 0.001) but not heteroscedastic (Koenker-Bassett test = 14.115 ;
p=0.118). Finally, residual independence is tested by the Moran I-statistic. This test shows
significant spatial residual autocorrelation (Moran’s I = 0.155 ; p < 0.001), violating the model’s
independence assumption. This residual pattern in the OLS model can be the result of existing
spatial effects and can be accounted for by means of a spatial regression model.
______________________________________________________________________________
1
Collinearity diagnostics were estimated using SPSS Base Statistics and no problems of multicollinearity were
found among the independent variables. The collinearity diagnostics used were the variance inflation factors (VIF)
and tolerances for individual variables. Multicollinearity is said to exist if the VIF is 5 or higher (or equivalently,
tolerances of 0,20 or less). The highest VIF-value in this analysis was 4,852 and the lowest tolerance was 0,206,
both for mean income.
2
Initially, two dummy variables representing the regions in Belgium were added to the regression equation.
However, VIF scores indicated the presence of multicollinearity. Therefore, these dummy variables were no
longer withheld in the OLS regression.
12
Assessingspatialheterogeneityincrimeprediction
13
Assessingspatialheterogeneityincrimeprediction
4. Global spatial regression model
The clustering of crime rates indicates that the data are not randomly distributed, but
instead follow a systematic pattern. The spatial clustering of variables, and the
possibility of omitted variables that relate to the connectivity of neighboring localities,
raise model specification issues. Evidence for the latter also comes from the residual
autocorrelations present in the OLS model.
We employ two alternative specifications to correct for spatial dependence. One is the
spatial lag model. This specification is relevant when the spatial dependence works
through a spatial lag of the dependent variable. The other specification is the spatial
error model. This specification is relevant when the spacial dependence works through
the disturbance term (spatial regression models ware developed by making use of
GeoDa, regression software of the GeoDa Center for Geospatial Analysis and
Computation, http://geodacenter.asu.edu).
The value of the LMLAG
-test is only weakly significant (LMLAG
= 3.598 ; p < 0.1) but the results of
the LMERROR
-test (56.900 ; p < 0.001) suggest that a spatial error must be considered in the
global spatial regression model.
The results from the spatial lag model shown in the table on the page, suggest that this model
does not perform as well as the spatial error model. The effect of the spatial lag term is
statistically weak (rho = 0.084 ; p= 0.101). The robust Lagrange Multiplier (LM) test also
recommends the use of the spatial error model and the lower AIC value combined with the
higher R
2
value for the spatial error model signals that this model outperforms the spatial lag
model. In the spatial error model, all predictor variables except one (the percentage of foreign
born residents) yield a statistically significant effect.
14
Assessingspatialheterogeneityincrimeprediction
Global OLS versus global spatial regression models
Based on the results of the global spatial regression model it is difficult to defend similarities in
municipality-level crime as arising from imitation of one’s neighbors, that is, a spatial lag
process. Criminality results from a complex mix of social, economic and cultural factors, only a
small number of which can be brought into a statistical model of the process. Much of it
remains unaccounted for and is summarized in the model’s error term.
Although we observe a very small Moran’s I value (-0.022) associated with the spatial error
model, the residuals are not in compliance with the assumption of being spatially independent
of each other (Breusch-Pagan test for heteroscedasticity = 54.060 ; p< 0.001).
15
Assessingspatialheterogeneityincrimeprediction
5. Local spatial regression model
As a global model, local regression modeling carries the assumption that the processes being
modeled are uniform throughout the study area : the relationships between the dependent and
the independent variables remain stationary (constant) across the entire study area of Belgium.
Local spatial regression models take nonstationarity into account. We use GWR4 to perform
geographically weighted regression analysis (GWR4 is release of a Microsoft Windows based
application for calibrating geographically weighted regression models, which can be used to
explore geographically varying relationships between dependent/response variables and
independent/explanatory variables ; see Nakaya, 2012).
The results of fitting the dataset to different GWR descriptive models are shown below. Four
alternatives of GWR modeling were applied considering the four possible combinations
between two different types of kernels (fixed or adaptive) and two different bandwidth
methods (AICC
and CV). GWR models 3 and 4 (both models use an adaptive kernel) offered
lower residual squares, meaning that these models provided a better fit to the data. The R
2
value of both GWR-models is nearly the same. We chose GWR model 3 with the lowest AICC
value to provide an exploratory analysis of the data.
GWR model 1 GWR model 2 GWR model 3 GWR model 4
Kernel fixed fixed adaptive adaptive
bandwidth method AICc CV AICc CV
adjusted R2 0,628 0,570 0,633 0,639
residual squares 342571,621 444329,261 337812,559 323290,833
AICc 5584,352 5630,776 5578,562 5581,763
Anova test residuals OLS/GWR p < 0,01 p < 0,01 p < 0,01 p < 0,01
GWR models applied to dataset of Belgian municipalities
16
Assessingspatialheterogeneityincrimeprediction
Results reveal that the GWR model exhibits a significant improvement in explained variance as
compared to the OLS regression model (63,3% vs. 54,2%). The AIC score for the GWR model
(5578.562) is substantially lower than the AIC score for the global OLS model (5657.654), which
reflects a better goodness of fit (AIC is a measure of spatial collinearity. The lower its value, the
better the fit of the model to the observed data).
Another method to evaluate the GWR model is the ANOVA test which verifies the null
hypothesis that the GWR model represents no improvement over the global model. The
computed F-value of 2.753 is in excess of the critical value of F (2.41 ; α = 0.01) with 10 and 496
degress of freedom. The ANOVA test thus suggests that the GWR model is a significant
improvement on the global model for the data of Belgian municipalities.
The results obtained by the GWR method provide information about locally differing estimation
coefficients. Therefore, the GWR results do not report a global estimate for each explanatory
variable but rather they provide insights into local ranges of the estimates (minimum, 25%
quantile, median, 75% quantile and maximum). The 5-number summary (see page 16) is
helpful to get a feel of the degree of spatial nonstationarity in a relationship by comparing the
range of the local parameter estimates with a confidence interval around the global estimate of
the parameter. This is accomplished by dividing the interquartile range of the GWR coefficient
by twice the standard error of the same variable from the global regression (OLS). Ratio values
> 1 suggest nonstationarity in the relationship between an independent variable and the
dependent variable.
The results of the Monte Carlo test indicate that the parameter estimates do vary significantly
across space. As shown on the map on the next slide, the total variance explained by the local
model ranges from 47,8% to 83,4%. In general, there is a north-south divide with higher R
2
values in the northern part of the country. Explained variance is lowest in the southern part of
the province of East Flanders and its surrounding municipalities in Wallonia.
17
Assessingspatialheterogeneityincrimeprediction
minimum lower quartile median upper quartile maximum status significance
Intercept -513,965 -28,093 210,299 361,918 512,299 non-stationary p < 0.001
ln(Gini inequality) -0,706 0,285 1,062 1,439 2,529 non-stationary p < 0.001
mean income -0,010 -0,006 -0,004 -0,002 0,001 non-stationary p < 0.001
ln(unemployment) -0,012 0,216 0,325 0,444 0,720 non-stationary p < 0.001
ln(population density) -0,057 0,006 0,056 0,132 0,227 non-stationary p < 0.001
% male in age group 15-64 -0,100 0,019 0,044 0,065 0,129 non-stationary p < 0.001
% 15-24 in population -0,119 -0,058 -0,011 0,022 0,087 non-stationary p < 0.001
ln(% foreign born) -0,119 -0,009 0,036 0,083 0,167 non-stationary p < 0.001
ln(% non-Euro foreign) -0,015 0,076 0,106 0,147 0,319 no spatial variability p < 0.001
female labour force participation 0,008 0,038 0,048 0,064 0,096 non-stationary p < 0.001
5-number parameter summary Monte Carlo test
Geographically weighted regression 5-number parameter summary results and
Monte Carlo significance test for spatial variability of parameters
(Belgian municipalities, N = 589)
Local R
2
values of the GWR model (Belgian municipalities, N = 589)
18
Assessingspatialheterogeneityincrimeprediction
5.1. Visualising GWR results
To better understand and interpret nonstationarity in individual parameters it is necessary to
visualize the local parameter estimates and their associated diagnostics. The output of a GWR
analysis includes data that can be used to generate surfaces for each model parameter that can
be mapped, where each surface depicts the spatial variation of the relationship between a
predictor and the outcome variable.
A challenge in GWR analysis is to visually represent the large number of results through the use
of cartographic design. Mapping only the parameter estimates is misleading, as the map reader
has no way of knowing whether the local parameter estimates are significant. As Mennis (2006
: 172) notes, a main issue is that “the spatial distribution of the parameter estimates must be
presented in concert with the distribution of significance, as indicated by the t-value, in order to
yield meaningful interpretation of results”.
There are several possibilities. The most popular and easiest way to visualise the results of
GWR is to make use of choropleth maps and colour the regions according to the values of
parameter estimates or the associated t-values in order to interpret the significance of the
parameters. Because the patterns of t-values for the parameter estimates are important to
reveal which areas have statistically significant estimates, we initially mapped the t-values for all
variables (see appendix).
Another possibility to map the GWR results is to create raster surfaces for both the parameter
estimates and the t-values. Geostatistical methods, e.g. inverse distance weighting (IDW) and
ordinary kriging (OK), are applied in spatial interpolation from point measurement to
continuous surfaces. Both IDW and OK estimate values at unmeasured points by the weighted
average of observed data at surrounding points. The weight of each measured value is a
function of its distance from the point we are trying to predict. The difference between both
methods is that in IDW the weights are arbitrarily specified while in OK the weights are
estimated from the data itself 1.
____________________________________________________________________________
1
See Dorman (2014, chapter 8) for an extensive explanation of spatial interpolation.
19
Assessingspatialheterogeneityincrimeprediction
To create raster surfaces of estimated coefficients and local t-values for each of the
parameters, we use R’s gstat package and the gstat function that lets us create a spatial
prediction model. The latter is then applied to a grid that represents the area we are working
with, to yield a new raster with predicted values (this new raster is obtained through the use of
the interpolation function in R’s raster package). For IDW, we created prediction models with
IDP-parameters set to 1, 2 and 3. A low IDP-parameter (1) results in a smoother surface while
higher values result in sharper boundaries. For OK, a model is automatically created through
the use of the autofitVariogram function of the automap package in R.
To evaluate the predictive ability of the interpolation models the process of cross-validation is
used to compare the predicted values to the observed ones. The raster surfaces with the
lowest root mean square error (RMSE) are finally chosen for the visual representation of
parameter estimates and t-values1.
_____________________________________________________________________________________________
1
R code for the various steps to construct a raster surface :
# GWR parameter estimates for a variable (e.g. income)
gwr_income <- read.csv("parameter_estimates_income.csv",header=T,sep=";")
# extract centroids of municipalities
mun.centroids <- data.frame(coordinates(belgie),belgie@data$ID_4)
names(mun.centroids) <- c("lon","lat","id")
# add lat lon to data
gwr_income <- merge(gwr_income,mun.centroids,by="id")
names(gwr_income) [3] <- "x"
names(gwr_income) [4] <- "y"
# make datafile a SpatialPointsDataFrame
coordinates(gwr_income) <- ~ x + y
# create grid (grid and datafile must have the same projection)
r <- raster(nrow=500,ncol=500,
xmn=bbox(belgie)["x","min"],xmx=bbox(belgie)["x","max"],
ymn=bbox(belgie)["y","min"],ymx=bbox(belgie)["y","max"],
crs=proj4string(gwr_income))
# model creation with gstat
model <- gstat(formula = inc ~ 1, data = gwr_income,set=list(idp=3))
print(model)
z <- interpolate(r,model)
z <- mask(z,belgie)
# cross-validation
cv <- gstat.cv(model)
rmse <- function(x) sqrt(sum((-x$residual)^2)/nrow(x))
rmse(cv)
20
Assessingspatialheterogeneityincrimeprediction
For a selected parameter, the surface created for the estimated coefficients and the local t-
values can be mapped together. In the map below, the t-values for the income parameter are
added as contour lines on top of the parameter estimate surface. While it is possible for the
reader to distinguish significant parameter estimates from those that are not significant, the
contour lines may not allways be easy to interpret.
Overlay of t-values as contour lines on parameter estimate map for income
In order to identify directly zones with significant parameter estimates, it is possible to set up a
mask. Insignificant values (between -1.96 and 1.96) in the raster surface layer of t-values are
set to NA, and subsequently using the mask function removes all values from the parameter
raster layer that are NA in the t-surface layer. This allows the visualisation of only the
significant parameter estimates. We used R’s plotGoogleMaps package to map significant
parameter estimates with a Google maps background (the resulting html-files also allow to
interactively explore the GWR results). The maps provide strong evidence of significant spatial
heterogeneity in the effect of predictor variables on crime across municipalities (significant
positive parameter erstimates are coloured yellow to red while significant negative estimates
are shades of blue).
21
Assessingspatialheterogeneityincrimeprediction
Significant GWR-estimates for Gini inequality (ordinary kriging)
Significant GWR-estimates for income (IDW, ß = 3)
22
Assessingspatialheterogeneityincrimeprediction
Significant GWR-estimates for unemployment (ordinary kriging)
Significant GWR-estimates for population density (ordinary kriging)
23
Assessingspatialheterogeneityincrimeprediction
Significant GWR-estimates for female labour force participation (IDW, ß= 3)
Significant GWR-estimates for % males in age group 15-64 (ordinary kriging)
24
Assessingspatialheterogeneityincrimeprediction
Significant GWR-estimates for % age group 15-24 in population (ordinary kriging)
Significant GWR-estimates for % foreign born in population (IDW, ß= 3)
25
Assessingspatialheterogeneityincrimeprediction
Significant GWR-estimates for % non-Euro in population (IDW, ß= 3)
The results of the geographically weighted regression analysis indicate that spatially varying
processes operate in Belgian municipalities with respect to the relationships between socio-
economic and socio-demographic variables and crime rates.
Several local results are of particular note. First, when we examine the incidence of significant
parameter estimates at the local level, 61 % of all parameter estimates are insignificant (see
graphs on pages 25-26). With the exception of unemployment and female labour force
participation, the majority of parameter estimates for all other independent variables and the
intercept are insignificant. Positively of negatively signed global effects of covariates do not
hold across all municipalities. This proves it is important to analyze beyond the global level
(OLS) and to examine variation at the local level (GWR).
Secondly, the global parameter estimates mask a great deal of variation at the local level. For
example, while the global parameter estimate for unemployment is 0,217, the parameter
estimates at the local level range from -0,012 to 0,720. Where the global estimate for the
26
Assessingspatialheterogeneityincrimeprediction
percentage of non-Euro foreign born inhabitants is 0,114, the local parameter estimates range
from -0,015 to 0,319.
Finally, insignificant global results mask countervailing positive and negative effects of
covariates at the local level. The negatively signed but insignificant global effect of the
percentages of 15-24 aged youngsters in the population (age) reaches negative significance in
23,2 % of the municipalities while the effect of this covariate reverses to positive significance in
a minority (2,9 %) of all municipalities. In a similar way, the positively signed but insignificant
global effect of the percentage of males aged 15-64 in the local population (gender) reaches
positive significance in 39,2 % of the municipalities while the effect of this variable is negative
significant in 2 % of the municipalities.
GWR model significant estimates
27
Assessingspatialheterogeneityincrimeprediction
28
Assessingspatialheterogeneityincrimeprediction
5.2. Cluster analysis
We can further explore the results of the GWR analysis by clustering locations with similar
parameter values for the variables considered. This synthesizes the output that is generated by
the GWR model and can help to interpret the results .
A two-step cluster analysis based on the nine parameter estimates and the intercept was
applied. We experimented with a range of clusters between 4 and 8. The optimal choice in
terms of the number of clusters was 6 (municipalities were divided in evenly sized clusters).
29
Assessingspatialheterogeneityincrimeprediction
Although latitude and longitude were not included in clustering municipalities’ parameter
estimates, the six clusters are geographically coherent. A discriminant analysis with cluster
membership as the dependent variable and both lat/lon-coordinates as predictors confirms that
70,6% of the cluster members are correctly classified based on their location which means that
70,6% of the municipalities were geographically near other members of the same cluster. By
cluster, the percentage of correctly classified members varies from 57,8% to 83,8%.
30
Assessingspatialheterogeneityincrimeprediction
Distribution of t-values within clusters
31
Assessingspatialheterogeneityincrimeprediction
Distribution of t-values within clusters (con’d)
32
Assessingspatialheterogeneityincrimeprediction
Distribution of t-values within clusters (con’d)
33
Assessingspatialheterogeneityincrimeprediction
Distribution of t-values within clusters (con’d)
34
Assessingspatialheterogeneityincrimeprediction
Distribution of t-values within clusters (con’d)
Although the parameter estimate of non-Euro inhabitants does not vary spatially (see 5-number
parameter summary), it is by far the most important predictor of criminality in cluster 1. In
comparison with the other clusters, the effect of the percentage of males in the age group 15-
64 is significant in a large majority of municipalities covered by cluster 1.
Like cluster 1, cluster 2 represents a contiguous area of municipalities but the percentage of
correctly classified municipalities is lowest (57,8%) of all clusters. Within this cluster the
percentage of explained variance strongly differs when moving from west to east (R
2
between
47,7% and 80,7).
Cluster 3 covers large parts of Wallonia, where local R
2
values are relative low. In cluster 3 as
well as in cluster 4 and cluster 6, the parameter estimates for socio-economic variables (Gini
inequality, mean income and unemployment) are significant in resp. 80,6 %, 65,9 % and 80,6 %
of the municipalities. In the other clusters, the effect of these variables is significant in less than
one third of the municipalities.
35
Assessingspatialheterogeneityincrimeprediction
Apart from the effect of socio-economic variables, the effect of non-Euro inhabitants on crime is
also significant in a majority of municipalities in cluster 4.
In cluster 5 the local R
2
values are also relative low and the estimate of the intercept factor is
significant. Criminality in the east cantons of cluster 5 also correlates significant and
independent of other predictors with population density and the presence of young people in
the population inhibits criminality in this area of cluster 5.
As stated, in the area that represents cluster 6 (the largest cluster in terms of the number of
municipalities), the measures of inequality are the most significant determinants of crime.
Criminality also varies in an independent way with population density.
36
Assessingspatialheterogeneityincrimeprediction
6. Conclusions
The objectives of this study were to examine the extent of geographic variation in the
relationship between socio-economic and demographic variables on the one hand and crime
rates on the other. The goals of our study were (i) to compare the performance of global and
local spatial regression with OLS regression (ii) examine spatial nonstationarity throught the
use of GWR (iii) map the parameter coefficients of GWR for further interpretation and (iv)
examine whether there are spatial groupings of parameter estimates.
The analysis revealed that there is evidence of overall clustering in crime rates in Belgium. Local
spatial analysis uncovered that places with the highest crime rates are often proximate.
The finding of the existence of local spatial autocorrelation in crime rates suggests that failing to
utilize spatially-oriented methodologies may result in biased parameter values in explanatory
models. As far as global models are concerned, this analysis demonstrated that a spatial error
model adds significantly to the understanding and interpretation of spatially varying crime rates.
The use of a GWR model allowed for an assessment of spatial heterogeneity when exploring the
relationships between predictor variables and crime rates by local area. Geographically
weighted estimations provided the best fit to the data. Predictor variables as well as crime rates
showing strong local variation point to problems that policy makers best address at the local
level and the situation in particular areas.
Significant local parameter estimates were found for the predictor variables, confirming spatial
heterogeneity in the effects of these variables on crime and providing insights into the spatial
scale at which processes may be operating. Furthermore, a two-steps cluster analysis revealed
distinct zones of spatial effects.
37
Assessingspatialheterogeneityincrimeprediction
APPENDIX
Spatial mapping of the coefficients from GWR modeling
38
Assessingspatialheterogeneityincrimeprediction
Spatial mapping of the coefficients from GWR modeling (con’d)
39
Assessingspatialheterogeneityincrimeprediction
Spatial mapping of the coefficients from GWR modeling (con’d)
40
Assessingspatialheterogeneityincrimeprediction
Spatial mapping of the coefficients from GWR modeling (con’d)
41
Assessingspatialheterogeneityincrimeprediction
Spatial mapping of the coefficients from GWR modeling (con’d)
42
Assessingspatialheterogeneityincrimeprediction
References
Anselin, L., Spatial regression analysis in R. A workbook, Spatial Analysis Laboratory, Dep. Of Geography,
University of Illinois, Urbana-Champaign, may 2007.
Arnio, A.N. & Baumer, E.P., Demography, foreclosure and crime : Assessing spatial heterogeneity in contemporary
models of neighborhood crime rates, Demographic Research, 26, 2012, pp.449-488.
Cahill, M. & Mulligan, G., Using geographically weighted regression to explore local crime patterns, Social Science
Computer Review, 25, 2007, pp. 174-193.
Dorman, M., Learning R for geospatial analysis, Birmingham, Packt Publishing, 2014.
Fotheringham, A.S., Brunsdon, C. & Charlton, M.E., Geographically weighted regression : The analysis of spatially
varying relationships, Chichester UK, John Wiley & Sons, 2002.
Helbich, M., Leitner, M. & Kapusta, N.D., Geospatial examination of lithium in drinking water and suicide mortality,
International Journal of Health Geography, 2012, pp . 11-19.
Matthews, S.A. & Yang, T.Ch., Mapping the results of local statistics : Using geographically weighted regression,
Demographic Research, 26, 2012, pp . 151-166.
Matthews, S.A. & Parker, D.M., Progression in spatial demography, Demographic Research, 28, 2013, pp . 271-312.
Mennis, J.L., Mapping the results of geographically weighted regression, The Cartographic Journal, 43, 2006, pp.
171-179.
Nakaya, T., GWR4 User Manual, update 7 may 2012.
Shoff, C., Yang, T.CH & Matthews, S.A., What has geography got to do with it ? Using GWR to explore place-specific
associations with prenatal care utilization, Geo Journal, 77, june 2012, pp. 331-341.
Siordia, C., Saenz, J. & Tom, S.E., An introduction to macro-level spatial nonstationarity : A geographically weighted
regression analysis of diabetes and poverty, Journal of Studies and Research in Human Geography, 6, 2012, pp. 5-
13.
Tita, G.E. & Radil, S.M., Making space for theory : The challenges of theorizing space and place for spatial analysis
in criminology, Journal of Quantitative Criminology, 26, 2010, pp. 467-479.
Tobler, W., A computer movie simulating urban growth in the Detroit region, Economic Geography, 46, 1970, pp.
234-240.
Vilalta, C.J., How exactly does place matter in crime analysis ? Place, space and spatial heterogeneity, Journal of
Criminal Justice Education, 2012, pp. 1-26.
Voss, P.R., Long, D.D., Hammer, R.B. & Friedman, S., County child poverty rates in the U.S. : A spatial regression
approach, Population Research Policy Review, 25, 2006, pp. 369-391.
Yamashita, K., Understanding urban fire : Modeling fire incidence using classical and geographically weighted
regression, ProQuest, UMI Dissertation Publishing, 2012.
Ad

More Related Content

Similar to Assessing spatial heterogeneity (20)

Evaluating competing predictive distributions
Evaluating competing predictive distributionsEvaluating competing predictive distributions
Evaluating competing predictive distributions
Andreas Collett
 
Bidanset Lombard Cityscape
Bidanset Lombard CityscapeBidanset Lombard Cityscape
Bidanset Lombard Cityscape
Paul Bidanset
 
Makalah Seminar_KNM XVII_ITS
Makalah Seminar_KNM XVII_ITSMakalah Seminar_KNM XVII_ITS
Makalah Seminar_KNM XVII_ITS
Sharon Ogolla
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
Catur Purnomo
 
B00624300_AlfredoConetta_EGM716_MAUP_Projectc
B00624300_AlfredoConetta_EGM716_MAUP_ProjectcB00624300_AlfredoConetta_EGM716_MAUP_Projectc
B00624300_AlfredoConetta_EGM716_MAUP_Projectc
Alfie Conetta MSc BSc(Hon) FRGS
 
Quantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic ProjectionsQuantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic Projections
Congressional Budget Office
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
QUESTJOURNAL
 
GEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCESGEOSPATIAL DATA SOURCES
GEOSPATIAL DATA SOURCES
Expert Writing Help
 
Mixed models
Mixed modelsMixed models
Mixed models
Arun Nagarajan
 
Group5
Group5Group5
Group5
Kritika Gupta
 
01_AJMS_310_21.pdf
01_AJMS_310_21.pdf01_AJMS_310_21.pdf
01_AJMS_310_21.pdf
BRNSS Publication Hub
 
AJMS_5(2)_21.pdf
AJMS_5(2)_21.pdfAJMS_5(2)_21.pdf
AJMS_5(2)_21.pdf
BRNSS Publication Hub
 
Dundee Police and Criminal Justice Group Presentation
Dundee Police and Criminal Justice Group PresentationDundee Police and Criminal Justice Group Presentation
Dundee Police and Criminal Justice Group Presentation
Eric Halford PhD(can)
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear Regression
Jonathan Chauwa
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
the unconditional Logistic Regression .pdf
the unconditional Logistic Regression .pdfthe unconditional Logistic Regression .pdf
the unconditional Logistic Regression .pdf
mikaelgirum
 
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
sipij
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
Parang Saraf
 
Predictive analysis of crime forecasting
Predictive analysis of crime forecastingPredictive analysis of crime forecasting
Predictive analysis of crime forecasting
Frank Smilda
 
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGUSE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
IJDKP
 
Evaluating competing predictive distributions
Evaluating competing predictive distributionsEvaluating competing predictive distributions
Evaluating competing predictive distributions
Andreas Collett
 
Bidanset Lombard Cityscape
Bidanset Lombard CityscapeBidanset Lombard Cityscape
Bidanset Lombard Cityscape
Paul Bidanset
 
Makalah Seminar_KNM XVII_ITS
Makalah Seminar_KNM XVII_ITSMakalah Seminar_KNM XVII_ITS
Makalah Seminar_KNM XVII_ITS
Sharon Ogolla
 
Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112Ekreg ho-11-spatial ec 231112
Ekreg ho-11-spatial ec 231112
Catur Purnomo
 
Quantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic ProjectionsQuantifying the Uncertainty of Long-Term Economic Projections
Quantifying the Uncertainty of Long-Term Economic Projections
Congressional Budget Office
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
QUESTJOURNAL
 
Dundee Police and Criminal Justice Group Presentation
Dundee Police and Criminal Justice Group PresentationDundee Police and Criminal Justice Group Presentation
Dundee Police and Criminal Justice Group Presentation
Eric Halford PhD(can)
 
Best crime predictor: Linear Regression
Best crime predictor: Linear RegressionBest crime predictor: Linear Regression
Best crime predictor: Linear Regression
Jonathan Chauwa
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
the unconditional Logistic Regression .pdf
the unconditional Logistic Regression .pdfthe unconditional Logistic Regression .pdf
the unconditional Logistic Regression .pdf
mikaelgirum
 
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
sipij
 
Merseyside Crime Analysis
Merseyside Crime AnalysisMerseyside Crime Analysis
Merseyside Crime Analysis
Parang Saraf
 
Predictive analysis of crime forecasting
Predictive analysis of crime forecastingPredictive analysis of crime forecasting
Predictive analysis of crime forecasting
Frank Smilda
 
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGUSE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
IJDKP
 

More from Johan Blomme (13)

Curieuzeneuzen ww belgie
Curieuzeneuzen ww belgieCurieuzeneuzen ww belgie
Curieuzeneuzen ww belgie
Johan Blomme
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
Johan Blomme
 
Trends voor data analyse 2014
Trends voor data analyse 2014Trends voor data analyse 2014
Trends voor data analyse 2014
Johan Blomme
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
Johan Blomme
 
Trends in business_intelligence_2013
Trends in business_intelligence_2013Trends in business_intelligence_2013
Trends in business_intelligence_2013
Johan Blomme
 
Trends in business intelligence 2012
Trends in business intelligence 2012Trends in business intelligence 2012
Trends in business intelligence 2012
Johan Blomme
 
The new normal in business intelligence
The new normal in business intelligenceThe new normal in business intelligence
The new normal in business intelligence
Johan Blomme
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economy
Johan Blomme
 
E Business Integration. Enabling the Real Time Enterprise
E Business Integration. Enabling the Real Time EnterpriseE Business Integration. Enabling the Real Time Enterprise
E Business Integration. Enabling the Real Time Enterprise
Johan Blomme
 
Correspondentie Analyse
Correspondentie AnalyseCorrespondentie Analyse
Correspondentie Analyse
Johan Blomme
 
Knowledge Discovery In Data. Van ad hoc data mining naar real-time predictie...
Knowledge Discovery In Data.  Van ad hoc data mining naar real-time predictie...Knowledge Discovery In Data.  Van ad hoc data mining naar real-time predictie...
Knowledge Discovery In Data. Van ad hoc data mining naar real-time predictie...
Johan Blomme
 
Operational B I In Supply Chain Planning
Operational  B I In Supply Chain PlanningOperational  B I In Supply Chain Planning
Operational B I In Supply Chain Planning
Johan Blomme
 
What is data mining ?
What is data mining ?What is data mining ?
What is data mining ?
Johan Blomme
 
Curieuzeneuzen ww belgie
Curieuzeneuzen ww belgieCurieuzeneuzen ww belgie
Curieuzeneuzen ww belgie
Johan Blomme
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
Johan Blomme
 
Trends voor data analyse 2014
Trends voor data analyse 2014Trends voor data analyse 2014
Trends voor data analyse 2014
Johan Blomme
 
Spatial data analysis 1
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
Johan Blomme
 
Trends in business_intelligence_2013
Trends in business_intelligence_2013Trends in business_intelligence_2013
Trends in business_intelligence_2013
Johan Blomme
 
Trends in business intelligence 2012
Trends in business intelligence 2012Trends in business intelligence 2012
Trends in business intelligence 2012
Johan Blomme
 
The new normal in business intelligence
The new normal in business intelligenceThe new normal in business intelligence
The new normal in business intelligence
Johan Blomme
 
Business intelligence in the real time economy
Business intelligence in the real time economyBusiness intelligence in the real time economy
Business intelligence in the real time economy
Johan Blomme
 
E Business Integration. Enabling the Real Time Enterprise
E Business Integration. Enabling the Real Time EnterpriseE Business Integration. Enabling the Real Time Enterprise
E Business Integration. Enabling the Real Time Enterprise
Johan Blomme
 
Correspondentie Analyse
Correspondentie AnalyseCorrespondentie Analyse
Correspondentie Analyse
Johan Blomme
 
Knowledge Discovery In Data. Van ad hoc data mining naar real-time predictie...
Knowledge Discovery In Data.  Van ad hoc data mining naar real-time predictie...Knowledge Discovery In Data.  Van ad hoc data mining naar real-time predictie...
Knowledge Discovery In Data. Van ad hoc data mining naar real-time predictie...
Johan Blomme
 
Operational B I In Supply Chain Planning
Operational  B I In Supply Chain PlanningOperational  B I In Supply Chain Planning
Operational B I In Supply Chain Planning
Johan Blomme
 
What is data mining ?
What is data mining ?What is data mining ?
What is data mining ?
Johan Blomme
 
Ad

Recently uploaded (20)

VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Ad

Assessing spatial heterogeneity

  • 1. Assessing spatial heterogeneity in crime prediction Using geographically weighted regression to explore local patterns in crime prediction in Belgian municipalities November 2016 Johan Blomme Leenstraat 11 8340 Damme-Sijsele URL : www.johanblomme.com Email : [email protected]
  • 2. 1 Assessingspatialheterogeneityincrimeprediction ASSESSING SPATIAL HETEROGENEITY IN CRIME PREDICTION Using geographically weighted regression to explore local patterns in crime prediction in Belgian municipalities Contents 1. Analytical framework 4 2. Exploratory spatial data analysis 7 3. Global non-spatial regression model 11 4. Global spatial regression model 13 5. Local spatial regression model 15 5.1. Visualising GWR results 18 5.2. Cluster analysis 28 6. Conclusions 36
  • 3. 2 Assessingspatialheterogeneityincrimeprediction Assessing spatial heterogeneity in crime prediction Using geographically weighted regression to explore local patterns in crime prediction in Belgian municipalities Traditional regression analysis describes a modelled relationship between a dependent variable and a set of independent variables. When applied to spatial data, the regression analysis often assumes that the modelled relationship is stationary over space and produces a global model which is supposed to describe the relationship at every location in the study area. This would be misleading, however, if relationships being modelled are intrinsically different across space. One of the spatial statistical methods that attempts to solve this problem and explain local variation in complex relationships is Geographically Weighted Regression (GWR). In a global regression model, the dependent variable is often modelled as a linear combination of independent variables that is stationary over the whole area (i.e. the model returns one value for each parameter). GWR extends this framework by dropping the stationarity assumption : the parameters are assumed to be continuous functions of location. The result of the GWR analysis is a set of continuous localised parameter estimate surfaces, which describe the geography of the parameter space. These estimates are usually mapped or analysed statistically to examine the plausibility of the stationarity assumption of the traditional regression and different possible causes of nonstationarity. The use of linear regression is common in many areas of science. Ordinary linear regression implicitly assumes spatial stationarity of the regression-model that is, the relationships between the variables remain constant over geographical space. We refer to a model in which the parameter estimates for every observation in the sample are identical as a global model.
  • 4. 3 Assessingspatialheterogeneityincrimeprediction Spatial nonstationarity occurs when a relationship (or pattern) that applies in one region does not apply in another. Global models are statements about processes or patterns which are assumed to be stationary and as such are local independent, i.e. are assumed to apply to all locations. In contrast local models are spatial disaggregations of global models, the results of which are location-specific. The template of the model is the same : the model is a linear regression model with certain variables, but the coefficients alter geographically. If the parameter estimates are allowed to vary across the study area such that every observation has its own separate set of parameter estimates we have a local model. GWR does not assume the relationships between independent and dependent variables are constant across space. Instead, GWR explores whether the relationships between a set of predictors and an outcome vary by geographical location. GWR is suggested to be a powerful tool for investigating spatial nonstationarity in the relationship between predictors and the outcome variable. Theoretically, spatial nonstationarity is based on the concept of the social construction of space. The interaction between individuals with each other and their physical environment produces space. Human beings are just as much spatial as temporaral beings. By temporal, we mean that we are most influenced by what is immediate in space. What happens near us matters more than non-proximal events. Human’s spatiality and temporality are essential and equal powerful in explaining human behavior. Consequently, everything that is social is inherently spatial, just as everything spatial is inherently socialized. From this perspective, we analyse how the macro-level relationship between crime and various socio-economic and demographic variables unfolds over geographical space.
  • 5. 4 Assessingspatialheterogeneityincrimeprediction 1. Analytical framework Our analysis strategy entails estimating regression models that summarize the “global”, or average, effects of the predictor variables on crime rates across our sample of Belgian municipalities. Given the well-known spatial autocorrelation evident in crime data, we generate the global models using Ordinary Least Squares (OLS) and Spatial AutoRegression (SAR) estimators. OLS and the spatial autoregression model are “global” models in the sense that they both assume that a single set of parameters sufficiently describe the relationships between predictor variables and crime rates. The classical ordinary least Squares (OLS) model is widely used to model the global relationship between a response variable and one or more explanatory variables. OLS assumes, among other things that residuals are spatially independent. Residual autocorrelation captures unexplained similarities between neighboring municipalities, which can be the result of omitted variables or a misspecification of the regression model. Assuming a global model does exist, an exploration of spatial patterns in the data can help determine whether a global model is misspecified – whether the model is missing important predictor variables (spatial error model) or if a spatial term should be included in the model (spacial lag model) – which would improve the accuracy of the global model in explaining crime levels across the study area. Global models that account for spatial effects are spatial autoregressive models (SAR). The spatial error model addresses the presence of spatial autocorrelation by defining a spatial autoregressive process for the error term and, by doing so, captures unexplained similarities. The spacial lag model extends the standard OLS regression model by including a spatially lagged dependent variable, which can be mostly interpreted as spill-over effects. Global regression models assume a homogeneous behavior of the estimated parameters across space. We expect spatial homogeneity to be rare and assume that most social phenomena are not geographically stationary. A way to deal with spacial heterogeneity is the application of geographically weighted regression (GWR) to investigate spatially varying relationships. GWR models spatial autocorrelation and spatial heterogeneity for subsets of the entire data set. Each subset is established around a regression point with near data points exhibiting a higher influence than more distant data points. This weighting is often based on a bi-square kernel function. Of crucial importance is the specification of an appropriate bandwidth length. The
  • 6. 5 Assessingspatialheterogeneityincrimeprediction most common is the adaptive bandwidth, where is length is allowed to vary across space, depending on the density of the data points. In densely populated areas the kernel possesses a shorter bandwith in contrast to regions with larger inter-point distances, where the bandwidth is longer. While it is often argued that GWR is more suitable for exploratory analysis, it is a technique to test whether local models yield a significant improvement in fit over the global models. The following analysis models both spatial autocorrelation and nonstationarity by means of global and local spatial statistical models. An exploratory spatial data analysis, a global non- spatial regression model, a global spatial regression model and finally a local spatial regression model were applied to explore the association between various predictors and crime in Belgian municipalities. We rely on crime data in municipalities, the main political and administrative unit of the Belgian territory. The dependent variable in this study is the crime rate/1000 residents (calculated as a mean over the period 2008-2012) in Belgian municipalities (N= 589, source data : statistics Belgian Federal Police). To test the impact of social deprivation on crime, we collected data at municipality level about various indicators of inequality . Besides mean family income and the percentage unemployed, we use the Gini coefficient as a measure of income variation, indicating the distribution of income in each municipality (between extremes of 0 (absolute equality) and 1 (maximum inequality). As control variables we include various socio-demographic indicators : population density, the share of males in the age group 15 to 64, the percentage of young people (15-24) in the population, the percentage of residents that are foreign born, the percentage of non-Euro foreign born residents and the degree of female labour force participation (source data : statistics Federal Government Belgium, 2011).
  • 7. 6 Assessingspatialheterogeneityincrimeprediction Since the original data for the dependent variable and five of the independent variables are not normally distributed (skewness marked in red in the above table) and normality of data is a basic assumption for both ordinary least squares regression and spatial regression, natural log values (ln) were used for these variables.
  • 8. 7 Assessingspatialheterogeneityincrimeprediction 2. Exploratory spatial data analysis The first step in an exploratory spatial data analysis (ESDA) is to verify if spatial data are randomly distributed. To do this, it is necessary to use global autocorrelation statistics. The global indicators of spatial autocorrelation are not capable of identifying local patterns of spatial association, such as local spatial clusters or local outliers in data that are statistically significant. To overcome this obstacle, it is necessary to implement a spatial clustering analysis (we made use of GeoDa open-source spatial regression software of the GeoDa Center for Geospatial Analysis and Computation, http://geodacenter.asu.edu). A significant Moran’s I statistic is a first clue that parameter estimates in an OLS regression can be affected by spatial residual autocorrelation. For this reason, the Moran’s I statistic was calculated for the dependent variable and the nine independent variables included in this study. The neighborhood relationships for calculating the Moran’s I statistic are defined as first order queen contiguity, which is commonly used (a municipality’s spatial lag is a weighted average of its neighboring localities ; neighbors are typically defined in terms of their physical proximity to the local geographic unit). Results indicate that both the dependent and all independent variables exhibit significant positive spatial autocorrelation. The hypothesis of spatial randomness is clearly rejected. A positive and significant spatial dependence in the dependent variable (crime rate) indicates that the crime rate in a particular municipality is associated with (not independent of) crime rates in surrounding counties. The value of the spatial autocorrelation coefficient (0,297) indicates that a 10 percentage point increase in the crime rate in a municipality results in an increase of nearly 3% in the crime rate in a neighboring municipality. This, together with the results of the LISA cluster analysis, is evidence of the existence of significant spillover effects between municipalities with respect to crime, and implies that there is a need of a coordination of the municipal efforts to fight criminal activities that spill over the municipal borders.
  • 9. 8 Assessingspatialheterogeneityincrimeprediction Prevalence of crime in Belgian municipalities (N = 589) (crimerate/1000 inhabitants ; crimerate percentiles) Global Moran’s I statistic for variables included in this study
  • 10. 9 Assessingspatialheterogeneityincrimeprediction Cartograms of the geographical distribution of independent variables
  • 11. 10 Assessingspatialheterogeneityincrimeprediction LISA cluster map for criminality in Belgian municipalities, N= 589
  • 12. 11 Assessingspatialheterogeneityincrimeprediction 3. Global non-spatial regression model Exploring the relationship between the independent variables and crime rates starts with a multivariate OLS regression model. None of the correlations between the predictors is excessively high enough to yield a major concern about multicollinearity. Nevertheless, we evaluated the diagnostics to assess the issue of multicollinearity more formally. In particular, Variance Inflation Factors (VIFs) were investigated. Since all VIF scores are below the critical value of 5, multicollinearity is rejected1. Results show that the nine predictors explain about 54,2% of the variance in crime rates. Of those, the variables representing the percentage of males in the age group 15-64, the percentage of the age group 15-24 in the population and the percentage of foreign born residents do not contribute significantly to the explanation of the variability in crime rates between municipalities2. A more detailed analysis of the error residuals reveals that they are not normally distributed (Jarque Bera test = 410.059 ; p < 0.001) but not heteroscedastic (Koenker-Bassett test = 14.115 ; p=0.118). Finally, residual independence is tested by the Moran I-statistic. This test shows significant spatial residual autocorrelation (Moran’s I = 0.155 ; p < 0.001), violating the model’s independence assumption. This residual pattern in the OLS model can be the result of existing spatial effects and can be accounted for by means of a spatial regression model. ______________________________________________________________________________ 1 Collinearity diagnostics were estimated using SPSS Base Statistics and no problems of multicollinearity were found among the independent variables. The collinearity diagnostics used were the variance inflation factors (VIF) and tolerances for individual variables. Multicollinearity is said to exist if the VIF is 5 or higher (or equivalently, tolerances of 0,20 or less). The highest VIF-value in this analysis was 4,852 and the lowest tolerance was 0,206, both for mean income. 2 Initially, two dummy variables representing the regions in Belgium were added to the regression equation. However, VIF scores indicated the presence of multicollinearity. Therefore, these dummy variables were no longer withheld in the OLS regression.
  • 14. 13 Assessingspatialheterogeneityincrimeprediction 4. Global spatial regression model The clustering of crime rates indicates that the data are not randomly distributed, but instead follow a systematic pattern. The spatial clustering of variables, and the possibility of omitted variables that relate to the connectivity of neighboring localities, raise model specification issues. Evidence for the latter also comes from the residual autocorrelations present in the OLS model. We employ two alternative specifications to correct for spatial dependence. One is the spatial lag model. This specification is relevant when the spatial dependence works through a spatial lag of the dependent variable. The other specification is the spatial error model. This specification is relevant when the spacial dependence works through the disturbance term (spatial regression models ware developed by making use of GeoDa, regression software of the GeoDa Center for Geospatial Analysis and Computation, http://geodacenter.asu.edu). The value of the LMLAG -test is only weakly significant (LMLAG = 3.598 ; p < 0.1) but the results of the LMERROR -test (56.900 ; p < 0.001) suggest that a spatial error must be considered in the global spatial regression model. The results from the spatial lag model shown in the table on the page, suggest that this model does not perform as well as the spatial error model. The effect of the spatial lag term is statistically weak (rho = 0.084 ; p= 0.101). The robust Lagrange Multiplier (LM) test also recommends the use of the spatial error model and the lower AIC value combined with the higher R 2 value for the spatial error model signals that this model outperforms the spatial lag model. In the spatial error model, all predictor variables except one (the percentage of foreign born residents) yield a statistically significant effect.
  • 15. 14 Assessingspatialheterogeneityincrimeprediction Global OLS versus global spatial regression models Based on the results of the global spatial regression model it is difficult to defend similarities in municipality-level crime as arising from imitation of one’s neighbors, that is, a spatial lag process. Criminality results from a complex mix of social, economic and cultural factors, only a small number of which can be brought into a statistical model of the process. Much of it remains unaccounted for and is summarized in the model’s error term. Although we observe a very small Moran’s I value (-0.022) associated with the spatial error model, the residuals are not in compliance with the assumption of being spatially independent of each other (Breusch-Pagan test for heteroscedasticity = 54.060 ; p< 0.001).
  • 16. 15 Assessingspatialheterogeneityincrimeprediction 5. Local spatial regression model As a global model, local regression modeling carries the assumption that the processes being modeled are uniform throughout the study area : the relationships between the dependent and the independent variables remain stationary (constant) across the entire study area of Belgium. Local spatial regression models take nonstationarity into account. We use GWR4 to perform geographically weighted regression analysis (GWR4 is release of a Microsoft Windows based application for calibrating geographically weighted regression models, which can be used to explore geographically varying relationships between dependent/response variables and independent/explanatory variables ; see Nakaya, 2012). The results of fitting the dataset to different GWR descriptive models are shown below. Four alternatives of GWR modeling were applied considering the four possible combinations between two different types of kernels (fixed or adaptive) and two different bandwidth methods (AICC and CV). GWR models 3 and 4 (both models use an adaptive kernel) offered lower residual squares, meaning that these models provided a better fit to the data. The R 2 value of both GWR-models is nearly the same. We chose GWR model 3 with the lowest AICC value to provide an exploratory analysis of the data. GWR model 1 GWR model 2 GWR model 3 GWR model 4 Kernel fixed fixed adaptive adaptive bandwidth method AICc CV AICc CV adjusted R2 0,628 0,570 0,633 0,639 residual squares 342571,621 444329,261 337812,559 323290,833 AICc 5584,352 5630,776 5578,562 5581,763 Anova test residuals OLS/GWR p < 0,01 p < 0,01 p < 0,01 p < 0,01 GWR models applied to dataset of Belgian municipalities
  • 17. 16 Assessingspatialheterogeneityincrimeprediction Results reveal that the GWR model exhibits a significant improvement in explained variance as compared to the OLS regression model (63,3% vs. 54,2%). The AIC score for the GWR model (5578.562) is substantially lower than the AIC score for the global OLS model (5657.654), which reflects a better goodness of fit (AIC is a measure of spatial collinearity. The lower its value, the better the fit of the model to the observed data). Another method to evaluate the GWR model is the ANOVA test which verifies the null hypothesis that the GWR model represents no improvement over the global model. The computed F-value of 2.753 is in excess of the critical value of F (2.41 ; α = 0.01) with 10 and 496 degress of freedom. The ANOVA test thus suggests that the GWR model is a significant improvement on the global model for the data of Belgian municipalities. The results obtained by the GWR method provide information about locally differing estimation coefficients. Therefore, the GWR results do not report a global estimate for each explanatory variable but rather they provide insights into local ranges of the estimates (minimum, 25% quantile, median, 75% quantile and maximum). The 5-number summary (see page 16) is helpful to get a feel of the degree of spatial nonstationarity in a relationship by comparing the range of the local parameter estimates with a confidence interval around the global estimate of the parameter. This is accomplished by dividing the interquartile range of the GWR coefficient by twice the standard error of the same variable from the global regression (OLS). Ratio values > 1 suggest nonstationarity in the relationship between an independent variable and the dependent variable. The results of the Monte Carlo test indicate that the parameter estimates do vary significantly across space. As shown on the map on the next slide, the total variance explained by the local model ranges from 47,8% to 83,4%. In general, there is a north-south divide with higher R 2 values in the northern part of the country. Explained variance is lowest in the southern part of the province of East Flanders and its surrounding municipalities in Wallonia.
  • 18. 17 Assessingspatialheterogeneityincrimeprediction minimum lower quartile median upper quartile maximum status significance Intercept -513,965 -28,093 210,299 361,918 512,299 non-stationary p < 0.001 ln(Gini inequality) -0,706 0,285 1,062 1,439 2,529 non-stationary p < 0.001 mean income -0,010 -0,006 -0,004 -0,002 0,001 non-stationary p < 0.001 ln(unemployment) -0,012 0,216 0,325 0,444 0,720 non-stationary p < 0.001 ln(population density) -0,057 0,006 0,056 0,132 0,227 non-stationary p < 0.001 % male in age group 15-64 -0,100 0,019 0,044 0,065 0,129 non-stationary p < 0.001 % 15-24 in population -0,119 -0,058 -0,011 0,022 0,087 non-stationary p < 0.001 ln(% foreign born) -0,119 -0,009 0,036 0,083 0,167 non-stationary p < 0.001 ln(% non-Euro foreign) -0,015 0,076 0,106 0,147 0,319 no spatial variability p < 0.001 female labour force participation 0,008 0,038 0,048 0,064 0,096 non-stationary p < 0.001 5-number parameter summary Monte Carlo test Geographically weighted regression 5-number parameter summary results and Monte Carlo significance test for spatial variability of parameters (Belgian municipalities, N = 589) Local R 2 values of the GWR model (Belgian municipalities, N = 589)
  • 19. 18 Assessingspatialheterogeneityincrimeprediction 5.1. Visualising GWR results To better understand and interpret nonstationarity in individual parameters it is necessary to visualize the local parameter estimates and their associated diagnostics. The output of a GWR analysis includes data that can be used to generate surfaces for each model parameter that can be mapped, where each surface depicts the spatial variation of the relationship between a predictor and the outcome variable. A challenge in GWR analysis is to visually represent the large number of results through the use of cartographic design. Mapping only the parameter estimates is misleading, as the map reader has no way of knowing whether the local parameter estimates are significant. As Mennis (2006 : 172) notes, a main issue is that “the spatial distribution of the parameter estimates must be presented in concert with the distribution of significance, as indicated by the t-value, in order to yield meaningful interpretation of results”. There are several possibilities. The most popular and easiest way to visualise the results of GWR is to make use of choropleth maps and colour the regions according to the values of parameter estimates or the associated t-values in order to interpret the significance of the parameters. Because the patterns of t-values for the parameter estimates are important to reveal which areas have statistically significant estimates, we initially mapped the t-values for all variables (see appendix). Another possibility to map the GWR results is to create raster surfaces for both the parameter estimates and the t-values. Geostatistical methods, e.g. inverse distance weighting (IDW) and ordinary kriging (OK), are applied in spatial interpolation from point measurement to continuous surfaces. Both IDW and OK estimate values at unmeasured points by the weighted average of observed data at surrounding points. The weight of each measured value is a function of its distance from the point we are trying to predict. The difference between both methods is that in IDW the weights are arbitrarily specified while in OK the weights are estimated from the data itself 1. ____________________________________________________________________________ 1 See Dorman (2014, chapter 8) for an extensive explanation of spatial interpolation.
  • 20. 19 Assessingspatialheterogeneityincrimeprediction To create raster surfaces of estimated coefficients and local t-values for each of the parameters, we use R’s gstat package and the gstat function that lets us create a spatial prediction model. The latter is then applied to a grid that represents the area we are working with, to yield a new raster with predicted values (this new raster is obtained through the use of the interpolation function in R’s raster package). For IDW, we created prediction models with IDP-parameters set to 1, 2 and 3. A low IDP-parameter (1) results in a smoother surface while higher values result in sharper boundaries. For OK, a model is automatically created through the use of the autofitVariogram function of the automap package in R. To evaluate the predictive ability of the interpolation models the process of cross-validation is used to compare the predicted values to the observed ones. The raster surfaces with the lowest root mean square error (RMSE) are finally chosen for the visual representation of parameter estimates and t-values1. _____________________________________________________________________________________________ 1 R code for the various steps to construct a raster surface : # GWR parameter estimates for a variable (e.g. income) gwr_income <- read.csv("parameter_estimates_income.csv",header=T,sep=";") # extract centroids of municipalities mun.centroids <- data.frame(coordinates(belgie),belgie@data$ID_4) names(mun.centroids) <- c("lon","lat","id") # add lat lon to data gwr_income <- merge(gwr_income,mun.centroids,by="id") names(gwr_income) [3] <- "x" names(gwr_income) [4] <- "y" # make datafile a SpatialPointsDataFrame coordinates(gwr_income) <- ~ x + y # create grid (grid and datafile must have the same projection) r <- raster(nrow=500,ncol=500, xmn=bbox(belgie)["x","min"],xmx=bbox(belgie)["x","max"], ymn=bbox(belgie)["y","min"],ymx=bbox(belgie)["y","max"], crs=proj4string(gwr_income)) # model creation with gstat model <- gstat(formula = inc ~ 1, data = gwr_income,set=list(idp=3)) print(model) z <- interpolate(r,model) z <- mask(z,belgie) # cross-validation cv <- gstat.cv(model) rmse <- function(x) sqrt(sum((-x$residual)^2)/nrow(x)) rmse(cv)
  • 21. 20 Assessingspatialheterogeneityincrimeprediction For a selected parameter, the surface created for the estimated coefficients and the local t- values can be mapped together. In the map below, the t-values for the income parameter are added as contour lines on top of the parameter estimate surface. While it is possible for the reader to distinguish significant parameter estimates from those that are not significant, the contour lines may not allways be easy to interpret. Overlay of t-values as contour lines on parameter estimate map for income In order to identify directly zones with significant parameter estimates, it is possible to set up a mask. Insignificant values (between -1.96 and 1.96) in the raster surface layer of t-values are set to NA, and subsequently using the mask function removes all values from the parameter raster layer that are NA in the t-surface layer. This allows the visualisation of only the significant parameter estimates. We used R’s plotGoogleMaps package to map significant parameter estimates with a Google maps background (the resulting html-files also allow to interactively explore the GWR results). The maps provide strong evidence of significant spatial heterogeneity in the effect of predictor variables on crime across municipalities (significant positive parameter erstimates are coloured yellow to red while significant negative estimates are shades of blue).
  • 22. 21 Assessingspatialheterogeneityincrimeprediction Significant GWR-estimates for Gini inequality (ordinary kriging) Significant GWR-estimates for income (IDW, ß = 3)
  • 23. 22 Assessingspatialheterogeneityincrimeprediction Significant GWR-estimates for unemployment (ordinary kriging) Significant GWR-estimates for population density (ordinary kriging)
  • 24. 23 Assessingspatialheterogeneityincrimeprediction Significant GWR-estimates for female labour force participation (IDW, ß= 3) Significant GWR-estimates for % males in age group 15-64 (ordinary kriging)
  • 25. 24 Assessingspatialheterogeneityincrimeprediction Significant GWR-estimates for % age group 15-24 in population (ordinary kriging) Significant GWR-estimates for % foreign born in population (IDW, ß= 3)
  • 26. 25 Assessingspatialheterogeneityincrimeprediction Significant GWR-estimates for % non-Euro in population (IDW, ß= 3) The results of the geographically weighted regression analysis indicate that spatially varying processes operate in Belgian municipalities with respect to the relationships between socio- economic and socio-demographic variables and crime rates. Several local results are of particular note. First, when we examine the incidence of significant parameter estimates at the local level, 61 % of all parameter estimates are insignificant (see graphs on pages 25-26). With the exception of unemployment and female labour force participation, the majority of parameter estimates for all other independent variables and the intercept are insignificant. Positively of negatively signed global effects of covariates do not hold across all municipalities. This proves it is important to analyze beyond the global level (OLS) and to examine variation at the local level (GWR). Secondly, the global parameter estimates mask a great deal of variation at the local level. For example, while the global parameter estimate for unemployment is 0,217, the parameter estimates at the local level range from -0,012 to 0,720. Where the global estimate for the
  • 27. 26 Assessingspatialheterogeneityincrimeprediction percentage of non-Euro foreign born inhabitants is 0,114, the local parameter estimates range from -0,015 to 0,319. Finally, insignificant global results mask countervailing positive and negative effects of covariates at the local level. The negatively signed but insignificant global effect of the percentages of 15-24 aged youngsters in the population (age) reaches negative significance in 23,2 % of the municipalities while the effect of this covariate reverses to positive significance in a minority (2,9 %) of all municipalities. In a similar way, the positively signed but insignificant global effect of the percentage of males aged 15-64 in the local population (gender) reaches positive significance in 39,2 % of the municipalities while the effect of this variable is negative significant in 2 % of the municipalities. GWR model significant estimates
  • 29. 28 Assessingspatialheterogeneityincrimeprediction 5.2. Cluster analysis We can further explore the results of the GWR analysis by clustering locations with similar parameter values for the variables considered. This synthesizes the output that is generated by the GWR model and can help to interpret the results . A two-step cluster analysis based on the nine parameter estimates and the intercept was applied. We experimented with a range of clusters between 4 and 8. The optimal choice in terms of the number of clusters was 6 (municipalities were divided in evenly sized clusters).
  • 30. 29 Assessingspatialheterogeneityincrimeprediction Although latitude and longitude were not included in clustering municipalities’ parameter estimates, the six clusters are geographically coherent. A discriminant analysis with cluster membership as the dependent variable and both lat/lon-coordinates as predictors confirms that 70,6% of the cluster members are correctly classified based on their location which means that 70,6% of the municipalities were geographically near other members of the same cluster. By cluster, the percentage of correctly classified members varies from 57,8% to 83,8%.
  • 35. 34 Assessingspatialheterogeneityincrimeprediction Distribution of t-values within clusters (con’d) Although the parameter estimate of non-Euro inhabitants does not vary spatially (see 5-number parameter summary), it is by far the most important predictor of criminality in cluster 1. In comparison with the other clusters, the effect of the percentage of males in the age group 15- 64 is significant in a large majority of municipalities covered by cluster 1. Like cluster 1, cluster 2 represents a contiguous area of municipalities but the percentage of correctly classified municipalities is lowest (57,8%) of all clusters. Within this cluster the percentage of explained variance strongly differs when moving from west to east (R 2 between 47,7% and 80,7). Cluster 3 covers large parts of Wallonia, where local R 2 values are relative low. In cluster 3 as well as in cluster 4 and cluster 6, the parameter estimates for socio-economic variables (Gini inequality, mean income and unemployment) are significant in resp. 80,6 %, 65,9 % and 80,6 % of the municipalities. In the other clusters, the effect of these variables is significant in less than one third of the municipalities.
  • 36. 35 Assessingspatialheterogeneityincrimeprediction Apart from the effect of socio-economic variables, the effect of non-Euro inhabitants on crime is also significant in a majority of municipalities in cluster 4. In cluster 5 the local R 2 values are also relative low and the estimate of the intercept factor is significant. Criminality in the east cantons of cluster 5 also correlates significant and independent of other predictors with population density and the presence of young people in the population inhibits criminality in this area of cluster 5. As stated, in the area that represents cluster 6 (the largest cluster in terms of the number of municipalities), the measures of inequality are the most significant determinants of crime. Criminality also varies in an independent way with population density.
  • 37. 36 Assessingspatialheterogeneityincrimeprediction 6. Conclusions The objectives of this study were to examine the extent of geographic variation in the relationship between socio-economic and demographic variables on the one hand and crime rates on the other. The goals of our study were (i) to compare the performance of global and local spatial regression with OLS regression (ii) examine spatial nonstationarity throught the use of GWR (iii) map the parameter coefficients of GWR for further interpretation and (iv) examine whether there are spatial groupings of parameter estimates. The analysis revealed that there is evidence of overall clustering in crime rates in Belgium. Local spatial analysis uncovered that places with the highest crime rates are often proximate. The finding of the existence of local spatial autocorrelation in crime rates suggests that failing to utilize spatially-oriented methodologies may result in biased parameter values in explanatory models. As far as global models are concerned, this analysis demonstrated that a spatial error model adds significantly to the understanding and interpretation of spatially varying crime rates. The use of a GWR model allowed for an assessment of spatial heterogeneity when exploring the relationships between predictor variables and crime rates by local area. Geographically weighted estimations provided the best fit to the data. Predictor variables as well as crime rates showing strong local variation point to problems that policy makers best address at the local level and the situation in particular areas. Significant local parameter estimates were found for the predictor variables, confirming spatial heterogeneity in the effects of these variables on crime and providing insights into the spatial scale at which processes may be operating. Furthermore, a two-steps cluster analysis revealed distinct zones of spatial effects.
  • 39. 38 Assessingspatialheterogeneityincrimeprediction Spatial mapping of the coefficients from GWR modeling (con’d)
  • 40. 39 Assessingspatialheterogeneityincrimeprediction Spatial mapping of the coefficients from GWR modeling (con’d)
  • 41. 40 Assessingspatialheterogeneityincrimeprediction Spatial mapping of the coefficients from GWR modeling (con’d)
  • 42. 41 Assessingspatialheterogeneityincrimeprediction Spatial mapping of the coefficients from GWR modeling (con’d)
  • 43. 42 Assessingspatialheterogeneityincrimeprediction References Anselin, L., Spatial regression analysis in R. A workbook, Spatial Analysis Laboratory, Dep. Of Geography, University of Illinois, Urbana-Champaign, may 2007. Arnio, A.N. & Baumer, E.P., Demography, foreclosure and crime : Assessing spatial heterogeneity in contemporary models of neighborhood crime rates, Demographic Research, 26, 2012, pp.449-488. Cahill, M. & Mulligan, G., Using geographically weighted regression to explore local crime patterns, Social Science Computer Review, 25, 2007, pp. 174-193. Dorman, M., Learning R for geospatial analysis, Birmingham, Packt Publishing, 2014. Fotheringham, A.S., Brunsdon, C. & Charlton, M.E., Geographically weighted regression : The analysis of spatially varying relationships, Chichester UK, John Wiley & Sons, 2002. Helbich, M., Leitner, M. & Kapusta, N.D., Geospatial examination of lithium in drinking water and suicide mortality, International Journal of Health Geography, 2012, pp . 11-19. Matthews, S.A. & Yang, T.Ch., Mapping the results of local statistics : Using geographically weighted regression, Demographic Research, 26, 2012, pp . 151-166. Matthews, S.A. & Parker, D.M., Progression in spatial demography, Demographic Research, 28, 2013, pp . 271-312. Mennis, J.L., Mapping the results of geographically weighted regression, The Cartographic Journal, 43, 2006, pp. 171-179. Nakaya, T., GWR4 User Manual, update 7 may 2012. Shoff, C., Yang, T.CH & Matthews, S.A., What has geography got to do with it ? Using GWR to explore place-specific associations with prenatal care utilization, Geo Journal, 77, june 2012, pp. 331-341. Siordia, C., Saenz, J. & Tom, S.E., An introduction to macro-level spatial nonstationarity : A geographically weighted regression analysis of diabetes and poverty, Journal of Studies and Research in Human Geography, 6, 2012, pp. 5- 13. Tita, G.E. & Radil, S.M., Making space for theory : The challenges of theorizing space and place for spatial analysis in criminology, Journal of Quantitative Criminology, 26, 2010, pp. 467-479. Tobler, W., A computer movie simulating urban growth in the Detroit region, Economic Geography, 46, 1970, pp. 234-240. Vilalta, C.J., How exactly does place matter in crime analysis ? Place, space and spatial heterogeneity, Journal of Criminal Justice Education, 2012, pp. 1-26. Voss, P.R., Long, D.D., Hammer, R.B. & Friedman, S., County child poverty rates in the U.S. : A spatial regression approach, Population Research Policy Review, 25, 2006, pp. 369-391. Yamashita, K., Understanding urban fire : Modeling fire incidence using classical and geographically weighted regression, ProQuest, UMI Dissertation Publishing, 2012.