0% found this document useful (0 votes)
56 views

Quantitative Methods I: Reproducible Research and Quantitative Geography

This document discusses the importance of reproducible research in quantitative geography. It argues that as trends move toward algorithms, simulations, and big data analysis, quantitative research should be documented well enough that a third party can replicate the results. The document provides examples of reproducible research in geography from the 1970s to present day. It discusses tools that enable reproducibility and the benefits it provides. Finally, it considers challenges quantitative geographers face in publishing reproducible research.

Uploaded by

kausara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

Quantitative Methods I: Reproducible Research and Quantitative Geography

This document discusses the importance of reproducible research in quantitative geography. It argues that as trends move toward algorithms, simulations, and big data analysis, quantitative research should be documented well enough that a third party can replicate the results. The document provides examples of reproducible research in geography from the 1970s to present day. It discusses tools that enable reproducibility and the benefits it provides. Finally, it considers challenges quantitative geographers face in publishing reproducible research.

Uploaded by

kausara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

View metadata, citation and similar papers at core.ac.

uk brought to you by CORE


provided by MURAL - Maynooth University Research Archive Library

Progress report
Progress in Human Geography
2016, Vol. 40(5) 687–696
Quantitative methods I: ª The Author(s) 2015
Reprints and permission:

Reproducible research sagepub.co.uk/journalsPermissions.nav


DOI: 10.1177/0309132515599625
phg.sagepub.com
and quantitative geography

Chris Brunsdon
National Centre for Geocomputation, Maynooth University, Ireland

Abstract
Reproducible quantitative research is research that has been documented sufficiently rigorously that a third
party can replicate any quantitative results that arise. It is argued here that such a goal is desirable for
quantitative human geography, particularly as trends in this area suggest a turn towards the creation of
algorithms and codes for simulation and the analysis of Big Data. A number of examples of good practice in
this area are considered, spanning a time period from the late 1970s to the present day. Following this,
practical aspects such as tools that enable research to be made reproducible are discussed, and some
beneficial side effects of adopting the practice are identified. The paper concludes by considering some of the
challenges faced by quantitative geographers aspiring to publish reproducible research.

Keywords
Big Data, computational paradigm, geocomputation, programming, reproducibility

I Reproducibility in research The term reproducible research (Claerbout,


1992) is used to describe an approach which
A great deal of practical quantitative work in
may be used to address this problem. Although
human geography relies on the analysis of data –
not noted greatly by geographers at the time of
and it is often the case that published results are
writing (but see Brunsdon and Singleton, 2015),
the final exposition of a great deal of behind-
it has gained attention in a number of areas where
the-scenes data collation, re-formatting, cod-
quantitative data analysis is used, for example:
ing, statistical modelling and visualization. It
statistics (Buckheit and Donoho, 1995; Gentle-
might be said that although published articles
man and Temple Lang, 2004), econometrics
in this area exist to outline underlying questions,
(Koenker, 1996) and signal processing (Barni
and draw conclusions from the data analysis, the
et al., 2007). It is argued here that there is a strong
conclusions will depend greatly on the behind-
case for a focus on this topic in quantitative geo-
the-scenes work as well. This is why those carry-
graphy. The goal of reproducible research is that
ing out this work are generally listed as authors.
However, although the publication itself is a
platform for discourse and debate around its
Corresponding author:
content, it is sometimes harder to incorporate the
Chris Brunsdon, National Centre for Geocomputation,
behind-the-scenes activities into such debate, NCG, Maynooth University, Iontas Building, NUI, Maynooth,
despite the fact that it can also influence conclu- Ireland.
sions and recommendations. Email: [email protected]
688 Progress in Human Geography 40(5)

complete details of any reported results and the by the researcher or a third party) as an enabling
computation used to obtain them results should technology. In both of the newer paradigms,
be available, so that others following the same although important ideas may be articulated in
procedures and using the same data can obtain published texts, distinct intellectual contribu-
identical results. This article considers the rele- tions are embedded in software code where the
vance and implications of this for geographical ideas are represented in their most detailed
data analysis and GIS. Although the idea was put form. Given this, a full critical engagement with
forward over two decades ago, the need to adopt researchers working within these paradigms is
reproducible practices is more relevant than inhibited if code is not available openly. This
ever. It has been argued that in addition to the is generally the case for quantitative science and
two ‘classical’ paradigms of science that were social science, and for digital humanities. Here
commonly acknowledged at the time of the attention will be focused on the implications
Claerbout (1992) paper (Hey et al., 2009; for quantitative geography, geocomputation and
Kitchin, 2014b), two further paradigms are geographical information science.
emerging:

1. Deductive (mathematics and formal logic) II Geographical examples


2. Empirical (data collection, statistical model of reproducible research
calibration and testing of hypotheses) For geographers, a consideration of the implica-
tions of the computational and exploratory
In chronological order a third computational paradigms is key in making the case for repro-
paradigm uses algorithmic approaches such as ducibility. In terms of the computational para-
large-scale simulation (for example agent- digm, there is already a long tradition of the
based modelling; Heppenstall et al., 2012) as a use of this approach. Although pre-dating the
tool to gain insight into complex systems. Next, time when the idea of a computational paradigm
a fourth exploratory paradigm is emerging in science was more common currency, work
(Kelling et al., 2009), typified by the use of ‘data such as Openshaw and Taylor (1979) exploring
mining’ or, more generally, data-intensive the variation in correlation coefficients as areal
approaches to identify interesting (arguably units change demonstrates its use impressively.
useful?) patterns in very large and structurally A key idea in the paper is this exploration of
complex data sets. This emergence is in part due variability, but a comprehensive and accurate
to the fact that advanced data collection, mea- record of how this was achieved lies in the
surement and observational technology have underlying FORTRAN code. Further examples
made it possible to collect very large data (but include those related to microsimulation (Clarke
often ‘messy’ data sets), and parallel advances and Holm, 1987). Lovelace and Ballas (2013)
in computer technology, such as cloud comput- modify microsimulation techniques to provide
ing, mean it is possible to process such data sets simulations guaranteed to produce integer-
in efficient ways. As the two ‘traditional’ para- based weighting for iterative proportional fit-
digms interact, there are interactions between ting (Ballas et al., 2005), and again the key ideas
all four of the paradigms listed. For example, are those reflected in code. In this case, a fully
large-scale simulations are a way of exploring reproducible approach is taken – in a supple-
the consequences of certain mathematical assump- ment to the main article by Lovelace and Ballas
tions arising from deductive approaches. a document outlines the technique in detail,
One thing linking the newer paradigms is incorporating code written in R (R Core Team,
their reliance on computer code (either created 2015) used to implement the algorithm. This
Brunsdon 689

enables others to interact with the algorithm


specified, and either modify it or apply it in
a different situation, but one sufficiently sim-
ilar that the same analytical framework would
be meaningful. A similarly open approach is
found in Ren and Karimi (2012), who present
a fuzzy logic approach to GPS-based wheel-
chair navigation – here a link is provided to
Java code used to implement their proposed
algorithm.
An epidemiological example may be found
in Parker and Epstein (2011), which uses
agent-based models to simulate disease trans-
mission on a global scale. In discussion, the
authors provide a detailed outline of the under-
lying code used and, in particular, consider and
provide details to assist in reproducing the code
(including several code chunks), again making
it possible to understand the underlying model
(the key idea embodied in the article) more thor- Figure 1. Map obtained by reproducing algorithm of
oughly and consider the effects of relaxing or Wood et al. (2012).
modifying the assumptions of the model by
modifying the code and re-running. of reproducibility of the algorithm discussed,
Other articles, although not providing full the author of this article was able to recreate
reproducibility – as they do not make the exact it in R, for example, producing the results
code used available – do provide very detailed shown in Figure 1.
descriptions so that there is a strong chance
that a third party could reproduce the results.
Although arguably this implies that full repro-
ducibility is not achieved, papers adopting this
III A geographical case
approach demonstrate some of the advantages for reproducibility
outlined above. For instance, Bergmann (2013) Clearly, this idea is more practical in some areas
combines quantitative and qualitative approaches of study than others, and resources are an impor-
to consider global geographies of carbon emis- tant factor. It would not be feasible to re-run
sions from a number of perspectives. For the an entire census including data collection, col-
quantitative part, full details of input-output lation and distribution, for example. However,
models are provided which could be used to in the area of quantitative human geography
reconstruct and run analyses. A very different (assuming we accept census data ‘as seen’!), and
paper by Wood et al. (2012) similarly provides particularly spatial data analysis and GIS, it is a
highly detailed computational description – in practical proposal in many cases.
this case of algorithms and data graphics with The above may be seen as sufficient justifica-
the appearance of being hand-drawn. Although tion for reproducible research. However, if a
the direct code to produce the results seen more detailed case is to be made, the following
in the paper is not shared, an open source scenarios (taken from Brunsdon and Singleton,
library of tools is made available. In terms 2015) help to reinforce the argument:
690 Progress in Human Geography 40(5)

1. You have a data set that you would like article by Claerbout (1992) and those following,
to analyse using the same technique as providing such detail was not considered stan-
described in a paper recently published dard practice in many disciplines. Indeed, some
by another researcher in your area. In time later, few journals (none in geography,
that paper the technique is outlined in although this could be changing soon) insist that
prose form, but no explicit algorithm is such precise details are provided, and it could
given. Although you have access to the perhaps be argued that there is some contribu-
data used in the paper, and have attempted tory negligence on their part.
to recreate the technique, you are unable Similarly, although it is usually required
to reproduce the results reported there. that researchers must cite the sources of sec-
2. You published a paper five years ago in ondary data, such citations often consist of
which an analytical technique was applied acknowledgement of the agency that sup-
to a data set. You now discover an alterna- plied this data, possibly with a link to a gen-
tive method of analysis, and wish to com- eral website, rather than an explicit link (or
pare the results. links) to a file (or files) that contained the
3. A particular form of analysis was reported actual data used in the research, or details
in a paper; subsequently it was discovered of any re-formatting of the data (including
that one software package offered an code) prior to analysis. However, both pieces
implementation of this method that con- of information allow published results to be
tained errors. You wish to check whether critically assessed and scrutinized – ulti-
this affects the findings in the paper. mately leading to more trustworthy research
4. A data set used in a reported analysis was conclusions.
subsequently found to contain rogue
data, and has now been corrected. You
wish to update the analysis with the
IV The case for reproducible
newer version of the data. quantitative geography
The above is a general argument for reproduci-
Articles providing precise verbal description bility. However, one could ask whether this is
of algorithms are useful in these scenarios – as relevant or practical for applications in quantita-
exemplified in the earlier examples – and it is tive human geography. In terms of relevance, it
certainly the case that this is a great improve- is worth noting that a great deal of analysis of
ment on vaguer descriptions that provide insuf- social and economic data is inherently spatial –
ficient information to reproduce initial analyses. whether focusing on regional, local or street
However, one could argue that the code itself is level – and that the results of such analyses are
a much stronger aid to reproduction – a verbal often used to inform policy-makers, and are
description being prone both to incorrect inter- used in decision-making processes. In many
pretation and omission of necessary detail. In cases, the data being analysed is publicly avail-
addition, there is the possibility that the code able – for example, the US Census Bureau
used in an article may contain an error, so that provide a number of APIs to access official sta-
the precise description is in fact precise only tistics such as economic time series indicators
in outlining what the author thinks it does – only and the decennial census for 1990, 2000 and
the code itself will yield what it actually does. In 2010, the UK provides public access to census
most cases, the omission of such information is and reported crime data crime data, and Ireland
not done with malice aforethought on the part of provides access to Irish census data. However,
researchers. Until the issue was raised in the not all reports or articles analysing this and
Brunsdon 691

other publicly available data provide precise weighting of summary statistics lead to serious
details of the analysis. miscalculations that inaccurately represent the
There are a number of arguments as to why relationship between public debt and GDP growth
such information should be provided. The first among 20 advanced economies . . . Our overall
is a purely academic one – a useful and informed evidence refutes RR’s claim that public debt/GDP
ratios above 90% consistently reduce a country’s
critical discourse of any analytical work can only
GDP growth. (2013: 1)
take place when full details are provided. When
the data analysis is a black box, it is difficult to This arose after a student, Thomas Herndon,
either uphold or argue against any conclusions unsuccessfully attempted to reproduce the anal-
reached. One cannot tell whether the underlying ysis in Reinhart and Rogoff’s paper as a course-
models or techniques are appropriate or, even if work exercise. Investigations unearthed that the
they are, whether the underlying code or other analysis was flawed – in part due to an error
computational approach faithfully reflects them. with an Excel spreadsheet. In this case measures
A second argument is one of accountability. were not taken to ensure reproducibility in the
Many quantitative studies inform policy deci- original paper – it took an amount of forensic
sions by governments and other institutions – computing to discover the problem. Following
different quantitative analyses with different this, an errata was published (Reinhart and Rog-
outcomes could well lead to different policy off, 2013), although Rogoff and Reinhart have
decisions. Providing information not only about defended their conclusions – if not their original
the sources of data used but also about the meth- analysis. However, the debate continues as
ods used to analyse the data is a key strategy of authors of the critique continue to challenge a
open government and democratic decision- number of assumptions in the corrected analysis.
making. As suggested earlier, this in turn leads Putting aside any criticisms I may have of the
to a more trustworthy approach – although this original paper, the outcome here is perhaps one
does not guarantee that an analysis is without of cautious optimism in that an open debate about
error, it provides a mechanism where it is open the underlying analysis is now taking place –
to public scrutiny, so that the probability that any albeit after a great deal of public controversy.
error is identified and corrected is notably Again quoting from Herndon, Ash and Pollin:
increased. Also, relating to the earlier point, it
implies that any assumptions made in the analy- Beyond these strictly analytical considerations,
sis are open to scrutiny, so that public discussion we also believe that the debate generated by our
critique of RR has produced some forward prog-
and debate regarding the basis of policy deci-
ress in the sphere of economic policy making.
sions is made possible. (2013: 279)
A reminder of the relevance of this is pro-
vided through the recent controversy surround- However, a reproducible approach here could
ing a paper by Reinhart and Rogoff (2010), have resulted in a smoother path to the final sit-
whose published findings have been widely uation of public debate and a resolution of the
cited as an argument for fiscal austerity. How- erroneous analysis. Indeed, the spirit of the
ever, in an article by Herndon, Ash and Pollin exercise set to the student was that of reprodu-
(2013), flaws were identified in the data analy- cing the published analysis.
sis carried out in the paper. Quoting from the
abstract of the latter article:
V Achieving reproducibility
We replicate . . . and find that selective exclusion To address these problems, one approach pro-
of available data, coding errors and inappropriate posed is that of literate programming (Knuth,
692 Progress in Human Geography 40(5)

1984). This was initially proposed as a tool for publicly available. Thus, not only is it possible
documenting code, where a single file contained to share high level data analysis operations, but
both the code documentation and the code itself. also the code used to build the tools at the higher
This was used to generate both a human read- level.
able document and computer readable content Another possibility here is an approach using
to generate software. The purpose of this was Pweave (Pastell, 2014) – a similar extension of
that the human readable output provided an NOWEB to embed Python code rather than R.
explanation of the working of the program (and Again, Python offers many tools for geographi-
also neatly printed listings of the code), offering cal data analysis, such as the PySAL package
an accessible overview explanation of the pro- (Rey, 2015).
gram’s function. However, such compendium
files can also be used in a slightly different way,
where rather than describing the code, the VI Beneficial side effects
human readable output is an article containing Although much of the justification of a reprodu-
some data analysis performed by the incorpo- cible approach has been defensive, there are a
rated code. Tabulated results, graphs and maps number of benefits provided. Many of these
are created by the embedded code. As before, occur as side effects when using the kinds of
two operations can be applied to the files – approach outlined above. In particular:
document creation, and code extraction. The
embedded code is also visible in the original  Reproducible analyses can be compared:
file. Thus information about both the reporting Different analytical approaches attempt-
and the processing can be contained in a single ing to address the same hypothesis can
document – and if this document is shared then a be compared on the same data set, to
reproducible analysis (together with associated assess the robustness of any conclusions
discussion) is achieved. drawn. In particular, a third party can take
Examples of this approach are the NOWEB an existing reproducible document and
system (Ramsey, 1994), and the Sweave and add an alternative analysis to it.
Knitr packages (Leisch, 2002; Xie, 2013). The  Methods are documented: One option
first of these incorporates code into LaTeX doc- with many reproducibility tools is to
uments using two very simple extensions to the incorporate the code itself – as well as its
markup language. The latter two are extended outputs – in the documents produced.
implementations of this system using R as the This allows for transparency in the way
language for the embedded code. Knitr also that results are obtained.
offers the possibility of embedding code into  Methods are portable: Since the code
markdown – a simpler markup language than may be extracted from the documents,
LaTeX – which facilitates very quick produc- others may use it and apply it to other data
tion of reproducible documents. The fact that sets, or modify it and combine it with
R is used in the latter two approaches is other methods. This allows approaches
encouraging for geographers, since R offers a to be assessed in terms of their generality,
number of packages for spatial analysis, geogra- and encourages further dialog in terms of
phical data manipulation of the kind provided interpretation of existing data.
by geographical information systems, and spa-  Results may be updated: If updated ver-
tial statistics (Brunsdon and Comber, 2015). sions of data used in an analysis are pub-
Furthermore, as R is open source software, the lished (for example new census data),
code used in any of these packages is also methods applied to the old data may be
Brunsdon 693

re-applied and updated results compared wave of practitioners for whom the adoption
to the original ones. Also, if the original of coding as a tool for data analysis does not
data required amendment, an updated imply a change of culture. Recent attendance
analysis could easily be carried out. at GIS conferences by the author would suggest,
 Reports may have greater impact: Recent at least anecdotally, that these trends are
work has shown that papers in a number reflected in geocomputation and geographical
of fields, including reproducible analy- information science.
ses, have higher impact and visibility. This Other minor practical challenges also exist –
is discussed in Vandewalle, Kovačević for example, how can a sequence of random
and Vetterli (2009). numbers in simulations be reproduced? How-
ever, many of these can be resolved by exam-
ples of ‘best practice’. In the given example,
VII Challenges random sequences may be made reproducible
The above sections argue that reproducible by noting that they are actually pseudo-random
approaches offer a number of benefits. However, and specifying the code used to produce them,
their adoption requires challenging changes in and the seed value(s).
current practice. Perhaps one of the most nota- However, a more significant challenge is cre-
ble is that the knitr, Sweave and Pweave ated by the so-called ‘Data Revolution’ (Kitchin,
approaches all require the use of code to carry 2014b) and the idea of Big Data – relating to the
out statistical analysis, visualization and data new paradigm of exploration and the search for
manipulation, rather than commonly adopted empirical pattern, with implications of data min-
GUI-based tools, such as Excel. Unfortunately ing and the search for patterns. Not only referring
this is an inherent characteristic of reproducibil- to the size of data sets, the term Big Data also
ity. After a series of point-and-click operations, refers to the diversity of applications, complexity
results are cut and pasted into a Word document of data and the fact that data is produced in a real-
(or similar) and the link between the reported time ‘firehose’ environment where sensors and
result and the analytical procedure is lost. It is other data-gathering devices are streaming vast
perhaps no surprise that the Reinhart and Rogoff quantities of data every second. This is of
affair was seeded by an error in Excel. importance to geographers applying quantitative
Despite this, perhaps it is more realistic to techniques, since much of this data has a geogra-
consider ways in which the divide between phical component. The exploratory paradigm is
GUI-based tools and reproducibility could be not without controversy – while the computa-
bridged than to propose such tools be aban- tional paradigm could be viewed as working
doned. One possibility might be to provide in co-operation with deductive and empirical
GUI-based software in which every interactive approaches, some propose the exploration of Big
event is echoed by a code equivalent, which is Data as a superior competitor to theory-led
recorded. The recorded code could then be approaches (see Mayer-Schonberger and Cukier,
embedded in a document. One such tool that 2013, or Anderson, 2008), suggesting that work-
does this on a web-based interface is Radiant ing with near-universal data sets and identifying
(Radiant News). However, it is perhaps also pattern supplants the need for theory and experi-
worth noting a general turn towards coding and ment. The title of the Anderson piece leaves little
away from GUI solutions in data analysis as doubt as to the magnitude of the claim being
indicated by the popularity of a number of books made!
such as O’Neill and Schutt (2013) and McKin- However, such boosterish claims have not
ney (2012) – suggesting that there is a current gone unchallenged – notably, in the discipline
694 Progress in Human Geography 40(5)

of geography, by Miller and Goodchild (2014), scrutiny of the representativeness of data – one
who argue, among other things, that there is still contextual factor that may enable more mean-
a need to understand the nature of the data being ingful analysis of Big Data.
used and to discriminate between spurious and
meaningful patterns. Kitchin (2014) warns of VIII Conclusion
the risks of ignoring contextual knowledge in
There are strong arguments for reproducibility
the analysis of Big Data. Although reproducibil-
in quantitative analysis of human geography
ity in research involving Big Data analysis
data – not just for academics, but also for public
would not fully address any of these issues, it
agencies and private consultancies charged with
may be argued that it can provide a foothold.
analysing data that may influence policy. Achiev-
Giving precise details of assumptions in coding
ing this in some situations is clearly within reach,
(for example, what kinds of patterns are being
although there are also some challenges ahead, as
sought out by a particular data mining algo-
the diversity and volume of geographically refer-
rithm?) will certainly provide an entry point into
enced information increases. Arguably there is
dialogues addressing the issues raised above.
also a role for such methods in addressing the Big
Despite this, currently many examples of
Data Revolution. However, the adoption of repro-
reproducible research have used fairly ‘tradi-
ducible approaches does call for some changes in
tional’ approaches to data analysis, where a data
the practice of both researchers – in adopting
set consists of a static file containing a rectangu-
reproducible research practices – and publishers –
lar table of cases by variables. More complex
in providing a medium where reproducible
data poses less of a conceptual problem per se
documents may be easily submitted, handled
in terms of reproducibility – the challenge here
and distributed.
is to devise appropriate analytical methods, but
if that can be achieved then code can be created Declaration of Conflicting Interests
and reproducible research can be carried out in
The author(s) declared no potential conflicts of inter-
the ways outlined above. Similarly, diversity est with respect to the research, authorship, and/or
of applications presents no further conceptual publication of this article.
difficulties for reproducibility. However, the
real-time aspect does provide some challenges – Funding
clearly, even with the same code, two people The author(s) received no financial support for the
accessing the same data stream at different research, authorship, and/or publication of this
points in time will not obtain identical results. article.
One possibility might be to acknowledge that
data used in a given publication is a static entity References
consisting of data obtained from a stream at a Anderson C (2008) The end of theory: The data deluge
given point in time – and to time stamp and makes the scientific method obsolete. Wired. Available
archive the data obtained and used in analysis at: http://www.wired.com/science/discoveries/maga-
zine/16-07/pb_theory (accessed 22 July 2015).
at the moment it was carried out. Although it
Ballas D, Clarke G, Dorling D, Eyre H, Thomas B and
would be impossible for a third party to obtain
Rossiter D (2005) SimBritain: A spatial microsimula-
identical data from the stream, and consequently tion approach to population dynamics. Population,
impossible to obtain identical analytical results, Space and Place 11(1): 13–34.
it would at least be possible to see the code used Barni M, Perez-Gonzalez F, Comesaña P and Bartoli G
to access the stream, note the time the stream (2007) Putting reproducible signal processing into
was accessed, and access a copy of the data practice: A case study in watermarking. Proc. IEEE
obtained at that time. This would also enable International Conference on Acoustics, Speech and
Brunsdon 695

Signal Processing. Available at: http://gpsc.uvigo.es/ Kitchin R (2014a) Big Data, new epistemologies and para-
sites/default/files/publications/icassp07reproducible. digm shifts. Big Data & Society 1(1). DOI: 10.1177/
pdf (accessed 22 July 2015). 2053951714528481.
Bergmann L (2013) Bound by chains of carbon: Ecological- Kitchin R (2014b) The Data Revolution: Big Data, Open
economic geographies of globalization. Annals of Data, Data Infrastructures and Their Consequences.
the Association of American Geographers 103(6): London: SAGE.
1348–70. DOI: 10.1080/00045608.2013.779547. Knuth D (1984) Literate programming. Computer Journal
Brunsdon C and Comber A (2015) An Introduction to R for 27(2): 97–111.
Spatial Analysis and Mapping. London: SAGE. Koenker R (1996) Reproducible Econometric Research.
Brunsdon C and Singleton A (2015) Reproducible Department of Econometrics, University of Illinois.
research: Concepts, techniques and issues. In: Bruns- Leisch F (2002) Dynamic generation of statistical reports
don C and Singleton A (eds) Geocomputation: A Prac- using literate data analysis. In: Härdle W and Rönz B
tical Primer. London: SAGE, 254–64. (eds) Compstat 2002: Proceedings in Computational
Buckheit JB and Donoho DL (1995) WaveLab and Repro- Statistics. Heidelberg: Physika Verlag, 575–580.
ducible Research. Tech. Rep. 474, Dept. of Statistics, Lovelace R and Ballas D (2013) ‘Truncate, replicate, sam-
Stanford University. ple’: A method for creating integer weights for spatial
Claerbout J (1992) Electronic documents give reprodu- microsimulation. Computers, Environment and Urban
cible research a mew meaning. In: Proc. 62nd Ann. Systems 41: 1–11.
Int. Meeting of the Soc. of Exploration Geophysics, Mayer-Schonberger V and Cukier K (2013) Big Data: A
601–604. Revolution That Will Change How We Live, Work and
Clarke M and Holm E (1987) Microsimulation methods in Think. London: John Murray.
spatial analysis in planning. Geografiska Annaler McKinney W (2012) Python for Data Analysis: Data
Series B, Human Geography 69(2): 145–164. Wrangling with Pandas, NumPy, and IPython. New
Gentleman R and Temple Lang D (2004) Statistical anal- York: O’Reilly.
yses and reproducible research. Bioconductor Project: Miller HJ and Goodchild M (2014) Data-driven geo-
Working Paper 2. graphy. GeoJournal. DOI: 10.10007/s10708-014-
Heppenstall A, Crooks A, See L and Batty M (2012) 9602-6.
Agent-Based Models of Geographical Systems. New O’Neill C and Schutt R (2013) Doing Data Science:
York: Springer. Straight Talk from the Frontline. New York:
Herndon T, Ash M and Pollin R (2013) Does high public O’Reilly.
debt consistently stifle economic growth? A critique Openshaw S and Taylor PJ (1979) A million or so cor-
of Reinhart and Rogoff. Cambridge Journal of relation coefficients: Three experiments on the
Economics 38: 257–279. modifiable areal unit problem. In: Statistical Appli-
Hey T, Tansley S and Tolle H (2009) Jim Gray on cations in the Spatial Sciences 21. London: Pion,
eScience: A transformed scientific method. In: Hey 127–144.
T, Tansley S and Tolle K (eds) The Fourth Paradigm: Parker J and Epstein J (2011) A distributed platform for
Data-Intensive Scientific Discovery. Redmond: Micro- global-scale agent-based models of disease transmis-
soft Research. Available at: http://research.microsoft. sion. ACM Transactions on Modeling and Computer
com/en-us/collaboration/fourthparadigm/4th_paradigm_ Simulation 22(1). DOI: 10.1145/2043635.2043637.
book_jim_gray_transcript.pdf (accessed 22 July 2015). Pastell M (2014) Pweave: Reports from data with Python.
Kelling S, Hochachka WH, Fink D, Riedewald M, Available at: http://mpastell.com/pweave/docs.html
Caruana R, Ballard G and Hooker G (2009) Data- (accessed 22 July 2015).
intensive science: A new paradigm for biodiversity R Core Team (2015) R: A Language and Environment for
studies. BioScience 59(7): 613–20. DOI: 10.1525/ Statistical Computing. Vienna: R Foundation for Statis-
bio.2009.59.7.12. tical Computing. Available at: http://www.R-project.
Kitchin R (2014) Big Data and human geography: Oppor- org/ (accessed 22 July 2015).
tunities, challenges and risks. Dialogues in Human Radiant News (2015) Introducing Radiant: A shiny inter-
Geography 3(3): 262–267. face for R. Available at: http://www.r-bloggers.com/
696 Progress in Human Geography 40(5)

introducing-radiant-a-shiny-interface-for-r-2/ (accessed Rey S (2015) Python Spatial Analysis Library (PySAL):


22 July 2015). An update and illustration. In: Brunsdon C and Single-
Ramsey N (1994) Literate programming simplified. IEEE ton S (eds) Geocomputation: A Practical Primer.
Software 11(5): 97–105. London: SAGE, 233–254.
Reinhart CM and Rogoff KS (2010) Growth in a time Vandewalle P, Kovačević J and Vetterli M (2009) Repro-
of debt. American Economic Review: Papers and ducible research in signal processing. IEEE Signal
Proceedings 100(May): 573–578. Processing Magazine 26(3): 37–47.
Reinhart CM and Rogoff KS (2013) Errata: Growth in a Wood J, Isenberg P, Isenberg T, Dykes J, Boukhelifa
time of debt. Harvard University, 5 May. Available at: N and Slingsby A (2012) Sketchy rendering for
http://www.carmenreinhart.com/user_uploads/data/36_ information visualization. IEEE Transactions on
data.pdf (accessed 22 July 2015). Visualization and Computer Graphics 18(12):
Ren M and Karimi HA (2012) A fuzzy logic map matching 2749–2758.
for wheelchair navigation. GPS Solutions 16: 274–282. Xie Y (2013) Dynamic Documents with R and Knitr. New
DOI: 10.1007/s10291-011-0229-5. York: Chapman and Hall CRC.

You might also like