0% found this document useful (0 votes)
93 views8 pages

R y Phreeqc PDF

Uploaded by

Neider Burgos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views8 pages

R y Phreeqc PDF

Uploaded by

Neider Burgos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Available online at www.sciencedirect.

com

ScienceDirect
Energy Procedia 40 (2013) 464 – 471

European Geosciences Union General Assembly 2013, EGU


Division Energy, Resources & the Environment, ERE

Coupling R and PHREEQC:


efficient programming of geochemical models
Marco De Lucia*, Michael Kühn
GFZ German Research Centre for Geosciences, Section 5.3 Hydrogeology, Telegrafenberg, 14473 Potsdam, Germany

Abstract

We present a new interface between the geochemical simulator PHREEQC and the open source language R. It
represents a tool to flexibly and efficiently program and automate every aspect of geochemical modelling. The
interface helps particularly to setup and run large numbers of simulations and visualise the results. Also profiting of
numberless high-quality R extension packages, performing sensitivity analysis or Monte Carlo simulations becomes
straightforward. Further, an algorithm to speedup reactive transport simulations starting from homogeneous or zone-
homogeneous state is programmed and successfully evaluated through the interface. It proved effective and could
therefore be included in any reactive transport simulator.

© 2013
© 2013 The Authors.
The Authors. Published
Published byLtd.
by Elsevier Elsevier Ltd. under CC BY-NC-ND license.
Open access
Selection and/or
Selection and peer-review
peer-review under responsibility
under responsibility of the GFZof the GFZ
German German
Research CentreResearch Centre for Geosciences
for Geosciences

Keywords: geochemical modelling; PHREEQC; language R; sensitivity analysis; automation; reactive transport;

1. Introduction

PHREEQC [1] is a widely used multi-platform open source software for geochemical calculations. It
represents the tool of choice for many researchers and practitioners for a broad set of geochemical
problems. Its open source nature, the flexibility to program arbitrary kinetic laws for the chemical
reactions, as well as a thorough implementation of the Pitzer formalism for concentrated solutions explain
its success and longevity in many branches of hydrogeochemistry and geochemical modelling. Graphical
interfaces have being developed over the years to accommodate for interactivity while performing

* Corresponding author. Tel.: +49 331 288-2829; fax: +49 331 288-1529.
E-mail address: [email protected]

1876-6102 © 2013 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
Selection and peer-review under responsibility of the GFZ German Research Centre for Geosciences
doi:10.1016/j.egypro.2013.08.053
Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471 465

geochemical modelling, freeing the modeller from the need to edit text-based input scripts. Through such
interfaces it is possible to perform the usual tasks and to achieve a graphical evaluation of the results in a
user-friendly manner. However, a primary need for more advanced modelling is not satisfied by such
graphical interfaces: the flexible and automatable setup of large numbers of simulations - for example
exploring wide ranges of conditions or Monte Carlo simulations - and non-interactive data input: the
graphical interfaces require the user to manually input the data, which can result in a tedious, error-prone
and cumbersome handcraft as soon as the number of required entries increase. Moreover, a full
programmable geochemical library, coupled to a high-level language, opens a broad spectrum of

involving geochemical modelling, including reactive transport. The developers of PHREEQC have put
effort into this issue, by including in recent releases of the software starting with version 2.18
coupling capabilities [2]. In practice however, the potential of this development is not yet fully realised
since, to our knowledge, there is no counterpart library for any high-level language providing out-of-the-
box a set of convenient functions to really take advantage of the coupling.
GNU R [3] is a multi-platform free software environment and programming language for statistical
computing and graphics which enjoys a large diffusion in the scientific community and is often referred
to as one of the most successful open-source collaborative projects following a number of evaluation
criteria [4]. Its powerful and flexible numerics and visualisation capabilities, on top of an easily extensible
architecture, attracted a huge user base, which in turn generated over the last years an impressive number
of user-contributed high quality extension packages, standing out as de facto common language for many
applications of mathematics and statistics, including among others big data analysis, bioinformatics,
chemometrics, ecological modelling, geostatistics. State-of-the-art algorithms such as Markov Chain
Monte Carlo simulation, Bayesian estimation, advanced experimental design, multivariate sensitivity and
uncertainty analysis are readily available for the modellers. These characteristics made it our environment
of choice for the integration of geochemical modelling through a programming interface to PHRE
functionalities.
The development originated in the framework of the research project CLEAN [5-6], which
investigated the feasibility of enhanced gas recovery combined with CO 2 storage in a depleted gas
reservoir. Thanks to the interface we were able to successfully perform a number of tasks in an efficient
manner, such as database comparisons and multivariate sensitivity analysis of models, as exemplary
showed in section 3 of the paper. It is important to stress out that through such an interface it is possible
to automate virtually all aspects of geochemical modelling, including programming full-fledged reactive
transport simulators. Of course a high-level interpreted language such as R should not be the tool of
choice for large scale massively parallel reactive transport simulations, for which low-level compiled
languages like C/C++ or FORTRAN allow to attain higher levels of computational efficiency, however at
a much higher cost in terms of programming effort and flexibility. Due to its compact syntax and the large
amount of available additional software, a high-level language such as R is an ideal tool wherever the
speed of development is decisive: for example to prototype and evaluate methods and algorithms
involving geochemical modelling. We present a showcase in section 4, where a strategy to reduce the
computational time of reactive transport simulations is outlined, implemented through the R/PHREEQC
interface and benchmarked.

2. Implementation details

The interface between R and PHREEQC is a platform-independent solution providing high-level


functions to setup and run geochemical simulations, collect and manipulate the results with a minimum
effort from the modellers, so that simple tasks can be ideally performed with little to no programming and
466 Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471

more complex tasks can be efficiently coded based on the provided fundamental building blocks. The
high-level functions create and manipulate standard PHREEQC input scripts, which are then evaluated by
a tight coupling with PHREEQC. Therefore a good familiarity with both R and PHREEQC syntax is still
required to fully benefit from the coupling.
The development of the interface started back in 2009, based on PHREEQC version 2.17 (svn 4799),
which is the last version not providing a coupling module. A porting to the recent, almost completely
rewritten release 3 of PHREEQC, possibly adopting its new built-in coupling capability, is under
evaluation at the time of this writing. However this shall remain completely transparent to the end-user.
In its current state, the interface consists of minor modifications in PHREEQC's C source code, only
pertaining data I/O. PHREEQC itself is packaged as a library, and together with an appropriate entry
function and the corresponding headers is then linked by R's executable. On the R side a set of low level
functions ensure the correct formatting of the input and output and the calling, while another set of R
routines provide the fundamental functionalities needed to setup and run the simulations, manipulate the
input scripts, extract the results and do some visualisation. The input for the calculations takes the form of
a regular PHREEQC input script stored in a string vector, one line per element, while the results of
simulations are returned as standard R objects such as lists, data frames and named matrices. At runtime
all communications between the main R process and the underlying PHREEQC library happen in RAM,
with no need of files written on disk, which greatly enhances the speed of extensive calculations.
Optionally, the user can enable the usual forms of PHREEQC's output, e.g., formatted ASCII output or
BASIC-programmable "PUNCH", also for debugging purposes. Since the input manipulated through the
high-level functions is a syntactically valid PHREEQC script, the user can store it in a normal file at any
time and run a stand-alone PHREEQC-version on it. This deferred evaluation mechanism is particularly
useful for long calculations, which are possibly not suited for the interactivity offered by the R

into R, thus completing the infrastructure for the deferred evaluation. Note that any kind of outputs -
provided the simulations comply with the supported functionalities (see below) - can be imported into R
this way.

interface, namely those expressed in blocks SOLUTION, EQUILIBRIUM_PHASES and KINETICS. To


achieve reactions or solution mixing based on these building blocks is a trivial programming effort.
Thanks to the design choices of the interface, in particular the deferred evaluation mechanism, virtually
all capabilities are accessible through the interface with additional programming.
Implemented among others are high-level functions to:
replicate simulations with varying parameters (i.e. aqueous concentrations or kinetic parameters);
add or delete mineral phases or elements from the simulations;
read into R standard PHREEQC output files, including simulations not run through the interface;
extract from a set of simulations the variables of interest;
manipulate thermodynamical databases;
automatically transform an output into a new input (i.e. restart kinetic simulations or emulate
built-in SAVE/USE directives).
Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471 467

The interface is platform-independent like both R and PHREEQC, which are available for all major
operating systems. It is already used on daily basis on GNU/Linux (several vendors) and regularly tested
on Windows systems. Even if this aspect is not directly related to the interface, it has to be stressed that
the latter system does not dispose of a comparable amount of tools, particularly regarding parallelisation
and ease of integration with High Performance Computing environments, thus hampering the achievable
performances in case of heavy calculations.

3. Application: multivariate sensitivity analysis of thermodynamic database entries

The phenomenological model PHREEQC is based on the Law of Mass Action and requires a
collection of parameters depicting the points of equilibrium of reactions and, optionally, additional
parameters for the calculation of activities following an extended Debye-Hückel or Pitzer [7] model. Such
. All those parameters are affected by
important errors and uncertainties: measurement errors, different ranges of validity of the data (pressure,
temperature and ionic strength), lack of consistency for data derived from different sources or,
particularly regarding the Pitzer activity model for concentrated solutions, incompleteness of the database
and lack of temperature dependence of parameters. The uncertainty associated with the databases is
usually disregarded by the end users, partly because of the large number of different parameters which
influence the results and the complexity of the models, partly because the same databases do not include
the variances of parameters.
On the other hand, several practical applications of geochemical modelling (i.e. geothermal energy,
CO2 storage) are at the limits or even outside the theoretical validity range of the available underlying
data [8], for example due to very high ionic strengths of the formation fluids or high temperatures, thus
introducing non negligible errors in the results; lacking the experimental data for complex systems, the
predictions of geochemical models cannot be directly validated. A way to at least quantify the expected
uncertainty of the results directly originating from the databases is offered by sensitivity analysis: for a
given geochemical model, the parameters of the database are varied by small predetermined quantities
and the variations in the calculated results build up confidence intervals. However, even for medium-
complex geochemical models, the number of underlying parameters which need to be explicitly taken into
account for sensitivity is large (easily 30-40), thus requiring efficient algorithms of experimental design
combined with multivariate statistical analysis of the outcomes to keep the computing effort limited. In
presence of many different parameters, One At a Time (OAT) methods such as the method of elementary
effects of Morris [9] are only suitable for quickly screening the entire parameters space, in order to rank
their importance. A complete multivariate Global Sensitivity analysis, which is computationally much
more intensive since it takes into account cross-correlations and co-dependencies, can be afterwards
performed including only the most influent parameters, but this time varying multiple parameters
simultaneously according to a more elaborate strategy for the sampling of the parameters space, for
example by a latin hypercube [10].
The R environment is ideal for this task. Making use of packages such as sensitivity [11] and FME [12]
the modeller has already available state-of-the-art routines to perform the analysis. The only requirement
is to write R functions to parse and manipulate the databases for applying the variations to the selected
entries and to collect and extract the outcomes of the PHREEQC calculations in a suitable way for use in
conjunction with the above mentioned packages.
In this way we were able to perform a sensitivity analysis to database errors for a real-life equilibrium
model describing the effects of CO2 injection in a depleted gas reservoir. The model covered all major
elements and five equilibrium phases, besides CO2, including alumo-silicates K-feldspar and albite,
redox-sensitive mineral haematite and cements anhydrite and calcite. Being the investigated formation
468 Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471

fluid an extremely saline brine, the Pitzer activity model (and a corresponding database) needed to be
used [8]: the EQ3/6 database [13], translated into PHREEQC format by Quintessa ltd [14]. The Pitzer
model requires up to eight different coefficients for each pair and triplet of anions, cations and neutral
dissolved species, leading to a multiplication of the parameters to be comprised in the sensitivity analysis:
in this case 39 parameters, of which 8 were equilibrium constants (log K) for aqueous and mineral
reactions. Applying the OAT screening method of Morris [9], 14 distinct parameters were selected as
being the most influent considering three different responses. A successive Global Sensitivity including
only those 14 database parameters was run having arbitrarily set the maximum amplitude of the
perturbations to ±5 %, which can even be considered optimistic considering the elevated temperature
(120 °C) and concentration of the initial aqueous solution. The results, shown in Fig. 1, highlight the
relevant propagation of variance throughout the chemical model, depending on which outcome parameter
was considered, since: the uncertainty could even reach 20 %. The whole exercise, in terms of
programming, required around 200 lines of code.

Fig 1. quantification of uncertainty of the models based on perturbations of 5 % max of database entries. Even for medium
complex systems, the calculated responses can show an uncertainty envelope of about 20 %

4. A strategy to speedup reactive transport simulations

A further illustratory example for the usefulness of a programmable interface to PHREEQC is


represented by the implementation and evaluation of an algorithm to speedup operator splitting reactive
transport simulations. Most simulations begin namely with initially homogeneous or zone-homogeneous
media. Typically, a reaction front forms and moves along the domain following the injection of a reactive
solution, and backwards, after enough reactant has flushed the system, the medium has consumed all its
reactive potential and becomes inert. Therefore, it is expected that reactions only occur in a defined
moving window and not on the whole domain, the amplitude of such window depending on transport
properties (diffusion/dispersion) and reaction kinetics. This consideration can be used to restrict the
evaluation of chemistry to a definite portion of the simulation domain; but the idea can be pushed further.
Consider an initially homogeneous 1D medium flushed at one inlet with a reactive solution. If the
Darcy velocity of the fluid is small enough, after the first transport step only the first element of the grid
is reached by the reactive solution and thus displays dissolved concentrations different from the others.
Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471 469

This actually means that the completion of the first transport-chemistry iteration is reached by solving two
distinct chemical systems, one for the first grid element, reached by the inflow solution and one for the
rest of the column. At a given following time step, also the second element is reached by the injected
solution, thus possibly arising a third unique system that needs to be evaluated; and so on. The number of
varies dynamically along the progress of the simulation, requiring a new search and
thus CPU-time for each iteration, but gives the smallest possible amount of distinct chemistry simulations
that fully determine the state of the whole domain at a given time. Therefore the CPU-time for it must be
outweighted by the CPU-time spared by eliminating duplicated chemical simulations.
In mathematical terms, the state S of the domain at a time t*, after the transport, but before the reactive
step, is fully represented by a matrix where each row represents a grid element and the columns the m
concentrations of solute elements C and l mineral phases P:

(1)

In the following we will call problem a row of the state matrix. The point of the proposed speedup is to
find the smallest possible number of unique problems before each chemistry iteration. Thus, a
compressed state Sc can be defined as:

(2)

where Ic is an n-elements vector which maps S back to Sc. Typically the number of rows of the Sc
matrix is much smaller than the number of grid elements r, or r/n << 1, although this ratio will increase
during the progress of the simulation. After the compression, there is only need to run r distinct
simulations, which will then be distributed back to the whole state matrix before the next transport step.
In terms of algorithms, the search for unique problems is the same as searching for unique occurrences of
each row in a matrix, which is not a trivial task. An efficient implementation is represented by the
uniquecombs() function provided by the R extension package mgcv [15].
The implementation of this compression algorithm for testing purposes through the R/PHREEQC
interface apart from a trivial transport function was achieved in about 50 lines of R code, including
parallelisation. The test consisted in taking the 1D reactive transport problem calcite_pqc from the
OpenGeoSys book of benchmarks [16] and measuring the CPU-time to run each chemistry iteration with
or without parallelisation, using or not the new algorithm, and with different refinements of the same
column, discretized in 500, 1000, 2000, 5000 and 10000 elements. Fig. 2 (a) shows a log-log plot of the
average CPU-time for the chemistry iterations as function of grid size and number of CPUs used; the two
families of lines discriminate the use of the compression algorithm, which clearly appears to be of great
advantage over the normal run. The speedup obtained through the compression becomes less important
when multiple CPUs are used in parallel, however it clearly outperforms the speedup achievable by
parallelisation alone.
470 Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471

(a) (b)
Fig. 2. (a) average CPU-time for chemistry iterations with or without compression, as function of number of CPUs and grid size
on a 1D initially homogeneous benchmark problem; (b) details about the behavior of simulations using the compression, one point
per time step, chemistry step alone. The number of unique problems increases along with the progression of the reaction front,
reaches a peak and then decreases again. The correlation with the overall CPU-time needed to perform the chemistry step is evident,
as well as the relative magnitude of the other tasks, namely dispatching (and retrieving) the simulations across multiple CPUs and
the search algorithm itself

Fig. 2 (b) shows the actual behaviour of one exemplary simulation (2 CPUs, 500 grid elements) when
the compression is activated, to profile the required computational efforts at each time step (each point
represents one time step). The CPU-time needed for the entire chemistry run (forming the input for
PHREEQC, running the simulations and collecting back the results to prepare for the next transport step)
is compared to the CPU-time needed for the compression and restoration at the end of the chemistry and
for the splitting and dispatching the simulations on multiple CPUs. The overhead due to the compression
and restoration is clearly of second order compared to the chemistry time; and the overhead produced by
splitting and dispatching the simulations for parallelisation, even if increasing with the number of unique
problems found, shows the same increase. The latter is clearly compensated by the speedup given by the
parallelization, although it is also clear that the parallelisation itself is justifiable only above a threshold of
unique simulations. Such threshold depends on the nature and complexity of the chemical system and has
to be determined for each case.
Note that the considered benchmark makes use of the hypothesis of local equilibrium for chemistry,
but the whole procedure and algorithm can also be applied to kinetic simulations, possibly considering the
reactive surfaces along the mineral concentrations as further columns of the matrix state S, depending on
the implemented kinetic law.

5. Conclusion

Automation of computational tasks involving geochemical modelling can be achieved thanks to the
newly developed interface between PHREEQC and the high-level language and environment R. The
interface was successfully used to perform multivariate sensitivity analysis, manipulation and comparison
of databases or programming and evaluating algorithms. In particular, it was possible to demonstrate that
a relevant speedup in operator splitting reactive transport simulations for the favourable but not infrequent
case of initially homogeneous medium can be reached simply by scanning the simulation grid, at each
time step, for the non-unique occurrences of distinct chemical systems; the demonstration of this speedup
required a moderate and thus quick programming effort.
Marco De Lucia and Michael Kühn / Energy Procedia 40 (2013) 464 – 471 471

The interface is not yet publicly available on a structured R repository but can be obtained by
contacting the authors.

Acknowledgements

This study was part of the joint research project CLEAN, funded by the German Federal Ministry of
Education and Research (BMBF) within the framework of the geoscientific research and development
This is publication GEOTECH-2099.

References

[1] Parkhurst D, Appelo C. Users guide to PHREEQC (version 2). Tech. rep., U.S. Geological Survey, 1999.
[2] Charlton S, Parkhurst D. Modules based on the geochemical model PHREEQC for use in scripting and programming
languages. Computers & Geosciences 2011; 37: 1653 1663.
[3] R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna,
Austria. 2013. ISBN 3-900051-07-0, URL http://www.R-project.org/.
[4] Fox J. Aspects of the Social Organization and Trajectory of the R Project. The R Journal 2009; 1(2): 5-13.
[5] Kühn M, Münch U. CLEAN: CO2 Large-Scale Enhanced Gas Recovery. GEOTECHNOLOGIEN Science Report, No. 19.
Series: Advanced Technologies in Earth Sciences, 199 p, ISBN 978-3-642-31676-0, 2013.
[6] Kühn M, Tesmer M, Pilz P, Meyer R, Reinicke K, Förster A, Kolditz O, Schäfer D, CLEAN Partners. CLEAN: project
overview on CO2 large-scale enhanced gas recovery in the Altmark natural gas field (Germany). Environ. Earth Sci. 2012; 67(2):
311-321. doi: 10.1007/s12665-012-1714-z.
[7] Pitzer, K.S. Thermodynamics of electrolytes. I. Theoretical basis and general equations. The Journal of Physical Chemistry
1973; 77(2): 268-277.
[8] De Lucia M, Bauer S, Beyer C, Kühn M, Nowak T, Pudlo D, Reitenbach V, Stadler S. Modelling CO2-induced fluid-rock
interactions in the Altensalzwedel gas reservoir. Part I: from experimental data to a reference geochemical model. Environmental
Earth Sciences 2012; 67(2): 563-572.
[9] Morris M. Factorial sampling plans for preliminary computational experiments, Technometrics 1991; 33(2).
[10] Saltelli A, Ratto M, Tarantola S, and Campolongo F. Sensitivity Analysis for Chemical Models. Chem. Rev. 2005; 105:
2811-2827.
[11] Pujol G, Iooss B, Janon A. "sensitivity" package, http://cran.r-project.org/web/packages/sensitivity/index.html. 2013.
[12] Soetaert K, Petzoldt T. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of
Statistical Software 2010; 33(3): 1-28. http://www.jstatsoft.org/v33/i03/.
[13] Wolery T. EQ3/6, a software package for geochemical modeling of aqueous systems: Package overview and installation
guide (version 7.0) ucrl-ma-110662. Technical Report, Lawrence Livermore National Laboratory. 1992.
[14] Benbow S, Metcalfe R, Wilson J. Pitzer databases for use in thermodynamic modeling. Quintessa Technical Memorandum,
unpublished, 2008.
[15] Wood SN. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized
linear models. Journal of the Royal Statistical Society (B) 2011; 73(1): 3-36.
[16] Kolditz O, Görke UJ, Shao H, Wang W. Thermo-Hydro-Mechanical-Chemical Processes in Porous Media, Springer, 2012.

You might also like