0% found this document useful (0 votes)
32 views

Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

The document describes a new workflow that combines spatial data analytics, geostatistics, and optimization techniques to analyze sparse subsurface data sets. It demonstrates the workflow on 1,152 wells in the Duvernay Formation, simulating density-porosity and cosimulating total organic content constrained by density-porosity. The workflow helps reduce time and errors in decision making with sparse data through spatial modeling and assessing uncertainty with multiple realizations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

The document describes a new workflow that combines spatial data analytics, geostatistics, and optimization techniques to analyze sparse subsurface data sets. It demonstrates the workflow on 1,152 wells in the Duvernay Formation, simulating density-porosity and cosimulating total organic content constrained by density-porosity. The workflow helps reduce time and errors in decision making with sparse data through spatial modeling and assessing uncertainty with multiple realizations.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

PETROPHYSICS, VOL. 64, NO. 2 (APRIL 2023); PAGES 287–302; 10 FIGURES, 5 TABLES. DOI:10.

30632/PJV64N2-2023a9

Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study


Jose J. Salazar1,5, Jesus Ochoa2, Léan Garland3, Larry W. Lake1, and Michael J. Pyrcz1,4

ABSTRACT

Data analytics facilitate the examination of spatial data plots to evaluate the goodness of the results at each step.
sets by using multiple techniques to find and understand It is also semiautomatic because it leverages the user’s
patterns to guide decision making. However, standard judgment for subsequent operations. For optimization, the

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
data analysis tools assume that the data are independent workflow uses Bayesian optimization and evolutionary
and identically distributed, an assumption that spatial data algorithms.
sets usually do not fulfill. Furthermore, the usual methods We demonstrate the use of the workflow by analyzing
neglect spatial continuity and the inherent data paucity that 1,152 wells over the Duvernay Formation in Canada.
should be considered in the data analytics workflow. The examples include the simulation of density-porosity
We present a new approach that combines data as the secondary feature and the cosimulation of total
analytics, geostatistics, and optimization techniques organic content constrained by the former. The proposed
to provide an end-to-end workflow to analyze two- workflow helps focus more on interpreting the results than
dimensional (2D) data sets. The proposed workflow the modeling parameters, reducing workforce time and
identifies outliers based on their spatial location or subjective errors. Moreover, the spatial simulation includes
distribution, models geological trends using a Gaussian multiple realizations to assess uncertainty and support
kernel, models the semivariogram, and performs sequential decision making in data paucity scenarios. Overall, the
Gaussian simulation applying kriging or cokriging for proposed workflow is a valuable and complementary tool
cosimulation. Moreover, it provides metrics and diagnostic for evaluating uncertainty in mature geospatial data.

INTRODUCTION in biased spatial estimates, biased uncertainty models, and,


ultimately, suboptimum decision making.
Given the economic, accessibility, and other difficulties We propose an end-to-end workflow that efficiently
related to data collection, most subsurface studies work combines spatial data analytics, geostatistical knowledge,
with sparse or low-resolution sample data. The sampling and optimization methods for geostatistical simulation and
campaigns seldom obtain enough samples for statistical diagnostics of spatial data sets to overcome the previously
representativity because of project economic drivers mentioned issues. Robust integration of geostatistics is
favoring delineating regions that are believed to significantly essential to address these challenges to provide spatial
impact extraction and resource exploitation (Pyrcz and models with accurate estimates as well as accurate and
Deutsch, 2014). Moreover, subsurface data, having a precise uncertainty models (Pyrcz and Deutsch, 2014;
geological origin, generally exhibit spatial correlation and Barnett et al., 2018). We now summarize these prerequisites.
trends that exclude the application of naïve statistics or out-
of-the-shelf machine-learning algorithms (Lovelace et al., Spatial Mapping Concepts
2019; Liu and Pyrcz, 2021; Salazar et al., 2022). Ignoring the A critical judgment in geostatistics is whether to
impact of data paucity, the geospatial context, and departure decide stationarity is present in the area of interest. The
from independent, identically distributed data often result semivariogram, a statistic commonly applied in spatial

Manuscript received by the Editor July 24, 2022; revised manuscript received September 14, 2022; manuscript accepted October 10, 2022.
1
Hildebrand Department of Petroleum and Geosystems Engineering, Cockrell School of Engineering, The University of Texas at Austin, USA,
[email protected], [email protected], [email protected]
2
Technology, Digital & Innovation, Equinor US, Houston, USA, [email protected]
3
Exploration & Production International, Equinor ASA, Oslo, Norway, [email protected]
4
Department of Geological Sciences, Jackson School of Geosciences, The University of Texas at Austin, USA, [email protected]
5
Facultad de Ingeniería en Ciencias de la Tierra, Escuela Superior Politécnica del Litoral, ESPOL, Campus Gustavo Galindo Km. 30.5 Vía
Perimetral, P.O. Box 09-01-5863, Guayaquil, Ecuador, [email protected]

287 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

models, describes the degree of spatial correlation of the Sequential Gaussian simulation overcomes the kriging
variable with itself as half the variance of values separated limitations by 1) creating stochastic realizations in modeling
by a lag vector h. The equation for the semivariogram is: that reproduce the global distribution, 2) replicating the
spatial continuity from Step 1, and 3) honoring the data at
(1) the sample locations (Pyrcz and Deutsch, 2014; Ma, 2019).
First, the feature is transformed to standard normal, and a
random path through all the cells of interest is assigned.
where Z is the feature of interest, h is the lag vector that Then, at each cell, Monte Carlo simulation is applied to draw
separates the samples, and N(h) is the number of available a local realization from a Gaussian distribution composed

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
data pairs. of the simple kriging’s estimate and the kriging variance to
Geostatistical techniques for spatial estimation (e.g., ensure the reproduction of the variance. Third, the simulated
simple kriging) and spatial simulation (e.g., sequential values are sequentially added to the data to reproduce the
Gaussian simulation) assume the feature of interest’s correct autocovariance between the simulated realizations.
histogram and variogram are stationary (i.e., invariant Finally, all model cells are visited on a random path, and
under translation in the domain of interest or geographical the realizations are back-transformed to the original feature
location). However, it is common for subsurface features to distribution.
exhibit geological trends (e.g., a local shift in the feature’s Modeling the subsurface often involves analyzing
mean). Regardless of the procedure to model the trend, it is more than one feature. A common practice is to cosimulate
essential to include geostatistical theory to obtain a reliable the primary feature of interest constrained to a secondary
trend model (Salazar and Pyrcz, 2021). For instance, for feature. One cosimulation method is collocated cokriging
nonstationary means, the standard geostatistical workflow to prioritize the reproduction of the primary feature’s
decomposes trend and residual components that account for histogram and variogram while maintaining Pearson’s
the variance’s additivity (Isaaks and Srivastava, 1989; Pyrcz correlation coefficient with the collocated secondary feature.
and Deutsch, 2014). The trend component is deterministic Cosimulation uses the Markov-Bayes assumption to avoid the
and assumed to be known, and the residual part is considered computation of the secondary variogram for cokriging. Only
stochastic and stationary. One technique to model the trend the collocated secondary feature is considered significant
component is convolution using moving window averaging, with the Markov screening assumption. Bayesian updating
where the window scans the region of interest and assigns calculates the cross variogram as rescaling the primary
the kernel weighted average to the centroid of the window. feature variogram by the correlation coefficient (Deutsch
Given data paucity and biased sampling commonly and Journel, 1997; Pyrcz and Deutsch, 2014). The cost of the
encountered with subsurface applications, the computation Markov screening may be a larger (i.e., inflated) variance
of an experimental semivariogram entails two caveats for the simulated realizations. Thus, an ad hoc solution
(Chilès and Delfiner, 2012; Pyrcz and Deutsch, 2014; multiplies the distributions by a variance reduction factor
Ma, 2019). First, the experimental semivariogram fails (Deutsch and Journel, 1997). Nonetheless, the results are
to fulfill the semipositive definite condition (specifically, better than simulating the variables independently because
the variogram must be conditionally negative definite). of the replication of the linear relationship. The spatial data
Second, spatial interpolation methods (e.g., kriging and analytics analysis could benefit from using optimization
stochastic simulation) require variogram values at any lag techniques to select the best modeling parameters and
distance or direction. Therefore, geoscientists and engineers remove feature outliers to obtain more reliable and less
must fit a positive definite variogram model guided by variable results.
experimental values and expert judgment to characterize
the feature’s spatial continuity. Although simple kriging is Optimization Techniques
an exact interpolator at sample locations, spatial mapping Geostatistical modeling requires careful model
using simple kriging conveys maps that are too smooth parameter selection in different workflow steps (e.g.,
(Pyrcz and Deutsch, 2014). The smoothness is because the variogram modeling, simulation). Moreover, the multivariate
kriging estimates are locally optimum (Ma, 2019), and the nature and geostatistical constraints add complexity to the
smoothing is directly proportional to the kriging variance. solution space requiring optimization methods to solve
Moreover, kriging maps are deterministic, making them the problem. Optimization involves a criterion function
incompatible with uncertainty analysis (Jensen et al., 2000; that scores the quality of a solution and requires a search
Journel et al., 2000). algorithm to maximize or minimize the criterion function.

April 2023 PETROPHYSICS 288


Salazar et al.

However, it is infeasible to explore all the solution space tasks and data sets. The algorithms suggest removing
for some problems where it is necessary to achieve a near- samples that could negatively impact the statistical analysis
optimal solution. Ubiquitous near-optimal optimization (e.g., variogram calculation in Eq. 1) and, therefore, model
methods include evolutionary algorithms and Bayesian predictions. First, the Mahalanobis distance measures the
optimization. dissimilarity among samples by converting the original
Evolutionary algorithms are nature-inspired by biology feature to standard space, removing apparent correlations
based on natural selection and survival of the fittest (Burke and differences in standard deviations (Gyebnár et al., 2019).
and Kendall, 2014). First, the algorithms initialize a Therefore, Euclidian distances in the new standard space
population of solutions (randomly or carefully initialized), indicate its remoteness concerning the original multivariate

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
and the number of individuals is a parameter defined by distribution. The second algorithm classifies samples three
the user. Each individual’s model parameter is represented standard deviations away from the mean as outliers. The
as a gene and the combination of model parameters as third algorithm, the isolation forest, categorizes an instance
a chromosome. Second, the algorithms evaluate each as an outlier by measuring its predisposition to being isolated
individual’s fitness using a criterion function. Third, the (i.e., separating the instance from the remaining samples)
methods perform selection; discard those solutions whose (Liu et al., 2012). An outlier requires fewer partitions than
fitness is unsatisfactory. Fourth, the methods recombine non-outliers, and it is more likely to become isolated early
the surviving individuals’ genes emulating reproduction to in the partition process of most decision trees from a random
obtain better solutions (i.e., offspring). Fifth, the individuals forest.
can randomly modify an individual’s genes by mutation with The fourth algorithm is the elliptic envelope that
a recommended probability smaller than recombination. assumes the data follows a Gaussian distribution. The
Finally, the new population obtained from Steps 3 to 5 replace percentage of outliers requires a hyperparameter to fit an
the starting population. For a more detailed explanation ellipse to the central samples using a robust covariance
of the different methods for evolutionary algorithms, the estimate. Then, the algorithm measures the distance of the
authors refer the reader to Chopard and Tomassini (2018). samples to the central samples to estimate their outlier degree
Bayesian optimization approximates the criterion (Thomas and Judith, 2020). Finally, the last algorithm is the
function using a surrogate function that is cheaper to local outlier factor. It uses the distances from each sample
evaluate and based on sampled data. Potential regions to its k-nearest neighbors to compute a local density and
that optimize the criterion function can be identified, and classifies as outliers samples with a smaller neighbor density
therefore, more samples from those promising regions are than their k-nearest neighbors (Breunig et al., 2000).
sampled to optimize the surrogate model. Moreover, the
optimization uses an acquisition function that suggests the Duvernay Formation
following optimal parameters (Archetti and Candelieri, The Duvernay Formation was deposited in a sub-
2019). An often-used surrogate model is a Gaussian equatorial epicontinental seaway in the Late Devonian,
process, a probability distribution of functions defined by Frasnian time, and the maximum transgression of this Late
a mean function and a covariance function (i.e., the kernel). Devonian sea into the western Canadian craton (Fig. 1).
Therefore, the values of the criterion function are realizations The paleo-lows (within Leduc reefs’ boundaries) contain
that come from a multivariate Gaussian process, and any shale in a slope and basin environment. The outcome is a
unobserved parameter combination has a corresponding series of sub-basins’ deposits from west to east shale basins.
mean (prediction) and variance (uncertainty). The The depth ranges from 2,000 to 3,700 m, and the formation
acquisition function offers a tradeoff between exploration produces across the entire oil, condensate, and gas windows.
and exploitation from the predictions and uncertainties. The west shale basin harbors the greater Kaybob area, where
Specifically, it searches for the best parameters that the Duvernay Formation is the thickest, thinning to the
improve the criterion function’s quality. The optimization east. The mineralogy also changes from west to east. For
then updates the surrogate model based on the acquisition example, the east shale basin is less quartz-rich with higher
function parameters and repeats until convergence. clay and carbonate content than the west, a more silica-rich
region. In addition, there is a robust correlation between the
Outlier Identification silica content and the TOC (total organic content). Finally,
The proposed workflow relies on five outlier most Duvernay production comes from the west shale basin,
identification algorithms that are proven reliable for similar specifically the most developed Kaybob area.

289 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

The primary Duvernay data set contains 1,152 wells and


five features: unique well identifier, easting and northing
coordinates, TOC (wt%), and density-porosity (%). Density-
porosity (PHID) is a porosity measurement log based on
density tools that radiate gamma rays into the formation.
The gamma rays will collide with electrons in the formation
and disperse after subsequent collisions. The number of
collisions is related to the electron density and directly
relates to the bulk density. Hence, the density tools measure

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
the bulk density, a combined effect of the fluid and rock, to
compute PHID (Liu, 2017).
The property values used at the easting and northing
coordinates are the averages for each property and each
interval in the well path. Although the property is averaged
at the surface location, it is representative of that interval
in the subsurface. This simplification assumes the landing
interval of interest is continuous and the property variations
in space are minimal. Other suitable mapping techniques
use the bottomhole location instead of the surface location
Fig. 2—Matrix scatterplot of the TOC (wt%) and PHID log (%). A strong
or project the midpoint in the reservoir interval where the linear relationship exists between the two features (Pearson correlation
average property is taken. Table 1 details the descriptive coefficient = 0.85).
statistics of the two petrophysical properties; Figure 2 shows
its scatterplot matrix.
Previous Work and the Proposed Method
Previous work at the Duvernay Formation includes
geostatistics, machine learning, and optimization. Barnett et
al. (2018) discuss principles for the geostatistical analysis
of the relationship between sample spacing and uncertainty.
One of their conclusions is that simulation helps quantify
the uncertainty of net shale, average porosity, and average
TOC. Shen et al. (2019) use kriging to obtain 3D models of
horizontal compressions and pore pressures from 57 wells
to develop a predictive model of the complete set of stress
components within Duvernay. Similarly, Weides et al. (2013)
evaluate the spatial distribution of porosity and permeability
with ordinary kriging for geothermal exploration
applications. Furthermore, they recommend simulation
because it accounts for the uncertainty and provides different
realizations of the petrophysical properties.
Hamdi et al. (2021) perform Bayesian history matching
as an optimization problem. They accelerate the Markov-
Chain Monte Carlo of the posterior using kriging and an
Fig. 1—Duvernay Formation map showing computer-processed adaptive sampling algorithm. Their results optimized the
interpretation wells (CPI) in red and production wells in green. huff ‘n’ puff process in a Duvernay well. Although previous

Table 1—Descriptive Statistics of 1,152 Wells From the Duvernay Formation

April 2023 PETROPHYSICS 290


Salazar et al.

research highlights the importance of quantifying uncertainty 2. Plot the experimental variograms to detect trend
with geostatistics, there is a lack of end-to-end workflows structures quantitatively
that integrate data analytics and geostatistics and streamlines a. If a trend is present, model it using a moving
with optimization to assist with the inference. Gaussian window and evolutionary algorithms.
We propose a simplified and semiautomated workflow 3. Identify the direction of major spatial continuity
that leverages robust spatial data analytics, geostatistics using Bayesian optimization
theory and practice, and heuristics for geostatistical 4. Model the variogram models using evolutionary
modeling from a two-dimensional (2D) data set to spatial algorithms
mapping. The proposed workflow assists geoscientists and 5. Perform sequential Gaussian simulation:

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
engineers in identifying spatial and feature outliers, modeling a. If modeling an unconstrained feature, the required
geological trends if present, modeling the semivariogram inputs are the number of realizations and the
model, obtaining multiple realizations from simulation variogram model.
or cosimulation, and delivering diagnostic summary b. For cosimulation, the algorithm also requires
visualizations of the results. All the steps are subject to realizations from the secondary feature. If
thorough geostatistical and statistical model diagnostics necessary, optimize the variance reduction factor
and checking. Furthermore, the developed code includes an using Bayesian optimization.
optimization method to suggest a variance reduction factor 6. The proposed diagnostic visualizations include:
(i.e., a model parameter) to yield a variance from realizations a. Model checking to assess the minimum acceptance
similar to the input samples. The main advantages of the criteria
proposed workflow are: b. Result summarizing: visualize 2D maps of the
x Replace single best estimate with quantitative realizations of the P10, P50, P90, uncertainty,
representations of uncertainty through multiple expectation, and trend array (if present). Also, the
realizations, bringing geostatistical model-building local probability of exceedance map computes the
methods to the regional scale (i.e., 2D map) probability of surpassing a specific threshold at all
x Integrate linear correlation between primary and locations.
secondary features through cosimulation
x Reduction in workforce time using a semiautomatic Step 1 is optional but recommended. The proposed
approach through optimization. Therefore, the overall workflow uses the Mahalanobis distance and a confidence
process is efficient and reproducible, and errors are interval to suggest samples that could be outliers based on
less subjective. their spatial location. It identifies candidate data for removal
x The final results are compatible with other far from the central cluster of samples, contributing to
commercial software. automatically computing an optimal lag distance. The lag
distance is the average of the 1-nearest neighbor using the
The workflow is available as a Python code on GitHub Euclidian distance. The filtering of outliers from sample data
for reproducible research. The following section specifies results in reliable experimental semivariograms due to the
the geostatistical, data analytics, and optimization methods sensitivity of the squared feature difference in the variogram
used. Then, the results and discussion section shows the calculation (Eq. 1). Regarding the feature of interest, the
proposed workflow’s findings in the 1,152 wells from the proposed workflow runs four outlier identification
Duvernay Formation and analyzes the outcomes obtained. algorithms for comparison: standard deviation, isolation
Lastly, the conclusions break down the most critical findings forest, elliptic envelope, and the outlier factor; however, the
and offer suggestions for future work. last three require an estimate of the percentage of outliers.
The user can select any previous result to update the
METHODOLOGY subsurface data set.
In Step 2, plotting the experimental semivariograms
Our proposed method has the following steps: indicates the presence of trends. If the decision is to model
1. Identify and remove outliers based on their spatial a spatial trend, the user must compute the declustered mean
coordinates and feature values and variance before trend modeling. Among the multiple

291 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

available kernels, we select the Gaussian kernel because it The optimization finds the near-optimal ș2* dimensions
gives more importance to data closer to the centroid than of the Gaussian window and the rotation angle in degrees (0°
those further away, a common consideration for spatial on the y-axis). The dimensions are the standard deviations
prediction (Pyrcz and Deutsch, 2014). Figure 3 compares of the Gaussian kernel, equivalent to the grid cells. For the
the PHID trend maps for Duvernay using Gaussian, Ricker evolutionary algorithm, we require the number of initial
wavelet, top hat, and box kernels. Moreover, Table 2 solutions and generations to optimize the criterion functions.
describes the metrics used in the proposed workflow to Additionally, the dimension bounds of the Gaussian kernel
evaluate the quality of the trend map. Although the metrics must be defined. Finally, the criterion function combines two
show similar errors and variance ratios, the resulting absolute percentage error functions. Equation 2 optimizes

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
trend maps are visually different. The Gaussian kernel has the mean of the trend, whereas Eq. 3 aims to reduce the
superior smoothing properties compared to the top hat and variance error from the splitting into a trend and residual
box kernels; the top hat kernel is isotropic, whereas the components:
box kernel is anisotropic, and both kernels produce spatial
artifacts because of sharp changes in data weighting over (2)
prediction location. The Ricker wavelet kernel has good
smoothing properties but yields negative values around the (3)
data clusters (here, replaced with null values for visualization
purposes).

Table 2—Goodness Metrics for PHID Trend Maps for the Duvernay Formation Using Four Kernels

Fig. 3—Comparison of PHID trend maps for the Duvernay Formation using four kernels.

April 2023 PETROPHYSICS 292


Salazar et al.

where —declus is the declustered mean of the feature of interest, Step 4 models the 2D variogram into two nested
—T is the mean of the proposed trend. Moreover, ıZ2 (constant), structures, including the nugget effect, using the GeostatsPy
ıT2, and ıR2 are the variances of the primary feature, trend, and format (Pyrcz et al., 2019), requiring eight or 12 modeling
residuals at the sample locations. CR-T(0) is the covariance parameters if one or two structures are used. For clarification,
between the trend and residuals. The near-optimal solution a nested structure is a variogram function that provides
ș2* would be one that minimizes both Eqs. 2 and 3. In other a variance contribution in variogram modeling (e.g., a
words, the goal is to minimize Eq. 4: Gaussian model with a variance contribution of 0.25) and
is a prerequisite for spatial mapping. On the other hand,
(4) a feature is a measurable property of data (e.g., PHID or

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
TOC). Table 3 shows the 12 parameters, variable type, and
where ș2* is a three-dimensional vector containing the near- feature space. Given that the near-optimal azimuth and the
optimal window sizes and rotation angle; its subscript refers major hmaj and minor hmin ranges of spatial continuity were
to Step 2. Once the algorithm finds an optimal solution, obtained in Step 3, those values are fixed and require no
diagnostics are provided, including residual and trend further optimization. Consequently, the feature space ș4
correlation, variance ratio of the trend to feature, CR-T (0), is reduced to seven dimensions by including the variance
the trend mean percentage error, residual mean, and the final contribution of the second structure.
variance loss. The evolutionary algorithm proposes a starting
Step 3 uses Bayesian optimization to find the azimuth population and optimizes ș4 using two criterion functions.
of maximum spatial continuity. The required inputs are Equation 6 is a geostatistical constraint that guarantees the
the number of trials and the azimuth bounds. First, the sum of the nugget effect and variance contributions add up
optimization suggests an azimuth within the bound, and then to one. Equation 7 evaluates the goodness of fit of the model
the algorithm computes the experimental semivariogram in using the proposed ș4* parameters and the experimental
that direction. The criterion function (Eq. 5) finds a near- variogram values (Li et al., 2018),
optimal azimuth by maximizing the range r subject to the
restriction that it must be the first Ȗ point that intercepts the (6)
sill; that is, where Ȗmajor (r) = 1. However, if zonal anisotropy
is present, the method computes the shortest intercept for
all those azimuths that depict zonal anisotropy. Once the (7)
Bayesian optimization converges, the variogram model is
ready for modeling.
where N is the number of lags from the experimental
(5) variogram; Ȗ(hi, ș4*) and Ȗ̂(hi) are the modeling and
experimental variogram values corresponding to the i-th lag.

Table 3—Feature Space ș4, Ranges, and Details Required for Semivariogram Modeling Assuming Two Structures

293 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

ıE is the standard deviation of all the experimental variogram algorithms (Fortin et al., 2012). The specific selection of
values, and ș4* is the near-optimal solution space for Step 4. one optimization technique over another (evolutionary
w(hi) is a weighting factor to express the importance of the algorithms or Bayesian optimization) is determined by a
i-th squared difference between two variogram values. The computational speed criterion after comparing them hand to
factor used here is the same as Li et al. (2018): hand. For example, multiple trials, cheap evaluations, and
extensive feature space are ideal for evolutionary algorithms
(8) (trend modeling, variogram modeling). On the other hand,
computer-intensive calculations and a smaller feature space
where n(hi) is the number of sample pairs for the i-th lag. In work well using Bayesian optimization (direction of spatial

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
short, the optimization’s goal (Eq. 9) is to satisfy Eq. 6 and continuity, variance reduction factor). The authors follow a
minimize Eq. 7: similar approach to suggest outlier identification algorithms
and the criterion functions at different steps in the proposed
(9) workflow.
Additionally, we follow a geostatistical approach that
Step 5 focuses on spatial simulations. If the uses all the available data for spatial mapping and performs
cosimulations yield a higher variance than the original input, a quality check of the predictions by comparing the spatial
the proposed workflow can optimize the variance reduction continuity and histogram reproduction. Nevertheless, the
factor (varred) using Bayesian optimization. Equation 10 workflow could use the validation set approach by splitting
is the root mean squared error between the variance of the the data into train and test sets or k-fold cross validation
samples ıZ2 and the variance of the cosimulated feature ı cosim
2
(James et al., 2013). However, the previous methods ignore
at the well locations. the spatial autocorrelation of the data. To address the
issue, the modeler can implement spatial cross validation
(10) (Lovelace et al., 2019), the jackknife (Pyrcz and Deutsch,
2014), or the fair train-test split (Salazar et al., 2022) if
Finally, Step 6 includes visualization tools for model there is a specific test pattern for evaluation. The proposed
checking and result reporting. The model-checking section workflow is compatible with any validation method because
follows the minimum acceptance criteria developed by it only requires the coordinates and feature values.
Leuangthong et al. (2004). Minimum acceptance aims
to reproduce: (i) the data values at their location, (ii) the RESULTS AND DISCUSSION
distribution of the variable of interest, (iii) the spatial
continuity of the input variogram, and (iv) the bivariate The proposed workflow is demonstrated for
correlation for cosimulations. The first criterion is fulfilled 1,152 wells over the Duvernay Formation to model, selecting
because the initialization of the grid starts by assigning the TOC as the primary feature and constrained to PHID as the
data values at their location without further resimulation secondary feature. Furthermore, we use a cell size of
at those locations (Pyrcz and Deutsch, 2014). Next, the 1,000 m because it provides a tradeoff of detailed simulations
distribution reproduction is evaluated using histograms, and faster computation times. It is similar to the average
empirical cumulative distribution functions, and q-q plots. distance of wells with their closest neighbor.
Lastly, the proposed workflow plots the variogram model
and variogram realizations for the spatial continuity Data Preprocessing and Feature Ranking
reproduction criteria, both being the residuals in Gaussian The original data set has nine potential predictor
space. The check uses the Gaussian residual space because it features, and because the Markov-Bayes assumption requires
is the required input for simulation. a single secondary feature, we perform feature ranking.
The proposed workflow uses open-source packages First, we standardize all predictor features to evaluate linear
like GeostatsPy for geostatistical analysis (Pyrcz et al., relationships and confirm the lack of correlated features.
2019) and scikit-learn for outlier identification (Pedregosa Then, we compute Pearson’s correlation, rank correlation,
et al., 2012). Moreover, the proposed workflow also partial correlation, and semipartial correlation coefficients;
uses the adaptive experimentation platform for Bayesian moreover, we use least absolute shrinkage and selection
optimization (Bakshy et al., 2018) and distributed operator (LASSO) regression for future ranking (James et
evolutionary algorithms in Python for evolutionary al., 2013). As a result, PHID is the feature with the largest

April 2023 PETROPHYSICS 294


Salazar et al.

linear correlation and the top predictor for TOC in all five solutions, and a selection of the best eight solutions for
analyses. Next, we use mutual information to evaluate the the next generations. Table 4 provides the optimized trend
nonlinear relationship (Hastie et al., 2009), concluding that model results for features and five metrics to evaluate its
PHID reduces the uncertainty of TOC prediction. Finally, goodness. The mean error for trend and residuals at the
the use of SHapley Additive exPlanations (SHAP) values sample locations is close to the declustered mean and
elucidates that PHID is the feature that contributes the most zero. The variance ratio of the trend to the input feature
when predicting TOC when using a gradient boosting tree confirms the correct modeling of the trend, avoiding
model. Hence, the workflow can confidently use PHID as a overfitting or underfitting. Moreover, the covariance ratio
predictor future of TOC. is smaller than 15% reducing the significance of artifacts

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
that may occur (Pyrcz and Deutsch, 2014).
Outlier Detection Next, the proposed workflow requires a variogram
There are no spatial outliers for the primary and model in Gaussian space; therefore, it automatically
secondary features (i.e., TOC and PHID) because most transforms the residuals. Furthermore, subsequent operations
wells are drilled in the developed Kaybob area, forming occur in the residual Gaussian space unless stated otherwise.
dense well clusters. On the other hand, we follow a
conservative approach to remove value outliers for PHID Major Direction of Spatial Continuity and Variogram
using the standard deviation method because it eliminates Modeling
fewer samples than the other algorithms (i.e., 0.35% against Bayesian optimization is an iterative process that
0.52%). More specifically, it removes large values that could involves exploration and exploitation phases. First, we run
impair the semivariogram computation (Eq. 1). Therefore, multiple optimizations for the exploration phase, setting
the resulting data set is ready for cosimulation, and no the azimuth bounds to [0, 180] to escape local minima and
additional outlier removal is required. However, additional find an optimal region of azimuths that yields large spatial
outlier removal in TOC might delete a unique secondary- continuity values. For example, the best azimuth region for
primary pair that realizations will fail to replicate. PHID is 140° to 155°, whereas TOC is 110° to 125° degrees.
Then, the exploitation phase seeks the near-optimal solution
Trend Detection and Modeling in the previous region. Once the optimization converges,
The directional semivariograms and the variogram the workflow will use the near-optimal azimuth and its
maps depict trend structures for the secondary and primary corresponding major and minor ranges. Furthermore, it
features. For instance, trend structures appear for PHID in suggests a lag distance as input for the variogram modeling
the 120° to 135° range, whereas for TOC, a combination of in the next step.
trend structures and cyclicity is present. Next, we compute The chosen configuration for variogram modeling is a
the declustered mean resulting in corrections of –13.04% starting population of 100 individuals, 14 generations, the
and 4.11% for the secondary and primary features. The computed lag distance from the previous step, the selection
corrections imply that most wells are drilled in large PHID of the top 40 individuals, and the generation of 75 offspring.
and small TOC regions. Equation 9 is computationally inexpensive, allowing the
Then, the trend modeling for both features has the workflow to start with a population of 100 variogram model
following configuration: a starting population of 20 solutions. Table 5 lists the optimized variogram models for
candidate solutions, eight generations, 12 offspring the secondary and primary features.

Table 4—Trend Modeling Results and Metrics for Trend Modeling Optimization

295 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

Table 5—Variogram Model Results to 5% and 6 to 8%. It produces fewer values than expected
within those ranges. Moreover, the realizations output
smaller values than the smallest input value of 1.23%,
although it is robust to avoid nonphysical PHID (negative
tails). Finally, Fig. 5 confirms the correct replication of the
input variogram model (Table 5) for the 90 and 135 azimuths,
complying with Leuangthong et al.’s (2004) work.
Given that all realizations are equiprobable, it is feasible
to assess the uncertainty of the results. Figure 6 shows the

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
PHID’s P10, P50, and P90 maps, and Fig. 7 shows the
probability of exceeding 7.5% for all realizations. Both
figures show that the trend computation agrees with the
local magnitudes (small or large) of PHID. For instance,
12 wells heavily influence a sizeable PHID region in the
Simulation middle of the Duvernay Formation, and all three percentile
Figure 4 compares the histograms (with 15 bins) and maps replicate the high values. Similarly, two PHID regions
empirical cumulative distribution function (ECDF) of the divide the northwest: wells with a smaller PHID than 7.5%
input feature (red) and 30 realizations (gray) for PHID. The influence the northmost area (blue in Fig. 6), whereas wells
realizations correctly replicate the ECDF in the 4 to 6% PHID with more than 7.5% impact the southern region (yellow in
region and the overall distribution shape. Nevertheless, the Fig. 6). Thus, the realizations are reproducing high and low
simulations fail to replicate the distribution in the bins of 3 PHID regions.

Fig. 4—Histogram and empirical cumulative distribution function of input data (red) and realizations (gray) for PHID.

April 2023 PETROPHYSICS 296


Salazar et al.

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
Fig. 5—Comparison of the variogram model (red) and the experimental realizations (gray) for PHID.

Fig. 6—Uncertainty maps (P10, P50, and P90 maps) of PHID.

We use a similar approach for the cosimulation of


the main feature. Nonetheless, the default value varred
= 1 yields a variance too large with an RMSE of 1.50%2.
Therefore, we run 10 Bayesian trials to reduce the error from
Eq. 10, obtaining a varred = 0.6 that decreases the error by
16% regarding the previous result. Figure 8 shows that the
optimization is successful because all realizations closely
resemble the input TOC distribution, with slight bias only
for the smallest and largest values.
Figure 9 shows the uncertainty model over 30
equiprobable realizations. Similar to Fig. 6, the 12 wells
in the middle region of the Duvernay Formation show
high TOC values, and the realizations surround those
wells with high TOC values. Furthermore, the realizations
replicate the direct relationship between the primary and
secondary features (Fig. 2) due to cosimulation. Figure 10
Fig. 7—Probability of exceeding 7.5% PHID. shows a hexagonal binning scatterplot that compares all the
realizations of primary and secondary features with the input

297 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

data. The bin count from the realizations is more considerable authors hypothesize that this region is prone to exist, given
in regions with more input data than regions with few input there are only four samples (0.35% of the input data after
samples, 2.5 to 10% PHID, and 1 to 4% for TOC. Although outlier removal) with more than 6 wt% TOC. The PHID
the realizations yield PHID-TOC pairs absent in the data set, feature lacks a lithology-TOC effect correction; therefore,
these pairs are fewer than those from the central input cluster the positive linear relationship between porosity with TOC
(red squares). Also, a region from the realizations expands is expected, and the proposed workflow replicates that
from 6 to 12 wt% TOC that is absent in the input data. The relationship.

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
Fig. 8—Histograms and empirical cumulative distribution compare input TOC (red) and its realizations (gray).

Fig. 9—P10, P50, and P90 uncertainty maps of TOC.

April 2023 PETROPHYSICS 298


Salazar et al.

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
Fig. 10—Hexagonal binning scatterplot comparing all realizations and input data of TOC and PHID.

There are three primary limitations to the proposed to focus on the meaning of the results and interpretation and
workflow. First, it requires enough samples to reliably less on the modeling. In addition, data analytics without
compute the residuals’ geological trend and variogram spatial context domain knowledge could negatively impact
model, a characteristic that impairs its use in greenfields. the results, so we included comprehensive geostatistical
Second, given the random initialization of Bayesian and spatial analysis to improve results.
optimization and evolutionary algorithms, the algorithms The proposed workflow assists in trend modeling,
fail to replicate the same optimal solution in each run; variogram modeling, and simulation/cosimulation of
additionally, if not correctly initialized, the solutions could features. Most importantly, the realizations agree with
fail to find the global minima solution, i.e., result in a local the four parameters of the minimum acceptance criteria.
minima. Third, the Markov-Bayes assumption uses a single In addition, the diagnostics plots assist the analysis of all
secondary feature to cosimulate a primary feature, limiting steps, including evaluating the uncertainty of multiple
the use of more informative features in a multivariate realizations. Future work could include new functionalities,
problem. such as integrating seismic surfaces to assist cosimulation,
Regarding future work, the authors evaluate the use different scenarios, and economic analysis.
of the cross-validation or the fair train-test split validation
method to evaluate the generalization error and combine ACKNOWLEDGMENTS
other algorithms to support the simulation predictions.
The authors thank Equinor and the DIRECT
CONCLUSIONS consortium’s industry partners at The University of Texas
at Austin for supporting this work. In addition, we thank the
The proposed workflow presents a novel combination helpful recommendations from Sofia Campo and Hallstein
of geostatistics, spatial data analytics, and optimization Lie from Equinor and the reviewers for improving the
methods to assist in the 2D modeling of spatial and sparse quality of the manuscript. Larry W. Lake holds the Shahid
data sets. The workflow is well documented and has a user- and Sharon Chair at The University of Texas at Austin.
friendly design to apply geostatistical methods, Bayesian Finally, we acknowledge Equinor for granting permission to
optimization, and evolutionary algorithms, allowing users use the data set presented here.

299 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

NOMENCLATURE Barnett, R.M., Lyster, S., Pinto, F., MacCormack, K., and Deutsch,
C.V., 2018, Principles of Data Spacing and Uncertainty in
Abbreviations Geomodeling, Bulletin of Canadian Petroleum Geology,
66(3), 575–594.
CPI = computer-processed interpretation
Breunig, M.M., Kriegel, H.-P., Ng, R.T., and Sander, J.,
ECDF = empirical cumulative distribution function 2000, LOF: Identifying Density-Based Local Outliers,
LASSO = least absolute shrinkage and selection operator Association for Computing Machinery, 29(2), 93–104. DOI:
PHID = density-porosity 10.1145/335191.335388.
RMSE = root mean square error Burke, E.K., and Kendall, G., 2014, Search Methodologies:
SHAP = Shapley additive explanations Introductory Tutorials in Optimization and Decision Support

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
TOC = total organic carbon Techniques, Springer, New York. DOI: 10.1007/978-1-4614-
varred = variance reduction factor 6940-7.
Chilès, J.-P., and Delfiner, P., 2012, Geostatistics: Modeling
Symbols Spatial Uncertainty, second edition, John Wiley & Sons, Inc.,
Hoboken, New Jersey, USA. DOI: 10.1002/9781118136188.
CR-T (0) = covariance between residuals and trend,
Chopard, B., and Tomassini, M., 2018, An Introduction to
wt% * %
Metaheuristics for Optimization, Springer Cham. DOI:
h = lag vector, m 10.1007/978-3-319-93073-2.
hmaj = major direction of continuity, m Deutsch, C.V., and Journel, A.G., 1997, GSLIB: Geostatistical
hmin = minor direction of continuity, m Software Library and User’s User’s Guide, second edition,
i 2ratio = variance ratio of residuals to feature of interest Oxford University Press, New York. ISBN: 978-0195100150.
n(hi) = number of samples pairs for the i-th lag Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M.,
N(h) = number of available data pairs and Gagné, C., 2012, DEAP: Evolutionary Algorithms Made
r = range, m Easy, The Journal of Machine Learning Research, 13(1),
2171–2175. URL: https://www.jmlr.org/papers/volume13/
w(hi) = weighting factor of the i-th squared difference
fortin12a/fortin12a.pdf. Accessed January 13, 2022.
between two variogram values
Gyebnár, G., Klimaj, Z., Entz, L., Fabó, D., Rudas, G., Barsi,
Z = feature of interest, wt% or % P., and Kozák, L.R., 2019, Personalized Microstructural
Ȗ̂(hi) = experimental variogram values, wt%2 or %2 Evaluation Using a Mahalanobis-Distance Based Outlier
Ȗ̂(hi, și*) = modeling variogram values of near-optimal Detection Strategy on Epilepsy Patients’ DTI Data – Theory,
solution for feature space i, wt%2 or %2 Simulations and Example Cases, PLoS One, 14(9), e0222720.
ȖZ (h) = semivariogram, wt%2 or %2 DOI: 10.1371/journal.pone.0222720.
și = feature space i Hamdi, H., Clarkson, C.R., Esmail, A., and Sousa, M.C., 2021,
—declus = declustered mean of the feature of interest, Optimizing the Huff’ n’ Puff Gas Injection Performance in
wt% or % Shale Reservoirs Considering the Uncertainty: A Duvernay
Shale Example, Paper SPE-195438, SPE Reservoir Evaluation
—T = trend mean, wt% or %
& Engineering, 24(1), 219–237. DOI: 10.2118/195438-PA.
ı cosim = variance of cosimulated features, wt%2
2
Hastie, T., Tibshirani, R., and Friedman, J., 2009, The Elements of
ıE = standard deviation of all experimental Statistical Learning: Data Mining, Inference, and Prediction,
variogram values, wt% or % second edition, Springer New York, New York, NY. DOI:
ıR2 = variance of the residuals, wt%2 or %2 10.1007/978-0-387-84858-7.
ıT2 = variance of the trend, wt%2 or %2 Isaaks, E.H., and Srivastava, R.M., 1989, An Introduction to
ıZ2 = variance of the feature or interest, wt%2 or %2 Applied Geostatistics, Oxford University Press, New York.
ISBN: 978-0195050134.
REFERENCES James, G., Witten, D., Hastie, T., Tibshirani, R., 2013, An
Introduction to Statistical Learning, Springer New York, New
Archetti, F., and Candelieri, A., 2019, Bayesian Optimization and York, NY. DOI: 10.1007/978-1-4614-7138-7.
Data Science, Springer Cham, New York. DOI: 10.1007/978- Jensen, J., Lake, L.W., Corbett, P.W.M., and Goggin, D., 2000,
3-030-24494-1. Statistics for Petroleum Engineers and Geoscientists With
Bakshy, E., Dworkin, L., Karrer, B., Kashin, K., Letham, B., Applications in R, Springer, New York. ISBN: 978-0-444-
Murthy, A., and Singh, S., 2018, AE: A Domain-Agnostic 50552-1.
Platform for Adaptive Experimentation, Paper presented at Journel, A.G., Kyriakidis, P.C., and Mao, S., 2000, Correcting the
the Conference on Neural Information Processing Systems, Smoothing Effect of Estimators: A Spectral Postprocessor,
Montreal, Canada, 2–8 December. URL: http://eytan.github. Mathematical Geology, 32(32), 787–813. DOI:
io/papers/ae_workshop.pdf. Accessed January 13, 2023. 10.1023/A:1007544406740.

April 2023 PETROPHYSICS 300


Salazar et al.

Leuangthong, O., McLennan, J.A., and Deutsch, C.V., Shen, L.W., Schmitt, D.R., and Haug, K., 2019, Quantitative
2004, Minimum Acceptance Criteria for Geostatistical Constraints to the Complete State of Stress From the
Realizations, Natural Resources Research, 13, 131–141. DOI: Combined Borehole and Focal Mechanism Inversions:
10.1023/B:NARR.0000046916.91703.bb. Fox Creek, Alberta, Tectonophysics, 764, 110–123. DOI:
Li, Z., Zhang, X., Clarke, K.C., Liu, G., and Zhu, R., 2018, An 10.1016/j.tecto.2019.04.023.
Automatic Variogram Modeling Method With High Reliability Thomas, R., and Judith, J.E., 2020, Voting-Based Ensemble
Fitness and Estimates, Computers & Geosciences, 120, 48– of Unsupervised Outlier Detectors, in Jayakumari, J.,
59. DOI: 10.1016/j.cageo.2018.07.011. Karagiannidis, G., Ma, M., and Hossain, S., editors, Advances
Liu, F.T., Ting, K.M., and Zhou, Z.-H., 2012, Isolation-Based in Communication Systems and Networks, 656, Springer,
Anomaly Detection, Association for Computing Machinery Singapore. DOI: 10.1007/978-981-15-3992-3_42.

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
Transactions of Knowledge Discovery From Data, 6(1), 1–39. Weides, S., Moeck, I., Majorowicz, J., Palombi, D., and Grobe,
DOI: 10.1145/2133360.2133363. M., 2013, Geothermal Exploration of Paleozoic Formations in
Liu, H., 2017, Principles and Applications of Well Logging, Central Alberta, Canadian Journal of Earth Sciences, 50(5),
Springer Berlin, Heidelberg, Germany. DOI: 10.1007/978-3- 519–534. DOI: 10.1139/cjes-2012-0137.
662-54977-3.
Liu, W., and Pyrcz, M.J., 2021, A Spatial Correlation-Based ABOUT THE AUTHORS
Anomaly Detection Method for Subsurface Modeling,
Mathematical Geosciences, 53, 809–822. DOI: 10.1007/
Jose Julian Salazar is a PhD student at the Hildebrand
S11004-020-09892-z.
Department of Petroleum and Geosystems of The University
Lovelace, R., Nowosad, J., and Muenchow, J., 2019,
Geocomputation With R, CRC Press. ISBN: 1-138-30451-4. of Texas at Austin. His research focuses on geostatistics,
Ma, Y.Z., 2019, Quantitative Geosciences: Data Analytics, machine learning, and deep learning to include the spatial
Geostatistics, Reservoir Characterization and Modeling, context to check the quality of data-driven applications
Springer Cham, Switzerland. DOI: 10.1007/978-3-030- applied to the subsurface.
17860-4.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Jesus Ochoa is a principal researcher working for
Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., the technology, development, and innovation division of
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, Equinor. Prior to joining Equinor, he worked at Marathon
A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay,
Oil Corporation. Before moving to the United States, Jesus
É., 2012, Scikit-Learn: Machine Learning in Python, Journal
of Machine Learning Research, 12(85), 2825–2830. URL:
worked as a geologist for PDVSA and as an independent
https://arxiv.org/pdf/1201.0490.pdf. Accessed January 13, consultant for firms in Venezuela. His interests include
2023. seismic stratigraphy and reservoir characterization. He
Pyrcz, M.J., and Deutsch, C.V., 2014, Geostatistical Reservoir earned a BSc degree from the Universidad de Los Andes
Modeling, second edition, Oxford University Press, New Venezuela (2001) and an MSc degree from Montana State
York. ISBN: 9780199358830. University (2008).
Pyrcz, M.J., Jo, H., Kupenko, A., Liu, W., Gigliotti, A.E., Salomaki,
T., and Santos, J., 2019, GeostatsPy [Python package], Léan Garland is a principal geologist currently working
PyPl, Python Package Index, URL: https://pypi.org/project/
in the subsurface digital projects and products department
geostatspy/. Accessed January 13, 2023.
in Equinor with a project focused on spatial analytics and
Salazar, J.J., Garland, L., Ochoa, J., and Pyrcz, M.J., 2022, Fair
Train-Test Split in Machine Learning: Mitigating Spatial implementing machine-learning solutions to production
Autocorrelation for Improved Prediction Accuracy, Journal forecasting. Before studying for an MSc degree in petroleum
of Petroleum Science and Engineering, 209, 109885. DOI: geology, Léan worked in the bedrock mapping department of
10.1016/j.petrol.2021.109885. the geological survey of Ireland. He has a BSc Hons degree
Salazar, J.J., and Pyrcz, M.J., 2021, Geostatistical Significance in geology from University College Dublin, Ireland (2002),
of Differences for Spatial Subsurface Phenomenon, Journal an MSc degree in petroleum geology from the University of
of Petroleum Science and Engineering, 203, 108694. DOI: Aberdeen, Scotland (2004), and a Certificate in petroleum
10.1016/j.petrol.2021.108694. engineering from Herriot-Watt University Edinburg (2014).

301 PETROPHYSICS April 2023


Spatial Data Analytics-Assisted Subsurface Modeling: A Duvernay Case Study

Larry W. Lake is a professor in the Department of


Petroleum and Geosystems Engineering at The University
of Texas at Austin, where he holds the Shahid and Sharon
Ullah Endowed Chair. He holds BSE and PhD degrees in
chemical engineering from Arizona State University and
Rice University, respectively. He is the author or coauthor
of more than 100 technical papers, four textbooks, and the
editor of three bound volumes. He has been a member of the
US National Academy of Engineers since 1997.

Downloaded from http://onepetro.org/petrophysics/article-pdf/64/02/287/3098012/spwla-2023-v64n2a9.pdf by The University of Texas At Austin user on 05 April 2023
Michael J. Pyrcz is an associate professor at the
Hildebrand Department of Petroleum and Geosystems and
an affiliated faculty at the Jackson School of Geosciences,
both of The University of Texas at Austin. Dr. Pyrcz’s current
research is focused on improving reservoir characterization
and modeling for enhanced development planning,
minimized environmental impact, more robust profitability,
and better utilization of valuable natural resources.

April 2023 PETROPHYSICS 302

You might also like