0% found this document useful (1 vote)
121 views166 pages

Introduction To Geostatistics For Site Characterization and Safety Assessment

This document provides an introduction to geostatistics for site characterization and safety assessment. It discusses how geostatistics can be used to model spatial variability in subsurface properties using limited sampling data, and how that variability affects uncertainty in performance assessment results. Geostatistical techniques like variogram analysis and simulation methods are introduced as tools for characterizing spatial correlation and creating multiple equally probable representations of the subsurface.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
121 views166 pages

Introduction To Geostatistics For Site Characterization and Safety Assessment

This document provides an introduction to geostatistics for site characterization and safety assessment. It discusses how geostatistics can be used to model spatial variability in subsurface properties using limited sampling data, and how that variability affects uncertainty in performance assessment results. Geostatistical techniques like variogram analysis and simulation methods are introduced as tools for characterizing spatial correlation and creating multiple equally probable representations of the subsurface.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 166

SAND2013-xxxx

SAND2013-4769C

Introduction to Geostatistics for Site


Characterization and Safety Assessment
IAEA Training Course – July 1-5, 2013
Bill Arnold

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Acknowledgements

Coarse materials and exercises were developed in


collaboration with Sean McKenna, formerly with Sandia
National Laboratories. Sean McKenna is currently with
IBM Research in the Smarter Cities Technology Centre –
Dublin, Ireland

2
Topics

 Ubiquitous nature of geological heterogeneity


 Relationship of geostatistics to classical statistics
 Estimation methods versus simulation methods in
geostatistics
 Relation between spatial variability and uncertainty
 Uses of geostatistics in site characterization
 Uses of geostatistics in risk assessment

3
Background

Geologic materials are


ubiquitously heterogeneous

4
Background

Limited sampling of subsurface


results in uncertainty in spatial
distribution of hydraulic and
physical parameters

5
Background
 Many geological media are produced by processes that are
very complex and occur on scales ranging from microscopic
to 100s of km
 Sampling of subsurface media is extremely limited relative
to the volume of material of interest, because of economic
and practical limitations
 The challenge for geoscientists and engineers is the
characterize and model geological media adequately for
predictive analysis and decision making
 Characterization and prediction is necessarily uncertain
because exact description of the system is impossible
 The goal is to make full use of available geological
information and to understand the limitations of our
knowledge and the related uncertainty
6
Background
 Fortunately, geological media have spatial structure that can
be characterized with subjective and quantitative knowledge
 Geological knowledge and interpretation are important
sources of information on spatial structure and continuity,
particularly for large-scale features
 Material properties also tend to have spatial correlation
related to the continuity of processes that operated during
formation of the geological media
 Intuitively, the values for a particular property tend to be
more similar for locations that are closer together than for
locations that are more widely spaced
 This characteristic of geological media forms the basis for
the field of geostatistics

7
Geostatistics - Overview
 Primary objective of geostatistics is
the characterization of spatial or
temporal systems that are
incompletely known
 Classical univariate statistics considers
only the population of values for a
particular variable
 Geostatistics is an extension of
bivariate statistics that uses the
sampling location (in space or time) of
every measurement
 Geostatistical analysis is only
meaningful if the measurements show
some spatial (or temporal) correlation
8
Geostatistics - Background
 Geostatistics is a very broad field; this workshop provides
only a brief introduction to the topic
 Geostatistics has developed over a long time frame,
starting with theoretical developments in the 1950s and
expanded significantly in the era of digital computation
 Original applications included estimation methods applied
to calculation of ore reserves in mining
 More recent applications have been more focused on
simulation methods, as applied to reservoir modeling in
petroleum engineering applications
 Geostatistical estimation methods are generally used for
interpolation of measurement, but simulation methods
can be used in extrapolation of parameters (with caution!)

9
Geostatistics - Overview
Geostatistics provides a set of tools for modeling spatial
distributions of parameters based on the available data and
the two-point spatial covariance

Histogram

Variogram
g

distance

10
Geostatistics

 The study of spatially and/or temporally correlated data


 Suite of tools for quantifying the amount and style of
spatial correlation
 Adaptations to classical regression techniques to take
advantage of spatial correlation
 Includes both interpolation (estimation) techniques and
Monte-Carlo simulation techniques

11
Geostatistics Applications
Three-dimensional model of fracture
permeability at the JNC MIU site, Japan

Cross-section model of fracture


frequency at the JNC MIU site
12
Geostatistics – Problem Statement

Site Characterization and Performance Assessment


Concerns:

 How can we model spatial variability in hydraulic and


physical properties?
 How does uncertainty in spatial distribution of properties
affect uncertainty in performance assessment results?
 If more site characterization boreholes are worthwhile,
where are the best locations for them?
 Optimal Site Characterization

13
Geostatistics – Approach

Once the site characterization goals and PA performance


measures have been established, Geostatistical Simulation is
used to:

 Accurately estimate the values of properties (K, T, porosity,


fracture frequency, thermal conductivity) at unsampled
locations
 Provide a measure of the uncertainty in that estimate
 Create equiprobable maps of the spatial distribution of
properties with associated uncertainty levels
 Provide sampling locations that will contribute the most to
reducing uncertainty

14
Geostatistics – Example Variogram

S&P 500 Index


The sill is the Daily Close Values (8/30/93-2/3/95)

gamma value 140.0


corresponding
to the total 120.0 Sill
Average Variability

variability in the 100.0


dataset 80.0
The range is
60.0 the
The nugget 40.0 separation
Range distance
accounts for 20.0 Nugget
variability at (time) at
zero lag
0.0 which the
0.0 0.1 0.2 0.3 0.4 0.5 0.6
distance. Time (year fraction) sill is
reached

15
Geostatistics - Estimation

 Estimation is an interpolation technique


 Weighted combinations of the surrounding data are used to
determine an estimate of the value at an unsampled location
 Kriging is a geostatistical estimation procedure that uses the
information in the variogram to determine the weights used in
estimating unsampled locations.
 Estimation procedures determine the “best-guess value” at any
location

16
Geostatistics - Estimation

160.0

140.0

120.0
Concentration (ppm)

100.0

80.0

60.0

40.0

20.0
0 200 400 600 800 1000
Distance (m)

17
Geostatistics - Simulation

Geostatistical simulation is a Monte-Carlo technique for producing


multiple, equally probable, realizations of a sampled variable.

 Simulation reproduces the variability of the initial


data and does not have the smoothing effect of
estimation
 Simulation reproduces the actual values,
histogram, and variogram of the input data
 Each realization is a plausible model of the reality
from which the samples were obtained

18
Geostatistics - Simulation

160.0

140.0

120.0
Concentration (ppm)

100.0

80.0

60.0

40.0

20.0
0 200 400 600 800 1000
Distance (m)

19
Geostatistics - Simulation Example
Sample Data showing location and 20 Realizations of Th-232 Activity
Activity of Th-232 Levels

4.0

0.0

Example of soil contamination from U.S. DOE Mound Site

20
Geostatistics – Groundwater Flow

One realization of
non-uniform flow through
heterogeneous material
Transport is convective, not
dispersive
(Large Peclet numbers)

21
Geostatistics – Groundwater Flow
Three Realizations conditioned to same 96 boreholes

22
Geostatistics – Realizations

How many realizations are enough to make an informed decision?


 The answer is dependent on the performance metric
(performance metric relative to uncertainty distribution)
Need fewer realizations to get accurate picture
Frequency

Hard to define the tail of the distribution, need more


realizations to get an accurate picture

Expected dose Alternative Performance Measures

 Is an idea of the best, worst, and expected conditions enough?


 Test how many is “enough” with concept of a representative elementary
volume (REV) from groundwater hydrology

23
Geostatistics – Additional Sampling
Where to locate additional boreholes? (3 Approaches)

Traditional Technique
 Reduce estimation error or kriging variance by putting
boreholes in unsampled locations
Decision Based Technique
 Areas of maximum uncertainty defined by probability
mapping
New Idea
 Consider K to be a stochastic input parameter to transport
model and use sensitivity analysis

24
Geostatistics – Summary
 Uncertainty is due to limited sampling of a spatially
heterogeneous variable
 Spatial uncertainty creates uncertainty in performance
assessment results
 Geostatistical simulation provides a technique for examining and
quantifying the amount uncertainty.
 Estimates of uncertainty can be propagated through
performance assessment models
 Relationship between PA uncertainty and spatial uncertainty can
be used to guide site characterization

25
Geostatistics

Data Histogram Sample Data Variogram

Translating uncertainty due


to limited sampling of a Geostatistical
spatially variable property Simulations of
into uncertainty in PA results Transmissivity

Dose Histogram Breakthrough Curves Sensitivity Map

26
Geostatistics – Exercises Objectives

 Become familiar with SGeMS and GSLIB software


 Learn the basics of exploratory data analysis
 Learn the basics of spatial correlation analysis and variogram
analysis
 Learn the basics of spatial estimation methods
 Learn the basics of spatial simulation methods
 Apply these analytical techniques and methods to example 2D and
3D data sets

27
Geostatistics – PA Decision Framework
G o a ls
(UR L , S a fe ty
A s s es sm e n t)

In fo rm ation D a ta C o lle c tio n


M an ag em en t

C o n c e p tu a l M o d e l, D e c is io n
P a ra m e te r D e v e lo p m e n t P o in t

U n c e rta in ty
An a l ys is D a ta W o rth A n a lys is
H o w M u c h D a ta to C o lle c t?

U n c e rta in ty S e n s itiv ity An a l ys is


D e c is io n W h a t D a ta to C o lle c t?
P o in t W h e re to C o lle c t D a ta ?

S to p

28
Geostatistics – PA Decision Framework

Sedimentary layer

River
Aquifer

Rock mass

29
Geostatistics – Exercise Preview

 Exploratory Data Analysis of


these data
 Analyze spatial correlation of
parameters by constructing
and modeling variograms
 Create estimate of porosity
and permeability fields
 Create multiple realizations of
porosity and permeability
fields

30
SAND2013-xxxx

Spatial Variability: Exploratory Data Analysis and


Spatial Correlation Analysis
IAEA Training Course – July 1-5, 2013
Bill Arnold

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Outline

 Introduction
 Exploratory data analysis
 Visualization
 Univariate statistics
 Data correlations
 Clustering, transformations, and trends
 Spatial correlation analysis
 Experimental variograms
 Correlation anisotropy
 Variogram models

2
Introduction

 Spatial correlation and heterogeneity are important


characteristics of geological system with potentially
important consequences for repository performance
 Objectives of spatial correlation analysis include:
 Data checking
 Conceptual understanding of heterogeneity
 Univariate statistical characterization
 Understand complicating factors in spatial correlation (e.g.,
data transformation, anisotropy, trends
 Quantitative analysis of spatial correlation
 Modeling of spatial correlation

3
Exploratory Data Analysis
Exploratory Data Analysis (EDA) is everything you do to understand
your data. It includes both objective and subjective analyses.

Three essential functions of EDA:


- error checking
- understanding physical processes for use in modeling
- statistical validation of results

Topics
- mapping the data
- histogram techniques
- probability-plotting techniques
- correlations among multivariate data
- data transformations
4
EDA - Mapping the Data

Plot it! Humans are very good at processing visual data.


 Look for spatial patterns, correlations with other variables
 Spot problems in data: keypunch errors,
transposed coordinates
 Beware of pitfalls: data clustering, preferential sampling
 Use:

- Plots with colors or symbols proportional to value


- Indicator plots
- Quick contouring with a simple program

5
EDA - Data Posting

Data posting shows locations and values in single image


50.0
Nitrate
50.0
Nitrate Contamination
Contamination

40.0 40.0 35.0


31.5
28.0
30.0 30.0 24.5
21.0
17.5
20.0 20.0 14.0
10.5
7.0
10.0 10.0
3.5
0.0

0.0 0.0
0.0 0.0
10.0 10.0
20.0 20.0
30.0 30.0
40.0 40.0

Coordinates of lower left data Data posting tool should be used


point may contain an error more often

6
Data Posting

Data posting examples from


workshop exercises

7
EDA – Contour Mapping

Contour mapping provides a


28
quick way to map the data.
26
Sharp high or low values lead

row
24
to steep gradients in contour
map and may indicate bad 22

data 20

20 21 22 23 24 25 26
Different contouring
column
algorithms give different
maps. Which one is best?
0 20 40 60 80 100

8
EDA – Indicator Plots

I(x)=1 if Z(x) > Z* ; I(x) = 0 otherwise

9
EDA – Histograms

 Simple Histograms
- Probability density function (PDF) is histogram
- Cumulative density function (CDF)
- Check for outliers (and their cause)
- Multimodality: evidence for multiple processes
- Clustering of data or preferential sampling

• CDF format provides an important conceptual link to


downstream modeling (Monte Carlo Models)
• Relationship of the histogram and other “ensemble statistics” to
model output
• Need to decluster data to remove effects of preferential
sampling (need an unbiased estimate of sampled property)

10
EDA – PDF and CDF plots

CDF

Cumulative Frequency
Frequency

PDF

Variable
VariableValue
Value

11
EDA – PDF and CDF plots

• PDF plots of the same data


may appear significantly
different depending on the
binning of the data
• Care should be taken to
examine PDF plots for
different numbers of bins
to avoid misinterpretation
• CDF plot is unique for a
given data set and not
prone to misinterpretation
from plotting preferences

12
EDA – Sample Clustering and Plots
50.0
Lead Concentrations

500.
40.0
450.
400.
350.
30.0
300.
250.
20.0 200.
150.
100.0
10.0 50.0
0.0

0.0
0.0 10.0 20.0 30.0 40.0 50.0

Clustered Lead Data Number of Data 140 Declustered Lead Data Number of Data 140
0.250 mean 406.9022 mean 231.5361
std. dev. 600.6014 std. dev. 414.9137
coef. of var 1.4760 0.300 coef. of var 1.7920
maximum 4972.3999 maximum 4972.3999
0.200 upper quartile 573.8152 upper quartile 229.9890
median 152.9998 median 88.5706
lower quartile 48.4950 lower quartile 36.6273
minimum 0.0700 minimum 0.0700
0.150 0.200

Frequency
Frequency

0.100

0.100

0.050

0.000 0.000
0. 500. 1000. 1500. 2000. 0. 500. 1000. 1500. 2000.

Variable Variable

13
EDA – Probability Plotting

Simple Probability Plots


• plot sample value against probability value rather than against
simple frequency
• implicit comparison to theoretical Gaussian distribution
• different underlying populations will plot as straight line segments
• can compare to populations other than Gaussian
Quantile-Quantile Plots
• plot corresponding quantiles of any two populations
• use in understanding cross-correlated variables
• use in “validating” output models
EDA – Normal Probability Plot

Plots sample values vs. Gaussian probability


99.99
Normal Probability Plot

99.9
99.8
99
98
95
Cumulative Probability

90
Cumulative Probability

80 99.99
Normal Probability Plot
70 99.9
99.8
60
50 99
98
40 95
30

Cumulative Probability
Cumulative Probability
90
20 80
70
60
10 50
40
5 30
20
2 10
1 5
2
1
0.2
0.1 0.2
0.1

0.01
0.01 -5.0 -3.0 -1.0 1.0 3.0 5.0
0. 0.20 0.40 0.60 0.80 1.0 Variable Value
Variable

Variable Value
Variable
EDA – Normal Probability Plot
Example probability plot from workshop exercises – using
normal score transform value for the Gaussian probability axis
EDA – Correlation of Multivariate Data

Multiple properties of interest


Multiple measurement methods
• For our purposes: Scatterplot Analysis
- direct and inverse correlation
- strength of correlation
- correlation coefficient (r)
- coefficient of determination (r2)
- rank-order correlation coefficient
• Concept of conditional expectation
EDA – Correlation of Multivariate Data

500. Nitrate-DCPA Correlation


Number of data 32
Number trimmed 10
X Variable: mean 15.819
400. std. dev. 8.609
Y Variable: mean 118.666
std. dev. 112.406
correlation 0.752
300.
rank correlation 0.846

DCPA
200.
30.0
Nitrate Concentration 30.0
DCPA Concentration

100.
50.0 500.
28.0 28.0

0.
26.0 0.0 10.0 20.0 30.0 26.0

NO3

24.0 24.0

22.0 22.0

20.0 20.0

0.0 0.0
18.0 18.0
18.00 20.00 22.00 24.00 26.00 18.00 20.00 22.00 24.00 26.00
EDA – Correlation of Multivariate Data

Example scatter plots from workshop exercises


EDA – Conditional Expectation
EDA – Data Transformations

• A powerful tool for understanding data.


- reduce numerical artifacts that can obscure relationships
- simplify portions of numerical modeling
• A “two-edged sword” (benefits and difficulties)
- back-transformation may have negative implications
Examples:
- logarithmic U = log (Z)
- indicator I = 1 if Z < Z*; I = 0 Otherwise
- rank-order Z in order 1,2,3, N
- normal-scores  = 0 2 = 1
- uniform-scores [0, 1]
EDA – Normal-score Transform

Graphical conceptualization of quantile-preserving process


1.0

0.8
Cumulative Frequency

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 -3 -2 -1 0 1 2 3

Variable Space Normal-Score Space


EDA – Trends

What is a trend?
• Geostatistics assumes second-order stationarity
• “Deterministic Geologic Processes”
- Trend analysis and modeling must make geologic sense
• Removing a trend - Analysis of residuals
Data Value

Data Value

Distance Distance
EDA – Summary

Focused principally on understanding the data


• Error checking
• Physical process responsible for deposition
• Development of reasonable target characteristics for models
(also used for validation)
Techniques
Mapping
Transformations - pros/cons
Histograms
Distributional analysis
Declustering
Trend removal
Spatial Correlation Analysis

 Measurements of an earth science variable are rarely


independent.
Independence is the premise underlying sampling
theory based on traditional statistics.

 It is this emphasis on spatial correlation that sets


geostatistics apart from traditional statistics
 The traditional measurement of spatial correlation within
geostatistics is the semi-variogram, commonly called the
variogram.
Spatial Correlation

How does the correlation coefficient, r, behave as h increases?

h2
Correlation, r

h1

Separation, h

 The greater the distance between points, the less correlated the
values.
 Since h is a vector, direction matters. Separation and differences
may be different in different directions.
Scatterplot Example

h=5 h = 10 h = 15 h = 20

Z(x+h)
Z(x+h)
Z(x+h)

Z(x+h)
Z(x) Z(x) Z(x) Z(x)

A scatter plot is a means of seeing the variability of sample values


for all sample points separated by a distance h.

• At small separations between any pair, correlation is strong


• As the separations between samples increase, correlation decreases
Stationarity
Stationarity is the invariance of a property (e.g., the mean) across
space or time.
A statistically homogeneous field is the result of a stationary
process.

• First order stationarity refers to the mean value remaining


constant in space
• Second order stationarity refers to the mean and variance being
constant in space

A Non-Stationary data set will show a trend in space or time of


the mean or variance of the data set.
Stationarity

• Measurements collected in a small area should be strongly


correlated because there is a relatively small separation
distance between samples
• Measurements collected in another area a couple of miles
away should also be strongly correlated to each other
because of small separation distances.
• But if the two sample groups are compared and do not
show correlation this may be evidence of non-stationarity.
1st Order Stationarity

Distance or Time
Analyzing Spatial Correlation
• In geostatistics we tend to look at the opposite of correlation,
which is variability.
• At very close distances variability is low, and as the separation
distance increases, so does variability.

Range
Var i abi li ty

Sill

Nugget

Separation Distance
Variogram
The variogram is a measure of variability as
a function of separation distance h.

Nugget: some amount Range: distance at which we reach the


of variability at zero total amount of variability
separation: a Range
representation
of measurement
Sill: the total
Variability

error or
variability at variability level at
separations
Sill which the
smaller than the variogram value
sample distance. becomes constant
Nugget

Separation Distance
Variogram Equation

1/2 the average squared difference between all values


separated by distance h.

1 n(h)
 ( h)  
2n(h) i 1
( z i ( x )  z i ( x  h )) 2

Where:  is the variability


z(x) is the value at location x
z(x+h) is the value h away from location x
n is (the number of values that are separated by h)

This gives a value for variability at the given h, and the value is a
point on the experimental variogram. Repeat for each value of h.
Variogram – Covariance Relationship
C(h) = Sill-(h)
1.2

Sill 1.0
Covariance is the
0.8 Semi-variance inverse of the
Gamma/Cova

0.6 variogram
0.4

Covariance
0.2

0.0
0.0 25.0 50.0 75.0 100.0 125.0 150.0
Distance (meters)

This simple relationship between variogram and covariance is true


under the assumption of second order stationarity
Variogram – Search Neighborhood

• To determine how many samples are a given h away from a certain


location, a search neighborhood is used.
• The simplest search neighborhood (isotropic) includes all locations in
a specified concentric ring away from the current location.
• Determine the average spacing of all values lying between h-1/2 and
h+1/2. This average spacing is the x coordinate of the point on the
experimental variogram.

2
Gamma

1
1 2 3
Separation Distance
Variogram – Search Neighborhood

Properties in the earth and environmental sciences are often


deposited/produced in anisotropic patterns.

• Rather than using a circular search neighborhood, may want to use an


ellipse oriented along the principle direction of correlation.
• For example with sedimentary layers: vertically there are changes in
types of rock and large sample variations, horizontally the beds are very
similar even at large distances

search
neighborhood
Variogram – Search Neighborhood
Use geological knowledge of genetic processes to customize search
along a preferred orientation. Orient search along this direction, the
search direction.
Search angle direction

Half-angle
Y direction

2nd lag Bandwidth


We have a
point here, at
the origin, and
want to search 1st lag
for nearby
points.
X direction
Variogram – Search Neighborhood
The search neighborhood diagram is a template which can be
moved to different points and different directions.

Search angle direction

Half-angle
Y direction

2nd lag Bandwidth

1st lag

X direction

One point on the variogram will be generated for each lag


Experimental Variogram

After employing the search


neighborhood and entering
the points into the variogram
equation, the experimental
variogram (shown by crosses)
is produced.

Typically the experimental variogram starts at small values,


increases, then follows, on average, the sill. In modeling, the
emphasis is placed on fitting the experimental variogram prior to
the sill.
Experimental Variogram

Generally, employ one of


a limited number of
theoretical models that
always yield positive
definite matrices

•Fitting a model is still an art


Usually emphasize the model fit to the
experimental variogram at smaller h.
Variogram Models: Spherical

 h 3
h<a:  (h)  C  1.5  0.5  
h
 a  a   Variogram Models
Range = 100.0 SIll= 1.0

ha:  (h)  C 1.2

1.0

0.8

Gamma
Where C = sill value 0.6

a = range 0.4

h = lag distance 0.2

0.0
0.0 50.0 100.0 150.0 200.0 250.0
Distance

The spherical model is linear at the origin and the


range parameter is exactly the correlation length
Variogram Models: Gaussian

  ( 3h ) 2  
  2  Variogram Models

 (h)  C  1  e
a  Range = 100.0 SIll= 1.0

  1.2

  1.0

0.8

Gamma
Where C = sill value 0.6

a = effective range 0.4

(95% of sill) 0.2


h = lag distance
0.0
0.0 50.0 100.0 150.0 200.0 250.0
Distance

The Gaussian model has a very slow increase near the


origin (low variability at small values of h).
Variogram Models: Exponential

 
3h Variogram Models
 Range = 100.0 SIll= 1.0

 (h)  C  1 e a
  1.2

  1.0

0.8

Gamma
Where C = sill value 0.6

a = effective range 0.4

(95% of sill) 0.2


h = lag distance
0.0
0.0 50.0 100.0 150.0 200.0 250.0
Distance

The exponential model displays a continuously increasing level


of variability and an asymptotic approach to the sill
Common Variogram Models

Often difficult to select a unique


variogram model. Typically Variogram Models
Range = 100.0 SIll= 1.0
experimental variograms show 1.2
variability, so exact choice of model
1.0
may be difficult.
0.8

Gamma
Exponential and Gaussian have a 0.6
practical range where the model
0.4
hits the sill, but often use a
different definition of the range: the 0.2

point at which 95% of the sill is 0.0


reached (effective range) 0.0 50.0 100.0 150.0 200.0 250.0
Distance

Generally the spherical model is used most


often and is most straightforward.
Nested Variogram Models

• Sometimes it is necessary to fit complex structures that may be


caused by a combination of processes.
• You can add models together to capture a particular curve that you
may want to interpret

(h) total  1 (h)   2 (h)  ...   n (h)

Nested semivariogram models can be created using any linear


combination of admissible models. (After Olea, 1994)

Any linear combination will produce a model with a positive definite


covariance matrix
Nested Variogram Models

Example shows three nested models:


two spherical
0.6
and
one exponential. 0.5

0.4
Gamma

0.3

0.2
Spherical Model Parameters
Ranges Nest
3
Range
67.0
C
0.40
Model
Exponential
0.1 2 45.0 0.21 Spherical
1 8.0 0.10 Spherical
Nugget 0

0.0
0.0 50.0 100.0 150.0 200.0
Distance (feet)
Anisotropy
• Variograms that show variation as a function of search direction
are anisotropic
• Anisotropy in the variable requires fine tuning of search
neighborhood

The general types of anisotropy are:

Geometric: Constant Sill, Range changes with direction


Software almost always requires anisotropy to be
geometric.

Zonal: Constant Range, Sill changes with direction


The level of variability is different in different directions.
Variogram Map

A variogram map can provide a visual check for


variogram anisotropy
2.5

20000

2.0
10000
Distance (feet)

0 1.5

-10000
1.0

-20000

0.5
-20000 -10000 0 10000 20000
Distance (feet)
Nugget Effect Variogram

1.0

0.8
Property Value

0.6
Sampling transect of
0.4 random, uncorrelated data
0.2

0.0
0 50 100 150 200 250 300

Distance

Resulting “nugget
effect” variogram
No spatial correlation
Hole Effect Variogram

Porosity as a function of depth


0.15 4.0e-04

0.14

3.0e-04

0.13

Gamma
Porosity

2.0e-04
0.12

0.11 1.0e-04

0.10

0.0e+00
1700.0 1725.0 1750.0 1775.0 1800.0 0.0 20.0 40.0 60.0 80.0 100.0 120.0
Distance (meters)
Depth (meters)
Periodic Data Hole Effect Variogram
Trend Effect Variogram

LSI Incorporated LSI Variogram


65.0

700.0
55.0

600.0
Daily Close (dollars)

45.0
500.0

Gamma
35.0 400.0

300.0
25.0

200.0
15.0
100.0

5.0 0.0
94.0 94.5 95.0 95.5 96.0 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
Julian Date Time (years)
Spatial Correlation Summary

• Determination of spatial correlation


• Relationship between variogram and covariance
• Calculation and modeling of experimental variograms
including definition of search neighborhoods
• Differences between several permissible variogram models
• Concept of anisotropy and how to find it with a variogram
map
• Review of special variograms (nugget-effect, hole-effect and
trend)
SAND2013-xxxx

Exercise 1: Exploratory Data Analysis

IAEA Training Course – July 1-5, 2013


Bill Arnold

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Outline

 Software
 Exercise data files
 Exploratory data analysis exercise tasks

2
Software
 SGeMS (Stanford Geostatistical Modeling Software)
 Windows-based software with graphical users interface (GUI)
 Open source software from Stanford University
 Based on the original GSLIB suite of DOS-based software
 Available for download from website
http://sgems.sourceforge.net/
 Two GSLIB software codes – NSCORE and LOCMAP
 DOS-based codes run using parameter input files
 Available for download from website
http://scrf.stanford.edu/resources.software.gslib.php
 GSview software
 Used for viewing postscript output files generate by GSLIB codes
 Available for download from website
http://pages.cs.wisc.edu/~ghost/
3
Exercise Data Files
 Four data sets are provided for use in the geostatistics
exercises
 2-D data set 1
 2-D data set 2
 2-D data set 2 - exhaustive
 3-D data set 3
 Data sets were randomly extracted from a hypothetical
synthetic geological system
 Data are given for rock porosity and permeability
 Data sets differ in spatial correlation structure in ways
that the students will explore and discover

4
Data File Format
 SGeMS can read data from the GSLIB format. Exercise
data files are provided in this format.

surface data set 1


6
X coordinate
Y Coordinate
Z Coordinate
Porosity
Permeability
Log10 Permeability
25.0 225.0 200.0 0.1362 2.904E-13 -12.537
940.0 175.0 200.0 0.1596 8.615E-13 -12.065
630.0 345.0 200.0 0.1647 1.223E-12 -11.912
835.0 705.0 200.0 0.1562 7.652E-13 -12.116
980.0 690.0 200.0 0.1617 9.353E-13 -12.029
170.0 105.0 200.0 0.1517 7.856E-13 -12.105

5
SGeMS Software Introduction
 The main screen of SGeMS contains the algorithm window, objects window,
and the visualization window

6
SGeMS Software Introduction
 The first task is to open a data file by selecting “Load Objects” from the
“Objects” pulldown menu
 Navigate to the data file, open it, select object type “point set”, go to next
screen, enter a Pointset name in the dialog box, and confirm that x, y, and z
columns are correctly identified

7
SGeMS Software Introduction
 Now that the data set is loaded, various actions can be taken, including data
visualization (shown below), data analysis tasks, and algorithms.
 Practice manipulating the visualization, adding a colorbar, setting the color scale, and
exporting an image of the plot (use the camera icon)
 Note that SGeMS does not manage screen real estate very well and windows may
need to be resized to show what you want to see
GSLIB Software - LOCMAP
 Use the LOCMAP DOS program to generate
a 2D plot of the sample data
 The LOCMAP program is executed with a
parameter control file named locmap.par
Parameters for LOCMAP
*********************
START OF PARAMETERS:
data_set_1_2D.dat \file with data
1 2 4 \ columns for X, Y, variable
-1.0e21 1.0e21 \ trimming limits
locmap.ps \file for PostScript output
0.0 1000. \xmn,xmx
0.0 1000. \ymn,ymx
0 \0=data values, 1=cross validation
0 \0=arithmetic, 1=log scaling
1 \0=gray scale, 1=color scale
0 \0=no labels, 1=label each location
0.12 0.20 0.005 \gray/color scale: min, max, increm
0.5 \label size: 0.1(sml)-1(reg)-10(big)
Porosity Data Map - 2D Data \Title

 Plot other parameter and data files using


the LOCMAP program
SGeMS Software Introduction
 Use the “Data Analysis” dropdown menu to choose the “Histogram” option to
examine the distributions of parameters in the data set
 Choose the data set you want to examine from the “Grid” dropdown menu
 Vary the number of bins in the plot, plot the various parameters, change the axis
limits and type, and practice exporting plot images
SGeMS – Exploratory Data Analysis
 Use the “Data Analysis” dropdown menu to choose the “Scatter-plot” option to
examine the correlation of parameters in the data set
 Choose the data set you want to examine from the “Grid” dropdown menu
 Plot the various parameters, change the axis limits and type, and practice exporting
plot images
GSLIB Software - NSCORE
 Use the NSCORE DOS program to calculate
the normal score transform of one
parameter in the data set
 The NSCORE program is executed with a
parameter control file named nscore.par
Parameters for NSCORE
*********************

START OF PARAMETERS:
data_set_3_3D.dat \file with data
4 0 \ columns for variable and weight
-1.0e21 1.0e21 \ trimming limits
0 \1=transform according to specified ref. dist.
unknown.out \ file with reference dist.
1 2 \ columns for variable and weight
nscore.out \file for output
nscore.trn \file for output transformation table

 The NSCORE program writes the normal


score transform values to a new column
 Make a probability plot with the scatterplot
option and check for a Gaussian distribution
SGeMS – Exploratory Data Analysis
 Now load the data set contained in the file “data_set_3_3D.dat”.
 This is a 3D point data set that contains parameter values at many surface locations
and in 8 boreholes
 Practice manipulating the visualization image and plot options
SGeMS – Exploratory Data Analysis
 Now load the data set contained in the file “data_set_2_2D_exhaustive.dat”.
 This is an exhaustive 2D data set that must be loaded in as an object using the
“cartesian grid” option. By looking at the data file you can see that it consists of
parameter values on a regular 201 X 201 grid, which must be entered to load the file.
 Practice changing the various options presented under the “Preferences” tab to
control the plot
SAND2013-xxxx

Exercise 2: Spatial Correlation Analysis

IAEA Training Course – July 1-5, 2013


Bill Arnold

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
SGeMS – Variogram Analysis
 Load the data sets that were used in the exploratory data analysis exercise.
 Start the variogram analysis by choosing the “Variogram” option under the “Data
Analysis” pulldown menu.
 Choose the data set to analyze from the “Grid Name” pulldown menu and choose the
parameter of interest from both the “Head Property” and “Tail Property” menus.
SGeMS – Variogram Analysis
 Enter the number of lags, lag separation, lag tolerance, azimuth, dip, directional
tolerance, and bandwidth to control the construction of the experimental variogram.
 Note that multiple variogram plots can be generated simulataneously by increasing
“Number of directions”.
 Search direction is defined by azimuth and dip, as explained in the figures to the right.
SGeMS – Variogram Analysis
 The experimental variogram is displayed in the next window.
 The number of data pairs that were used to calculate each plotted value are displayed
by right clicking the mouse.
SGeMS – Variogram Analysis
 Fit the experimental variogram with a model created with the parameters entered in
the panel to the right of the plot.
 Try different variogram types and increasing the number of structures to create
variograms using linear combinations of models.
SGeMS – Variogram Analysis
 Save or write down the parameters for the variogram model that you have fit to the
experimental variogram for later use in the estimation and simulation exercises.
 Load the other data sets ( 2-D data set 2, 2-D data set 2–exhaustive, and 3-D data set
3) and examine directional variograms to examine anisotropy in the horizontal and
vertical directions.
SGeMS – Variogram Analysis
 Fit a model to the experimental variograms created in different directions to evaluate
the anisotropy in the spatial correlation.
 Save or write down the parameters for the variogram models that you have fit to the
directional variograms for later use in the estimation and simulation exercises.
SAND2013-xxxx

Spatial Estimation and Simulation Methods

IAEA Training Course – July 1-5, 2013


Bill Arnold

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed
Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Outline

 Spatial estimation objectives


 Interpolation methods of estimation
 Evaluation of estimation methods
 Kriging
 Spatial simulation versus estimation
 Sequential simulation methods

2
Point Estimation

Point estimation is used to estimate the value of a property


(porosity, permeability, fracture frequency, etc.) at some
point in the ground or in a space based on linear
combinations of the surrounding data.

Kriging is a form of estimation


(interpolation algorithm).
The kriging algorithm is also the basis of
the simulation techniques used in +
geostatistics.

3
Estimation Techniques
Example data-driven point estimation techniques (require
some data to exist already) that interpolate to the
surrounding locations:
• Nearest Neighbor Polygons
(aka Theissen or Voronoi polygons)
• Local Mean (using surrounding data)
• Inverse Distance Squared

We will look at the three techniques above, as well as kriging.


There are other techniques such as: trend surfaces (polynomial
fit) and splines.

4
Estimation Example

500

 0.200

400
 0.215
•Porosity measured
 0.203 at 6 points.
300
•Make estimate of
Y

 0.261
200 porosity at
 Unknown unknown point, x0.
 0.174
100

0.241

0
0 100 200 300 400 500
X

5
Nearest Neighbor Polygons
• Construct polygons around the samples that divide the space
into regions
• Everywhere inside of the polygon is closer to the sample point
enclosed by that polygon than to any other sample point

Advantages:
•Simple, fast, exact interpolator (at a point where the value is
known, it returns that exact value)
Disadvantages:
•Discontinuities at polygon boundaries
•If data are sparse and somewhat unevenly spaced, the global
estimation is dominated by the sparsely located points

6
Nearest Neighbor Polygons
- Connect each sample point to the
- Draw a perpendicular
neighboring sample points to
bisector through each line.
create a series of triangles
500 500
 0.200  0.200
400 400
 0.215  0.215
 0.203  0.203
300 300

Y
Y

 0.261  0.261
200 200
Unknown
 Unknown

 0.174  0.174
100 100
 0.241  0.241

0 0
0 100 200 300 400 500 0 100 200 300 400 500
X X

Estimated value at Unknown = .261


Nearest Neighbor Polygons
Example from workshop exercise data set

porosity
Synthetic “Reality” porosity
Local Mean Estimation

• Use the mean of surrounding data as an estimate of the value


at target location

Advantages:
• Simple, fast, few large errors (near the edges of the domain)
Disadvantages:
• Not an exact interpolator (the average of the few surrounding data
points won’t necessarily return the exact value for a known point)
• Definition of "surrounding data“ may be difficult?
• It has a smoothing effect on the data values. Any extreme values,
high or low, will get smoothed out as they are averaged in with the
surrounding values

For the example shown: mean = 1/n*SUM(data) = 0.216


Local Mean Estimation
Example from workshop exercise data set

porosity
Synthetic “Reality” porosity
Local Mean Estimation

• Should each surrounding point be 500


weighted evenly?  0.200
400
• Do closer points have a greater  0.215
far  0.203
influence on the estimate than 300
point

data that are farther away?

Y
 0.261
• Should the more distant points be 200
close
included in the average? If so, point  Unknown
 0.174
100
should they be given less weight?  0.241

0
0 100 200 300 400 500
X

Next we will examine a technique that weights the data relative


to their distance away from the point being estimated.
Inverse Distance Estimation

• Create weights for the data values that


are inversely proportional to the n  1  
distance from the unknown location,   d  zi 
i 1 
 i  
est 
• The weighting function is the inverse n
1
distance raised to a power,   d
i 1 i
Advantages:
•Simple, fast and includes distance in Where:
calculation of weights di is the distance
Disadvantages: zi is the sample
•Not an exact interpolator value
•As d goes to 0, the estimator “blows up”.
Inverse Distance Estimation

Inverse Distance Squared ( = 2)


Estimation of Value at (235.0, 155.0)

Distance from Normalized Weighted


Sample # X Y Value
X0'Y0 Weight Value

1 195 225 0.261 80.62 0.403 0.105


2 355 225 0.174 123.69 0.172 0.03
3 355 345 0.203 224.72 0.052 0.01
4 265 365 0.215 212.13 0.058 0.013
5 185 75 0.241 94.34 0.295 0.071
6 25 445 0.2 358.05 0.02 0.004

The final estimate is the sum of weighted values.

Estimate of value at Unknown = 0.233


Inverse Distance Estimation
Example from workshop exercise data set

porosity
Synthetic “Reality” porosity
Evaluating Estimation Methods

What attributes/statistics could be used to determine whether


or not a technique is worthwhile?

• Estimate a large number of points (100’s-1000’s) and then take


a sample at each location. Look at how well the estimates and
actual values compare.
(Practically possible with a subset of a large data set)
• Look at the mean error as a measure of bias across all of those
locations. Want as many over-estimates as under-estimates so
that the mean = 0 in terms of error.
• Look at the spread, or variance, of the errors. Want it to be
minimal.
Evaluating Estimation Methods

Cross-validation: Pull each datum out of the model individually and


use the surrounding data to re-estimate the removed
datum. Then compare the estimate to the actual
value.
Jackknifing: Hold back some of original data and use the
remaining data to estimate those locations. Then
compare the real values with the estimates.
Both techniques: Examine a scatterplot of the actual values vs.
estimates. Map the residuals to make sure they are
not always over estimated in one region and
underestimated in another.
Evaluating Estimation Methods

• Build a model of the concentration at each point, the estimate.


• Take a sample and see how well the estimate and true values
correlate.
optimal
Optimal: The distribution is centered on the
45 degree line (accurate, unbiased) with a
estimate

small spread.

true

large variance Large Variance: The distribution is accurate


and unbiased, but the estimates are more
estimate

variable causing a wider spread in the


distribution (imprecise).

true
Evaluating Estimation Methods

high bias The distribution is precise but biased. low bias

High bias: overestimates the true value

estimate
estimate

Low bias: underestimates the true value

true true
Heteroscedastic: The variance changes as
a function of the value. So as the values
heteroscedastic increase, quality of the fit about the 45 conditional bias
degree line deteriorates.
estimate

estimate
Conditional: A small subsection appears
to be optimal, but the low values tend
true to be overestimated and high values true

tend to be underestimated.
Precision and Accuracy

Precise Imprecise
Accurate

Accurate: mean error = 0 Accurate: mean error = 0


Precise: range is narrow Imprecise: wide distribution
Inaccurate

Inaccurate: bias to the low side Inaccurate: high side bias


Precise: narrow range Imprecise: wide distribution
Geostatistical Estimation: Kriging

Kriging is an estimator that uses a weighted linear combination of


surrounding data to produce unbiased, minimum variance estimates.
Kriging weights are not based on Euclidean distance, but use the geometry
defined by the variograms.

Ordinary Kriging (OK): Allows for local re-


estimation of the global mean. The estimate
is the sum of the product of the weights and
the z values.

Simple Kriging (SK): Enforces the global


mean on to each estimate. Sums the
weighted residuals from the surrounding
data points and adds that sum onto the
global mean.
Kriging

Best Linear Unbiased Estimator (B.L.U.E.)

Does Ordinary Kriging (OK) fit the requirements of a


B.L.U.E.?

Best: Minimizes the variance of the residuals (precise


estimate)
Linear: Employs a weighted linear combination of the
surrounding data
Unbiased: Attempts to make the mean residual equal to zero
Kriging – Matrix Formulation

Calculation of Kriging Weights

C11 C12 C13 C14 C15 C16 1  1  C10  Vector of


Local C C C C C C 1    C  covariances
 21 22 23 24 25 26   2   20 
covariance C 31 C 32 C 33 C 34 C 35 C 36 1  3  C 30  between each
matrix that       point in the
C
 41 C C C C C 1    C
  4   40 
describes 42 43 44 45 46 search
covariance C 51 C 52 C 53 C 54 C 55 C 56 1  5  C 50  neighborhood
     
between all  61
C C 62 C 63 C 64 C 65 C 66 1 
  6   60 
C and the location
samples in the  1 1 0     1  being estimated
 1 1 1 1
  
local search 

neighborhood C  D

To solve for vector of weights use matrix algebra:  = C-1*D


Kriging – Matrix Formulation

C11 is the covariance at zero separation, the sill value

 C11 C12 C13 C14 C15 C16 1  1   C10 


C C 22 C 23 C 24 C 25 C 26 1  2  C 20 
 21
C 31 C 32 C 33 C 34 C 35 C 36 1 3  C 30 
     
C 41 C 42 C 43 C 44 C 45 C 46 1   4   C 40 
C 51 C 52 C 53 C 54 C 55 C 56 1 5  C 50 
     
C 61 C 62 C 63 C 64 C 65 C 66 1 6  C 60 
 1 0     1 
C61 is the  1 1 1 1 1
covariance
between
Add an extra row and column to Lagrange parameter,
points 6 and 1
assure unbiasedness from adding a row for
unbiasedness
Kriging – Covariance Matrix

• There is no guarantee of a unique solution to the matrix system.


To ensure that there is only one unique solution, the system
must be positive definite
• For estimates that are weighted linear combinations of other
values, the variance about those estimates must be greater than
or equal to zero
• Positive definite condition can be achieved by modeling
variograms with positive definite functions, as long as one of the
standard models is used (which are positive definite functions)
the covariances will create a positive definite set of matrices
Kriging – Unbiasedness

We want the mean, or expectation, of the errors to equal zero.

• Define error as: the estimated value – actual value


n
E(x 0 )   i Z(x i )  Z(x 0 )
i 1

• Realize that the mean is stationary, so both the estimate and the
actual value have the same mean.
• If the average error is set to zero, then:
n
 i  1.0
i 1
Kriging – Lagrange Parameter

• Lagrange parameter, m, solves the problem of n+1


equations and only n unknowns created by the
unbiasedness constraint
• Lagrange parameter is essentially another unknown

n n
 n 
 i  1  i  1  0 2   i  1  0
i 1 i 1  i 1 
Kriging – Summary

Covariances in D act like inverse distance weights (two points close
together have a high covariance, as the distance increases the
covariance approaches zero).
•However, the weight as a function of distance is not limited to simple
powers, it can fit with more complex variogram models.

Distances are not in Euclidian space, but are relative to variogram


range.
•Can provide anisotropy and weights of zero, if a point is beyond the
range of the variogram it will be assigned a zero weight

Covariances in C act to decluster the data


•C-1 is adjusting the weights in D for data redundancy
Kriging – Results

500

 0.200

400
 0.215
•Porosity measured
 0.203 at 6 points.
300
•Make estimate of
Y

 0.261
200 porosity at
 Unknown unknown point, x0.
 0.174
100

0.241

0
0 100 200 300 400 500
X
Kriging – Results

For the example problem, the normal-score variogram model is


spherical, with a range of 125.0 (N-S) and 100.0 (E-W) and a
nugget of 0.0
Distance from Kriging Weight
Location
X0 (radii = range)
1 80.6 0.541 Note that only
2 123.7 0 points 1 and 5 fall
3 224.7 0
within the search
ellipse
4 212.1 0
5 94.3 0.459
6 358.1 0

Estimate of value at x0 = 0.250


Kriging – Results
Example from workshop exercise data set

Synthetic “Reality” porosity


Kriging – Error Variance

• Kriging is unique among spatial estimation techniques in


attempting to minimize errors, making the distribution tight.
• Concisely, it is the variance of the errors that is minimized

n
 e2   e i  e 
1 2

n i 1

Where each error, ei = the estimate - the true value


and e is the mean error across the domain
Kriging – Error Variance

Long Version: Derive a model of the error variance and minimize


the modeled variance by setting partial derivative for
each weighted covariance (w.r.t each weight)
between datum and estimation location to zero.

Short Version: The estimate of error variance is the total variance



minus the weighted sum of covariances in D plus the
Lagrange parameter

 
n
 
σ̂ e2  σ 2Data   iCi0   μ in matrix
form: σ̂e2  σ 2Data   D
i 1

where ̂ eis the local estimate of the error


variance
Kriging – Error Variance

• Error variance is also called kriging variance or estimation


variance
• Error variance is not a function of data values but only of
sample configuration
• Error Variance is equal to zero at data location
• Like the entire kriging system, distribution of errors is non-
parametric
(although a gaussian distribution is often assumed for errors)
Kriging – Error Variance
Example from workshop exercise data set

Synthetic “Reality” porosity


Spatial Estimation Summary

•Review of three traditional point estimation techniques


on a simple example problem
•Introduction of techniques that can be used to assess the
accuracy and precision of estimators
•Introduction to kriging including the matrix formulation of
the kriging system
•Assessment of kriging system as a “BLUE”
•Introduction to kriging variance
Spatial Simulation

• Probability-based techniques (Monte Carlo process) on


spatially correlated distributions
• Sacrifices the local best estimate for the reproduction of
global statistics and features
• Simulation process can create any number of equally
probable realizations (maps), all of which honor the
available information
• Simulation allows for evaluation of joint uncertainty
(accuracy) at multiple locations
• A large suite of geostatistical simulation methods are
available
Spatial Simulation
• Simulation provides a more realistic picture of natural
complexity and heterogeneity relative to kriging
• Simulation can provide an idea of “best,” “most likely” and
“worst” cases for a given problem
- bounding cases
• Simulation is a basis for Monte Carlo risk analysis where a full
distribution of results is necessary
- full distribution
• Simulation reproduces the observed level of variability or
heterogeneity at a site
• Effectively extrapolates parameter values, whereas kriging only
interpolates parameter values
Estimation versus Simulation

Estimation (Kriging) Simulation


160.0 160.0

140.0 140.0

120.0
Concentration (ppm)

120.0

Concentration (ppm)
100.0 100.0

80.0 80.0

60.0 60.0

40.0 40.0

20.0 20.0
0 200 400 600 800 1000 0 200 400 600 800 1000
Distance (m) Distance (m)
Estimation versus Simulation
Example from workshop exercise data set
Ordinary Kriging (Estimation) Sequential Gaussian Simulation
Simulation Example

The Monte Carlo process can


create multiple images of the
activity at the site.
Every image honors:
Sample data
Histogram
Variogram

Given our knowledge of the site,


every image is a plausible
depiction of the real parameter
distribution.
Simulation Example
Example from workshop exercise data set

Synthetic “Reality”
Transport Example
Uncertainty in spatial distribution of hydraulic properties leads to
uncertainty in transport results
Transfer Function

There is a water supply


well in the vicinityare
Radionuclides of abeing
leaking landfill.
released from aWhatrepository.
Multiple Realizations of will the concentration of
What will the concentration
a given contaminant be
Porosity and Permeability
Multiple of a given
at that radionuclide
well in 50 years? be at
Realizations of a downgradient water
Porosity and supply well in 50 years?
Permeability

Frequency
Ground Water
Ground Water Flow
Flowandand
Transport Model
Transport Model
Concentration
General Types of Simulation

Parametric: requires transform of the data to a parametric space,


simulation in that space and then back-transform to raw
data space.
Example: Gaussian simulation using the normal-score
transform
Advantage: only requires one-variogram model
Disadvantage: does not reproduce variogram at extremes of the
distribution

Non-Parametric: requires discretization of data into classes and a


variogram model at each threshold or class.
Example: indicator simulation of geologic facies (sand, silt,
clay)
Advantage: Reproduces each variogram at each class/threshold
Disadvantage: requires variogram modeling for each
class/threshold
Sequential Simulation

• Map the conditioning data onto a grid


• Randomly visit all other grid nodes
• Use kriging system to create a local cdf based on surrounding
data for each node
• Draw a random value from the cdf to get the simulated value at
that location
• Consider each simulated point as a conditioning value for future
cdf construction
• Continue until all nodes have a simulated value
• Reinitialize random number generator and begin next
realization
Sequential Simulation

• Use the kriging system to create a local cdf based on


the surrounding data points of the first node.
• This cdf can be parametric or indicator
• Draw a random number between 0 and 1, and assign
the value for that probability to the node.
• For the remainder of this realization, the newly
defined node is treated as a sample point.

Local cdfs reflect proximity to data locations

Well defined Poorly defined On top of / right next to value


Close to data Far from data On top of data point
CDF Construction

Gaussian: normal-score transform allows the kriging


estimate and kriging variance to define the
local cdf.
n
variance  σ̂ e2  σ 2Data   i Ci0   μ
i 1

Indicator: construct the cdf through indicator kriging at


each threshold zk.
The expected value of the cdf at any threshold is
estimated by the weighted linear combination of
surrounding indicator data.
Gaussian Simulation Example
Aquifer bottom elevation,
Portsmouth, Ohio
Realization 1
Simulation shows greater
11700 650.0 variability in elevation values
relative to kriged map
648.0 Sunbury Shale Elevation
650.0

11200
646.0
11400
647.0
Northing (feet)

644.0
11000

Northing (feet)
644.0
10700
642.0
10600
641.0

640.0
10200 10200
638.0

638.0
9800
635.0
9400 9800 10200 10600 11000 11400
9700 636.0 Easting (feet)
9400 9900 10400 10900 11400
Easting (feet)
Indicator Simulation Example
Kriging versus Simulation

Kriging: smoothing effect of interpolation will produce:


1) A longer range variogram in the output than the input
model
2) Less variability in the output field than the input data
(distribution gets squeezed)

Simulation: attempts to reproduce the input histogram and


variogram (the input univariate and bivariate
data distributions, respectively) within the limits
of “ergodic fluctuations”.
Kriging versus Simulation

Sample Data
and Kriging
Estimate

Two example
realizations
from
simulation
Kriging versus Simulation

Kriging Variograms Simulation Variograms


1.4 1.4

1.2 1.2

1.0 1.0
Variogram

Variogram
0.8 0.8
Model
0.6 0.6
Sim1 EW
0.4 Model 0.4
Sim1 NS
0.2 Krig EW 0.2 Sim2 EW
Krig NS Sim2 NS
0.0 0.0
0 5 10 15 20 25 0 5 10 15 20 25
Distance Distance
Kriging versus Simulation

Histogram Histogram of
of raw data kriged data

Histogram of Histogram of
Realization 1 Realization 2
(simulation) (simulation)
Kriging versus Simulation

Raw Data Kriged Map Realization Realization


Parameter
(n=214) (n=5329) 1 (n=5329) 2 (n=5329)

Mean 7.53 7.49 7.41 7.52

Median 6.79 7.34 6.70 6.81


Standard
1.81 0.86 1.75 1.80
Deviation
Minimum 5.20 5.46 3.18 3.10

Maximum 13.24 13.13 14.93 14.88


10th
5.80 6.46 5.80 5.80
Percentile
90th
10.47 8.75 10.39 10.47
Percentile
Kriging versus Simulation

160.0

• Kriging reduces variance but 140.0

retains the mean of the input 120.0

Concentration (ppm)
data 100.0

80.0

• Kriging, as an interpolator, does 60.0

not produce values outside 40.0

minimum and maximum of 20.0


0 200 400 600 800 1000
Distance (m)
sample data 160.0

140.0

• Simulation can produce values 120.0

Concentration (ppm)
above and below the maximum 100.0

and minimum sample data 80.0

because it draws from a fully


60.0

40.0
defined cdf [0,1] at each 20.0

location. 0 200 400 600


Distance (m)
800 1000
Ergodic Fluctuations

• Ergodic fluctuation is defined as the difference between the


input model and the statistics of a realization.
• Input models are generally based on data from a limited
sample size
• The underlying model is said to be ergodic in the
paramenter  if the realization statistics tend toward  as
the size of the field increases
Ergodic Fluctuations

In this region, the


domain size is nearly
infinite relative to the
variogram range
Ergodic Fluctuations

Range = 80.0 Range = 10.0


Simulation Post Processing
Expectation, or Conditional
Average, Map Variance Map

In the limit of an infinite number of realizations, these two maps will be


identical to the kriging map and the kriging variance map
Simulation – Nugget Effect

Nugget = 0.00
Nugget = 0.40
Simulation – Variogram Models

Spherical Exponential Gaussian


Simulation – Anisotropy

Isotropic Anisotropy = 4.0 Anisotropy = 12.0


Simulation Summary

•Introduction to geostatistical simulation and different


simulation algorithms
•Mechanics of sequential simulation
•Differences of parametric and non-parametric
simulation
•Comparison of simulation to kriging
•The ergodic assumption
•Postprocessing of simulations for probability maps

You might also like