Practical Sensitivity Analysis For Environmental Modeling: Thorsten Wagener With Francesca Pianosi
Practical Sensitivity Analysis For Environmental Modeling: Thorsten Wagener With Francesca Pianosi
2
[1] INTRODUCTION TO
SENSITIVITY ANALYSIS
3
I will use the following terminology
Factor 1 Output 1
Factor 2 Output 2
Model
… …
Factor N Output M
Sensitivity Analysis
The study of how uncertainty in the output of a model
can be apportioned to different sources of uncertainty
in the model input (factors) (Saltelli et al., 2004).
4
What is the relationship to the material
of the previous days?
Output
Variance
Output Variance
Decomposition
Example:
Flood forecasting
Example:
Model calibration
1 100
Many environmental
2 10,000 models, of course,
have many more
3 1,000,000 uncertain parameters
and inputs and will be
subject to the problem
4 100,000,000
of making enough
runs to characterize
5 10,000,000,000 the whole model
space.
6 1,000,000,000,000
8
Question 3: What value does a factor have
to take to achieve the desired model output?
9
Question 4: How can we reduce the
output variance below a chosen threshold?
10
The 4 possible questions (objectives) in
summary:
Factors prioritization (FP) Assume that, in principle, the uncertain input factors can be ‘discovered’, i.e. determined or measured,
so as to find their true value. One legitimate question is then “which factor should one try to determine first in order to have
the largest expected reduction in the variance of the model output”? This defines the ‘factors prioritization’ setting. Saltelli and
Tarantola (2002) have shown that the variance-based main effect provides the answer to the Factor Prioritization setting.
Factors fixing (FF) Another aim of sensitivity analysis is to simplify models. If a model is used systematically in a Monte Carlo
framework, so that input uncertainties are systematically propagated into the output, it might be useful to ascertain which
input factors can be fixed, anywhere in their range of variation, without sensibly affecting a specific output of interest. This may
be useful for simplifying a model in a larger sense, because we may be able then to condense entire sections of our models if
all factors entering in a section are non-influential. Saltelli and Tarantola (2002) also showed that the variance-based total effect
provides the answer to the Factor Fixing setting. A null total effect is a sufficient condition for an input factor to be irrelevant,
and therefore to be fixed.
Factors Mapping (FM) In this case, the analyst is interested to as many information as possible, either global and local, i.e. which
values of an input factor (or of group of factors) are responsible for driving the model output in a given region? Which
conditions are able to drive a specified model behaviour? In this case, a full array of methods, from local ones, to Monte Carlo
Filtering, to model emulators, to variance-based and entropy-based methods can provide useful insights about model
properties.
Variance Cutting (VC) In other cases the objective of SA can be the reduction of the output variance to a lower threshold
(variance cutting setting) by simultaneously fixing the smallest number of input factors. This setting could be of use when SA is
part of a risk assessment study and e.g. when a regulatory authority was to find the width of the impact estimate distribution
too wide. Note that the variance cutting and factor prioritization settings may appear to be very similar, as they both aim at
reducing the output variance. However, in the case of factor prioritization the scope is to identify the most influent factors one
by one, while in the variance cutting setting the objective is to reduce the output variance down to a pre-established level by
fixing the smallest subset of factors at once.
11
In general, there are just a few basic
steps in any sensitivity analysis
Sensitivity Analysis is the study of how uncertainty in the output
of a model can be apportioned to different sources of uncertainty
in the model input (factors) (Saltelli et al., 2004).
12
[2] SENSITIVITY TO WHAT?
13
There are two major distinctions in how
we approach this question. [1]
[1] Analyze the model output directly
e.g. flash
flooding in
Africa
14
We can directly estimate the sensitivity
to the simulated output
• This approach means
that we put all our
stock into our model!
• Works only if we are
rather confident in
the realism of our
model.
• For example,
integrated
assessment models
15
Or [2]. We can do sensitivity analysis in
either case, but with different objectives
[2] Analyze some error metric (objective function, likelihood etc.)
16
In this case we typically estimate some
type of (statistical) error metric
et = yobst – yt(q)
et yttrue
ytsim(θ)
time
17
A typical error metric (objective or cost or
likelihood function) is the Root Mean Squared Error
e.g.
19
Keep in mind that part of the sensitivity
analysis has to be a performance analysis
• The model sensitivities are more likely to be
meaningful if the model shows a good performance!
• Performance of your model (i.e. how well it
reproduces the data) is a large part of the
justification for trusting your model in the first place.
• TIP: Assess the performance for all the samples you
estimate during your sensitivity analysis (check the
histogram).
20
[3] VISUAL, SCREENING AND
VARIANCE-BASED METHODS
21
We can distinguish between local and
global approaches to sensitivity analysis
[1] Local methods analyze [2] Global methods
sensitivity around some attempt to analyze
(often optimal) point in variability across the full
the factor space factor space
22
Local methods require a good ‘baseline’ or
‘nominal point’
24
Output Plots
Scatter Plots
Regional Sensitivity Analysis Plots
25
A pitch for looking at your results!
26
Scatter plots (factor vs output-metric)
12
12
10
10
RMSE (m3/s)
RMSE (m3/s)
8
8
6
6
4
4
2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0
N K
27
Scatter plots (factor vs factor)
1.0
o o
o o o o
o o
o o o o
o o
o
o o
o o
o o
o o
o o
o o
o o
o o o
o
o o o
o o
o o o
o o
o o o
o o
o
o
o o
o o o o
o o o
o o
o o o
o
o o
o o
o o
o o
o o
o o
o o
o o
o o
RMSE = 13.1 o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
0.8 o
o o
o o o o
o o
o o o o
o
o o o o
o o
o o
o o
o o o
o o
o o o
o o o
o o
o o o
o o
o o
o
o
o o
o o
o o
o o
o o
o o
o o o o
o o o
o o
o o
o o
o o
o o
o o
o o
o
o
o
o o o
o o o
o o o
o o
o o o
o
o o
o o o
o o o
o o o o
o o
o
0.6
o
o o o o o o o o
o o o
o o o
o o o
o o
o o
o o
o o
o o
o
o
o o
o o
o o o
o o
o o
o o
o o
o o
o
o
o o
o o o
o o o
o o
o o
o o
o o
o RMSE = 3.37
K
o o
o o
o o
o o
o o o
o o
o o
o o
o
o o o o o o o o o
o o
o
o
o
o o
o o
o o
o o o
o o
o o o
o o
0.4
o o o
o o o
o o o
o o
o o
o o
o
o
o o o o
o
o o
o
o o
o o
o o
o o o
o
o
o o
o o
o o
o o o
o o
o o
o o
o o
o
o
o o o o
o o
o o
o o
o o
o o o
o
o o
o o
o o
o o o
o o o o o
o
o o o o o
o
o o o
o o o
o
o o o o
o o o o
o o o
o o
0.2
o o
o o
o o
o o o o o o o
o
o
o o
o o
o o
o o
o o
o o
o o o o
o
o o
o o o
o o
o o
o o o
o o
o o
o
o o
o o
o o
o o
o o
o o
o o
o o o
o
o o
o o
o o
o o
o o o
o o o
o o
o
o
o o
o o o o o
o o
o o
o o o
o
o o o
o o
o o
o o
o o o o
o o
o
0.0
2 4 6 8 10
28
Method of Morris
29
What if we have a very large number of
parameters?
e.g. large scale
groundwater
pollution model
We’d like to reduce the number of parameters in a first step, so that we can
then assess the key parameters more thoroughly
30
In such cases we can applied what is
called a ‘screening method’
32
A popular strategy to implement this is
the Method of Morris
• Derives measures of global sensitivity from a set of local
derivatives, or elementary effects (EE)
• Each factor xi is perturbed along a grid of size Δi to
create a trajectory through the factor space, where f(x)
is the baseline
• Each trajectory yields one estimate of the elementary
effect for each factor, i.e. , the ratio of the change in
model output to the change in that parameter
33
The computational cost is r(k+1) model runs,
where k is the no. of parameters and r=4 to 10
Key References
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D.,
Saisana, M., and Tarantola, S., 2007. Global sensitivity analysis. Gauging
the worth of scientific models. John Wiley & Sons. In press.
Morris, M. D., 1991 Factorial sampling plans for preliminary computational
experiments, Technometrics, 33, 161–174
Campolongo, F., Cariboni, J. and Saltelli, A., 2007. An effective screening
design for sensitivity analysis of large models. Environmental Modelling
and Software 22: 1509-1518.
36
Sobol’s Method
[3.3] VARIANCE-BASED
METHODS
37
Variance-based methods quantify sensitivity by
decomposing the variance of model outputs into
factors related components
In particular, the variance is decomposed into main effects (or
first-order effects) and interaction effects. The main effect of a
parameter quantifies the portion of the variance of the model
output which is explained by that parameter, by allowing all
other parameters to be varied at the same time.
The total effect of a parameter measures the residual variance of
the model output that remains by removing the portion
explained by all other parameters, i.e. quantifies the variance
(i.e. the uncertainty) in the model output that would be left by
fixing any other factor to its ‘true’, albeit unknown, value.
39
A variance-based approach is called FAST
(and extended versions of it)
FAST (Fourier Amplitude Sensitivity Test) is a methodology which
allows to estimate the entire set of main effect sensitivities by
Fourier transformation (Koda et al., 1979; McRae et al., 1982),
using a single sample of size N.
Extensions of the FAST method are described in Saltelli et al.
(1999) and Tarantola et al. (2006). In classic FAST only the main
effect terms are computed. Extended FAST allows the
computation of higher order terms, in particular it allows to
compute the entire set of main and total effects, at the cost of
kN model runs.
FAST decomposes the output variance V(Y) by means of spectral
analysis: Where Vi is the amount of variance explained by
factor Xi and K is the residual.
[Pappenberger and Vandenberghe, 2007] 40
Sobol’ is becoming a very popular strategy
in environmental modeling
The Sobol’ method is a Monte Carlo procedure that allows to
compute any term of the variance decomposition, each at the
cost of N model runs (Sobol’, 1993).
Following Saltelli (2002), the cost of estimating the entire set of
main and total effects is of (2+k)N model evaluations, which
roughly halves the computational cost with respect to the
original Sobol’ algorithm.
41
[Pappenberger and Vandenberghe, 2007]
Sobol’ attributes the the variance in the
model output as follows …
42
The first-order and total sensitivity
indices are defines as …
43
Interpretation of the sensitivity indices
The main (or first-order) effect (Si) measures the contribution to the output variance
from varying the i-the factor alone (but averaged over variations in other factors)
(i) the higher the value of Si, the higher the influence of the i-th factor on the output
(i) if Si = 0, then the i-th parameter has no direct influence on the output (but it might still have
some in interaction with other parameters!)
(iii) the sum of all Si is always lower or equal to 1. If it is equal to 1, then there are no interactions
between the parameters (“additive” model)
The total effect (STi) measures the total contribution to the output variance of the i-
th factor, including its direct effect and interactions with other factors
(i) STi must be higher or equal to Si. If it is equal, then the parameter has no interactions with the
other parameters
(ii) if STi = 0, the i-th parameter has no influence (neither direct or indirect) on the model output
(ii) the sum of all STi is always higher or equal to 1. If it is equal to 1, then there are no interactions
between the parameters
[Pappenberger and Vandenberghe, 2007] 44
+&-
Advantages
Extremely robust, they work with any type of discontinuous (even randomised)
mapping between input factors and the output. Sobol’ estimator is unbiased. They
do not rely on any hypothesis about the smoothness of the mapping. The only key
assumption is that variance (i.e. the second moment) is an adequate measure for
quantifying the uncertainty of the model output.
Computing main effects and total effects for each factor, while still being far from a full
factors mapping, gives a fairly instructive description of the system. Moreover, they
provide unambiguous and clear answers to well specified sensitivity settings
(prioritisation and fixing).
Disadvantages
The computational cost is relatively high, which implies that these methods cannot be
applied to computationally expensive models. They do not provide any mapping,
i.e. they decompose the output uncertainty but they do not provide information
about, e.g., the input factors responsible for producing Y values in specified
regions, such as extreme high/low or any behavioural classification.
[Pappenberger and Vandenberghe, 2007] 45
Sobol’ sequences of quasi-random points
sample spaces very evenly
256 points from a pseudorandom number source (left); compared with the
first 256 points from the 2,3 Sobol sequence (right). The Sobol sequence
covers the space more evenly (red=1,..,10, blue=11,..,100, green=101,..,256)
46
[3.4] WHICH METHOD
SHOULD I SELECT?
47
There is not single best strategy for all
problems!
AD means “automated
differentiation”.
[Cariboni et al., 2007. Ecological Modellling] 49
MCF
“Monte Carlo
Filtering”
FF
“Factor
Fixing”
FP
“Factor
Prioritization”
VC
“Variance Cutting”
FM
“Factor Mapping”
50
[4] VALIDATION AND
ROBUSTNESS ANALYSIS
51
Andres (1997) suggested a simple
strategy to verify that all important
factors have been identified
Optimal: Perfect
factors
Optimal: No
correlation correlation
56
[a] Better visualization of complex spaces
and interactions
Total Order First Order Interactions
Total effect = Identifiability + Interactions
A As
A A
As
As A
D As Dα
α