0% found this document useful (0 votes)
63 views

User Guide

This document provides a summary of the ENSEMBLES Downscaling Portal user guide. It describes the portal's structure and how to use it to conduct a statistical downscaling experiment in three main steps: 1) defining predictors from large-scale climate models to link to local observations, 2) choosing the local/regional variable to be downscaled, and 3) creating the statistical downscaling method. It then explains how to apply the method to downscale global climate model scenarios and validate/interpret the results. The document recommends reviewing additional guidance on best practices before using the portal and working with downscaling experts to appropriately define experiments and analyze outputs.

Uploaded by

Lav Bajpai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

User Guide

This document provides a summary of the ENSEMBLES Downscaling Portal user guide. It describes the portal's structure and how to use it to conduct a statistical downscaling experiment in three main steps: 1) defining predictors from large-scale climate models to link to local observations, 2) choosing the local/regional variable to be downscaled, and 3) creating the statistical downscaling method. It then explains how to apply the method to downscale global climate model scenarios and validate/interpret the results. The document recommends reviewing additional guidance on best practices before using the portal and working with downscaling experts to appropriately define experiments and analyze outputs.

Uploaded by

Lav Bajpai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Technical Notes

Santander Meteorology Group (CSIC-UC)


SMG:2.2011

User Guide of the ENSEMBLES Downscaling Portal (version 2)


J.M. Gutierrez1 , D. San Martn1,2 , A.S. Cofino3 , S. Herrera1,2 , R. Manzanas1 , M.D. Fras3
1
Instituto de Fsica de Cantabria, CSIC-Universidad de Cantabria, Santander, Spain.
2
Predictia Intelligent Data Solutions, Santander, Spain.
3
Dpto. Matematica Aplicada y C.C. Universidad de Cantabria. Santander, Spain
correspondence: [email protected], [email protected], [email protected]
version: v3November 2012

Abstract
http://www.meteo.unican.es/ensembles. This report describes the structure and usage of the statistical downscaling portal developed by
the Santander Meteorology Group (http://www.meteo.unican.es) with the technical assistance of Predictia (http://www.predicita.es) as part of
the activities of the EU-funded ENSEMBLES project (2004-2009, see http://ensembles-eu.metoffice.com). The current operational version
(version 2) is a complete reimplementation of the portal, allowing particular adaptations and views for supporting projects and institutions (see
the acknowledgements at the end). The three main actions necessary to create a downscaling method (defining the predictors, choosing the
local/regional target variable to be downscaled and creating the downscaling method) are described step by step, illustrating the different options
available from the portal. Afterwards, the application of the method to downscale GCM climate scenarios is described and some information
about validating and interpreting the results is provided. Therefore, this document is intended to be a brief user guide for the downscaling portal
and requires additional good practice documents to learn about the optimum regions and predictors for the statistical downscaling process.

1 Introduction ing experiment using the downscaling portal. As an illus-


trative example, the portal includes a demo experiment
Iberia demo, which focuses on maximum temperature in five
Statistical downscaling is a sound and mature field which locations/cities for the 2091-2100 decade. This experiment is
allows adapting the coarse-resolution (typically 250 km) available for all users (in write-protect mode) and can be fol-
global climate change scenarios provided by the Global Cli- lowed step to step through the different panels of the portal
mate Models (GCMs) to regional or local scale. These meth- in order to see a typical application.
ods link the large scale outputs of GCMs (typically large-
scale fields such as 500 mb geopotential height) with simul- The know-how information about selecting appropriate
taneous local historical observations (typically surface vari- predictors, calibrating/validating the downscaling method,
ables such as precipitation or temperature) in the region of selecting the appropriate GCMs and scenarios, assumptions
interest. Therefore, these techniques allow filling the gap be- of the statistical downscaling methodology, etc., is not dealt
tween the low-resolution GCM outputs and the models used with in this document. Thus, before using the portal, we
in different impact sectors such as agriculture, energy or strongly recommend the user to read the Guidelines for Use
health which require daily meteorological inputs in spe- of Climate Scenarios Developed from Statistical Downscal-
cial high-resolution grids, or gauge networks. ing Methods1 (which constitutes supporting material of the
Intergovernmental Panel on Climate Change, IPCC).
Statistical downscaling is nowadays a complex multi-
disciplinary discipline involving a cascade of different scien- Finally, we want to remark that this portal should not
tific applications to access and process large amounts of het- be used as a black-box input-output tool since, other-
erogeneous data. Therefore, interactive user-friendly tools wise, the obtained regional projections could be mislead-
are necessary in order to ease the downscaling process for ing or even wrong. Therefore, some background knowledge
end users, thus maximizing the exploitation of the available about the meteorological conditions in the area of interest
climate projections. The ENSEMBLES Downscaling Portal and the main large scale drivers influencing the climate is
described in this document was initially developed within the needed to appropriately use the downscaling tool and to ob-
EU-funded ENSEMBLES project (2004-2009) following an tain meaningful results. Moreover, the results obtained from
end-to-end approach. Afterwards, a complete reimplemen- the ENSEMBLES Downscaling Portal should not be directly
tation (version 2) was performed to ensure the appropriate used in impact applications without the necessary knowledge
adaptation of the portal (different views for different users) about the assumptions and limitations of this methodology.
to the needs of future supporting projects and institutions (see Thus, we strongly advise end-users to work in collaboration
the acknowledgements at the end for the current list of sup- with downscaling groups, or at least have some support from
porting projects and institutions). them, in order to define the experiments and to appropriately
analyze and use the results.
This user guide is intended for end-users with some ba- 1
sic knowledge on statistical downscaling and focus on the http://www.ipcc-data.org/guidelines/dgm no2 v1 09 2004.pdf
steps to be followed to undertake a particular downscal-
User Guide of the ENSEMBLES Downscaling Portal (version 2) 2

Definition/Calibration of the SDM Downscaling

Reanalysis Obseved GCMs


datasets datasets datasets

Predictor Downscaling
Predictand SDM RCM
Method

Predictor Predictand Downscaled


Definition
dataset dataset dataset

Validation (perfect prognosis)

Figure 1: Scheme of the downscaling process using either Statistical Downscaling Methods (SDM) or Regional Climate Models (RCM);
in the former case, besides the Global Circulation Model (GCM) scenarios, reanalysis and observed local data are necessary to perform the
downscaling. Details of the definition/calibration of the statistical downscaling approach are shown.

2 Downscaling Elements case study, an ensemble of statistical downscaling methods


needs to be tested and validated to achieve the maximum skill
and a proper representation of uncertainties. Thus, validation
To fill the gap between the coarse-scale GCM outputs and is a key issue in the ENSEMBLES downscaling portal and,
the local/regional needs of end-users, a number of dynami- as we will show later, it is automatically performed when a
cal models (Regional Climate Models, RCMs) and statisti- downscaling method is defined.
cal methods (Statistical Downscaling Methods, SDMs) have
been developed. On the one hand, RCMs are directly cou-
pled to the outputs of the GCMs (GCMs datasets) and pro-
vide high-resolution (typically 25 km) gridded downscaled 3 Structure of the Portal
datasets for the variables of interest, as simulated from the
physical equations and parameterizations included in the
RCM (see the scheme of this downscaling process in Fig. The portal has been organized in different windows (tabs) to
1, left panel). On the other hand, SDMs combine the in- gradually access the information necessary to define a down-
formation of retrospective GCM analysis/forecasts databases scaling task: (1) Predictor, (2) Predictand, (3) Downscaling
(Reanalysis datasets) with simultaneous historical observa- Method and (4) Downscale. (1-3) correspond to the calibra-
tions of the variables of interest (Observed datasets, either tion/validation of a particular downscaling method, whereas
station networks or grids of interpolated observations) to in- (4) corresponds to the actual downscaling process, apply-
fer appropriate statistical transfer models. Therefore, besides ing the calibrated method to different GCMs and scenarios.
the GCM datasets, two basic ingredients of the statistical These windows can be accessed from the corresponding up-
downscaling methodology are the Reanalysis and Observa- per tabs of the portal, as shown in Fig. 2 (1).
tions datasets, which are required to define and calibrate the
statistical downscaling methods.
A first window (My home) is shown after login to the por-
tal (see Fig. 2) and provides information about the existing
The diagram in Fig. 1 (right panel) shows how the dif- downscaling experiments (2) and the status of the submit-
ferent ingredients of the statistical downscaling process are ted jobs (3), as well as the users account profile (4). The
used to define a SDM for a particular application. A par- Experiment manager panel shows the details of the exper-
ticular subset (geographical region, variables and historical iments already created by the user a unique experiment,
temporal window) of the reanalysis constitutes the predictor Iberia demo, in this case; see Fig. 2 (5), each includ-
dataset, whereas the historical records (for the same temporal ing a set of predictors (6) defined in a particular region from
window) from a goal variable on a number of stations over a reanalysis project MSLP, T850,Q850 and Z500 from
the region of interest forms the predictand dataset. These ERA40, in this case and one, or several, predictands
data are used to calibrate and validate a particular downscal- maximum temperature in five stations in the Iberian penin-
ing method before using it for downscaling purposes (i.e. for sula from GSOD Europe database, labeled as Tmax cities
projecting GCM datasets). These three basic ingredients are as shown in Fig. 2 (7). Each of the predictands may
the basis of the portal workflow, as described in the following have one or several associated downscaling methods in
sections. this case, only the default analog method (8). The user
can browse the information and navigate through the panel
The skill of the downscaling methods depends on the by clicking in the different components.
variable, season and region of interest, with the latter vari-
ation dominating. Thus, for each particular application and

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 3

Figure 2: Main window of the downscaling portal. Management of the experiments (left) and the jobs/tasks (right).

The panel MyJobs allows monitoring the status (starting, 3. Downscaling Method: Definition and validation of one
reading, running, finished, etc.) and type (predictors, vali- or several downscaling methods to be applied in the ex-
dation, downscaling) of the jobs, which are run in parallel periment.
by the portal through a queue of computational resources,
which allows to handle and monitor simultaneously sev- Once the Predictor Predictand Downscaling Method
eral requests2 . Moreover, a thread with the different exe- chain of tasks has been completed, the downscaling methods
cutions stages (reading, performing downscaling, writing re- will be ready to downscale the control and future scenarios of
sults, etc.) and the corresponding execution times can be dis- any of the available GCMs (see the scheme in Fig. 1). This
played for each job. Finally, a job can be killed during its final task is done in the Downscale window.
execution when it is taking longer than expected and when
the user needs extra computational slots. The information
about the account details, including the restrictions holding
on the resources (number of simultaneous jobs, etc.), can be
consulted at any time in the My Account tab (see figure 3)
in the upper-right corner of the window. It also gives infor-
mation about the databases available for the current user.

Each downscaling experiment contains all the informa-


tion needed for the downscaling process: a unique set of pre-
dictors, a number of predictands and a number of downscal-
ing methods. To define an experiment the following three
sequential steps must be followed, each of them correspond-
ing to each of the tabs shown in Fig. 2 (1):

1. Predictors: Definition of the geographical region and


predictors to be used in the experiment.
2. Predictands: Definition of one or several predictands
of interest to be downscaled in this experiment (i.e. with
this particular configuration of predictors).
2
The current version of the portal runs in the com- Figure 3: My Account tab. It gives information about the
puting infrastructure of the Santander Meteorology Group; databases and resources available for the current user.
http://www.meteo.unican.es/computing

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 4

4 Selecting the Predictors future (B1, A1B and A2, for 2001-2100) scenarios; these
models will be described later in the downscaling section.
As shown in Fig. 1, predictor datasets are defined based on
Each particular experiment (shown in the Experiment man- reanalysis data (since day-to-day correspondence with obser-
ager panel from MyHome window) is based on a single vations is required in order to establish the statistical transfer
predictor dataset defined from reanalysis data over a partic- functions used for downscaling).
ular region with a particular resolution. Therefore, a one-
to-one correspondence is established in the portal between Figure 4 shows the view(top) and create (bottom) pan-
an experiment and the particular predictor dataset used 3 . els from the Predictor window allowing to visualize pre-
New predictors (i.e. new experiments) can be defined from dictor datasets of the existing experiments, or to create new
the Experiment manager window (New predictor button) ones, respectively. Note that online help (label 1 in the fig-
or from the Predictor window (second tab of the portal) ure) is provided in all windows to give relevant information
by specifying a reanalysis (ERA40 by default), a geographi- about the different tasks to be performed. For a particular
cal area, a grid resolution (the original reanalysis resolution experiment selected from the pop-up menu (2) the view
by default) and a set of large-scale variables (variable-level panel shows the following information (3): Dataset (reanaly-
pairs). sis used), Dates (time period), Time resolution (24h for daily
data), Lon and Lat (geographical domain), Resolution (hor-
In order to manage a homogeneous basic set of param- izontal and vertical resolution) and, finally, Predictors (vari-
eters for the different GCM outputs (reanalysis and climate ables used as predictors for the experiment). In this example
change projections), a dataset of commonly-used predictor (Iberia demo) we have considered an area of interest cover-
variables at a daily basis has been defined (see Table 1). ing the Iberian peninsula and included basic predictor param-
eters covering the period 1960-1999; this information consti-
Variable (Code) Levels (mb) Time tutes the predictor dataset (as shown in Fig. 1).
Geopotential (Z) 1000,850,700,500,300 00
The create panel allows defining new experiments by
V velocity (V) 850,700,500,300 00 defining the associated predictor dataset (see Figure 4 bot-
U velocity (U) 850,700,500,300 00 tom). In particular in the following we illustrate the defi-
Temperature (T) 850,700,500,300 00 nition of the Iberia demo predictor dataset above described.
Specific humidity (Q) 850,700,500,300 00 First, a reanalysis must be chosen (4), and the time window
and grid (longitude/latitude area and grid resolution) to be
Relative Vorticity (VO) 850,700,500,300 00 used in the experiment must be specified; the map (5) shows
Divergence (D) 850,700,500,300 00 the resulting grid. Alternatively, the region to be used can
MSLP (MSL) surface daily be graphically selected by shift-clicking and dragging in this
window, and the resolution can be manually configured in
2m Temperature (2T) surface 00
(4). Afterwards, the particular predictors must be selected (6)
by choosing the variable, level (when required) and the base
Table 1: Description of the variables, height levels and times
hour (by default 00 UTC), for instantaneous variables (see
(UTC) of the common set of parameters used in the portal. Time
Table 1); in the example shown, the selected predictors are Z
values daily refer to daily mean values, whereas times 00 refer to
at 500 mb, T and Q at 850 mb and SLPd (d denotes daily
instantaneous values.
mean). Moreover, since the GCM models to be used later
As a compromise among the different native horizontal for downscaling may lack some of these predictors, a panel
resolutions of the models that will be used to project future (7) indicates the GCMs (among those available for the user)
climate, a common 2.5 x 2.5 grid was considered. Reanal- compatible with the selected set of predictors, i.e. the models
ysis and models are interpolated to this grid using standard with the scenario data required to be downscaled within the
bilinear interpolation. In particular, we have downloaded, current predictor dataset (i.e. within the current experiment).
post-processed and stored data for the ERA40 ECMWF re- Once the information has been defined, a name can be given
analysis, the NCEP/NCAR Reanalysis1 (see Brands et al., (8) and the create new predictor button can be clicked to
2012, for a comparison of these two reanalysis for downscal- define the new experiment. Note that the name of the ex-
ing porpuses) and from different GCMs from the ENSEM- periment can be changed afterwards from the My Home
BLES project4 both in control (20c3m, for 1961-2000) and window.

3
Note that this restriction could be problematic for a friendly use Note that the definition of a predictor dataset involves
of the portal, since running a downscaling method for a given pre- several calculations to prepare the data in order to speed-up
dictand with different predictors would imply defining a new exper- the downscaling process; for instance, PCs explaining a 99%
iment. However, the flexibility to freely combine predictors, predic- of the variance are computed (and stored) for the selected
tands and downscaling techniques lead to data-compatibility prob- period. Thus, when creating a new predictor/experiment, a
lems which can not be solved in a user-friendly form. This restric- job is launched to the portal (labeled as PREDICTOR) and
tion may change in future versions of the portal, if the development the execution can be followed in the Jobs panel until termi-
team find out a solution to overcome these problems. nation. For instance, the My Jobs panel shown in Fig. 2
4
Both the IPCC-AR4 simulations (ENSEMBLES Stream1) and shows a job with ID code 606 ran on 9th March 2011 to de-
the new simulations done in the project (Stream2); see http://cera- fine a PREDICTOR dataset (Iberia demo in this case) which
www.dkrz.de/WDCC/ui/BrowseExperiments.jsp?proj=ENSEMBLES lasted 5 minutes (not shown in the figure).

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 5

Figure 4: Windows to visualize an existing predictor (above) and to create new ones (below). Numbers refer to the different elements of the
windows and are explained in the running text.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 6

Figure 5: Window to create a predictand for a particular experiment (Iberia demo in this case).

5 Selecting the Predictand(s) (3), in this case maximum daily temperature. Afterwards
the points/stations of interest must be graphically selected by
adding (or removing) points (4) and shift clicking and drag-
The statistical downscaling portal contains different sources ging on the map to define an inclusion (or exclusion) square
of historical data which can be used as predictands (tar- (5); the labels of the stations can be optionally displayed on
gets) in the downscaling process. For instance, open-access the map to facilitate this task. Moreover, information about
datasets such as GSN (Global Stations Network) or GSOD the stations currently selected can be consulted at any time
(Global Summary of the Day) have been included in or- (6). According to the restrictions of the users account, there
der to have a minimum set of historical information to test is a maximum of stations/points that can be selected for a
the downscaling methods worldwide (consult the informa- particular predictand. For instance, users with a basic profile
tion about these datasets in the portal). Moreover, the user (i.e. those not involved in the supporting projects or institu-
can include new observation datasets into the portal. This tions) can only select five stations (see Fig. 3 for additional
option will be available in the new version of the portal5 . information).

The Predictands window allows viewing and creating Once the dataset, variable and stations have been defined,
predictands for an experiment from the available historical a name can be given to the predictand and it can be in-
datasets. Each predictand must be defined by considering a cluded in the corresponding experiment by clicking on the
single variable of interest (e.g. maximum temperature) and Create new Predictand button (8). Note that if the cre-
a number of points/stations among the ones lying within the ate default downscaling method checkbox is selected, then
region defined while creating the experiment (e.g. five cities a default statistical downscaling method (a pre-defined ana-
in the Iberian peninsula). Figure 5 illustrates the steps to log method) will be defined and validated for this predictand
be followed to create a new predictand for a particular ex- (see the next section); in this case a VALIDATION job will
periment, selected from the list of available experiments (1). be run by the portal and a new default downscaling method
First, the historical dataset to be used must be selected (2), will by automatically associated to the predictand.
in this case the GSOD Europe dataset, and the variable of
interest must be chosen among those existing for the dataset Once the predictor and predictand have been defined for
5
a particular experiment, the common historical dataset will
Note that the downscaling portal is compatible be used to calibrate and validate the different downscaling
with the MeteoLab observation datasets format; see methods, as explained in the next section.
http://www.meteo.unican.es/trac/MLToolbox/wiki/NewObs

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 7

Figure 6: Configuration panels for the different statistical downscaling techniques included in the statistical downscaling portal: (a) analogs,
(b) wealther typing, (c) linear regression, (d) neural networks. Note that when the predictand is precipitation linear regression is replaced by
generalized linear models (GLMs), with the same configuration options.

6 The Downscaling Method(s) ical period (the intersection of the reanalysis time-window
and the observations availability period, typically between
15 and 30 years). Then, the resulting statistical model is ap-
Different statistical methods have been proposed in the liter- plied to data from different GCM climate change simulations
ature to adapt the coarse predictions provided by global cli- in different scenarios to obtain the projected local forecast (in
mate models to the finer scales required by impact studies. this case the predictor data is build considering the predictor
These methods usually work in two steps (perfect progno- variables from the GCM outputs).
sis approach6 ): Firstly, an empirical relationship (a statisti-
cal model) is established between the large-scale reanalysis Usually, the different statistical downscaling methodolo-
variables (predictors) and the small-scale observed variables gies are broadly categorized into three classes (see, e.g.
of interest (predictands) using data from a common histor- Gutierrez et al., 2012, and references therein):
6
Thus, systematic model errors are not taken into account with
Weather typing (analogs), based on nearest neighbors or
this methodology and it will be a component of the downscaling er-
in a pre-classification of the reanalysis into a finite num-
ror. Recently MOS-like approaches have been tested in the climate
ber of weather types obtained according to their synop-
change context with promising results. These methods will be in-
tic similarity; these methods are usually non-generative,
cluded as an alternative to Perfect Prognosis in a future version of
since they consist of an algorithmic procedure to obtain
the downscaling portal.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 8

the prediction, such as the method of analogs. (ELM) tab, the user can also define the number of Principal
Transfer functions (regression), based on linear regres- Components of the predictors (1) or the number of nearest
sion or nonlinear models (e.g., neural networks) to in- neighbors to be considered in the model (2). The default
fer the relationships between predictands and the large- configuration of this method takes 100 hidden neurons and
scale predictors; these methods are generative in the applies a sigmoidal function as activation function of the neu-
sense that the projections are derived from a model ob- ral network. This configuration can be modified by the user
tained from data. introducing an arbitrary number of hidden neurons (3) or se-
Weather generators, which stochastically simulate daily lecting another activation function within those implemented
climate values based on the available monthly average in the portal (4). A clustering method based on k-means can
projections or in resampling or simulation procedures also be defined (4) indicating the number of clusters to be
applied to the daily data. These techniques are temporal considered (6) in this case. Note that the Generalized Linear
disaggregation methods. Model (GLM) is also provided from the transfer functions
category. The GLM is a generalization of ordinary linear re-
The downscaling portal includes techniques from the first gression to variables that are not normally distributed (e.g.
two categories (weather generators will be also implemented precipitation). Then, this approach is available in the portal
in a future version of the portal), thus allowing to test and to downscale precipitation. The configuration for this ap-
compare the performance of several approaches (note that proach is similar to the options available in the linear regres-
the skill of statistical downscaling methods varies from vari- sion tab.
able to variable and from region to region). For a particular
experiment, a number of methods can be selected and config- As mention above, once a downscaling method is defined,
ured from the Downscaling Method window, as shown in a name must be assigned in the corresponding text-box la-
figure 6. The default configuration corresponds to an analog beled as Downscaling method name. Then, the method will
downscaling method, from the weather-typing category, con- be automatically validated by clicking on the Create new
sidering the closest analog day (Fig. 6a); additional configu- Method button. Note that every new downscaling method
rations with a different number of analogs/neighbors (1) and is automatically validated by the portal. Therefore, a job (la-
inference methods (2) can be selected by the user in this win- beled as VALIDATION) will be submitted to the portal and
dow. A comment can be included in (3) (this is optional) and its execution can be followed in the Jobs panel until termi-
a name for the particular technique in (4). These text boxes nation.
are defined for each downscaling method as shown in Fig.
6. Finally, the button (5) allows creating the defined tech-
nique. The status of the downscaling process can be checked
at any time in the My jobs panel of My home window 6.1 Validation of the SDM
(Fig. 2). From the second tab of the weather-typing cate-
gory, it is also possible to perform a statistical downscaling
method based on a pre-classification of the reanalysis data Every donwscaling method defined in the portal is automat-
into a finite number of weather types defined according to ically validated using a train/test validation approach. The
their synoptic similarity (Fig. 6b). The clustering method common historical period for predictors (reanalysis; note that
implemented in the portal is the k-means approach (1) and it this validation is done in Perfect Prognosis conditions) and
is possible to select the number of clusters or weather types predictands (local observations) is split into training (75%
to be considered (2). As in the Analogs tab, different ap- of the data) and test (the remaining 25%) subsets. In the
proaches are provided (3), in this case, to infer a prediction training phase the downscaling method is calibrated using
from the observations within the corresponding cluster. the training data (e.g. the regression coefficients are fitted to
the data), whereas in the test phase the method is validated
on the test data (note that the test data is not used in the cal-
The portal also includes linear regression methods (from ibration phase and, thus, the results can be extrapolated to
the transfer functions category), (Fig. 6c). In its default con- new datasets).
figuration, only the first five PCs of the predictors are con-
sidered for downscaling however, the user can modify this
number including an arbitrary number of PCs (1). A number The validation results are given in the View panel, for
of neighbor grid-boxes can be also introduced in the model a particular predictor, predictand and downscaling method
(2). This option tries to solve the underestimation of the pre- of interest (1 in figure 7). A description of the downscaling
dicted variability taking into account local effects by means method is given in (2). The results of the validation are given
of the nearest grid points data selected. However, the raw both as a summary PDF file (3) and in tabular form in the
model output values will be used as predictors and the spa- application window (4).
tial coherence of the method will be lost, since different local
points will have different downscaling models, with different The validation is performed both on a daily and on a
predictors (the same variables, but over different grid points). 10-day aggregated basis (4). In both cases, basic statistics
It is also possible to apply the linear regression method con- (mean, standard deviation, minimum and maximum values
ditioned on clusters which is based on the k-means approach and percentage of missing data) of the observations (Obs.
(3). The number of clusters can be indicated in (4), then stats) and the downscaled predictions (Pred. stats) are calcu-
a regression model is derived for each cluster. A nonlinear lated and displayed (5). Furthermore, other scores, such as
transfer function model based on neural networks is also im- percentiles, are also computed, but they are not shown in the
plemented in the portal (Fig. 6d). In the Neural Network default view for the sake of clarity (this can be configured in

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 9

Figure 7: Window to access to the results from the validation of the downscaling method.

the columns choice menu in 6). Similarly, the Accuracy (bottom) basis. The more accurate the method, the more lin-
and Distributional Similarity tabs show different validation ear the plot and the higher the correlation (rho for tempera-
scores related to the accuracy (the default ones are correla- ture and r for precipitation). Note that for precipitation the
tion, MAE, RMSE and normalized RMSE; see Appendix 1) HIR and the FAR scores are also given in the daily case, thus
and the reliability (bias, normalized bias, ratio of variances characterizing the discrete part of the distribution (see Ap-
and p-Value of the Kolmogorov-Smirnov test) of the method. pendix 1 for details on validation scores). The two plots on
In the case of precipitation, some additional scores related the right show the distributional similarity of the observed
to the occurrence character of precipitation will also be and predicted values, on a 10-day aggregated basis; the up-
shown. In particular the ratio of observed and predicted non- per figure shows the observed and predicted PDFs, including
precipitation frequencies and the Hit and False Alarm Rates the KS-pValue and PDF-Score (see Appendix 1 for details),
(HIR and FAR, respectively; see Appendix 1 for details). whereas the figure in the bottom shows the quantile-quantile
plot of the observations and predictions. In the case of pre-
By clicking on the right arrow in any of the score labels a cipitation, the plots correspond to rainy days; moreover, the
menu will appear (6). From it, the user can choose which numbers on the top of the figures show the scores for non-rain
scores (columns) to visualize, the ascending/descending days and, thus, the combination of both pieces of information
ranking of the stations; moreover, there is the possibility to gives a general idea of the performance of the method for this
display the spatial distribution of the score on the right hand mixed (discrete and continuous) variable.
side map.

By clicking on any row/station (Navacerrada in this case) 6.2 Downscaling Reanalysis


a new panel will be displayed (7). This panel shows the
basic descriptive and validation scores together with some
graphical representations, providing a summary of the per- The validation utilities described in the previous section per-
formance of the downscaling method. Note that this panel form an automatic validation of the downscaling methods by
is slightly different for temperature and precipitation due the comparing the observations with the corresponding down-
mixed (discrete/continuous) behavior of the latter variable, scaled values from the reanalysis in the historical period,
that must be taken into account for validation. The two plots considering at random 75% of the data for training and the re-
on the left show the accuracy of the method in the selected maining 25% for testing. In order to allow further validation
station by displaying a scatterplot of observed vs predicted analysis, there is the possibility to apply the statistical down-
values for the test data, on daily (top) and 10-day aggregated scaling methods in retrospective perfect-prog mode, by con-

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 10

Figure 8: Downscaling from reanalysis data (hindcast) for particular time slices.

sidering reanalysis data as input in the statistical downscaling 7 Downscaling GCM Scenarios
method. In this way the whole predicted series for the histor-
ical period can be obtained. Note that, since the training and
downscaling periods can overlap in this case, special care is Once a target predictand to be downscaled has been selected
to be taken in the definition of the time-window for the pre- for a particular experiment (predictor set), and the statisti-
dictors, considering a time-slice of the available reanalysis cal downscaling method has been calibrated and validated
data. However, in order to avoid problems with the analog using reanalysis data (under Perfect Prognosis conditions),
method, a one-month temporal exclusion window centered then the downscaling method is ready to be applied to future
in the downscaling date is considered in this case. Moreover, climate change scenarios, considering GCMs outputs in dif-
the sensitivity of the SD methods to reanalysis uncertainty ferent control (20c3m, for 1961-2000) and future scenarios
(Brands et al., 2012) can be tested by using different reanal- (B1, A1B and A2, for 2001-2100). This option is available
ysis datasets as input data at this point. in the Downscale window (the last tab of the application).
The portal contains GCM daily data from the following four
This option is available in the Hindcast tab (1) from the GCMs: BCM2, CNCM3, MPEH5 (ENSEMBLES Stream1)
Downscale window (see Fig. 8), where different reanalysis and HADGEM2 (ENSEMBLES Stream27 ), which have been
datasets (2) can be selected (e.g. ERA40, NCEP). The down- validated at a daily basis for the different upper-level fields
scaling method is selected in (3) from those already validated included as predictors in the portal (Brands et al., 2011). The
for the selected Predictor and Predictand. Note that the available variables and scenarios as well as information about
information of the predictor (Predictor tab) includes the the spatial coverage for each particular GCM can be con-
particular reanalysis and periods used; this information is to sulted by clicking on the Info label for the corresponding
be considered when performing hind cast experiments since model or on the My Account tab (as shown in Fig. 3).
the training and downscaling (test) datasets might overlap.
Different panels are available for viewing (and downloading) Figure 9 shows the Downscale window with the cre-
the existing downscalings or for creating new ones (4), such ate tab selected, as shown in (1). This window allows cre-
in the present case. The available periods for downscaling ating new downscalings for a particular predictor, predictand
are organized in decades, which can be directly selected for and downscaling method, selected from (2), as well as the
downscaling by clicking on the corresponding chec-boxes. scenario of interest (A1B in this case). For the particular
In the next section, further details are given on the downscal-
ing jobs and the access to the resulting data. 7
The details of the models are given in http://cera-
www.dkrz.de/WDCC/ui/BrowseExperiments.jsp?proj=ENSEMBLES.
Preferably Stream1 models were selected for this version of the
portal; however, HADGEM2 was selected from Stream2 because
the availability of daily data for the Stream1 MetOffice models was
limited.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 11

Figure 9: Downscale create window to apply downscaling methods using GCM scenario data for particular time slices.

Figure 10: Cells become green once the downscaling is finished.

selection, the window shows a downscaling matrix includ- Fig. 9(5) note that by clicking on a decade label or on
ing the possible combinations of GCMs with available data8 a GCM label, all the corresponding cells are automatically
(in columns) as shown in (3) and the corresponding selected; afterwards, the corresponding downscaling jobs
time periods with available simulations (organized in rows, can be submitted by clicking on the run button, as shown
decade by decade) as shown in (4). In this case all the in (6); note that the portal will submit one job per cell, so the
GCM simulations span the whole period of 10 decades but, accounts restrictions will determine the maximum number
in general, different models may have different simulated pe- of cells that can be selected/submitted simultaneously9 . For
riods (e.g. the models downloaded from the IPCC database instance, users with a basic profile (i.e. those not involved
which include only certain time-slices, e.g. 2081-2100). in the supporting projects or institutions) can only run two
jobs simultaneously, which include the creation of predictor,
Each of the elements in the matrix (a decade for a partic- predictand (with the basic downscaling method), or down-
ular GCM for a given scenario) is considered a downscaling scaling method, as well as the downscaling jobs. Therefore,
cell and it is run by the portal as an independent job. Same downscaling the A1B scenario for the whole 2001-2100 pe-
criteria is applied in the hindcast tab. One or several of riod for a particular GCM would require five run steps (two
these cells (jobs) can be selected by clicking on them, as in decades each) in the portal (in case that the user is not run-
9
8
Note that some of the variables included in the predictor defini- See Fig. 3 for more information about your accounts restric-
tion may be missing for some of the GCMs, e.g. 1000 mb levels in tions; in particular you may consult the number of simultaneous
the HADGEM2 model; in those cases, the GCM will not be avail- jobs allowed for your account: ConcurrentJobs. These limitations
able for downscaling for this predictor; note that this information is have been considered to keep the downscaling jobs at a reasonable
available when creating the predictor as shown in Fig. 4 (7). level of complexity, in terms of the memory needed and duration of
the task.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 12

Figure 11: Downscaled projections can be downloaded in a .csv file and loaded in, e.g. Excel.

ning any other task). We strongly advise the users to first View panel. By clicking on the existing ones (those with
downscale, download and analyze a single decade before per- a check box) the user can select those downscaling cells of
forming more exhaustive downscaling tasks, as we did in interest and download them in a .csv file (Download se-
the Iberia demo experiment (the 2091-2100 decade for the lected downscalings). This file can be easily converted to
ECHAM5 A1B scenario). a commonly used Excel .xls in which daily predictions for
all the stations (in columns) selected in the Predictand win-
The status of the jobs can be checked at any time in the dow are in displayed in rows; note that the dates may no be
Jobs info button (in the upper right corner of the window) consecutive and, therefore, you may need to sort the rows
or in the My jobs panel of My home window. A typical by the first column (the date) to obtain a chronological file.
downscaling job will access the required data (the GCM sce- This allows the user to easily manipulate the data, drawing
nario simulations and the reanalysis and observed data) and projected time series, etc. For instance, Fig. 11 shows the
apply the downscaling method, producing the local projec- csv file downloaded with the projections of the 2091-2100
tions for the defined locations/stations and period; this pro- decade for the ECHAM5 model shown in Fig. 10. A graph
cess takes typically some minutes and goes through different of the daily temperatures for two out of the four stations
stages, which are indicated in the Jobs info panel: START- (Madrid and Navacerrada) have been drawn by simply us-
ING, RUNNING, etc., until the job finishes normally (FIN- ing the drawing facilities in Excel. The .csv file includes
ISHED), or abnormally (ERROR). The different stages are some header lines (the first 22 lines in Fig. 11) describing
also indicated with a background color in the correspond- the predictors, the GCM and scenario, downscaling method
ing downscaling cell: yellow for STARTING (i.e., the job and the predictands/stations (labelled as c1, c2, etc.) corre-
is waiting at the execution queue), blue for RUNNING (i.e. sponding to the particular downscaling. The remaining rows
the job is running at the cluster), green for FINISHED and correspond to the data, including the date in the first col-
red for ERROR (indicating some failure of the process). In umn, and the stations in the remaining ones, following the
this last case, we advise the user to wait a couple of hours order c1, c2, etc. defined in the header. Note that the name
and re-submit the job (in order to avoid possible spurious er- of the file is also informative of the particular downscaling
rors in the computing infrastructure) and, if the error persist, details (Iberia demo - Tmax 5cities - Analogues (default) -
contact the portal development team using the email contact MPEH5 - A1B.csv in this case).
form included in the upper left corner of the portal.

The completed downscaled projections (downscaling ma-


trix cells) can be consulted and downloaded through the

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 13

8 Acknowlegments

The authors are grateful to the 6th FP EU project ENSEM-


BLES (GOCE-CT-2003-505539) for partial support for the
development of the downscaling portal (see Linden et al.,
2009, available at http://ensembles-eu.metoffice.com). The
authors are also grateful to the 7th FP EU projects FUME
(No. 243888), CLIM-RUN (No. 265192), METAFOR (No.
211753), QWECI (No. 243964), and to the MOSAICC ini-
tiative by FAO (http://www.fao.org/climatechange/mosaicc).

References

Brands, S., S. Herrera, D. San-Martn, and J.M. Gutierrez,


2011a: Validation of the ENSEMBLES Global Climate Mod-
els over southwestern Europe using probability density func-
tions: A downscalers perspective. Climate Research, 48, 145-
161. DOI: 10.3354/cr00995 (open access). http://www.
int-res.com/articles/cr_oa/c048p145.pdf
Brands, S., J.M. Gutierrez, S. Herrera, and A. Cofino, 2012: On
the Use of Reanalysis Data for Downscaling. Journal of Climate,
25, 2517-2526. http://www.meteo.unican.es/node/
73004
Gutierrez, J.M., D. San-Martn, S. Brands, R. Manzanas and S. Her-
rera, 2012: Reassessing statistical downscaling techniques for
their robust application under climate change conditions. Journal
of Climate, doi:10.1175/JCLI-D-11-00687.1. http://www.
meteo.unican.es/en/node/73058
Kalnay, E., et al., 1996: The NCEP/NCAR 40-year reanalysis
project. Bulletin of the American Meteorological Society, 77 (3),
437471.
Van der Linden P. and Mitchell, JFB, 2009: ENSEMBLES: Climate
change and its impacts: Summary of research and results from
the ENSEMBLES project. Met Office Hadley Center, FitzRoy
Road, Exeter EX1 3PB, UK. 160pp.
Perkins, S. E., Pitman, A. J., Holbrook, N. J. and McAneney, J.,
2007: Evaluation of the AR4 climate models? simulated daily
maximum temperature, minimum temperature and precipitation
over Australia using Probability Density Functions, Journal of
Climate, 20, 43564376, doi:10.1175/JCLI4253.1.
Uppala, S., et al., 2005: The ERA-40 re-analysis. Quarterly Journal
of The Royal Meteorlogical Society, 131 (612, Part B), 2961
3012, doi:{10.1256/qj.04.176}.
Wilks, D., 2006: Statistical methods in the atmospheric sciences.
Academic Press, 2 ed., Amsterdam, Elsevier.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 14

9 Appendix 1: Validation scores average. It is defined as the square root of the variance:
v
u n
u1 X
This section provides a more detailed description of the vali- =t (xi x)2 (3)
n i=1
dation process generated in the downscaling portal. It makes
an attempt to help the user to properly analyse the statisti-
cal scores calculated for the validation of the downscaling The standard deviation is also greatly influenced by out-
method applied. liers. A useful property of the standard deviation is that,
unlike variance, it is expressed in the same units as the
data.
Validaton is performed at two different time-scales in the IQR: Interquartile range. It is a robust score that also
downscaling portal: daily and 10-daily aggregated data. De- estimates the dispersion in a sample, but it is not influ-
pending on the users needs, both time-scales might be useful enced by outliers. It is defined as the difference between
and the downscaling methods may show higher performance the upper (75th percentile) and lower (25th percentile)
on the aggregated one, particularly for precipitation, being quartiles, Q3 and Q1 respectively.
more informative for validation purposes. Note that addic-
tiona validation scores are computed for precipitation, in or- IQR = Q3 Q1 (4)
der to take into account its dual (discrete/continuous) char-
acter. These scores will be identified with a only for pre- The interquartile range is commonly used to build box-
cipitation label in the following description. Labels in bold plots, simple graphical representations that shows with
correspond to the codes used in the downscaling portal (Sec. a box the spread of the data falling between the 25th and
6.1). All statistics are computed using the period defined for 75th percentiles.
the particular experiment, so these scores (including descrip- PX: X th percentile. Value below which X% of the data
tive ones), might change from experiment to experiment. points are found. X = 5, 10, 90, 95.
Missing: Percentage of missing values within the data:
[0,100].
9.1 Descriptive Statistics

9.2 Accuracy
Basic descriptive statistics of observed (forecast) series.

RR: Rainfall Rate (only for precipitation). This score Accuracy is one of the main aspects that must be examined
measures the frequency of wet days and it is calculated when looking at the quality of a forecast since it measures
as the number of wet days divided by the size of the sam- the level of agreement between forecasts and observed time
ple, n, expressed in % series. Note that some of the scores are presented in units
of some descriptive statistic, what allows for direct compari-
nwet son among stations and/or seasons, not worrying about their
RR = 100 (1)
n different regimes. In particular, those scores re-scaled by the
The threshold considered for defining wet days is 0.1 Mean (Sigma) are named with a n (N) at the beginning of
mm. their names.
Mean: Arithmetic mean. It measures the central ten-
dency in a sample. It is calculated as the sum of all data HIR: Hit Rate (only for precipitation). It is the prob-
points (xi , i = 1, ...n) divided by the size of the sample, ability of occurrences (o) (i.e. wet day) that were cor-
n. rectly forecast (f). This score ranges in [0,1] being 1 the
n
perfect score.
1X
x = xi (2) HIR = P (f = 1|o = 1) (5)
n i=1
The arithmetic mean is greatly influenced by outliers. FAR: False Alarm Rate (only for precipitation). It is
For this reason, robust statistics such as the median may the probability of non-occurrences that were incorrectly
provide a better description of central tendency. forecasts. This score ranges in [0,1] being 0 the perfect
Median: Median. The median is also a measure of lo- score.
cation. It is described as the value separating the higher F AR = P (f = 1|o = 0) (6)
half of the sample from the lower one (50th percentile).
Note that both scores, HIR and FAR, are only calculated
It can be found by arranging all the values from the low-
in the portal for the case of the daily precipitation. They
est to the highest and picking the middle one. For data
are not calculated for the 10-daily validation since ag-
symmetrically-distributed , the mean and the median are
gregated data are considered to be continuous. HIR and
the same.
FAR must be considered together in order to validate the
Min: Minimum. The smallest value in the series.
discrete part precipitation. The threshold considered for
Max: Minimum. The largest value in the series.
defining wet days is 0.1 mm.
Sigma: Standard Deviation (also denoted as Std). It
rho: Pearsons Product-Moment Correlation Coeffi-
shows how much variation or dispersion exists from the
cient. It measures the strength of the linear relationship

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 15

between observations and forecasts. Ranged in [-1,1]. score: 0.


Perfect score: 1. The Pearsons correlation coefficient M AE
(10)
between two variables (x and y or observations (o) and o
forecasts (f) in our case) is defined as the covariance of It has a singularity at o = 0, but this is not a realistic
the two variables (Cov(x,y)) divided by the product of situation.
their standard deviations. RMSE: Root Mean Square Error. It measures the av-
erage magnitude of the forecast errors, weighted ac-
Pn cording to the square of the error. This score ranges
Cov(o, f ) i=1 (oi o)(fi f ) in [0,). Perfect score: 0.
o,f = = pP qP
o f n 2 n 2
i=1 (oi o) i=1 (fi f )
v
u n
(7) u1 X
The Pearsons correlation coefficient shows how close RMSE = t (fi oi )2 (11)
n i=1
the points of a scatter plot (observations against fore-
casts) are to a straight line. This score ranges in [-1,1]. The Root Mean Square Error shows high influence on
A value of 1 (-1) implies that a linear equation describes large errors than on smaller ones, which may be appro-
the relationship between observations and forecasts per- priate if large errors are especially undesirable. How-
fectly. Thus, all the data points lies on a line indicat- ever it may also encourage conservative forecasting.
ing that forecasts increase (decrease) as observations nRMSE: Root Mean Square Error (RMSE), in units of
increase. A value of 0 implies that there is no linear the observed mean. Ranged in [0,). Perfect score: 0.
correlation between the variables. This score does not
take bias into account, i.e., it is possible for a forecast RM SE
(12)
with large errors to still have a good Pearsons correla- o
tion coefficient respect to the observations. This score
is sensitive to outliers. NRMSE: Root Mean Square Error (RMSE), in units
r: Spearmans Rank Correlation Coefficient. This score of the observed standard deviation. Ranged in [0,).
measures the dependence, through some monotonic Perfect score: 0.
RM SE
function, between observations and forecasts. The (13)
Spearmans correlation coefficient is defined as the o
Pearsons correlation coefficient considering the ranked
variables. The sign of this score shows the direction
of association between observations and forecasts. A 9.3 Distributional Similarity
positive (negative) coefficient indicates that forecasts
tend to increase (decrease) as observations increase. Its
magnitude increases as observations and forecasts be- The analysis of the distributional similarity is also a char-
come closer to being perfect monotone functions of acteristic that describes the quality of a forecast/simulation,
each other. particularly at temporal scales where no serial correspon-
It ranges in [-1,1]. A Spearmans correlation coefficient dence between observations and predictions is required (e.g.
of 1 (-1) results when observations and forecasts keep for climate change projections). Thus, these scores measure
a perfect monotone relationship, even if their relation- similarity in climatological terms. Note that distributional
ship is not linear. Note, that this does not yield to a similarity must be carefully examined, specially for climate
perfect Pearsons correlation. The Spearmans correla- change studies. These are the scores shown by the portal.
tion coefficient is less sensitive than the Pearsons one
to outliers that may be in the tails of both observations
Ratio: Ratio of wet days (only for precipitation). Ra-
and/or predictions. This score should be used when val-
tio between forecasted and observed frequencies of wet
idating precipitation rather than the Pearson correlation
days. It rangs in [0,). Perfect score: 1.
coefficient.
MAE: Mean Absolute Error. It is an average of the P (f = 1)
forecast absolute errors. This score ranges in [0,) Ratio = (14)
P (o = 1)
being 0 the perfect score
n The threshold considered for defining wet days is 0.1
1X mm. It presents a singularity when P (o = 1) = 0 (no
MAE = |fi oi | (8)
n i=1 rain occurs).
Bias: Additive Bias. This score measures the average
nMAE: Mean Absolute Error (MAE), in units of the forecast error. It ranges in (-,+). Perfect score: 0.
observed mean. It ranges in [0,). Perfect score: 0.
n
M AE 1X
(9) Bias = (fi oi ) (15)
o n i=1

It has a singularity at o = 0 (could occur for tempera- It does not measure the punctual correspondence be-
tures, for instance). tween forecasts and observations, i.e., it is possible to
NMAE: Mean Absolute Error (MAE), in units of the get a perfect score for a bad forecast if errors are com-
observed standard deviation. Ranged in [0,). Perfect pensated.

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116


User Guide of the ENSEMBLES Downscaling Portal (version 2) 16

NBias: Bias, in units of the observed standard devia- In addition, the PDF Score is hardly sensitive to failures
tion. It ranges in [0,). Perfect score: 0. in the tails of the distributions. Thus, the user should
not rely exclusively on this score for validation, espe-
Bias cially when projecting under climate change scenarios.
(16)
o We strongly recommend to consider both KS and PDF
scores in conjunction.
It has a singularity at o = 0.
RV: Ratio of Variances. This scores measures the ratio
Note that, for the special case of daily precipitation, and
between forecast and observed variances, in units of the
due to the high mass of probability density located at
observed one. It ranges in [0,). Perfect score: 1.
zero, the KS-pValue, KSX-pVvalue and the PDF Score
f2 are calculated for the continuous part of the distribu-
RV = (17) tions, by considering exclusively the observed and fore-
o2 casted wet days. The discrete occurrence/non occur-
It has a singularity at o = 0. rence event is validated through the above explained
KS-pValue: pValue from the two-sample Kolmogorov- HIR, FAR and Ratio scores. The latter scores are cal-
Smirnov test. This score ranges in [0,1]. The null hy- culated over the entire observed and forecasted series
pothesis of equality of distributions is rejected when the for the 10-daily precipitation and temperature at both
significance level equals or exceeds this pValue. The time-scales.
Kolmogorov - Smirnov test for two samples of sizes n
and n0 measures a distance, Dn,n0 , between both cumu-
lative density functions. Dn,n0 is calculated as:

Dn,n0 = sup |F1,n (x) F2,n0 (x)| (18)


x
where F1,n and F2,n0 are the empirical cumulative
distribution functions of the first and second sample,
respectively. This test is one of the most useful and
general nonparametric methods for comparing two
samples, as it is sensitive to differences in both location
and shape of the empirical cumulative distribution
functions. Therefore, it is a must to consider this score
for validation, especially when projecting under climate
change scenarios.

KSX-pVvalue: pValue from the two-sample


Kolmogorov-Smirnov test restricted to observations
and forecasts under their respective X th percentiles
(here X = 10, 90). It ranges in [0,1]. The null
hypothesis of equality of distributions is rejected when
the significance level equals or exceeds this pValue.
PDF Score: The PDF Score measures the overlap be-
tween observed and forecasted empirical probability
density functions. It ranges in [0,1]. Perfect score: 1.
This score is calculated as in Perkins and McAneney
(2007):
200
X
P DF Score = (P DFfi P DFoi )
i=1
being P DFfi the forecast probability density for the ith
bin and P DFoi the observed probability density both
for the ith bin. 200 discrete bins (classes) are defined
for the whole range of observations and predictions.
Then, the probability density for each class is estimated
by Kernel Density Smoothing. Observed and forecast
probability densities are then compared for each class,
retaining each pair minimum. The resulting sample of
minima is finally summed up.
In the portal, Gaussian Kernels and a width parameter
optimized for normal distributions are considered to
estimate the probability densities. Therefore, the user
must be aware that this score is more appropriate
for validating temperature than for precipitation (see
Brands et al., 2012, for a critical analysis of this score).

Tech. Notes Santander Meteorology Group (CSIC-UC): GMS:2.2011;116

You might also like