User Guide
User Guide
Abstract
http://www.meteo.unican.es/ensembles. This report describes the structure and usage of the statistical downscaling portal developed by
the Santander Meteorology Group (http://www.meteo.unican.es) with the technical assistance of Predictia (http://www.predicita.es) as part of
the activities of the EU-funded ENSEMBLES project (2004-2009, see http://ensembles-eu.metoffice.com). The current operational version
(version 2) is a complete reimplementation of the portal, allowing particular adaptations and views for supporting projects and institutions (see
the acknowledgements at the end). The three main actions necessary to create a downscaling method (defining the predictors, choosing the
local/regional target variable to be downscaled and creating the downscaling method) are described step by step, illustrating the different options
available from the portal. Afterwards, the application of the method to downscale GCM climate scenarios is described and some information
about validating and interpreting the results is provided. Therefore, this document is intended to be a brief user guide for the downscaling portal
and requires additional good practice documents to learn about the optimum regions and predictors for the statistical downscaling process.
Predictor Downscaling
Predictand SDM RCM
Method
Figure 1: Scheme of the downscaling process using either Statistical Downscaling Methods (SDM) or Regional Climate Models (RCM);
in the former case, besides the Global Circulation Model (GCM) scenarios, reanalysis and observed local data are necessary to perform the
downscaling. Details of the definition/calibration of the statistical downscaling approach are shown.
Figure 2: Main window of the downscaling portal. Management of the experiments (left) and the jobs/tasks (right).
The panel MyJobs allows monitoring the status (starting, 3. Downscaling Method: Definition and validation of one
reading, running, finished, etc.) and type (predictors, vali- or several downscaling methods to be applied in the ex-
dation, downscaling) of the jobs, which are run in parallel periment.
by the portal through a queue of computational resources,
which allows to handle and monitor simultaneously sev- Once the Predictor Predictand Downscaling Method
eral requests2 . Moreover, a thread with the different exe- chain of tasks has been completed, the downscaling methods
cutions stages (reading, performing downscaling, writing re- will be ready to downscale the control and future scenarios of
sults, etc.) and the corresponding execution times can be dis- any of the available GCMs (see the scheme in Fig. 1). This
played for each job. Finally, a job can be killed during its final task is done in the Downscale window.
execution when it is taking longer than expected and when
the user needs extra computational slots. The information
about the account details, including the restrictions holding
on the resources (number of simultaneous jobs, etc.), can be
consulted at any time in the My Account tab (see figure 3)
in the upper-right corner of the window. It also gives infor-
mation about the databases available for the current user.
4 Selecting the Predictors future (B1, A1B and A2, for 2001-2100) scenarios; these
models will be described later in the downscaling section.
As shown in Fig. 1, predictor datasets are defined based on
Each particular experiment (shown in the Experiment man- reanalysis data (since day-to-day correspondence with obser-
ager panel from MyHome window) is based on a single vations is required in order to establish the statistical transfer
predictor dataset defined from reanalysis data over a partic- functions used for downscaling).
ular region with a particular resolution. Therefore, a one-
to-one correspondence is established in the portal between Figure 4 shows the view(top) and create (bottom) pan-
an experiment and the particular predictor dataset used 3 . els from the Predictor window allowing to visualize pre-
New predictors (i.e. new experiments) can be defined from dictor datasets of the existing experiments, or to create new
the Experiment manager window (New predictor button) ones, respectively. Note that online help (label 1 in the fig-
or from the Predictor window (second tab of the portal) ure) is provided in all windows to give relevant information
by specifying a reanalysis (ERA40 by default), a geographi- about the different tasks to be performed. For a particular
cal area, a grid resolution (the original reanalysis resolution experiment selected from the pop-up menu (2) the view
by default) and a set of large-scale variables (variable-level panel shows the following information (3): Dataset (reanaly-
pairs). sis used), Dates (time period), Time resolution (24h for daily
data), Lon and Lat (geographical domain), Resolution (hor-
In order to manage a homogeneous basic set of param- izontal and vertical resolution) and, finally, Predictors (vari-
eters for the different GCM outputs (reanalysis and climate ables used as predictors for the experiment). In this example
change projections), a dataset of commonly-used predictor (Iberia demo) we have considered an area of interest cover-
variables at a daily basis has been defined (see Table 1). ing the Iberian peninsula and included basic predictor param-
eters covering the period 1960-1999; this information consti-
Variable (Code) Levels (mb) Time tutes the predictor dataset (as shown in Fig. 1).
Geopotential (Z) 1000,850,700,500,300 00
The create panel allows defining new experiments by
V velocity (V) 850,700,500,300 00 defining the associated predictor dataset (see Figure 4 bot-
U velocity (U) 850,700,500,300 00 tom). In particular in the following we illustrate the defi-
Temperature (T) 850,700,500,300 00 nition of the Iberia demo predictor dataset above described.
Specific humidity (Q) 850,700,500,300 00 First, a reanalysis must be chosen (4), and the time window
and grid (longitude/latitude area and grid resolution) to be
Relative Vorticity (VO) 850,700,500,300 00 used in the experiment must be specified; the map (5) shows
Divergence (D) 850,700,500,300 00 the resulting grid. Alternatively, the region to be used can
MSLP (MSL) surface daily be graphically selected by shift-clicking and dragging in this
window, and the resolution can be manually configured in
2m Temperature (2T) surface 00
(4). Afterwards, the particular predictors must be selected (6)
by choosing the variable, level (when required) and the base
Table 1: Description of the variables, height levels and times
hour (by default 00 UTC), for instantaneous variables (see
(UTC) of the common set of parameters used in the portal. Time
Table 1); in the example shown, the selected predictors are Z
values daily refer to daily mean values, whereas times 00 refer to
at 500 mb, T and Q at 850 mb and SLPd (d denotes daily
instantaneous values.
mean). Moreover, since the GCM models to be used later
As a compromise among the different native horizontal for downscaling may lack some of these predictors, a panel
resolutions of the models that will be used to project future (7) indicates the GCMs (among those available for the user)
climate, a common 2.5 x 2.5 grid was considered. Reanal- compatible with the selected set of predictors, i.e. the models
ysis and models are interpolated to this grid using standard with the scenario data required to be downscaled within the
bilinear interpolation. In particular, we have downloaded, current predictor dataset (i.e. within the current experiment).
post-processed and stored data for the ERA40 ECMWF re- Once the information has been defined, a name can be given
analysis, the NCEP/NCAR Reanalysis1 (see Brands et al., (8) and the create new predictor button can be clicked to
2012, for a comparison of these two reanalysis for downscal- define the new experiment. Note that the name of the ex-
ing porpuses) and from different GCMs from the ENSEM- periment can be changed afterwards from the My Home
BLES project4 both in control (20c3m, for 1961-2000) and window.
3
Note that this restriction could be problematic for a friendly use Note that the definition of a predictor dataset involves
of the portal, since running a downscaling method for a given pre- several calculations to prepare the data in order to speed-up
dictand with different predictors would imply defining a new exper- the downscaling process; for instance, PCs explaining a 99%
iment. However, the flexibility to freely combine predictors, predic- of the variance are computed (and stored) for the selected
tands and downscaling techniques lead to data-compatibility prob- period. Thus, when creating a new predictor/experiment, a
lems which can not be solved in a user-friendly form. This restric- job is launched to the portal (labeled as PREDICTOR) and
tion may change in future versions of the portal, if the development the execution can be followed in the Jobs panel until termi-
team find out a solution to overcome these problems. nation. For instance, the My Jobs panel shown in Fig. 2
4
Both the IPCC-AR4 simulations (ENSEMBLES Stream1) and shows a job with ID code 606 ran on 9th March 2011 to de-
the new simulations done in the project (Stream2); see http://cera- fine a PREDICTOR dataset (Iberia demo in this case) which
www.dkrz.de/WDCC/ui/BrowseExperiments.jsp?proj=ENSEMBLES lasted 5 minutes (not shown in the figure).
Figure 4: Windows to visualize an existing predictor (above) and to create new ones (below). Numbers refer to the different elements of the
windows and are explained in the running text.
Figure 5: Window to create a predictand for a particular experiment (Iberia demo in this case).
5 Selecting the Predictand(s) (3), in this case maximum daily temperature. Afterwards
the points/stations of interest must be graphically selected by
adding (or removing) points (4) and shift clicking and drag-
The statistical downscaling portal contains different sources ging on the map to define an inclusion (or exclusion) square
of historical data which can be used as predictands (tar- (5); the labels of the stations can be optionally displayed on
gets) in the downscaling process. For instance, open-access the map to facilitate this task. Moreover, information about
datasets such as GSN (Global Stations Network) or GSOD the stations currently selected can be consulted at any time
(Global Summary of the Day) have been included in or- (6). According to the restrictions of the users account, there
der to have a minimum set of historical information to test is a maximum of stations/points that can be selected for a
the downscaling methods worldwide (consult the informa- particular predictand. For instance, users with a basic profile
tion about these datasets in the portal). Moreover, the user (i.e. those not involved in the supporting projects or institu-
can include new observation datasets into the portal. This tions) can only select five stations (see Fig. 3 for additional
option will be available in the new version of the portal5 . information).
The Predictands window allows viewing and creating Once the dataset, variable and stations have been defined,
predictands for an experiment from the available historical a name can be given to the predictand and it can be in-
datasets. Each predictand must be defined by considering a cluded in the corresponding experiment by clicking on the
single variable of interest (e.g. maximum temperature) and Create new Predictand button (8). Note that if the cre-
a number of points/stations among the ones lying within the ate default downscaling method checkbox is selected, then
region defined while creating the experiment (e.g. five cities a default statistical downscaling method (a pre-defined ana-
in the Iberian peninsula). Figure 5 illustrates the steps to log method) will be defined and validated for this predictand
be followed to create a new predictand for a particular ex- (see the next section); in this case a VALIDATION job will
periment, selected from the list of available experiments (1). be run by the portal and a new default downscaling method
First, the historical dataset to be used must be selected (2), will by automatically associated to the predictand.
in this case the GSOD Europe dataset, and the variable of
interest must be chosen among those existing for the dataset Once the predictor and predictand have been defined for
5
a particular experiment, the common historical dataset will
Note that the downscaling portal is compatible be used to calibrate and validate the different downscaling
with the MeteoLab observation datasets format; see methods, as explained in the next section.
http://www.meteo.unican.es/trac/MLToolbox/wiki/NewObs
Figure 6: Configuration panels for the different statistical downscaling techniques included in the statistical downscaling portal: (a) analogs,
(b) wealther typing, (c) linear regression, (d) neural networks. Note that when the predictand is precipitation linear regression is replaced by
generalized linear models (GLMs), with the same configuration options.
6 The Downscaling Method(s) ical period (the intersection of the reanalysis time-window
and the observations availability period, typically between
15 and 30 years). Then, the resulting statistical model is ap-
Different statistical methods have been proposed in the liter- plied to data from different GCM climate change simulations
ature to adapt the coarse predictions provided by global cli- in different scenarios to obtain the projected local forecast (in
mate models to the finer scales required by impact studies. this case the predictor data is build considering the predictor
These methods usually work in two steps (perfect progno- variables from the GCM outputs).
sis approach6 ): Firstly, an empirical relationship (a statisti-
cal model) is established between the large-scale reanalysis Usually, the different statistical downscaling methodolo-
variables (predictors) and the small-scale observed variables gies are broadly categorized into three classes (see, e.g.
of interest (predictands) using data from a common histor- Gutierrez et al., 2012, and references therein):
6
Thus, systematic model errors are not taken into account with
Weather typing (analogs), based on nearest neighbors or
this methodology and it will be a component of the downscaling er-
in a pre-classification of the reanalysis into a finite num-
ror. Recently MOS-like approaches have been tested in the climate
ber of weather types obtained according to their synop-
change context with promising results. These methods will be in-
tic similarity; these methods are usually non-generative,
cluded as an alternative to Perfect Prognosis in a future version of
since they consist of an algorithmic procedure to obtain
the downscaling portal.
the prediction, such as the method of analogs. (ELM) tab, the user can also define the number of Principal
Transfer functions (regression), based on linear regres- Components of the predictors (1) or the number of nearest
sion or nonlinear models (e.g., neural networks) to in- neighbors to be considered in the model (2). The default
fer the relationships between predictands and the large- configuration of this method takes 100 hidden neurons and
scale predictors; these methods are generative in the applies a sigmoidal function as activation function of the neu-
sense that the projections are derived from a model ob- ral network. This configuration can be modified by the user
tained from data. introducing an arbitrary number of hidden neurons (3) or se-
Weather generators, which stochastically simulate daily lecting another activation function within those implemented
climate values based on the available monthly average in the portal (4). A clustering method based on k-means can
projections or in resampling or simulation procedures also be defined (4) indicating the number of clusters to be
applied to the daily data. These techniques are temporal considered (6) in this case. Note that the Generalized Linear
disaggregation methods. Model (GLM) is also provided from the transfer functions
category. The GLM is a generalization of ordinary linear re-
The downscaling portal includes techniques from the first gression to variables that are not normally distributed (e.g.
two categories (weather generators will be also implemented precipitation). Then, this approach is available in the portal
in a future version of the portal), thus allowing to test and to downscale precipitation. The configuration for this ap-
compare the performance of several approaches (note that proach is similar to the options available in the linear regres-
the skill of statistical downscaling methods varies from vari- sion tab.
able to variable and from region to region). For a particular
experiment, a number of methods can be selected and config- As mention above, once a downscaling method is defined,
ured from the Downscaling Method window, as shown in a name must be assigned in the corresponding text-box la-
figure 6. The default configuration corresponds to an analog beled as Downscaling method name. Then, the method will
downscaling method, from the weather-typing category, con- be automatically validated by clicking on the Create new
sidering the closest analog day (Fig. 6a); additional configu- Method button. Note that every new downscaling method
rations with a different number of analogs/neighbors (1) and is automatically validated by the portal. Therefore, a job (la-
inference methods (2) can be selected by the user in this win- beled as VALIDATION) will be submitted to the portal and
dow. A comment can be included in (3) (this is optional) and its execution can be followed in the Jobs panel until termi-
a name for the particular technique in (4). These text boxes nation.
are defined for each downscaling method as shown in Fig.
6. Finally, the button (5) allows creating the defined tech-
nique. The status of the downscaling process can be checked
at any time in the My jobs panel of My home window 6.1 Validation of the SDM
(Fig. 2). From the second tab of the weather-typing cate-
gory, it is also possible to perform a statistical downscaling
method based on a pre-classification of the reanalysis data Every donwscaling method defined in the portal is automat-
into a finite number of weather types defined according to ically validated using a train/test validation approach. The
their synoptic similarity (Fig. 6b). The clustering method common historical period for predictors (reanalysis; note that
implemented in the portal is the k-means approach (1) and it this validation is done in Perfect Prognosis conditions) and
is possible to select the number of clusters or weather types predictands (local observations) is split into training (75%
to be considered (2). As in the Analogs tab, different ap- of the data) and test (the remaining 25%) subsets. In the
proaches are provided (3), in this case, to infer a prediction training phase the downscaling method is calibrated using
from the observations within the corresponding cluster. the training data (e.g. the regression coefficients are fitted to
the data), whereas in the test phase the method is validated
on the test data (note that the test data is not used in the cal-
The portal also includes linear regression methods (from ibration phase and, thus, the results can be extrapolated to
the transfer functions category), (Fig. 6c). In its default con- new datasets).
figuration, only the first five PCs of the predictors are con-
sidered for downscaling however, the user can modify this
number including an arbitrary number of PCs (1). A number The validation results are given in the View panel, for
of neighbor grid-boxes can be also introduced in the model a particular predictor, predictand and downscaling method
(2). This option tries to solve the underestimation of the pre- of interest (1 in figure 7). A description of the downscaling
dicted variability taking into account local effects by means method is given in (2). The results of the validation are given
of the nearest grid points data selected. However, the raw both as a summary PDF file (3) and in tabular form in the
model output values will be used as predictors and the spa- application window (4).
tial coherence of the method will be lost, since different local
points will have different downscaling models, with different The validation is performed both on a daily and on a
predictors (the same variables, but over different grid points). 10-day aggregated basis (4). In both cases, basic statistics
It is also possible to apply the linear regression method con- (mean, standard deviation, minimum and maximum values
ditioned on clusters which is based on the k-means approach and percentage of missing data) of the observations (Obs.
(3). The number of clusters can be indicated in (4), then stats) and the downscaled predictions (Pred. stats) are calcu-
a regression model is derived for each cluster. A nonlinear lated and displayed (5). Furthermore, other scores, such as
transfer function model based on neural networks is also im- percentiles, are also computed, but they are not shown in the
plemented in the portal (Fig. 6d). In the Neural Network default view for the sake of clarity (this can be configured in
Figure 7: Window to access to the results from the validation of the downscaling method.
the columns choice menu in 6). Similarly, the Accuracy (bottom) basis. The more accurate the method, the more lin-
and Distributional Similarity tabs show different validation ear the plot and the higher the correlation (rho for tempera-
scores related to the accuracy (the default ones are correla- ture and r for precipitation). Note that for precipitation the
tion, MAE, RMSE and normalized RMSE; see Appendix 1) HIR and the FAR scores are also given in the daily case, thus
and the reliability (bias, normalized bias, ratio of variances characterizing the discrete part of the distribution (see Ap-
and p-Value of the Kolmogorov-Smirnov test) of the method. pendix 1 for details on validation scores). The two plots on
In the case of precipitation, some additional scores related the right show the distributional similarity of the observed
to the occurrence character of precipitation will also be and predicted values, on a 10-day aggregated basis; the up-
shown. In particular the ratio of observed and predicted non- per figure shows the observed and predicted PDFs, including
precipitation frequencies and the Hit and False Alarm Rates the KS-pValue and PDF-Score (see Appendix 1 for details),
(HIR and FAR, respectively; see Appendix 1 for details). whereas the figure in the bottom shows the quantile-quantile
plot of the observations and predictions. In the case of pre-
By clicking on the right arrow in any of the score labels a cipitation, the plots correspond to rainy days; moreover, the
menu will appear (6). From it, the user can choose which numbers on the top of the figures show the scores for non-rain
scores (columns) to visualize, the ascending/descending days and, thus, the combination of both pieces of information
ranking of the stations; moreover, there is the possibility to gives a general idea of the performance of the method for this
display the spatial distribution of the score on the right hand mixed (discrete and continuous) variable.
side map.
Figure 8: Downscaling from reanalysis data (hindcast) for particular time slices.
sidering reanalysis data as input in the statistical downscaling 7 Downscaling GCM Scenarios
method. In this way the whole predicted series for the histor-
ical period can be obtained. Note that, since the training and
downscaling periods can overlap in this case, special care is Once a target predictand to be downscaled has been selected
to be taken in the definition of the time-window for the pre- for a particular experiment (predictor set), and the statisti-
dictors, considering a time-slice of the available reanalysis cal downscaling method has been calibrated and validated
data. However, in order to avoid problems with the analog using reanalysis data (under Perfect Prognosis conditions),
method, a one-month temporal exclusion window centered then the downscaling method is ready to be applied to future
in the downscaling date is considered in this case. Moreover, climate change scenarios, considering GCMs outputs in dif-
the sensitivity of the SD methods to reanalysis uncertainty ferent control (20c3m, for 1961-2000) and future scenarios
(Brands et al., 2012) can be tested by using different reanal- (B1, A1B and A2, for 2001-2100). This option is available
ysis datasets as input data at this point. in the Downscale window (the last tab of the application).
The portal contains GCM daily data from the following four
This option is available in the Hindcast tab (1) from the GCMs: BCM2, CNCM3, MPEH5 (ENSEMBLES Stream1)
Downscale window (see Fig. 8), where different reanalysis and HADGEM2 (ENSEMBLES Stream27 ), which have been
datasets (2) can be selected (e.g. ERA40, NCEP). The down- validated at a daily basis for the different upper-level fields
scaling method is selected in (3) from those already validated included as predictors in the portal (Brands et al., 2011). The
for the selected Predictor and Predictand. Note that the available variables and scenarios as well as information about
information of the predictor (Predictor tab) includes the the spatial coverage for each particular GCM can be con-
particular reanalysis and periods used; this information is to sulted by clicking on the Info label for the corresponding
be considered when performing hind cast experiments since model or on the My Account tab (as shown in Fig. 3).
the training and downscaling (test) datasets might overlap.
Different panels are available for viewing (and downloading) Figure 9 shows the Downscale window with the cre-
the existing downscalings or for creating new ones (4), such ate tab selected, as shown in (1). This window allows cre-
in the present case. The available periods for downscaling ating new downscalings for a particular predictor, predictand
are organized in decades, which can be directly selected for and downscaling method, selected from (2), as well as the
downscaling by clicking on the corresponding chec-boxes. scenario of interest (A1B in this case). For the particular
In the next section, further details are given on the downscal-
ing jobs and the access to the resulting data. 7
The details of the models are given in http://cera-
www.dkrz.de/WDCC/ui/BrowseExperiments.jsp?proj=ENSEMBLES.
Preferably Stream1 models were selected for this version of the
portal; however, HADGEM2 was selected from Stream2 because
the availability of daily data for the Stream1 MetOffice models was
limited.
Figure 9: Downscale create window to apply downscaling methods using GCM scenario data for particular time slices.
selection, the window shows a downscaling matrix includ- Fig. 9(5) note that by clicking on a decade label or on
ing the possible combinations of GCMs with available data8 a GCM label, all the corresponding cells are automatically
(in columns) as shown in (3) and the corresponding selected; afterwards, the corresponding downscaling jobs
time periods with available simulations (organized in rows, can be submitted by clicking on the run button, as shown
decade by decade) as shown in (4). In this case all the in (6); note that the portal will submit one job per cell, so the
GCM simulations span the whole period of 10 decades but, accounts restrictions will determine the maximum number
in general, different models may have different simulated pe- of cells that can be selected/submitted simultaneously9 . For
riods (e.g. the models downloaded from the IPCC database instance, users with a basic profile (i.e. those not involved
which include only certain time-slices, e.g. 2081-2100). in the supporting projects or institutions) can only run two
jobs simultaneously, which include the creation of predictor,
Each of the elements in the matrix (a decade for a partic- predictand (with the basic downscaling method), or down-
ular GCM for a given scenario) is considered a downscaling scaling method, as well as the downscaling jobs. Therefore,
cell and it is run by the portal as an independent job. Same downscaling the A1B scenario for the whole 2001-2100 pe-
criteria is applied in the hindcast tab. One or several of riod for a particular GCM would require five run steps (two
these cells (jobs) can be selected by clicking on them, as in decades each) in the portal (in case that the user is not run-
9
8
Note that some of the variables included in the predictor defini- See Fig. 3 for more information about your accounts restric-
tion may be missing for some of the GCMs, e.g. 1000 mb levels in tions; in particular you may consult the number of simultaneous
the HADGEM2 model; in those cases, the GCM will not be avail- jobs allowed for your account: ConcurrentJobs. These limitations
able for downscaling for this predictor; note that this information is have been considered to keep the downscaling jobs at a reasonable
available when creating the predictor as shown in Fig. 4 (7). level of complexity, in terms of the memory needed and duration of
the task.
Figure 11: Downscaled projections can be downloaded in a .csv file and loaded in, e.g. Excel.
ning any other task). We strongly advise the users to first View panel. By clicking on the existing ones (those with
downscale, download and analyze a single decade before per- a check box) the user can select those downscaling cells of
forming more exhaustive downscaling tasks, as we did in interest and download them in a .csv file (Download se-
the Iberia demo experiment (the 2091-2100 decade for the lected downscalings). This file can be easily converted to
ECHAM5 A1B scenario). a commonly used Excel .xls in which daily predictions for
all the stations (in columns) selected in the Predictand win-
The status of the jobs can be checked at any time in the dow are in displayed in rows; note that the dates may no be
Jobs info button (in the upper right corner of the window) consecutive and, therefore, you may need to sort the rows
or in the My jobs panel of My home window. A typical by the first column (the date) to obtain a chronological file.
downscaling job will access the required data (the GCM sce- This allows the user to easily manipulate the data, drawing
nario simulations and the reanalysis and observed data) and projected time series, etc. For instance, Fig. 11 shows the
apply the downscaling method, producing the local projec- csv file downloaded with the projections of the 2091-2100
tions for the defined locations/stations and period; this pro- decade for the ECHAM5 model shown in Fig. 10. A graph
cess takes typically some minutes and goes through different of the daily temperatures for two out of the four stations
stages, which are indicated in the Jobs info panel: START- (Madrid and Navacerrada) have been drawn by simply us-
ING, RUNNING, etc., until the job finishes normally (FIN- ing the drawing facilities in Excel. The .csv file includes
ISHED), or abnormally (ERROR). The different stages are some header lines (the first 22 lines in Fig. 11) describing
also indicated with a background color in the correspond- the predictors, the GCM and scenario, downscaling method
ing downscaling cell: yellow for STARTING (i.e., the job and the predictands/stations (labelled as c1, c2, etc.) corre-
is waiting at the execution queue), blue for RUNNING (i.e. sponding to the particular downscaling. The remaining rows
the job is running at the cluster), green for FINISHED and correspond to the data, including the date in the first col-
red for ERROR (indicating some failure of the process). In umn, and the stations in the remaining ones, following the
this last case, we advise the user to wait a couple of hours order c1, c2, etc. defined in the header. Note that the name
and re-submit the job (in order to avoid possible spurious er- of the file is also informative of the particular downscaling
rors in the computing infrastructure) and, if the error persist, details (Iberia demo - Tmax 5cities - Analogues (default) -
contact the portal development team using the email contact MPEH5 - A1B.csv in this case).
form included in the upper left corner of the portal.
8 Acknowlegments
References
9 Appendix 1: Validation scores average. It is defined as the square root of the variance:
v
u n
u1 X
This section provides a more detailed description of the vali- =t (xi x)2 (3)
n i=1
dation process generated in the downscaling portal. It makes
an attempt to help the user to properly analyse the statisti-
cal scores calculated for the validation of the downscaling The standard deviation is also greatly influenced by out-
method applied. liers. A useful property of the standard deviation is that,
unlike variance, it is expressed in the same units as the
data.
Validaton is performed at two different time-scales in the IQR: Interquartile range. It is a robust score that also
downscaling portal: daily and 10-daily aggregated data. De- estimates the dispersion in a sample, but it is not influ-
pending on the users needs, both time-scales might be useful enced by outliers. It is defined as the difference between
and the downscaling methods may show higher performance the upper (75th percentile) and lower (25th percentile)
on the aggregated one, particularly for precipitation, being quartiles, Q3 and Q1 respectively.
more informative for validation purposes. Note that addic-
tiona validation scores are computed for precipitation, in or- IQR = Q3 Q1 (4)
der to take into account its dual (discrete/continuous) char-
acter. These scores will be identified with a only for pre- The interquartile range is commonly used to build box-
cipitation label in the following description. Labels in bold plots, simple graphical representations that shows with
correspond to the codes used in the downscaling portal (Sec. a box the spread of the data falling between the 25th and
6.1). All statistics are computed using the period defined for 75th percentiles.
the particular experiment, so these scores (including descrip- PX: X th percentile. Value below which X% of the data
tive ones), might change from experiment to experiment. points are found. X = 5, 10, 90, 95.
Missing: Percentage of missing values within the data:
[0,100].
9.1 Descriptive Statistics
9.2 Accuracy
Basic descriptive statistics of observed (forecast) series.
RR: Rainfall Rate (only for precipitation). This score Accuracy is one of the main aspects that must be examined
measures the frequency of wet days and it is calculated when looking at the quality of a forecast since it measures
as the number of wet days divided by the size of the sam- the level of agreement between forecasts and observed time
ple, n, expressed in % series. Note that some of the scores are presented in units
of some descriptive statistic, what allows for direct compari-
nwet son among stations and/or seasons, not worrying about their
RR = 100 (1)
n different regimes. In particular, those scores re-scaled by the
The threshold considered for defining wet days is 0.1 Mean (Sigma) are named with a n (N) at the beginning of
mm. their names.
Mean: Arithmetic mean. It measures the central ten-
dency in a sample. It is calculated as the sum of all data HIR: Hit Rate (only for precipitation). It is the prob-
points (xi , i = 1, ...n) divided by the size of the sample, ability of occurrences (o) (i.e. wet day) that were cor-
n. rectly forecast (f). This score ranges in [0,1] being 1 the
n
perfect score.
1X
x = xi (2) HIR = P (f = 1|o = 1) (5)
n i=1
The arithmetic mean is greatly influenced by outliers. FAR: False Alarm Rate (only for precipitation). It is
For this reason, robust statistics such as the median may the probability of non-occurrences that were incorrectly
provide a better description of central tendency. forecasts. This score ranges in [0,1] being 0 the perfect
Median: Median. The median is also a measure of lo- score.
cation. It is described as the value separating the higher F AR = P (f = 1|o = 0) (6)
half of the sample from the lower one (50th percentile).
Note that both scores, HIR and FAR, are only calculated
It can be found by arranging all the values from the low-
in the portal for the case of the daily precipitation. They
est to the highest and picking the middle one. For data
are not calculated for the 10-daily validation since ag-
symmetrically-distributed , the mean and the median are
gregated data are considered to be continuous. HIR and
the same.
FAR must be considered together in order to validate the
Min: Minimum. The smallest value in the series.
discrete part precipitation. The threshold considered for
Max: Minimum. The largest value in the series.
defining wet days is 0.1 mm.
Sigma: Standard Deviation (also denoted as Std). It
rho: Pearsons Product-Moment Correlation Coeffi-
shows how much variation or dispersion exists from the
cient. It measures the strength of the linear relationship
It has a singularity at o = 0 (could occur for tempera- It does not measure the punctual correspondence be-
tures, for instance). tween forecasts and observations, i.e., it is possible to
NMAE: Mean Absolute Error (MAE), in units of the get a perfect score for a bad forecast if errors are com-
observed standard deviation. Ranged in [0,). Perfect pensated.
NBias: Bias, in units of the observed standard devia- In addition, the PDF Score is hardly sensitive to failures
tion. It ranges in [0,). Perfect score: 0. in the tails of the distributions. Thus, the user should
not rely exclusively on this score for validation, espe-
Bias cially when projecting under climate change scenarios.
(16)
o We strongly recommend to consider both KS and PDF
scores in conjunction.
It has a singularity at o = 0.
RV: Ratio of Variances. This scores measures the ratio
Note that, for the special case of daily precipitation, and
between forecast and observed variances, in units of the
due to the high mass of probability density located at
observed one. It ranges in [0,). Perfect score: 1.
zero, the KS-pValue, KSX-pVvalue and the PDF Score
f2 are calculated for the continuous part of the distribu-
RV = (17) tions, by considering exclusively the observed and fore-
o2 casted wet days. The discrete occurrence/non occur-
It has a singularity at o = 0. rence event is validated through the above explained
KS-pValue: pValue from the two-sample Kolmogorov- HIR, FAR and Ratio scores. The latter scores are cal-
Smirnov test. This score ranges in [0,1]. The null hy- culated over the entire observed and forecasted series
pothesis of equality of distributions is rejected when the for the 10-daily precipitation and temperature at both
significance level equals or exceeds this pValue. The time-scales.
Kolmogorov - Smirnov test for two samples of sizes n
and n0 measures a distance, Dn,n0 , between both cumu-
lative density functions. Dn,n0 is calculated as: