r Climd Ex User Manual
r Climd Ex User Manual
0)
User Manual
By
The RClimDex is developed and maintained by Xuebin Zhang and Feng Yang at the
Climate Research Branch of Meteorological Service of Canada. Its initial development
was funded by the Canadian International Development Agency through the Canada
China Climate Change Cooperation (C5) Project. Lisa Alexander, Francis Zwiers, Byron
Gleason, David Stephenson, Albert Klan Tank, Mark New, Lucie Vincent, and Tom
Peterson made important contributions to the development and testing of the package.
Jose Luis Santos at CIIFEN helped to translate this document into Spanish. Earlier
versions of RClimDex have been used during CCl/CLIVAR ETCCDMI workshops in
Cape Town, South Africa, May 31-June 4, 2004, and in Maceio, Brazil, August 9-14,
2004. The lectures and attendees of the workshops provided very valuable suggestions for
the improvement of RClimDex.
2
TABLE OF CONTENTS
1. Introduction
3
1. Introduction
The original objective was to port ClimDex into an environment that does not depend on
a particular operating system. It was very natural to use R as our platform, since R is a
free and yet very robust and powerful software for statistical analysis and graphics. It
runs under both Windows and Unix environments. In 2003 it was discovered that the
method used for computing percentile-based temperature indices in ClimDex and other
programs resulted in inhomogeneity in the indices series. A fix to the problem requires a
bootstrap procedure that makes it almost impossible to implement in an Excel
environment. This has made it more urgent to develop this R based package.
A main objective of constructing climate extremes indices is to use for climate change
monitoring and detection studies. This requires that the indices be homogenized. Data
homogenization has been planned but is not implemented in this release. Current
RClimDex only includes a simple data quality control procedure that was provided in
ClimDex. As in ClimDex, we require that data are quality controlled before the indices
can be computed. This users’ manual provides step-by-step instructions on 1) The
installation of R and setting up the user environment, 2) Quality control of daily climate
data, 3) Calculation of the 27 core indices.
RClimDex requires the base package of R and graphic user interface TclTk. The
installation of R involves a very simple procedure. 1) Connect to the R project website at
4
http://www.r-project.org, 2) Follow the links to download the most recent version of R
for your computer operating system from any mirror site of CRAN.
For Microsoft Windows (95, 98, 2000, and XP), download the Windows setup program.
Run that program and R will be automatically installed in your computer, with a short cut
to R on your desktop. The TclTk is included in the default installation of R 1.9.0 or later
versions. It may need to be installed separately if you are running an earlier version of R.
For Linux, download proper precompiled binaries and follow the instruction to install R.
For other unix systems, you many need to download source code and compile it yourself.
Under the Windows environment, double click the R icon on your desktop, or launch it
through Windows “start” menu. This usually gets you into the R user interface. For some
computers, you may need to first setup an environment variable called “HOME”. See R
for Windows FAQ (Appendix E) for details if you have any problems.
Exit from R by entering q() in the R console under both Windows and unix. Under
Windows, you may also click “File” menu and then “Exit”.
Within the R consol prompt “>”, enter source(“rclimdex.r”). This will load RClimDex
into R environment. You may need to include the full path before the filename
rclimdex.r.
Or you may download the most recent version from ETCCDMI web site by entering
source (“http://cccma.seos.uvic.ca/ETCCDMI/RClimDex/rclimdex.r”) if your computer
is connected to the internet. Under windows, RClimDex can also be loaded from drop
down menu. Choose the “File” from the RGui menu, and then select “Source R code”.
This will bring a new pop-up window within which you can select our R source code
“rclimdex.r” from the directory where the program was saved or type
http://cccma.seos.uvic.ca/ETCCDMI/RClimDex/rclimdex.r to download the latest version
from the web site .
5
Once the source code is successfully loaded, the RClimDex main menu will appear.
6
Select “Load Data and Run QC” from the RClimDex Menu to open a window as shown
below. This allows users to select (load) the data file from which indices are to be
computed.
The filename should be of the form “stationname.txt”. The values in the file should be of
the format described in Appendix B. In this menu, we use data from a station whose data
are stored in an ASCII file “21946.txt” for the purpose of demonstration. A pop-up
window, as shown below, will appear once the data for station 21946 are successfully
loaded.
Error messages will appear in the R console if this step has not been completed
successfully. This is usually caused by the wrong input data format. Please compare your
format with our sample data if you see such messages.
7
The default value for n is 3 (Criteria in the “Set Parameters for Data QC”) window, but
this number may be overwritten by the user. As a value of 3 may flag a very large
number of values, users may wish to start by setting this value to 4. There is no need to
fill in “Station name or code” as this parameter is for a later use. After setting the
parameter, click “OK” to continue.
Pop-up windows will appear if unreasonable values are found. For instance, when
minimum daily temperature is greater than maximum daily temperature, the following
message appears.
If there are any negative values (other than missing values coded as -99.9) in the daily
precipitation amount, the following message will appear.
8
If there are outliers, the following window appears.
A pop-up window appears once the data QC is complete. At the same time, four Excel
files, “21946tempQC.csv”, “21946prcpQC.csv”, “21946tepstdQC.csv”, and
“21946indcal.csv” are created in a subdirectory called log. The first two files contain
information about unreasonable values for temperature and precipitation. The third file
flags all possible outliers in daily temperature with the dates on which those outliers
occur. The last file contains the QC’d data and will be used for the indices calculation.
Note that, in this file, only missing values and unreasonable values are replaced with NA,
flagged possible outliers are NOT changed. For an easy visualization, 4 PDF files
containing time series plots (missing values are plotted as red dots) of daily precipitation
amount, daily maximum, minimum temperatures and daily temperature range are also
stored in log.
At this point, the user may check the data in the file “21946tepstdQC.csv” to determine if
any value marked as an outlier is really an outlier. The file “21946indcal.csv” can be
modified using Excel under Windows and any editor under Unix if any action needs to be
taken. After the completion of this step, the user may Click OK on the following window
to proceed with indices calculation.
9
Note that, the indices are computed from the QC’d data. The original input file is not
altered in any manner. So if a user chose to modify the original data file to correct some
of the problematic values, the Load Data and Run QC procedure needs to be performed
again on the improved data set before the changes can be reflected in the indices
calculation.
RClimDex is capable of computing all 27 core indices listed in Appendix A. Users may,
however, compute only those indices they require.
After selecting “Indices Calculation” from the main menu, a user is asked to set up some
parameters for the indices calculation. The “Set Parameter Values” window allows the
user to enter the first and last years of the base period for the threshold calculation, the
station latitude (Southern Hemisphere is negative) to determine in which hemisphere the
station is located, a user defined daily precipitation threshold, P (in mm), to compute the
number of days when daily precipitation amounts exceed this threshold (the Rnn
indicator), and 4 user defined temperature thresholds. The “User defined Upper Limit of
Day High” allows the calculation of the number of days when daily maximum
temperature has exceeded this threshold. The “User defined Lower Limit of Day High”
allows the calculation of the number of days when daily maximum temperature is below
this value. The “User defined Upper Limit of Day Low” allows the calculation of the
number of days when daily minimum temperature has exceeded this threshold. The “User
defined Lower Limit of Day Low” allows the calculation of the number of days when
daily minimum temperature is below this limit. These indices are called SUmm, FDmm,
TRmm, IDmm where “mm” corresponds to user defined value. This step includes some
data processing, so it will take a few seconds to finish.
Once this step is completed, a window will appear to allow the user to select their desired
indices for calculation. All indices are selected by default.
10
Uncheck indices that are not needed, then click “OK” to perform the computation.
Depending on the indices selected, this procedure may take a while.
A pop-up window will appear once the selected indices are computed.
Resulting indices series are stored in a sub-directory called indices in Excel format. The
indices files have names “21946_XXX.cvs” where XXX represents the name of the
index. Data columns are separated by a comma (“,”). For the purpose of visualization, we
plot annual series, along with trends computed by linear least square (solid line) and
locally weighted linear regression (dashed line). Statistics of the linear trend fitting are
displayed on the plots. These plots are stored in a sub-directory called plots in JPEG
format. The filenames for plots follow the same rule except that “cvs” is changed to
“jpg”.
Select “Indices Calculation” from the main menu to compute additional indices for the
same station. For additional stations, select “Data QC” and repeat the above process.
Select “Exit” if all required calculations are completed.
4. Known bugs
11
There is a known bug in this and earlier versions of RClimDex. The program will stop
running if the first year of the available data is the same as the first year of the base
period. This is caused by come computation that requires data beyond the boundary of the
base period. The calculation of percentile based temperature indices is an example. One
way to avoid this problem is to add an extra record for the day (with values marked as
missing just before the beginning of the base period. For example, if base period is 1961-
1990 and the data also starts in 1961, one may add “1960 12 31 -99.9 -99.9 -99.9” as the
first line for the input data file.
5. Bug report
Please report any bugs/errors to [email protected] with error messages and data being
used for the calculation of the indices. This will be helpful in producing a better release in
the near future. We would also appreciate your suggestions for further improvement.
12
APPENDIX A: List of ETCCDMI core Climate Indices
Indicator name UNIT
ID Definitions S
Frost days
FD0 Annual count when TN(daily minimum)<0ºC Days
Summer days
SU25 Annual count when TX(daily maximum)>25ºC Days
Ice days
ID0 Annual count when TX(daily maximum)<0ºC Days
Tropical nights
TR20 Annual count when TN(daily minimum)>20ºC Days
st st th
Growing season Annual (1st Jan to 31 Dec in NH, 1 July to 30 June
Length in SH) count between first span of at least 6 days with
GSL Days
TG>5ºC and first span after July 1 (January 1 in SH) of
6 days with TG<5ºC
Max Tmax
TXx Monthly maximum value of daily maximum temp ºC
Max Tmin
TNx Monthly maximum value of daily minimum temp ºC
Min Tmax
TXn Monthly minimum value of daily maximum temp ºC
Min Tmin
TNn Monthly minimum value of daily minimum temp ºC
Cool nights
TN10p Percentage of days when TN<10th percentile Days
Cool days
TX10p Percentage of days when TX<10th percentile Days
Warm nights
TN90p Percentage of days when TN>90th percentile Days
Warm days
TX90p Percentage of days when TX>90th percentile Days
Warm spell duration Annual count of days with at least 6 consecutive days
WSDI Days
indicator when TX>90th percentile
Cold spell duration Annual count of days with at least 6 consecutive days
CSDI Days
indicator when TN<10th percentile
Diurnal temperature
DTR Monthly mean difference between TX and TN ºC
range
Max 1-day
RX1day Monthly maximum 1-day precipitation Mm
precipitation amount
Max 5-day
Rx5day Monthly maximum consecutive 5-day precipitation Mm
precipitation amount
Simple daily Annual total precipitation divided by the number of wet Mm/
SDII
intensity index days (defined as PRCP>=1.0mm) in the year day
Number of heavy
R10 Annual count of days when PRCP>=10mm Days
precipitation days
Number of very
R20 heavy precipitation Annual count of days when PRCP>=20mm Days
days
Number of days Annual count of days when PRCP>=nn mm, nn is user
Rnn Days
above nn mm defined threshold
Consecutive dry
CDD Maximum number of consecutive days with RR<1mm Days
days
Consecutive wet
CWD Maximum number of consecutive days with RR>=1mm Days
days
13
Very wet days
R95p Annual total PRCP when RR>95th percentile Mm
Extremely wet days
R99p Annual total PRCP when RR>99th percentile mm
Annual total wet-day
PRCPTOT Annual total PRCP in wet days (RR>=1mm) mm
precipitation
14
APPENDIX B: Input Data Format
All of the data files that are read or written are in list formatted format. The exception is
the very first data file that is processed in the “ Quality Control” step. This input data file
has several requirements:
Example data Format for the initial data file (e.g. used in the ‘Quality Control’ step):
15
APPENDIX C: Indices definitions
Definitions for indicators listed in Appendix A. For practical reasons, in this version of
the software, not all indices are calculated on a monthly basis. Monthly indices are
calculated if no more than 3 days are missing in a month, while annual values are
calculated if no more than 15 days are missing in a year. No annual value will be
calculated if any one month’s data are missing. For threshold indices, a threshold is
calculated if at least 70% of data are present. For spell duration indicators (marked with a
*), a spell can continue into the next year and is counted against the year in which the
spell ends e.g. a cold spell (CSDI) in the Northern Hemisphere beginning on 31 st
December 2000 and ending on 6th January 2001 is counted towards the total number of
cold spells in 2001.
1. FD0
Let be the daily minimum temperature on day in period . Count the number of
days where:
2. SU25
Let be the daily maximum temperature on day period . Count the number of days
where:
3. ID0
Let be the daily maximum temperature on day in period . Count the number of
days where:
4. TR20
Let be the daily minimum temperature on day in period . Count the number of
days where:
5. GSL
Let be the mean temperature on day in period . Count the number of days between
the first occurrence of at least 6 consecutive days with:
16
and the first occurrence after 1st July (1st January in SH) of at least 6 consecutive days
with:
6. TXx
Let be the daily maximum temperatures in month , period . The maximum daily
maximum temperature each month is then:-
7. TNx
Let be the daily minimum temperatures in month , period . The maximum daily
minimum temperature each month is then:-
8. TXn
Let be the daily maximum temperatures in month , period . The minimum daily
maximum temperature each month is then:-
9. TNn
Let be the daily minimum temperatures in month , period . The minimum daily
minimum temperature each month is then:-
10. Tn10p
Let be the daily minimum temperature on day in period and let be the
calendar day 10th percentile centred on a 5-day window (calculated using method from
Appendix D). The percentage of time is determined where:
11. Tx10p
17
Let be the daily maximum temperature on day in period and let be the
th
calendar day 10 percentile centred on a 5-day window (calculated using method from
Appendix D). The percentage of time is determined where:
12. Tn90p
Let be the daily minimum temperature on day in period and let be the
th
calendar day 90 percentile centred on a 5-day window (calculated using method from
Appendix D). The percentage of time is determined where:
13. Tx90p
Let be the daily maximum temperature on day in period and let be the
th
calendar day 90 percentile centred on a 5-day window (calculated using method from
Appendix D). The percentage of time is determined where:
14. WSDI*
Let be the daily maximum temperature on day in period and let be the
calendar day 90th percentile centred on a 5-day window (calculated using method from
Appendix D). Then the number of days per period is summed where, in intervals of at
least 6 consecutive days:-
15. CSDI*
Let be the daily minimum temperature at day in period and let be the
th
calendar day 10 percentile centred on a 5-day window (calculated using the method
from Appendix D). Then the number of days per period is summed where, in intervals of
at least 6 consecutive days:-
16. DTR
Let and be the daily maximum and minimum temperature respectively on day
in period . If represents the number of days in , then:
18
17. RX1day
Let be the daily precipitation amount on day in period . Then maximum 1-day
values for period are:
18. Rx5day
Let be the precipitation amount for the 5-day interval ending , period . Then
maximum 5-day values for period are:
19. SDII
20. R10
Let be the daily precipitation amount on day in period . Count the number of
days where:
21. R20
Let be the daily precipitation amount on day in period . Count the number of
days where:
22. Rnn
19
Let be the daily precipitation amount on day in period . If represents any
reasonable daily precipitation value then, count the number of days where:
23. CDD*
Let be the daily precipitation amount on day in period . Count the largest number
of consecutive days where:
24. CWD*
Let be the daily precipitation amount on day in period . Count the largest number
of consecutive days where:
25. R95pTOT
26. R99p
27. PRCPTOT
20
Appendix D : Threshold estimation and base period temperature indices calculation
Hyndman and Fan (1996) suggest a formula to obtain medium un-biased estimate of the
quantile by letting and letting , where
int(u) is the largest integer not greater than u. The empirical quantile is set to the smallest
or largest value in the sample when j<1 or j> n respectively. That is, quantile estimates
corresponding to p<1/(n+1) are set to the smallest value in the sample, and those
corresponding to p>n/(n+1) are set to the largest value in the sample.
Bootstrap procedure for the estimation of exceedance rate for the base period:
It is not possible to make an exact estimate of the thresholds due to sampling uncertainty.
To provide temporally consistent estimate of exceedance rate throughout the base period
and out-of-base period, we adapt the following procedure (Zhang et al. 2004) to estimate
exceedance rate for the base period.
Reference:
21
Hyndman, R.J., and Y. Fan, 1996: Sample quantiles in statistical packages. The American
Statistician, 50, 361-367.
Zhang, X., G. Hegerl, F.W. Zwiers, and J. Kenyon, 2004: Avoiding inhomogeneity in
percentile-based indices of temperature extremes. J. Climate, submitted.
22