0% found this document useful (0 votes)
16 views34 pages

Agri 225 Agricultural Meteorological Data

Uploaded by

Ramos, Keith A.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views34 pages

Agri 225 Agricultural Meteorological Data

Uploaded by

Ramos, Keith A.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

CHAPTER 3

AGRICULTURAL METEOROLOGICAL DATA,


THEIR PRESENTATION AND STATISTICAL ANALYSIS

3.1 INTRODUCTION 3.2 DATA FOR AGRICULTURAL


METEOROLOGY
Agricultural meteorology is the science that applies
knowledge in weather and climate to qualitative Agrometeorological data are usually provided to
and quantitative improvement in agricultural users in a transformed format; for example, rainfall
production. Agricultural meteorology involves data are presented in pentads or in monthly
meteorology, hydrology, agrology and biology, amounts.
and it requires a diverse, multidisciplinary array of
data for operational applications and research.
3.2.1 Nature of the data
Basic agricultural meteorological data are largely
the same as those used in general meteorology. Basic agricultural meteorological data may be
These data need to be supplemented with more divided into the following six categories, which
specific data relating to the biosphere, the envi- include data observed by instruments on the ground
ronment of all living organisms, and biological and by remote-sensing.
data relating to the growth and development of (a) Data relating to the state of the atmospheric
these organisms. Agronomic, phenological and environment. These include observations
physiological data are necessary for dynamic of rainfall, sunshine, solar radiation, air
modelling, operational evaluation and statistical temperature, humidity, and wind speed and
analyses. Most data need to be processed for gener- direction;
ating various products that affect agricultural (b) Data relating to the state of the soil envi-
management decisions in matters such as crop- ronment. These include observations of soil
ping, the scheduling of irrigation, and so forth. moisture, that is, the soil water reservoir for
Additional support from other technologies, such plant growth and development. The amount
as geographical information and remote-sensing, of water available depends on the effective-
as well as statistics, is necessary for data process- ness of precipitation or irrigation, and on the
ing. Geographical information and remote-sensing soil’s physical properties and depth. The rate
data, such as images of the status of vegetation of water loss from the soil depends on the
and crops damaged by disasters, soil moisture, and climate, the soil’s physical properties, and the
the like, should also be included as supplementary root system of the plant community. Erosion
data. Derived agrometeorological parameters, such by wind and water depends on weather factors
as photosynthetically active radiation and poten- and vegetative cover;
tial evapotranspiration, are often used in (c) Data relating to organism response to vary-
agricultural meteorology for both research and ing environments. These involve agricultural
operational purposes. On the other hand, many crops and livestock, their variety, and the state
agrometeorological indices, such as the drought and stages of their growth and development,
index, the critical point threshold of temperature as well as the pathogenic elements affect-
and soil water for crop development, are also ing them. Biological data are associated with
important for agricultural operations. Weather phenological growth stages and physiological
and climate data play a crucial role in many agri- growth functions of living organisms;
cultural decisions. (d) Information concerned with the agricultural
practices employed. Planning brings the best
Agrometeorological information includes not only available resources and applicable production
every stage of growth and development of crops, technologies together into an operational farm
floriculture, agroforestry and livestock, but also the unit. Each farm is a unique entity with combi-
technological factors that affect agriculture, such as nations of climate, soils, crops, livestock and
irrigation, plant protection, fumigation and dust equipment to manage and operate within the
spraying. Moreover, agricultural meteorological farming system. The most efficient utilization
information plays a crucial role in the decision- of weather and climate data for the unique
making process for sustainable agriculture and soils on a farm unit will help conserve natural
natural disaster reduction, with a view to preserving resources, while at the same time promoting
natural resources and improving the quality of life. economic benefit to the farmer;
3–2 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

(e) Information relating to weather disasters and carrying out similar research work. At the
their influence on agriculture; same time, the existence of these data should
(f) Information relating to the distribution of be publicized at the national level and possi-
weather and agricultural crops, and geograph- bly at the international level, if appropriate,
ical information, including digital maps; especially in the case of longer series of special
(g) Metadata that describe the observation tech- observations;
niques and procedures used. (d) All the usual data storage media are recom-
mended:
(i) The original data records, or agromete-
3.2.2 Data collection
orological summaries, are often the most
The collection of data is very important as it lays convenient format for the observing
the foundation for agricultural weather and climate stations;
data systems that are necessary to expedite the (ii) The format of data summaries intended
generation of products, analyses and forecasts for for forwarding to regional or national
agricultural cropping decisions, irrigation manage- centres, or for dissemination to the user
ment, fire weather management, and ecosystem community, should be designed so that
conservation. The impact on crops, livestock, water the data may be easily transferred to a vari-
and soil resources, and forestry must be evaluated ety of media for processing. The format
from the best available spatial and temporal array should also facilitate either the manual
of parameters. Agrometeorology is an interdiscipli- preparation or automated processing
nary branch of science requiring the combination of statistical summaries (computation
of general meteorological data observations and of means, frequencies, and the like). At
specific biological parameters. Meteorological data the same time, access to and retrieval of
can be viewed as typically physical elements that data files should be simple, flexible and
may be measured with relatively high accuracy, reproducible for assessment, modelling or
while other types of observations (namely, biologi- research purposes;
cal or phenological) may be more subjective. In (iii) Rapid advances in electronic technology
collecting, managing and analysing the data for facilitate effective exchange of data files,
agrometeorological purposes, the source of data summaries and charts of recording instru-
and the methods of observation define their char- ments, particularly at the national and
acter and management criteria. Some useful international levels;
suggestions with regard to the storage and process- (iv) Agrometeorological data should be trans-
ing of data can be offered, however: ferred to electronic media in the same way
(a) Original data files, which may be used for as conventional climatological data, with
reference purposes (the daily register of obser- an emphasis on automatic processing.
vations, and so on), should be stored at the
observation site; this applies equally to atmos- The availability of proper agricultural meteorological
pheric, biological, crop and soil data; databases is a major prerequisite for studying and
(b) The most frequently used data should be managing the processes of agricultural and forest
collected at national or regional agrometeoro- production. The agricultural meteorology
logical centres and reside in host servers for community has great interest in incorporating new
network accessibility. This may not always be information technologies into a systematic design
practical, however, since stations or laborato- for agrometeorological management to ensure
ries under the control of different authorities timely and reliable data from national reporting
(meteorological services, agricultural services, networks for the benefit of the local farming
universities, research institutes) often collect community. While much more information has
unique agrometeorological data. Steps should become available to the agricultural user, it is
therefore be taken to ensure that possible users essential that appropriate standards be maintained
are aware of the existence of such data, either for basic instrumentation, collection and
through some form of data library or compu- observations, quality control, and archiving and
terized documentation, and that appropriate dissemination. After they have been recorded,
data exchange mechanisms are available to collected and transferred to the data centres, all
access and share these data; agricultural meteorological data need to be
(c) Data resulting from special studies should be standardized or technically treated so that they can
stored at the place where the research work is be used for various purposes. The data centres need
undertaken, but it would be advantageous to to maintain special databases. These databases
arrange for exchanges of data among centres should include meteorological, phenological,
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–3

edaphic and agronomic information. Database ways, including by mail, telephone, telegraph,
management and processing and the quality fax and Internet, and via Comsat; transmission
control, archiving, timely accessing and via the Internet and Comsat is more efficient.
dissemination of data are all important components After reaching the data centres, data should
that render the information valuable and useful in be identified and processed by means of a
agricultural research and operational programmes. special program in order to facilitate their
dissemination to other users.
After they have been stored in a data centre, the data
are disseminated to users. There have been major
3.2.4 Scrutiny of data and acquisition of
advancements in making more data products availa-
metadata
ble to the user community through automation. The
introduction of electronic transfer of data files via the It is very important that all agricultural meteorologi-
Internet using the file transfer protocol (FTP) and the cal data be carefully scrutinized, both at the observing
World Wide Web (WWW) has brought this informa- station and at regional or national centres, by means
tion transfer process up to a new level. The Web allows of subsequent automatic computer processing. All
users to access text, images and even sound files that data should be identified immediately. The code
can be linked together electronically. The Web’s parameters should be specified, such as types, regions,
attributes include the flexibility to handle a wide missing values and possible ranges for different meas-
range of data presentation methods and the capabil- urements. The quality control should be done
ity to reach a large audience. Developing countries according to Wijngaard et al. (2003), WMO-TD
have some access to this type of electronic informa- No. 1236 (WMO, 2004a) and the current Guide to
tion, but limitations still exist in the development of Climatological Practices (WMO, 1983). Every measure-
their own electronically accessible databases. These ment code must be checked to make certain that the
limitations will diminish as the cost of technology measurement is reasonable. If the value is unreasona-
decreases and its availability increases. ble, it should be corrected immediately. After being
scrutinized, the data can be processed further for
different purposes. In order to ascertain the quality of
3.2.3 Recording of data
observation data and determine whether to correct or
Recording of basic data is the first step for agricul- normalize them before analysis, metadata are needed.
tural meteorological data collection. When the These are the details and history of local conditions,
environmental factors and other agricultural mete- and instrumentation, operational, data-processing
orological elements are measured or observed, they and other factors relevant to the observation process.
must be recorded on the same media, such as agri- Such metadata should be documented and treated
cultural meteorological registers, diskettes, and the with the same care as the data themselves (see WMO
like, manually or automatically. 2003a, 2003b). Unfortunately, observation metadata
(a) The data, such as the daily register of obser- are often incomplete and poorly organized.
vations and charts of recording instruments,
should be carefully preserved as permanent In Chapter 2 of this Guide, essential metadata are
records. They should be readily identifiable specified for individual parameters and the
and include the place, date and time of each organization of their acquisition is reviewed in
observation, and the units used. 2.2.5. Many kinds of metadata can be recorded as
(b) These basic data should be sent to analysis simple numbers, as is the case with observation
centres for operational uses, such as local heights, for example; but more complex aspects,
agricultural weather forecasts, agricultural such as instrument exposure, must also be
meteorological information services, plant recorded in a manner that is practicable for the
protection treatment and irrigation guidance. observers and station managers. Acquiring
Summaries (weekly, 10-day or monthly) of metadata on present observations and inquiring
these data should be made regularly from the about metadata on past observations are now a
daily register of observations according to the major responsibility of data managers. Omission
user demand and then distributed to inter- of metadata acquisition implies that the data will
ested agencies and users. have low quality for applications. The optimal
(c) Observers need to record all measurements set-up of a database for metadata is at present still
in compliance with rules for harmonization. in development, because metadata characteristics
This will ensure that the data are recorded are so variable. To be manageable, the optimal
in a standard format so that they can readily database should not only be efficient for archiving,
be transferred to data centres for automatic but also easily accessible for those who are
processing. Data can be transferred in several recording the metadata. To allow for future
3–4 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

improvement and continuing accessibility, good 3.2.6 Catalogue of data


metadata database formats are ASCII, SQL and
XML, because they are independent of any Very often, considerable amounts of agrometeoro-
presently available computing set-up. logical data are collected by a variety of services.
These data sources are not readily publicized or
accessible to potential users, which means that
3.2.5 Format of data
users often have great difficulty in discovering
The basic data obtained from observing stations, whether such data exist. Coordination should
whether specialized or not, are of interest to both therefore be undertaken at the global, regional and
scientists and agricultural users. A number of national levels to ensure that data catalogues are
established formats and protocols are available for prepared periodically, while giving enough back-
the exchange of data. A data format is a docu- ground information to users. The data catalogues
mented set of rules for the coding of data in a form should include the following information:
for both visual and computer recognition. Its uses (a) The geographical location of each observing site;
can be designed for either or both real-time use (b) The nature of the data obtained;
and historical or archival data transfer. All the crit- (c) The location where the data are stored;
ical elements for identification of data should be (d) The file types (for instance, manuscript,
covered in the coding, including station identifi- charts of recording instruments, auto-
ers, parameter descriptors, time encoding mated weather station data, punched cards,
conventions, unit and scale conventions, and magnetic tape, scanned data, computerized
common fields. digital data);
(e) The methods of obtaining the data.
Large amounts of data are typically required for
processing, analysis and dissemination. It is For a more extensive specification of these aspects,
extremely important that data are in a format that see Chapter 2, section 2.2.5.
is both easily accessible and user-friendly. This is
particularly pertinent as more and more data
become available in electronic format. Some types
of software, such as NetCDF (network common 3.3 DISTRIBUTION OF DATA
data form), process data in a common form and
disseminate them to more users. NetCDF consists
3.3.1 Requirements for research
of software for array-oriented data access and a
library that provides for implementation of the In order to highlight the salient features of the influ-
interface (Sivakumar et al., 2000). The NetCDF ence of climatic factors on the growth and
software was developed at the Unidata Program development of living things, scientists often have
Center in Boulder, Colorado, United States. This is to process a large volume of basic data. These data
an open-source collection of tools that can be might be supplied to scientists in the following
obtained by anonymous FTP from ftp://ftp. forms:
unidata.ucar.edu/pub/netcdf/ or from other mirror (a) Reproductions of original documents (origi-
sites. nal records, charts of recording instruments)
or periodic summaries;
The NetCDF software package supports the crea- (b) Datasets on a server or Website that is ready
tion, access and sharing of scientific data. It is for processing into different categories, which
particularly useful at sites with a mixture of can be read or viewed on a platform;
computers connected by a network. Data stored (c) Various kinds of satellite digital data and imagery
on one computer may be read directly from on different regions and different times;
another without explicit conversion. The NetCDF (d) Various basic databases, which can be viewed
library generalizes access to scientific data so that as reference for research.
the methods for storing and accessing data are
independent of the computer architecture and
3.3.2 Special requirements for
the applications being used. Standardized data
agriculturists
access facilitates the sharing of data. Since the
NetCDF package is quite general, a wide variety Two aspects of the periodic distribution of agro­
of analysis and display applications can use it. meteorological data to agricultural users may be
The NetCDF software and documentation may be considered:
obtained from the NetCDF Website at http:// (a) Raw or partially processed operational data
www.unidata.ucar.edu/packages/netcdf/. supplied after only a short delay (rainfall,
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–5

potential evapotranspiration, water balance or of the type of data to be systematically distributed


sums of temperature). These may be distributed can be established on that basis. For example, when
by means of: both the climatic regions and the areas in which
i. Periodic publications, twice weekly, different crops are grown are well defined, an agrom-
weekly or at 10-day intervals; eteorological analysis can illustrate which crops are
ii. Telephone and note; most suited to each climate zone. This type of analy-
iii. Special television programmes from a sis can also show which crops can be adapted to
regional television station; changing climatic and agronomic conditions.
iv. Regional radio broadcasts; Agricultural users require these analyses; they can be
v. Release on agricultural or weather distributed by geographic, crop or climatic region.
Websites.
(b) Agrometeorological or climatic summaries
3.3.4 Minimum distribution of
published weekly, every 10 days, monthly or
agroclimatological documents
annually, which contain agrometeorological
data (rainfall, temperatures above the ground, Since the large number of potential users of agro­
soil temperature and moisture content, poten- meteorological information is so widely dispersed,
tial evapotranspiration, sums of rainfall and it is not realistic to recommend a general distribu-
temperature, abnormal rainfall and temperature, tion of data to all users. In fact, the requests for raw
sunshine, global solar radiation, and so on). agrometeorological data are rare. Not all of the raw
agrometeorological data available are essential for
those persons who are directly engaged in agricul-
3.3.3 Determining the requirements
ture – farmers, ranchers and foresters. Users
of users
generally require data to be processed into an
The agrometeorologist has a major responsibility to understandable format to facilitate their decision-
ensure that effective use of this information offers making process. But the complete datasets should
an opportunity to enhance agricultural efficiency be available and accessible to the technical services,
or to assist agricultural decision-making. The infor- agricultural administrations and professional organ-
mation must be accessible, clear and relevant. It is izations. These professionals are responsible for
crucial, however, for an agrometeorological service providing practical technical advice concerning the
to know who the specific users of information are. treatment and management of crops, preventive
The user community ranges from global, national measures, adaptation strategies, and so forth, based
and provincial organizations and governments to on collected agrometeorological information.
agro-industries, farmers, agricultural consultants,
and the agricultural research and technology devel- Agrometeorological information should be distrib-
opment communities or private individuals. The uted to all users, including:
variety of agrometeorological information requests (a) Agricultural administrations;
emanates from this broad community. Therefore, (b) Research institutions and laboratories;
the agrometeorological service must distribute the (c) Professional organizations;
information that is available and appropriate at the (d) Private crop and weather services;
right time. (e) Government agencies;
(f) Farmers, ranchers and foresters.
Researchers invariably know exactly which agro­
meteorological data they require for specific
statistical analyses, modelling or other analytical
studies. Often, many agricultural users are not just 3.4 DATABASE MANAGEMENT
unaware of the actual scope of the agrometeorologi-
cal services available, but also have only a vague idea The management of weather and climate data for
of the data they really need. Frequent contact agricultural applications in the electronic age has
between agrometeorologists and professional agri- become more efficient. This section will provide an
culturists, and enquiries through professional overview of agrometeorological data collection, data
associations and among agriculturists themselves, or processing, quality control, archiving, data analysis
visiting professional Websites, can help enormously and product generation, and product delivery. A
to improve the awareness of data needs. Sivakumar wide variety of database choices are available to the
(1998) presents a broad overview of user require- agroclimatological user community. To accompany
ments for agrometeorological services. Better the agroclimatological databases that are created,
applications of the type and quantity of useful agrometeorologists and software engineers develop
agrometeorological data available and the selection the special software for agroclimatological database
3–6 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

management. Thus, a database management system CLICOM provides tools (such as stations, observa-
for agricultural applications should be comprehen- tions and instruments) to describe and manage the
sive, bearing in mind the following considerations: climatological network. It offers procedures for the
(a) Communication among climatologists, key entry, checking and archiving of climate data,
agrometeorologists and agricultural extension and for computing and analysing the data. Typical
personnel must be improved to establish an standard outputs include monthly or 10-day data
operational database; from daily data; statistics such as means, maxi-
(b) The outputs must be adapted for an opera- mums, minimums and standard deviations; and
tional database in order to support specific tables and graphs. Other products requiring more
agrometeorological applications at a national/ elaborate data processing include water balance
regional/global level; monitoring, estimation of missing precipitation
(c) Applications must be linked to the Climate data, calculation of the return period and prepara-
Applications Referral System (CARS) tion of the CLIMAT message.
project, spatial interpolated databases and a
Geographical Information System (GIS). The CLICOM software is widely used in developing
countries. The installation of CLICOM as a data
Personal computers (PCs) are able to provide prod- management system in many of these countries has
ucts formatted for easy reading and presentation, successfully transferred the technology for use with
which are generated through simple processors, PCs, but the resulting climate data management
databases or spreadsheet applications. Some careful improvements have not yet been fully realized.
thought needs to be given, however, to what type of Station network density as recommended by WMO
product is needed, what the product looks like and has not been fully achieved and the collection of
what it contains, before the database delivery design data in many countries remains inadequate.
is finalized. The greatest difficulty often encountered CLICOM systems are beginning to yield positive
is how to treat missing data or information (WMO, results, however, and there is a growing recognition
2004a). This process is even more complicated when of the operational applications of CLICOM.
data from several different datasets, such as climatic
and agricultural data, are combined. Some software There are a number of constraints that have been
programs for database management, especially the identified over time and recognized for possible
software for climatic database management, provide improvement in future versions of the CLICOM
convenient tools for agrometeorological database system. Among the technical limitations, the list
management. includes (WMO, 2000):
(a) The lack of flexibility to implement specific
applications in the agricultural field and/or at
3.4.1 CLICOM Database Management
a regional/global level;
System
(b) The lack of functionality in real-time operations;
CLICOM (CLImate COMputing) refers to the (c) Few options for file import;
WMO World Climate Data Programme Project, (d) The lack of transparent linkages to other appli-
which is aimed at coordinating and assisting the cations;
implementation, maintenance and upgrading of (e) The risk of overlapping of many datasets;
automated climate data management procedures (f) A non-standard georeferencing system;
and systems in WMO Member countries (that is, (g) Storage of climate data without the corre-
the National Meteorological and Hydrological sponding station information;
Services in these countries). The goal of CLICOM (h) The possibility of easy modification of the data
is the transfer of three main components of entry module, which may destroy existing data.
modern technology, namely, desktop computer
hardware, database management software and 3.4.2 Geographical Information System
training in climate data management. CLICOM is (GIS)
a standardized, automated database management
system software for use on a personal computer A Geographical Information System (GIS) is a
and it is targeted at introduction of a system in computer-assisted system for the acquisition, storage,
developing countries. As of May 1996, CLICOM analysis and display of observed data on spatial
version 3.0 was installed in 127 WMO Member distribution. GIS technology integrates common
countries. Now CLICOM software is available in database operations such as query and statistical
Czech, English, French, Spanish and Russian. analysis with the unique visualization and geographic
CLICOM Version 3.1 Release 2 became available in analysis benefits offered by mapping overlays. Maps
January 2000. have traditionally been used to explore the Earth and
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–7

its resources. GIS technology takes advantage of developing future climate scenarios based on global
computer science technologies, enhancing the climate model (GCM) simulations or subjectively
efficiency and analytical power of traditional introduced climate changes for climate change impact
methodologies. models. Weather generators project future changes in
means (averages) onto the observed historical weather
GIS is becoming an essential tool in the effort to series by incorporating changes in variability; these
understand complex processes at different scales: projections are widely used for agricultural impact
local, regional and global. In GIS, the information studies. Daily climate scenarios can be used to study
coming from different disciplines and sources, such potential changes in agroclimatic resources. Weather
as traditional point sources, digital maps, databases generators can calculate agroclimatic indices on the
and remote‑sensing, can be combined in models basis of historical climate data and GCM outputs.
that simulate the behaviour of complex systems. Various agroclimatic indices can be used to assess crop
production potentials and to rate the climatic suita-
The presentation of geographic elements is solved in bility of land for crops. A methodologically more
two ways: using x, y coordinates (vectors), or repre- consistent approach is to use a stochastic weather
senting the object as a variation of values in a generator, instead of historical data, in conjunction
geometric array (raster). The possibility of transform- with a crop simulation model. The stochastic weather
ing the data from one format to the other allows fast generator allows temporal extrapolation of observed
interaction between different informative layers. weather data for agricultural risk assessment and
Typical operations include overlaying different provides an expanded spatial source of weather data
thematic maps; acquiring statistical information by interpolation between the point-based parameters
about the attributes; changing the legend, scale and used to define the weather generators. Interpolation
projection of maps; and making three-dimensional procedures can create both spatial input data and
perspective view plots using elevation data. spatial output data. The density of meteorological
stations is often low, especially in developing coun-
The capability to manage this diverse information, tries, and reliable and complete long-term data are
by analysing and processing the informative layers scarce. Daily interpolated surfaces of meteorological
together, opens up new possibilities for the simula- variables rarely exist. More commonly, weather gener-
tion of complex systems. GIS can be used to produce ators can be used to generate the weather variables in
images – not only maps, but cartographic products, grids that cover large geographic regions and come
drawings, animations or interactive instruments as from interpolated surfaces of weekly or monthly
well. These products allow researchers to analyse climate variables. On the basis of these interpolated
their data in new ways, predicting the natural behav- surfaces, daily weather data for crop simulation
iours, explaining events and planning strategies. models are generated using statistical models that
attempt to reproduce series of daily data with means
For the agronomic and natural components in and a variability similar to those that would be
agrometeorology, these tools have taken the name observed at a given location.
Land Information Systems (LIS) (Sivakumar et al.,
2000). In both GIS and LIS, the key components are Weather generators have the capacity to simulate
the same, namely, hardware, software, data, tech- statistical properties of observed weather data for agri-
niques and technicians. LIS, however, requires cultural applications, including a set of agroclimatic
detailed information on environmental elements, indices. They are able to simulate temperature, precip-
such as meteorological parameters, vegetation, soil itation and related statistics. Weather generators
and water. The final product of LIS is often the result typically calculate daily precipitation risk and use this
of a combination of a large number of complex information to guide the generation of other weather
informative layers, whose precision is fundamental variables, such as daily solar radiation, maximum and
for the reliability of the whole system. Chapter 4 of minimum temperature, and potential evapotranspi-
this Guide contains an extensive overview of GIS. ration. They can also simulate statistical properties of
daily weather series under a changing/changed
climate through modifications to the weather genera-
3.4.3 Weather generators (WG)
tor parameters with optimal use of available
Weather generators are widely used to generate information on climate change. For example, weather
synthetic weather data, which can be arbitrarily long generators can simulate the frequency distributions of
for input into impact models, such as crop models the wet and dry spells fairly well by modifying the
and hydrological models that are used for assessing four transition probabilities of the second-order
agroclimatic long-term risk and agrometeorological Markov chain. Weather generators are generally based
analysis. Weather generators are also the tool used for on the statistics. For example, to generate the amount
3–8 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

of precipitation on wet days, a two-parameter gamma (b) Precipitation


distribution function is commonly used. The two i. Probability of a specified amount during a
parameters, a and b, are directly related to the average period;
amount of precipitation per wet day. They can, there- ii. Number of days with specified amounts
fore, be determined with the monthly means for the of precipitation;
number of rainy days per month and the amount of iii. Probabilities of thundershowers;
precipitation per month, which are obtained either iv. Duration and amount of snow cover;
from compilations of climate normals or from inter- v. Dates on which snow cover begins and
polated surfaces. ends;
vi. Probability of extreme precipitation
The popular weather generators are, inter alia, WGEN amounts.
(Richardson, 1984, 1985), SIMMETEO (Geng et al., (c) Wind
1986, 1988), and MARKSIM (Jones and Thornton, i. Windrose;
1998, 2000). They include a first- or high-order ii. Maximum wind, average wind speed;
Markov daily generator that requires long-term (at iii. Diurnal variation;
least 5 to 10 years) daily weather data or climate clus- iv. Hours of wind less than selected speed.
ters of interpolated surfaces for estimation of their (d) Sky cover, sunshine, radiation
parameters. The software allows for three types of i. Per cent possible sunshine;
input to estimate parameters for the generator: ii. Number of clear, partly cloudy, cloudy
(a) Latitude and longitude; days;
(b) Latitude, longitude and elevation; iii. Amounts of global and net radiation.
(c) Latitude, longitude, elevation and long-term (e) Humidity
monthly climate normals. i. Probability of a specified relative humid-
ity;
ii. Duration of a specified threshold of
humidity.
3.5 AGROMETEOROLOGICAL (f) Free water evaporation
INFORMATION i. Total amount;
ii. Diurnal variation of evaporation;
The impacts of meteorological factors on crop iii. Relative dryness of air;
growth and development are consecutive, although iv. Evapotranspiration.
sometimes they do not emerge over a short time. (g) Dew
The weather and climatological information should i. Duration and amount of dew;
vary according to the kind of crop, its sensitivity to ii. Diurnal variation of dew;
environmental factors, water requirements, and so iii. Association of dew with vegetative
on. Certain statistics are important, such as wetting;
sequences of consecutive days when maximum and iv. Probability of dew formation based on
minimum temperatures or the amount of precipita- the season.
tion exceed or are less than certain critical threshold (h) Soil temperature
values, and the average and extreme dates when i. Mean and standard deviation at standard
these threshold values are reached. depth;
ii. Depth of frost penetration;
The following are some of the more frequent types of iii. Probability of occurrence of specified
information that can be derived from the basic data: temperatures at standard depths;
(a) Air temperature iv. Dates when threshold values of temper-
i. Temperature probabilities; ature (germination, vegetation) are
ii. Chilling hours; reached.
iii. Degree-days; (i) Weather hazards or extreme events
iv. Hours or days above or below selected i. Frost;
temperatures; ii. Cold wave;
v. Interdiurnal variability; iii. Hail;
vi. Maximum and minimum temperature iv. Heatwave;
statistics; v. Drought;
vii. Growing season statistics, that is, dates vi. Cyclones;
when threshold temperature values for vii. Flood;
the growth of various kinds of crops begin viii. Rare sunshine;
and end. ix. Waterlogging.
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–9

(j) Agrometeorological observations statistical methods on which these analyses are based.
i. Soil moisture at regular depths; Another point that needs to be stressed is that one is
ii. Plant growth observations; often obliged to compare measurements of the physi-
iii. Plant population; cal environment with biological data, which are often
iv. Phenological events; difficult to quantify.
v. Leaf area index;
vi. Above-ground biomass; Once the agrometeorological data are stored in
vii. Crop canopy temperature; electronic form in a file or database, they can be
viii. Leaf temperature; analysed using a public domain or commercial
ix. Crop root length. statistical software. Some basic statistical analyses
can be performed in widely available commercial
spreadsheet software. More comprehensive basic
3.5.1 Forecast information
and advanced statistical analyses generally require
Operational weather information is defined as real- specialized statistical software. Basic statistical
time data that provide conditions of past weather analyses include simple descriptive statistics,
(over the previous few days), present weather, as distribution fitting, correlation analysis, multiple
well as predicted weather. It is well known, however, linear regression, non-parametrics and enhanced
that the forecast product deteriorates with time, so graphic capabilities. Advanced software includes
that the longer the forecast period, the less reliable linear/non-linear models, time series and forecast-
the forecast. Forecasting of agriculturally important ing, and multivariate exploratory techniques such
elements is discussed in Chapters 4 and 5. as cluster analysis, factor analysis, principal
components and classification analysis, classifica-
tion trees, canonical analysis and discriminant
analysis. Commercial statistical software for PCs
3.6 STATISTICAL METHODS OF would be expected to provide a user-friendly inter-
AGROMETEOROLOGICAL DATA face with self-prompting analysis selection
ANALYSIS dialogues. Many software packages include elec-
tronic manuals that provide extensive explanations
The remarks set out here are intended to be of analysis options with examples and compre-
supplementary to WMO-No. 100, Guide to hensive statistical advice.
Climatological Practices, Chapter 5, “The use of
statistics in climatology”, and to WMO-No. 199, Some commercial packages are rather expensive, but
Some Methods of Climatological Analysis (WMO some free statistical analysis software can be down-
Technical Note No. 81), which contain advice loaded from the Web or made available upon request.
generally appropriate and applicable to agricul- One example of freely available software is INSTAT,
tural climatology. which was developed with applications in agromete-
orology in mind. It is a general-purpose statistics
Statistical analyses play an important role in agro­ package for PCs that was developed by the Statistical
meteorology, as they provide a means of Service Centre of the University of Reading in the
interrelating series of data from diverse sources, United Kingdom. It uses a simple command language
namely biological data, soil and crop data, and to process and analyse data. The documentation and
atmospheric measurements. Because of the software can be downloaded from the Web. Data for
complexity and multiplicity of the effects of envi- analysis can be entered into a table or copied and
ronmental factors on the growth and development pasted from the clipboard. If CLICOM is used as the
of living organisms, and consequently on agricul- database management software, then INSTAT, which
tural production, it is sometimes necessary to use was designed for use with CLICOM, can readily be
rather sophisticated statistical methods to detect used to extract the data and perform statistical analy-
the interactions of these factors and their practical ses. INSTAT can be used to calculate simple descriptive
consequences. statistics, including minimum and maximum values,
range, mean, standard deviation, median, lower quar-
It must not be forgotten that advice on long-term tile, upper quartile, skewness and kurtosis. It can be
agricultural planning, selection of the most suitable used to calculate probabilities and percentiles for
farming enterprise, the provision of proper equip- standard distributions, normal scores, t-tests and
ment and the introduction of protective measures confidence intervals, chi-square tests, and non-para-
against severe weather conditions all depend to some metric statistics. It can be used to plot data for
extent on the quality of the climatological analyses of regression and correlation analysis and analysis of
the agroclimatic and related data, and hence, on the time series. INSTAT is designed to provide a range of
3–10 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

climate analyses. It has commands for 10-day, for much shorter periods than those used for
monthly and yearly statistics. It calculates water macroclimatic analyses, provided that they can
balance from rainfall and evaporation, start of rains, be related to some long reference series;
degree-days, wind direction frequencies, spell lengths, (c) For bioclimatic research, the physical envi-
potential evapotranspiration according to Penman, ronment should be studied at the level of the
and the crop performance index according to meth- plant or animal, or the pathogenic colony
odology used by the Food and Agriculture itself. Obtaining information about radiation
Organization of the United Nations (FAO). The useful- energy, moisture and chemical exchanges
ness of INSTAT for agroclimatic analysis is illustrated involves handling measurements on the
in Sivakumar et al. (1993): the major part of the analy- much finer scale of microclimatology;
sis reported here was carried out using INSTAT. (d) For research on the impacts of a changing
climate, past long-term historical and future
climate scenarios should be used.
3.6.1 Series checks

Before selecting a series of values for statistical treat-


3.6.2.1 Reference periods
ment, the series should be carefully examined for
validity. The same checks should be applied to series of The length of the reference period for which the
agrometeorological data as to conventional climato- statistics are defined should be selected according to
logical data; in particular, the series should be checked its suitability for each agricultural activity. Calendar
for homogeneity and, if necessary, gaps should be filled periods of a month or a year are not, in general, suit-
in. It is assumed that the individual values will have able. It is often best either to use a reduced timescale
been carefully checked beforehand (for consistency or, alternatively, to combine several months in a way
and coherence) in accordance with section 4.3 of the that will show the overall development of an agricul-
Guide to Climatological Practices (WMO-No. 100). tural activity. The following periods are thus
suggested for reference purposes:
Availability of good metadata is essential during (a) Ten-day or weekly periods for operational
analysis of the homogeneity of a data series. For statistical analyses, for instance, evapotran-
example, a large number of temperature and precipi- spiration, water balance, sums of temperature,
tation series were analysed for homogeneity (WMO, frequency of occasions when a value exceeds
2004b). Because some metadata are archived in the or falls below a critical threshold value, and so
country where those observations were made, the forth. Data for the weekly period, which has
research could show that at least two thirds of the the advantage of being universally adopted
homogeneity breaks in those series were not due to for all activities, are difficult to adjust for
climate change, but rather to instrument relocations, successive years, however;
including changes in observation height. (b) For certain agricultural activities, the periods
should correspond to phenological stages or
to the periods when certain operations are
3.6.2 Climatic scales
undertaken in crop cultivation. Thus, water
In agriculture, perhaps more than in most economic balance, sums of temperature, sequences of
activities, all scales of climate need to be considered days with precipitation or temperature below
(see 3.2.1): certain threshold values, and the like, could
(a) For the purpose of meeting national be analysed for:
and regional requirements, studies on a i. The mean growing season;
macroclimatic scale are useful and may be ii. Periods corresponding to particularly crit-
based mainly on data from synoptic stations. ical phenological stages;
For some atmospheric parameters with little iii. Periods during which crop cultivation,
spatial variation, for example, duration of plant protection treatment or preventive
sunshine over a week or 10-day period, such measures are found to be necessary.
an analysis is found to be satisfactory;
(b) In order to plan the activities of an agricultural These suggestions, of course, imply a thorough
undertaking, or group of undertakings, it is knowledge of the normal calendar of agricultural
essential, however, to change over to the meso- activities in an area.
climatic or topoclimatic scale, in other words,
to take into account local geomorphological
3.6.2.2 The beginning of reference periods
features and to use data from an observational
network with a finer mesh. These comple- In agricultural meteorology, it is best to choose
mentary climatological series of data may be starting points corresponding to the biological
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–11

rhythms, since the arbitrary calendar periods Any one of the statistics mean, median, mode and
(month, year) do not coincide with these. For mid-interquartile range would seem to be suitable
example, in temperate zones, the starting point for use as an estimator of the population mean m. In
could be autumn (sowing of winter cereals) or order to choose the best estimator of a parameter
spring (resumption of growth). In regions subject to from a set of estimators, three important desirable
monsoons or the seasonal movement of the properties should be considered. These are unbias-
intertropical convergence zone, it could be the edness, efficiency and consistency.
onset of the rainy season. It could also be based on
the evolution of a significant climatic factor
3.6.4 Frequency distributions
considered to be representative of a biological cycle
that is difficult to assess directly, for example, the When dealing with a large set of measured data, it
summation of temperatures exceeding a threshold is usually necessary to arrange it into a certain
temperature necessary for growth. number of equal groupings, or classes, and to count
the number of observations that fall into each class.
The number of observations falling into a given
3.6.2.3 Analysis of the effects of weather
class is called the frequency for that class. The
The climatic elements do not act independently on number of classes chosen depends on the number
the biological life cycle of living things: an analyti- of observations. As a rough guide, the number of
cal study of their individual effects is often illusory. classes should not exceed five times the logarithm
Handling them all simultaneously, however, (base 10) of the number of observations. Thus, for
requires considerable data and complex statistical 100 observations or more, there should be a maxi-
treatment. It is often better to try to combine several mum of 10 classes. It is also important that adjacent
factors into single agroclimatic indices, considered groups do not overlap. Table 3.1 serves as the basis
as complex parameters, which can be compared for Table 3.2, which displays the result of this oper-
more easily with biological data. ation as a grouped frequency table.

3.6.3 Population parameters and The table has columns showing limits that define
sample statistics classes and another column giving lower and upper
class boundaries, which in turn give rise to class widths
The two population characteristics m and s are or class intervals. Another column gives the mid-marks
called parameters of the population, while each of of the classes, and yet another column gives the totals
the sample characteristics, such as sample mean –x of the tally known as the group or class frequencies.
and sample standard deviation s, is called a sample
statistic. Another column contains entries that are known as
the cumulative frequencies. They are obtained from
A sample statistic used to provide an estimate of a the frequency column by entering the number of
corresponding population parameter is called a observations with values less than or equal to the
point estimator. For example, x– may be used as an value of the upper class boundary of that group.
estimator of m, the median may be used as an esti-
mator of m and s2 may be used as an estimator of the The pattern of frequencies obtained by arranging
population variance s2. data into classes is called the frequency

Table 3.1. Climatological series of annual rainfall (mm) for Mbabane, Swaziland (1930–1979)

Year 0 1 2 3 4 5 6 7 8 9

193- 1 063 1 237 1 495 1 160 1 513 912 1 495 1 769 1 319 2 080

194- 1 350 1 033 1 707 1 570 1 480 1 067 1 635 1 627 1 168 1 336

195- 1 102 1 195 1 307 1 118 1 262 1 585 1 199 1 306 1 220 1 328

196- 1 411 1 351 1 115 1 256 1 226 1 062 1 546 1 545 1 049 1 830

197- 1 018 1 690 1 800 1 528 1 285 1 727 1 704 1 741 1 667 1 260
3–12 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

distribution of the sample. The probability of 3.6.4.1.1 Probability based on normal


finding an observation in a class can be obtained distributions
by dividing the frequency for the class by the
total number of observations. A frequency A normal distribution is a highly refined frequency
distribution can be represented graphically with a distribution with an infinite number of very
two-dimensional histogram, where the heights of narrow classes. The histogram from this
the columns in the graph are proportional to the distribution has smoothed-out tops that make a
class frequencies. continuous smooth curve, known as a normal or
bell curve. A normal curve is symmetric about its
3.6.4.1 Examples using frequency centre, having a horizontal axis that runs
distribution indefinitely both to the left and to the right, with
the tails of the curve tapering off towards the axis
The probability of an observation’s falling in class in both directions. The vertical axis is chosen in
10
number five is 50 = 0.2 or 20 per cent. That is the such a way that the total area under the curve is
same as saying that the probability of getting exactly 1 (one square unit). The central point on
between 1 480 mm and 1 620 mm of rain in the axis beneath the normal curve is the mean m
Mbabane is 20 per cent, or once in five years. The and the set of data that produced it has a standard
probability of getting less than 1 779 mm of rain in deviation s. Any set of data that tends to give rise
Mbabane as in class six is 0.94, which is arrived at to a normal curve is said to be normally distributed.
by dividing the cumulative frequency up to this The normal distribution is completely characterized
point by 50, the total number of observations or by its mean and standard deviation. Sample
frequencies. This kind of probability is also known statistics are functions of observed values that are
as relative cumulative frequency, which is given as used to infer something about the population
a percentage in column seven. From column seven, from which the values are drawn. The sample
one can see that the probability of getting between mean –x and sample variance s2, for instance, can
1 330 mm and 1 929 mm of rain is 98 per cent be used as estimates of population mean and
minus 58 per cent, or 40 per cent. Frequency population variance, respectively, provided the
distribution groupings have the disadvantage that relationship between these sample statistics and
certain information is lost when they are used, such the populations from which the samples are drawn
as the highest observation in the highest frequency is known. In general, the sampling distribution of
class. means is less spread out than the parent population.

Table 3.2. Frequency distribution of annual precipitation for Mbabane, Swaziland (1930–1979)

1 2 3 5 6 7

Group boundaries Group limits or Mid-mark xi Frequency fi Cumulative Relative


class interval frequency Fi cumulative
frequency (%)

1 879.5–1 029.5 880–1 029 954.5 2 2 4

2 1 029.5–1 179.5 1 030–1 179 1 104.5 8 10 20

3 1 179.5–1 329.5 1 180–1 329 1 254.5 15 25 50

4 1 329.5–1 479.5 1 330–1 479 1 404.5 4 29 58

5 1 479.5–1 629.5 1 480–1 629 1 554.5 10 39 78

6 1 629.5–1 779.5 1 630–1 779 1 704.5 8 47 94

7 1 779.5–1 929.5 1 780–1 929 1 854.5 2 49 98

8 1 929.5–2 079.5 1 930–2 079 2 004.5 0 49 98

9 2 079.5–2 229.5 2 080–2 229 2 154.5 1 50 100


Total: 50 – –
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–13

This fact is embodied in the central limit theorem; The meaning here is that the X-score lies one stand-
it states that if random samples of size n are drawn ard deviation to the right of the mean. If a z-score
from a large population (hypothetically infinite), equivalent of X=74 is computed, one obtains:
which has mean m and standard deviation s, then
– has X − μ 74 − 80 −6
the theoretical sampling distribution of x Z= = = = −1.5 (3.3)
σ σ 4 4
mean m and standard deviation . The
n
theoretical sampling distribution of .–xcan be closely The meaning of this negative z-score is that the
approximated by the corresponding normal curve original X-score of 74 lies 1.5 standard devia-
if n is large. Thus, for quite small samples, tions (that is, six units) to the left of the mean.
particularly if one knows that the parent A z-score tells how many standard deviations
population is itself approximately normal, the removed from the mean the original x-score is,
theorem can be confidently applied. If one is not to the right (if Z is positive) or to the left (if Z is
sure that the parent population is normal, negative).
application of the theorem should, as a rule, be
restricted to samples of size ≥30. The standard There are many different normal curves due to the
deviation of a sampling distribution is often called different means and standard deviations. For a fixed
the standard error of the sample statistic concerned. mean m and a fixed standard deviation s, however,
Thus σ X = σ is the standard error of .–x there is exactly one normal curve having that mean
n
and that standard deviation.
A comparison among different distributions with
different means and different standard deviations Normal distributions can be used to calculate prob-
requires that they be transformed. One way would be abilities. Since a normal curve is symmetrical, having
to centre them about the same mean by subtracting a total area of one square unit under it, the area to
the mean from each observation in each of the popula- the right of the mean is half a square unit, and the
tions. This will move each of the distributions along same is true for the area to the left of the mean. The
the scale until they are centred about zero, which is the characteristics of the standard normal distribution
mean of all transformed distributions. Each distribu- are extremely well known, and tables of areas under
tion will still maintain a different bell shape, however. specified segments of the curve are available in
almost all statistical textbooks. The areas are directly
expressed as probabilities. The probability of encoun-
3.6.4.1.2 The z-score
tering a sample, by random selection from a normal
A further transformation is done by subtracting the population, whose measurement falls within a speci-
mean of the distribution from each observation fied range can be found with the use of these tables.
and dividing by the standard deviation of the distri- The variance of the population must, however, be
bution, a procedure known as standardization. The known. The fundamental idea connected with the
result is a variable Z, known as a z-score and having area under a normal curve is that if a measurement X
the standard normal form: is normally distributed, then the probability that X
X−μ will lie in some range between a and b on any given
Z=
σ (3.1) occasion is equal to the area under the normal curve
between a and b.
This will give identical bell-shaped curves with
normal distribution around zero mean and stand- To find the area under a normal curve between
ard deviation equal to unit. the mean m and some x-value, convert the x into
a z‑score. The number indicated is the desired
The z-scale is a horizontal scale set up for any given area. If z turns out to be negative, just look it up
normal curve with some mean m and some standard as if it were positive. If the data are normally
deviation s. On this scale, the mean is marked 0 distributed, then it is probable that at least 68 per
and the unit measure is taken to be s, the particular cent of data in the series will fall within ±1s of
standard deviation of the normal curve in question. the mean, that is, z = ±1. Also, the probability is
A raw score X can be converted into a z-score by the 95 per cent that all data fall within ±2s of the
above formula. mean, or z = ±2, and 99 per cent within ±3s of the
mean, or z = ±3.
For instance, with m = 80 and s = 4, in order to
formally convert the X-score 85 into a z-score, the
3.6.4.1.3 Examples using the z-score
following equation is used:
X − μ 85 − 80 5 Suppose a population of pumpkins is known to
Z= = = = 1.25
σ 4 4 (3.2) have a normal distribution with a mean and
3–14 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

standard deviation of its length equal to 14.2 cm For X = 35:


and 4.7 cm, respectively. What is the probability of X − x 35 − 38 -3
finding, by chance, a specimen shorter than 3 m? Z= = =  - 0.67 (3.6)
S 4.5 4.5
To find the answer, 3 cm must be converted to units
of standard deviation using the Standard Normal For X = 40:
Distribution Table. X − x 40 − 38 2
Z= = = ; 0.44 (3.7)
3.0 − 14.2 S 4.5 4.5
Z= ≈ −2.4 (3.4)
4.7
These tables can be found in many statistical To determine the probability or area (Figure 3.1),
textbooks (Wilks, 1995; Steel and Torrie, 1980). one first needs to obtain the cumulative distribu-
There are, however, various types of normal tion for Z = 0.44, which is 0.6700. Remember that
tables (left-tail, right-tail) that require specific this is the cumulative distribution from Z to the
and detailed explanations of their use. In order left-tail. For Z = –0.67, the probability is 0.2514. But
to simply demonstrate the statistical concepts the probability between Z = –0.67 and Z = 0.44
and not provide additional confusion about needs to be determined. Therefore one subtracts
which type of distribution table one has availa- probabilities, 0.6700 – 0.2514, to obtain 0.4186.
ble, the Excel function NORMDIST can be used Thus, the probability that a stalk chosen at random
to calculate the standard normal cumulative will have height X between 35 and 40 cm is 0.4186.
distribution. In other words, one would expect 41.86 per cent of
the paddy field’s rice stalks to have heights in that
The probability of finding a variety smaller than range.
–2.4 standard deviations is the cumulative probabil-
ity to this point. By using NORMDIST(–2.4) one Elements that are not normally distributed may
obtains 0.0082, which is very small indeed. Now, easily be transformed mathematically to the normal
what is the probability of finding one longer than 20 distribution, an operation known as normalization.
mm? Again, converting to standard normal form: Among the moderate normalizing operators are the
square root, the cube root and the logarithm for
20.0 − 14.2
Z= ≈ 1.2 (3.5) positively skewed data such as rainfall. The trans-
4.7
formation reduces the higher values by
By using NORMDIST(1.2), one obtains 0.1151, or proportionally greater amounts than smaller
slightly greater than one chance out of 10. values.

Here is a slightly more complicated example. If the 3.6.4.2 Extreme value distributions
heights of all the rice stalks in a farm are thought to
be normally distributed with mean X = 38 cm and Certain crops may be exposed to lethal conditions
standard deviation s = 4.5 cm, find the probability (frost, excessive heat or cold, drought, high winds,
that the height of a stalk taken at random will be and so on), even in areas where they are commonly
between 35 and 40 cm. To solve this problem, one grown. Extreme value analysis typically involves
must find the area under a portion of the appropri- the collection and analysis of annual maxima of
ate normal curve, between X = 35 and X = 40. (See parameters that are observed daily, such as
Figure 3.1). It is necessary to convert these x-values temperature, precipitation and wind speed. The
into z-scores as follows. process of extreme value analysis involves data
Area gathering; the identification of a suitable
0.418 6 probability model, such as the Gumbel distribution
or generalized extreme value (GEV) distribution
(Coles, 2001), to represent the distribution of the
observed extremes; the estimation of model
parameters; and the estimation of the return values
for periods of fixed length.
35 40
x =38 The Gumbel double exponential distribution is the
one most used for describing extreme values. An
z event that has occurred m times in a long series of n
–0.67 0 0.44 independent trials, one per year say, has an esti-
Figure 3.1. Probabilities for a normal distribution mated probability p = nm ; conversely, the average
with
–x = 3.8 and s = 4.5
interval between recurrences of the event during a
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–15
m
long period would be n ; this is defined as the 3.6.4.4 Distribution of sequences of
return period T where: consecutive days
1
T= (3.8) The distribution of sequences of consecutive days
p
in which certain climatic events occur is of special
For example, if there is a 5 per cent chance that an interest to the agriculturist. From such data one
event will occur in any one year, its probability of can, for example, deduce the likelihood of being
occurrence is 0.05. This can be expressed as an event able to undertake cultural operations requiring
having a return period of five times in 100 years or specific weather conditions and lasting for several
once in 20 years. days (haymaking, gathering grapes, and the like).
The choice of protective measures to be taken
For a valid application of extreme value analysis, against frost or drought may likewise be based on
two conditions must be met. First, the data must an examination of their occurrence and the distri-
be independent, that is, the occurrence of one bution of the corresponding sequences. For
extreme is not linked to the next. Second, the whatever purpose the sequences are to be used, it is
data series must be trend-free and the quantity of important to specify clearly the periods to which
data must be large, usually not fewer than they refer (also whether or not they are for overlap-
15 values. ping periods). Markov chain probability models
have frequently been used to estimate the probabil-
ity of sequences of certain consecutive days, such as
3.6.4.3 Probability and risk
wet days or dry days. Under many climate condi-
Frequency distributions, which provide an indication tions, the probability, for example, that a day will
of risk, are of particular interest in agriculture due to be dry is significantly larger if the previous day is
the existence of ecological thresholds which, when known to have been dry. Knowledge of the persist-
reached, may result either in a limited yield or in ence of weather events such as wet days or dry days
irreversible reactions within the living tissue. can be used to estimate the distribution of consecu-
Histograms can be fitted to the most appropriate tive days using a Markov chain. INSTAT includes
distribution function and used to make statements algorithms to calculate Markov chain models, to
about probabilities or risk of critical climate simulate spell lengths and to estimate probability
conditions, such as freezing temperatures or dry using climatological data.
spells of more than a specified number of days.
Cumulative frequencies are particularly suitable and 3.6.5 Measuring central tendency
convenient for operational use in agrometeorology.
Cumulative distributions can be used to prepare One descriptive aspect of statistical analysis is
tables or graphs showing the frequencies of occasions the measurement of what is called central
when the values of certain parameters exceed (or fall tendency, which gives an idea of the average or
below) given threshold values during a selected middle value about which all measurements
period. If a sufficiently long series of observations coming from the process will cluster. To this
(10 to 20 years) is available, it can be assumed to be group belong the mean, the median and the
representative of the total population, so that mean mode. Their symbols are as listed below:
durations of the periods when the values exceed (or
fall below) specified thresholds can be deduced. ––
x arithmetic mean of a sample;
When calculating these mean frequencies, it is often m– population mean;
an advantage to extract information regarding the –w –
x weighted mean;
extreme values observed during the period chosen, –
xh – harmonic mean.
such as the growing season, growth stage or period
of particular sensitivity. Some examples are:
3.6.5.1 The mean
(a) Threshold values of daily maximum and
minimum temperatures, which can be used to While frequency distributions are undoubtedly
estimate the risk of excessive heat or frost and useful for operational purposes, mean values of the
the duration of this risk; main climatic elements (10-day, monthly or
(b) Threshold values of 10-day water deficits, seasonal) may be used broadly to compare climatic
taking into account the reserves in the soil. regions. To show how the climatic elements are
The quantity of water required for irrigation distributed, however, these mean values should be
can then be estimated; supplemented by other descriptive statistics, such
(c) Threshold values of relative humidity from as the standard deviation, coefficient of variation
hourly or 3-hour observations. (variability), quintiles and extreme values. In
3–16 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

agroclimatology, series of observations that have into equation (3.11), the overall mean yield of
not been made simultaneously may have to be maize for these 21 000 ha of land is as follows:
compared. To obtain comparable means in such
cases, adjustments are applied to the series so as to 3 000(1.5) + 7 000(2.0) + 2 000(1.8) + 5 000(1.3) + 4 000(1.9)
Xw =
3 000 + 7 000 + 2 000 + 5 000 + 4 000
fill in any gaps (see Some Methods of Climatological (3.12)
33 800
Analysis, WMO-No. 199). Sivakumar et al. (1993) =
21 000
illustrate the application of INSTAT in calculating
descriptive statistics for climate data and discuss In operational agrometeorology, the mean is
the usefulness of the statistics for assessing normally computed for 10 days, known as dekads, as
agricultural potential. They produce tables of well as for the day, month, year and longer periods.
monthly mean, standard deviation, and maximum This is used in agrometeorological bulletins and for
and minimum for rainfall amounts and for the describing current weather conditions. At agro­
number of rainy days for available stations. meteorological stations where the maximum and
Descriptive statistics are also presented for the minimum temperatures are read, a useful approx-
maximum and minimum air temperatures. imation of the daily mean temperature is given by
taking the average of these two temperatures. These
The arithmetic mean is the most commonly used averages should be used with caution when compar-
measure of central tendency, defined as: ing data from different stations, as such averages
1 n may differ systematically from each other.
X= ∑ xi i = 1,2,...n (3.9)
n i =1
Another measure of the mean is the harmonic
This consists of adding all data in a series and divid- mean, which is defined as n divided by the sum of
ing their sum by the number of data. The mean of the reciprocals or multiplicative inverses of the
the annual precipitation series from Table 3.1 is: numbers: k

∑x ∑ ni xi
X= i = 69 449 / 50 = 1 388.9 (3.10) Xw = i ≡ 1k (3.13)
n
The arithmetic mean may be computed using other
∑ ni
i ≡1

labour-saving methods such as the grouped data


technique (Guide to Climatological Practices, If five sprinklers can individually water a garden in
WMO‑No. 100), which estimates the mean from 4 h, 5 h, 2 h, 6 h and 3 h, respectively, the time
the average of the products of class frequencies and required for all pipes working together to water the
their midpoints. garden is given by

Another version of the mean is the weighted mean,


S=
∑ (x i
− x )2
(3.14)
which takes into account the relative importance of
n −1
each variate by assigning it a weight. An example of
the weighted mean can be seen in the calculation = 46 minutes and 45 seconds.
of areal averages such as yields, population densities
or areal rainfall over non-uniform surfaces. The Means of long-term periods are known as normals. A
value for each subdivision of the area is multiplied normal is defined as a period average computed for a
by the subdivision area, and then the sum of the uniform and relatively long period comprising at least
products is divided by the total area. The formula three consecutive 10-year periods. A climatological
for the weighted mean is expressed as: standard normal is the average of climatological data
k computed for consecutive periods of 30 years as
∑n x i i follows: 1 January 1901 to 31 December 1930,
Xw = i ≡1
k
(3.11) 1 January 1931 to 31 December 1960, and so on.
∑n i
i ≡1
3.6.5.2 The mode

For example, the average yield of maize for the The mode is the most frequent value in any array.
five districts in the Ruvuma Region of Tanzania Some series have even more than one modal value.
was 1.5, 2.0, 1.8, 1.3 and 1.9 tonnes per hectare Mean annual rainfall patterns in some sub-­
(t/ha), respectively. The respective areas under equatorial countries have bimodal distributions,
maize were 3 000, 7 000, 2 000, 5 000 and 4 000 meaning they exhibit two peaks. Unlike the mean,
ha. If the values n1 = 3 000, n2 = 7 000, n3 = the mode is an actual value in the series. Its use is
2 000, n4 = 5 000 and n5 = 4 000 are substituted mainly in describing the average.
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–17

3.6.5.3 The median Alternatively, with only a single computation run


summating data values and their squares:
The median is obtained by selecting the middle
value in an odd-numbered series of variates or S = √((∑(xi2) – (∑(xi)2/n)/(n – 1)) (3.16)
taking the average of the two middle values of an
even-numbered series. For large volumes of data it This standard deviation has the same units as the
is easiest to obtain a close approximation of their mean; together they may be used to make precise
median by graphical or numerical interpolation of probability statements about the occurrence of certain
their cumulative frequency distribution. values of a climatological series. The influence of the
actual magnitude of the mean can be easily elimi-
nated by expressing s as a percentage of the mean to
3.6.6 Fractiles
derive a dimensionless quantity called the coefficient
Fractiles such as quartiles, quintiles and deciles of variation:
are obtained by first ranking the data in ascend-
s
ing order and then counting an appropriate Cv = × 100 (3.17)
x
fraction of the integers in the series (n + 1). For
quartiles, n + 1 is divided by four, for deciles by For comparing values of s between different places,
10, and for percentiles by a hundred. Thus if n = this can be used to provide a measure of relative vari-
50, the first decile is the 1 [ n + 1]th or the 5.1 th ability for such elements as total precipitation.
10
observation in the ascending order, and the 7 th
7
decile is the 10 [ n + 1]th in the rank or the 35.7 th
3.6.7.3 Measuring skewness
observation. Interpolation is required between
observations. The median is the 50th percentile. Other parameters can provide information on the
It is also the fifth decile and the second quartile. skewness, or asymmetry, of a population. Skewness
It lies in the third quintile. In agrometeorology, represents a tendency of a data distribution to show a
the first decile means that value below which pronounced tail to one side or another. With these
one-tenth of the data falls and above which nine- populations, there is a good chance of finding an
tenths lie. observation far from the mode, and the mean may
not be representative as a measure of the central
tendency.
3.6.7 Measuring dispersion

Other parameters give information about the spread


or dispersion of the measurements about the aver-
age. These include the range, the variance and the 3.7 DECISION-MAKING
standard deviation.
3.7.1 Statistical inference and
3.6.7.1 The range
decision-making

This is the difference between the largest and the small- Statistical inference is a process of inferring informa-
est values. For instance, the annual range of mean tion about a population from the data of samples
temperature is the difference between the mean daily drawn from it. The purpose of statistical inference is
temperatures of the hottest and coldest months. to help a decision-maker to be right more often than
not, or at least to give some idea of how much danger
3.6.7.2 The variance and the standard there is of being wrong when a particular decision is
deviation made. It is also meant to ensure that long-term costs
through wrong decisions are kept to a minimum.
The variance is the mean of the squares of the devia-
tions from the arithmetic mean. The standard deviation Two main lines of attacking the problem of statisti-
s is the square root of the variance and is defined as the cal inference are available. One is to devise sample
root-mean-square of the deviations from the arithme- statistics that may be regarded as suitable estima-
tic mean. To obtain the standard deviation of a given tors of corresponding population parameters. For

sample, the mean x– is computed first and then the example, the sample mean X may be used as an
deviations from the mean (x – –x– ): estimator of the population mean m, or else the
i
sample median may be used. Statistical estimation

S=
∑ (x
i − x )2 (3.15) theory deals with the issue of selecting best estima-
tors. The steps to be taken to arrive at a decision are
n −1
as follows:
3–18 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

Step 1. Formulate the null and alternative Step 4. Decide upon the test statistic to be used
hypotheses
The decision in a hypothesis test can be made depend-
Once the null hypothesis has been clearly defined, ing upon a random variable known as a test statistic,
one can calculate what kind of samples to expect such as z or t, as used in finding confidence intervals.
under the supposition that it is true. Then if a Its sampling distribution, under the assumption that
random sample is drawn, and if it differs markedly Ho is true, must be known. It can be normal, bino-
in some respect from what is expected, the observed mial or another type of sampling distribution.
difference is said to be significant and one is
inclined to reject the null hypothesis and accept Step 5. Calculate the acceptance and rejection
the alternative hypothesis. If the difference regions
observed is not too large, one might accept the null
hypothesis or call for more statistical data before Assuming that the null hypothesis is true, and bear-
coming to a decision. One can make the decision in ing in mind the chosen values of n and a, an
a hypothesis test depending upon a random varia- acceptance region of values for the test statistic is
ble known as a test statistic, such as the z-score used now calculated. Values outside this region form the
in finding confidence intervals, and critical values rejection region. The acceptance region is so chosen
of this test statistic can be specified that can be used that if a value of the test statistic, obtained from the
to indicate not only whether a sample difference is data of a sample, fails to fall inside it, then the
significant, but also the strength of the assumption that Ho is true must be strongly
significance. doubted. In general, there is a test statistic X, whose
sampling distribution, defined by certain parame-
For instance, in a coin experiment to determine if ters such as h and s, is known. The values of the
the coin is fair or loaded: parameters are specified in the null hypothesis Ho.
From integral tables of the sampling distribution,
Null Ho: p = 0.5 (namely, the coin is fair). critical values X1, X2 are obtained such that

And alternative H1: p ≠ 0.5 (namely, the coin is P [X1 < X < X2] = 1 – a (3.18)
biased).
These determine an acceptance region, which gives
(Or equivalently H1: p < 0.5 or p > 0.5; this is called a test for the null hypothesis at the appropriate
a two-sided alternative). level of significance (a).

Step 2. Choose an appropriate level of significance Step 6. Formulate the decision rule

The probability of wrongly rejecting a null hypothesis The general decision rule, or test of hypothesis, may
is called the level of significance (a) of the test. The now be stated as follows:
value for a is selected first, before any experiments
are carried out; the values most commonly used by (a) Reject Ho at the a significance if the sample
statisticians are 0.05, 0.01 and 0.001. The level of value of X lies in the rejection region (that is,
significance a = 0.5 means that the test procedure outside [X1, X2]). This is equivalent to saying
has only 5 chances in 100 of leading one to decide that the observed sample value is significant
that the coin is biased if in fact it is not. at the 100 a % level.

Step 3. Choose the sample size n The alternative hypothesis H1 is then to be accepted.

It is fairly clear that if bias exists, a large sample will (b) Accept Ho if the sample value of X lies in the
have more chance of demonstrating its existence acceptance region [X1, X2]. (Sometimes, espe-
than a small one. So one should make n as large as cially if the sample size is small, or if X is close
possible, especially if one is concerned with to one of the critical values X1 and X2, the
demonstrating a small amount of bias. Cost of decision to accept Ho is deferred until more
experimentation, time involved in sampling, data are collected.)
necessity of maintaining statistically constant
conditions, amount of inherent random variation Step 7. Carry out the experiment and make the test
and possible consequences of making wrong
decisions are among the considerations on which The n trials of the experiment may now be carried
the sample sizes depend. out, and from the results, the value of the chosen
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–19

test statistic may be calculated. The decision rule to indicate not only whether a sample difference
described in Step 6 may then be applied. Note: All is significant but also the strength of the
statistical test procedures should be carefully significance.
formulated before experiments are carried out.
The test statistic, the level of significance, and
3.7.3 Interval estimation
whether a one- or two-tailed test is required, must
be decided before any sample data are looked at. Confidence interval estimation is a technique of
Switching tests in midstream, as it were, leads to calculating intervals for population parameters
invalid probability statements about the decisions and measures of confidence placed upon them. If
made. one has chosen an unbiased sample statistic b as
the point estimator of b, the estimator will have a
sampling distribution with mean E(b) = b and
3.7.2 Two-tailed and one-tailed tests
standard deviation S.D.(b) = sb. Here the parame-
The determination of whether one uses a two-tailed ter b is the unknown and the purpose is to estimate
or a one-tailed test depends on how the hypothesis it. Based on the remarkable fact that many sample
wcharacterized. If the H1 was defined as μ ≠ 0, the statistics used in practice have a normal or approx-
critical region would occupy both extremes of the imately normal sampling distribution, from the
test distribution. This is a two-tailed test, where the tables of the normal integral one can obtain the
values could be on either side of μ. If the H1 was probability that a particular sample will provide a
defined as μ > 0 or μ < 0, the critical region occurs value of b within a given interval (b – d) to (b + d).
only at high or low values of the test statistic. This
is known as a one-tailed test. This is indicated in the diagram below. Conversely,
for a given amount of probability, one can deduce
With a two-tailed test, the critical region contain- the value d. For example, for 0.95 probability, one
d
ing 5 per cent of the area of the normal distribution knows from standard normal tables that σ b = 1.96 .
is split into two equal parts, each containing 2.5 per In other words, the probability that a sample will
cent of the total area. If the computed value of Z provide a value of b in the interval [b – 1.96sb,
falls into the left-hand region, the sample came b + 1.96sb] is 0.95. This is written as P [b – 1.96sb <
from a population having a smaller mean than the = b < = b = 1.96sb] = 0.95. After rearranging the
known population. Conversely, if it falls into the inequalities inside the brackets to the equivalent
right-hand region, the mean of the sample’s parent form [b – 1.96sb ≤ b ≤ b + 1.96sb], one obtains the
population is larger than the mean of the known 95 per cent confidence interval for b, namely the
population. From the standardized normal distribu- interval [b – 1.96s b, b + 1.96s b]. In general,
tion table found in most statistical textbooks, one confidence intervals are expressed in the form
can find that approximately 2.5 per cent of the area [b – z.sb, b + z.sb], where z, the z-score, is the
of the curve is to the left of a Z value of –1.96 and number obtained from tables of the sampling
97.5 per cent of the area of the curve is to the left of distribution of b. This z-score is chosen so that the
+1.96. An example of a normal table can be accessed desired percentage confidence may be assigned to
from http://www.isixsigma.com/library/content/ the interval; it is now called the confidence
zdistribution.asp. coefficient, or sometimes the critical value. The
endpoints of a confidence interval are known as
Once the null hypothesis has been clearly defined, the lower and upper confidence limits. The
one can calculate what kind of samples to expect probable error of estimate is half the interval
under the supposition that it is true. Then, if a length of the 50 per cent confidence interval,
random sample is drawn, and if it differs mark- namely, 0.674s. Table 3.3 is an abbreviated table
edly in some respect from what is expected, one of confidence values for z.
can say that the observed difference is significant,
and one is inclined to reject the null hypothesis The most commonly required point and interval
and accept the alternative hypothesis. If the estimates are for means, proportions, differences
difference observed is not too large, one might between two means, and standard deviations. Table
accept the null hypothesis, or one might call for 3.4 gives all the formulae needed for these esti-
more statistical data before coming to a decision. mates. The reader should note the standard form of
The decision in a hypothesis test can be made b ± z . s b for each of the confidence interval
depending upon a random variable known as a estimators.
test statistic, such as z or t, as used in finding
confidence intervals, and critical values of this For the formulae to be valid, sampling must be
test statistic can be specified, which can be used random and the samples must be independent. In
3–20 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

some cases, sb will be known from prior informa- The observations in the sample were selected
tion. Then the sample estimator will not be used. In randomly from a normal population whose variance
each of the confidence interval formulae, the confi- is known.
dence coefficient z may be found from tables of the
normal integral for any desired degree of confi-
3.7.5 Tests for normal population
dence. This will give exact results if the population
means
from which the sampling is done is normal; other-
wise, the errors introduced will be small if n is A random sample size n is drawn from a normal
reasonably large (n ≥ 30). population having unknown mean m and known
standard deviation s. The objective is to test the
What should one do when samples are small? It is clear hypothesis Ho: m = m’, that is, the assumption that
that the smaller the sample, the smaller amount of the population mean has value s’.
confidence one can place on a particular interval esti-
X − μʹ
mate. Alternatively, for a given degree of confidence, The variate Z = has a standard normal
the interval quoted must be wider than for larger σ n

samples. To bring this about, one must have a confi- distribution if Ho is true. Z (or X) may be used as the
dence coefficient that depends upon n. The letter t test statistic.
shall be used for this coefficient and confidence inter-
val formulae shall be provided for the population mean Example 1
and for the difference of two population means.
Suppose that the shelf life of one-litre bottles of
The reader will note that these are the same as for pasteurized milk is guaranteed to be at least 400 days,
large samples, except that t replaces z. When the with a standard deviation of 60 days. If a sample of 25
sample estimators for s x– and s x– 1–x– 2 and are used, bottles is randomly chosen from a production batch,
the correct values for t are obtained from what is and a sample mean shelf life of 375 is calculated after
called the Student t-distribution. For convenience, testing has been performed, should the batch be
they are related not directly to sample sizes, but to rejected as not meeting the guarantee?
a number known as “degrees of freedom”; this shall
be denoted by u. An abbreviated table of t-values is Solution: Let m be the batch mean.
given in Table 3.5.
Step 1. Null hypothesis Ho: h = 400.
Table 3.4. Formulae for confidence interval estimates
Alternative hypothesis H1: h = 400 (one-sided: one
Confidence interval Degrees of is only interested in whether or not the mean is up
freedom (u) to the guaranteed minimum value).
1. Mean m: –x ± t.s – n–1
x
Steps 2 and 3. n = 25 (given); choose a = 0.05.
2. Difference (x– 1 – –x 2) ± n1 + n2 – 2 –
m1 – m1: t.s(x– 1 – –x 2) Step 4. If X is the sample mean, the quantity

3.7.4 The z-test 375 − 400 (3.20)


Z= = −2.083
60 25
The nature of the standard normal distribution is a standard normal variate (perhaps approximately)
allows one to test hypotheses about the origin of if Ho is true. Z shall be used as the test statistic.
certain samples. The test statistic Z has a normal
frequency distribution, which is a standardized Step 5. For a one-tailed test, standard normal tables
normal distribution defined as: give Z = –1.65 as the lowest value to be allowed before
X − μ0 Ho must be rejected, at the 5 per cent significance level.
Z= (3.19) The acceptance region is therefore [–1.65, infinite].
σ n

Table 3.3. Abbreviated table of confidence values for z

Confidence 50% 60% 80% 86.8% 90% 92% 93.4% 94.2% 95% 95.6% 96% 97.4% 98%
level

Confidence 0.674 0.84 1.28 1.50 1.645 1.75 1.84 1.90 1.96 2.01 2.05 2.23 2.33
coefficient z
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–21

Step 6. Decision rule: Null hypothesis Ho: mA = mB (that is, mA – mB = 0).

(a) Reject the production batch if the value of Z Alternative hypothesis H1: mA > mB.
calculated from the sample is less than –1.65.
(b) Accept the batch otherwise. The specific question was whether the heat-treated
seeds were faster germinating than the untreated
Step 7. Carry out the test: seeds, so the one-sided alternative hypothesis is
used.
From the sample data one finds that
Steps 2 and 3. nA = 30 and nB = 36 (given).
375 − 400
Z= = −2.083 (3.21)
60 25
The significance level shall be a = 0.05.
Decision: the production batch must be rejected,
since –2.083 < –1.65. It is highly unlikely that the Step 4. Test statistic:
mean shelf life of milk bottles in the batch will be
400 days or more. The chance that this decision is No information other than the two sample means
wrong is smaller than 5 per cent. is given. Even if the individual students’ results
were known, the paired comparison test could not
Example 2 be used – there would be no possible reason for
linking the results in pairs.
A sample of 66 seeds of a certain plant variety were
planted on a plot using a randomized block design. The difference in means (x–A – x–B) is approximately
Before planting, 30 of the seeds were subjected to a normally distributed, with mean (mA – mB) and
certain heat treatment. The times from planting to standard deviation
germination were observed. The 30 treated seeds took
σ2 σ2
52 days to germinate, while the 36 untreated seeds σʹ = + (3.22)
took 47 days. If the common standard deviation for nA nB
time to germination applicable to individual seeds, So one may use as a test statistic the standard
calculated from several thousand seeds, may be taken normal variate
as 12 days, can it be said that the heat treatment
significantly speeds up a seed’s germination rate? (xA − xB ) − (μ A − μ B ) (3.23)
z=
σʹ
From the data given, it is clear that the heat-treated And if Ho is true, mA – mB = 0; so the test statistic
seeds had an earlier start in growth. One may reduces to
consider, however, the wider question as to whether
xA − xB
heat-treated seeds are significantly faster germinat- z= (3.24)
ing generally than untreated seeds. σʹ
Step 5. Acceptance region:
The test is as follows:
The critical value of z at 5 per cent level of signifi-
Step 1. Let mA, mB be the germination period popula- cance, for a one-tailed test, is 1.65. Therefore, the
tion means for heat-treated and untreated seeds, acceptance region for the null hypothesis is the set
respectively. of values z less than or equal to 1.65.

Table 3.5. Abbreviated table of confidence values for t

u 3 4 5 7 9 10 15 20 25 30

90% 2.35 2.13 2.02 1.89 1.83 1.81 1.75 1.72 1.71 1.70

95% 3.18 2.78 2.57 2.36 2.26 2.23 2.13 2.09 2.06 2.04

99% 5.84 4.60 4.03 3.50 3.25 3.17 2.95 2.85 2.79 2.75

Degrees of freedom u
3–22 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

Step 6. Decision rule: pumpkins that the farmer is selling are ordinary
pumpkins.
(a) If the sample value of z > 1.65, conclude that
heat-treated seeds germinate significantly earlier One can hypothesize that the mean of the popula-
(at the 5 per cent level) than untreated seeds. tion from which the farmer’s pumpkins were taken
(b) If z ≤ 1.65, the germination rates of both heat- is the same as the mean of the ordinary pumpkins
treated and untreated seeds may well be the same. by the null hypothesis

Step 7. Carry out the test. H o : m1 = m0 (3.27)

The value of s is given as 12. An alternative hypothesis must also be given:

Therefore Ho : m1 ≠ m0 ,
1 1 1 1
σʹ = σ + = 12 + ≅ 2.96 (3.25) stating that the mean of the population from which
nA nB 30 36
the sample was drawn does not equal the specified
And so the sample value of the test statistic is population mean. If the two parent populations are
x A − x B 52 − 47 not the same, one must conclude that the pumpkins
z= = ≅ 1.69 (3.26) that the farmer was selling were not drawn from the
σʹ 2.96
ordinary pumpkin population, but from the popula-
Decision: The heat-treated seed is just significantly tion of some other genus. One needs to specify levels
earlier germinating at the 5 per cent level than the of probability of correctness, or level of significance,
untreated seed. denoted by a. A probability level of 5 per cent may
be applied; this means a willingness to risk rejecting
the hypothesis when it is correct 5 times out of 100
3.7.6 The t-test
trials. One must have the variance of the population
The uncertainty introduced into estimates based against which one is checking. A formal statistical
on samples can be accounted for by using a prob- test may now be set up in the following manner:
ability distribution that has a wider spread than
the normal distribution. One such distribution is 1. The hypothesis and alternative:
the t-distribution, which is similar to the normal
distribution, but dependent on the size of sample H o : m1 = m0 (3.28)
taken. When the number of observations in the
sample is infinite, the t-distribution and the Ho : m1 ≠ m0 (3.29)
normal distribution are identical. Tables of the
t-distribution and other sample-based distribu- 2. The level of significance:
tions are used in exactly the same manner as tables
of the cumulative standard normal distribution, a = 0.05 (3.30)
except that two entries are necessary to find a
probability in the table. The two entries are the 3. The test statistic:
desired level of significance (a) and the degrees of X − μ0
Z=
freedom (u), defined as the number of observa- σ n (3.31)
tions in the sample minus the number of
parameters estimated from the sample. The test statistic, Z, has a frequency distribution that
X − μ0 is a standardized normal distribution, provided that
t =
Then for the test statistic one uses S n , the observations in the sample were selected randomly
which has a Student’s t-distribution with n – 1 from a normal population whose variance is known.
degrees of freedom. If that has been specified, one is willing to reject the
hypothesis of the equality of means when they actu-
Example using the z-test ally are equal one time out of twenty: that is, one will
accept a 5 per cent risk of being wrong. On the stand-
A farmer was found to be selling pumpkins that ardized normal distribution curve, therefore, the
looked like ordinary pumpkins except that these extreme regions that contain 5 per cent of the area of
were very large, with an average diameter of the curve need to be determined. This part of the
30.0 cm for 10 samples. The mean and standard probability curve is called the area of rejection or the
deviation for pumpkins are 14.2 cm and 4.7 cm, critical region. If the computed value of the test statis-
respectively. The intent is to test whether the tic falls into this area, the null hypothesis will be
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–23

rejected. The hypothesis will be rejected if the test The standard deviation of the sampling distribu-
statistic is either too large or too small. The critical tion of –x1 – –x2, written as s(x
– – –x ), is estimated first.
1 2
region, therefore, occupies the extremities of the The data from the two samples are pooled, thus:
probability distribution and each subregion contains 1 1 1 1
2.5 per cent of the total area of the curve. σ̂ x 1− x 2 = s. + = s. + (3.35)
n1 n2 6 11

Working through the pumpkin example, the


(n1 − 1)s1 2 + (n2 − 1)s2 2
outline takes the following form: where s2 = (3.36)
n1 + n2 − 2
1. Ho : m ≠ m of pumpkins = 14.2 mm
5x0.0144 + 10x0.0441 0.72 + 0.441 1.161
= = = = 0.0774 (3.37)
6 + 11 − 2 15 15
H1 : m of pumpkins ≠ 14.2 mm
and so s = 0.2782.
2. a level = 0.05
Therefore
30 − 14.2
3. Z = = 10.6
4.7 10 (n1 − 1)s1 2 + (n2 − 1)s2 2 (3.38)
s2 =
The computed test value of 10.6 exceeds 1.9, so one n1 + n2 − 2
concludes that the means of the two populations With 15 degrees of freedom, the confidence coeffi-
are not equal, and the plants must represent some cient is t = 2.13 for 95 per cent confidence. Therefore
genus other than that of ordinary pumpkins. the required limits for m1 – m2 are

– – –x ) ± t.s∧
(x
3.7.7 Estimators using pooled samples 1 2 x1 – x2 = 0.374 ± 2.13 x 0.141196 =
0.374 ± 0.300748 (3.39)
Let two random samples of sizes n1, n2, respectively,
be drawn from a large population that has mean m Thus, the 95 per cent confidence limits for the
and variance s2. Suppose that the samples yield difference in mean strengths of the acids in the two
unbiased estimates, –x1 and –x2 of m and s12, s22 of s2. bottles are 0.0733 and 0.6747. This indicates that
The problem arises of combining these pairs of esti- one is 95 per cent confident that the difference in
mates to obtain single unbiased estimates of m and mean strengths of the acids in the two bottles lies
s2. The process of combining estimates from two or between 0.0733 and 0.6747.
more samples is known as pooling. The correct
ways to pool unbiased estimates of means and vari- 3.7.8 The paired comparison test and the
ances, to yield single unbiased estimates, are difference between two means test
n1 x1 + n2 x2
Means: μ̂ = (3.32) Example: Paired comparison test
n1 + n2
(n1 − 1)s1 2 + (n2 − 1)s2 2
Variances: σσ σ2= (3.33) The yields from two varieties of wheat were compared.
n1 + n2 − 2 The wheat was planted on 25 test plots. Each plot was
Example: divided into two equal parts; one part was chosen
randomly and planted with the first variety and the
A soil scientist made six determinations of the other part was planted with the second variety of
strength of dilute sulphuric acid. His results showed a wheat. This process was repeated for all 25 plots.
mean strength of 9.234 with a standard deviation of When the crop yields were measured, the difference
0.12. Using acid from another bottle, he made eleven in yields from each plot was recorded (second variety
determinations, which showed a mean strength of minus first variety). The sample mean plot yield
8.86 with a standard deviation of 0.21. Obtain 95 per difference was found to be 3.5 t/ha, and the variance
cent confidence limits for the difference in mean of these differences was calculated to be 16 t/ha.
strengths of the acids in the two bottles. Could the (a) Does the second variety produce signifi-
bottles have been filled from the same source? cantly higher yields than the first variety?
(b) Test the hypothesis that the population mean
Working: plot yield difference is as high as 5 t/ha.
(c) Obtain 95 per cent confidence limits for
The difference in mean strengths of the acids is the population mean plot yield difference.
estimated by
It is clear that there is a good deal of variation in
–x – –x = 9.234 – 8.86 = 0.374 (3.34) yields from plot to plot. This variation tends to
1 2
3–24 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

confound the main issue, which is to determine second wheat variety were due to chance vari-
whether yields are increased by using a second vari- ation in the experiment.
ety. This confusion has been avoided by considering
only the change in yields for each plot. If the second Step 7. Carry out the test:
variety has no effect, the average change will be zero.
From the sample data
Data of this kind, in which results are combined in
D − 0 3.5 − 0
pairs and each pair arises from one experimental t= = = 2.375 (3.40)
unit or has some clear reason for being linked in this S n 6 2
way, are analysed by the paired comparison test.
Each pair provides a single comparison as a measure Decision: since 2.375 > 1.71, one can conclude at
of the effect of the treatment applied (for example, the 5 per cent level that the second variety signifi-
growing a different variety). Let D denote the differ- cantly produces higher yields than the first
ence in a given pair of results. D will have normal variety.
distribution with mean m and standard deviation s
(both the parameters are unknown in this case).
3.7.9 Difference between two means

Example using the t-test A sampling result, which is frequently used in infer-
ence tests, is one concerning the distribution of the
Step 1. difference in means of independent samples drawn
from two different populations. Let a random
Null hypothesis Ho: m = 0 (namely, the yield of the sample of size n1 be drawn from a population
two wheat varieties is the same). having mean m1 and standard deviation sx; and let
an independent sample of size n2 be drawn from
Alternative hypothesis H1: m > 0 (that is, the second another population having mean mY and standard

variety yields are higher than the first variety yields). deviation sY. Consider the random variable D = X1

and Y; that is, the difference in means of the two
H1 is one sided; a one-tailed test must be applied. samples. The theorem states that D has a sampling
distribution with mean mD = mX – mY and variance
Steps 2 and 3. Twenty-five plots were used, which σX 2 σX2
means that n = 25. A significance level of a = 0.05 Var (D) = + (3.41)
n1 n1
shall be used.
D − 0
z=
Step 4. The quantity σ 25 is a standard normal 3.7.10 The F-Test
variate and may be used as the test statistic if s is
known from previous experimentation. It seems reasonable that the sample variances will
range more from trial to trial if the number of
The parameters of a population are rarely known. observations used in their calculation is small.
In this case, s is not given, so it must be estimated Therefore, the shape of the F-distribution would be
from the sample data. expected to change with changes in sample size.
The degrees of freedom idea comes to mind, except
Step 5. Acceptance region: the critical level of t at in this situation the F-distribution is dependent on
the 0.05 level of significance (one-tailed test) is the two values of g, one associated with each variance
same as the upper 90 per cent confidence coeffi- in the ratio. Since the F-ratio is the ratio of two
cient, as provided in Table 3.5. With 24 degrees of positive numbers, the F-distribution cannot be
freedom, this value is 1.71. The acceptance region negative. If the samples are large, the average of the
is therefore all values of t from –infinity to 1.71. ratios should be close to 1.0.

Step 6. Decision rule: Because the F-distribution describes the probabilities of


obtaining specified ratios of sample variances drawn
(a) If the value of t calculated from the sample is from the same population, it can be used to test the
greater than 1.71, one may conclude that the equality of variances that are obtained in statistical
second wheat variety gives higher yields than sampling. One may hypothesize that two samples are
the first variety. drawn from populations having equal variances. After
(b) If the value of t is less than 1.71, one may not computing the F-ratio, one can then ascertain the
reject (at the 5 per cent level) the hypothesis probability of obtaining, by chance, that specific value
that the observed increases in yield in the from two samples from one normal population. If it is
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–25

unlikely that such a ratio could be obtained, this can be alternative hypothesis states that they do not.
seen to indicate that the samples come from different Degrees of freedom associated with this test are
populations having different variances. (n2 – 1) for g1 and (n2 – 1) for g2. The critical value of
F with g1 = 9 and g1 = 9 degrees of freedom and a
For any pair of variances, two ratios can be computed level of significance of 5 per cent (α = 0.05).
⎛ S 2 and S 1 ⎞ . If one arbitrarily decides that the larger
⎝ S1 S2 ⎠
variance will always be placed in the numerator, the The value of F calculated from (3.42) will fall into
ratio will always be greater than 1.0 and the statistical one of the two areas shown in Figure 3.2. If the
tests can be simplified. Only one-tailed tests need to calculated value of F exceeds 3.18, the null hypoth-
be utilized, and the alternative hypothesis actually is esis is rejected and one concludes that the variation
a statement that the absolute difference between the in porosity is not the same in the two groups. If the
two sample variances is greater than expected if the calculated value is less than 3.18, there would be no
population variances are equal. This is shown in evidence for concluding that the variances are differ-
Figure 3.2, a typical F-distribution curve in which the ent (determine at 0.05 if variances are the same).
critical region or area of rejection has been shaded.
In most practical situations, one ordinarily has no
knowledge of the parameters of the population, except
for estimates made from samples. In comparing two
samples, it is appropriate to first determine if their vari-
ances are statistically equivalent. If they appear to be
equal and the samples have been selected without bias
5% from a naturally occurring population, it is probably
safe to proceed to additional statistical tests.

The next step in the procedure is to test equality of


0 F=2.24 means. The appropriate test is:

Figure 3.2. A typical F-distribution γγ1 = 10 and γ


x1 − x2
γ2 = 25 degrees of freedom with the critical t= (3.45)
region (shown by shading), which contains Sp ( n11 ) + ( n12 )
5 per cent of the area under the curve. The
critical value of F = 2.24. where the quantity Sp is the pooled estimate of the
population standard deviation based on both
samples. The estimate is found from the pooled
As an example of an elementary application of the estimated variance, given by:
F-distribution, consider a comparison between the
two sample sets of porosity measurements on soils (n1 − 1)S 2 1 + (n2 − 1)S 2 2 (3.46)
Sp 2 =
of two areas of a certain district. The aim is to deter- n1 + n2 − 2
mine if the variation in porosity is the same in the
two areas. For these purposes, a level of significance
of 5 per cent will be satisfactory. That is, the risk of where the subscripts refer, respectively, to the
concluding that the porosities are different, when samples from area A and area B of the district.
actually they are the same one time out of every
twenty trials, is acceptable. 3.7.11 Relationship between variables
S12

F= (3.42)
S2 2 3.7.11.1 Correlation methods

where S12 is the larger variance and S22 is the smaller. Correlation methods are used to discover objectively
Now the hypothesis and quantitatively the relationship that may exist
between several variables. The correlation coefficient
Ho : s12 = s22 (3.43) determines the extent to which values of two
variables are linearly related; that is, the correlation
is tested against is high if it can be approximated by a straight line
(sloped upwards or downwards). This line is called
Ho : s12 = ≠ s22 (3.44) the regression line. Correlation analysis is especially
valuable in agrometeorology because of the many
The null hypothesis states that the parent popula- factors that may be involved, simultaneously or
tions of the two samples have equal variances: the successively, during the development of a crop and
3–26 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

also because for many of them – climatic factors in 3.7.11.2 Regression


particular – it is impossible to design accurate
experiments, since their occurrence cannot be After the strength of the relationship between two
controlled. There are two sets of circumstances in or more variables has been quantified, the next
which, more particularly, the correlation and simple logical step is to find out how to predict specific
regression method can be used: values of one variable in terms of another. This is
(a) In completing climatological series that done by using regression models. A single linear
have gaps. Comparisons of data for different regression model is of the form:
atmospheric elements (such as precipitation,
evapotranspiration, duration of sunshine) Y = a + bX (3.49)
allow estimates of the missing data to be made
from the other measured elements; where Y is the dependent variable;
(b) In comparing climatological data and biologi-
cal or agronomical data, such as yields and X is the independent variable;
quality of crops (sugar content, weight of dry
matter, and so on). a is the intercept on the Y axis;

Care should be exercised in interpreting these corre- b is the slope of the regression.
lations. Graphs and scatter plots should be used to
give much more information about the nature of The least squares criterion requires that the line be
the relationship between variables. The discovery chosen to fit the data so that the sum of the squares
of a significant correlation coefficient should of the vertical deviations separating the points from
encourage the agrometeorologist, in most cases, to the line will be a minimum.
seek a physical or biological explanation for the
relationship and not just be content with the statis- The recommended formulae for estimating the two
tical result. sample coefficients for least squares are:

Having discovered that there is a relationship the slope of the line


between variables, one hopes to establish the close-
ness of this relationship. This closeness of agreement
n∑ XiYi − (∑ Xi)(∑ Yi)
between two or more variables is called correlation. b= (3.50)
The closeness is expressed by a correlation coeffi- n∑ Xi 2 − (∑ Xi)2
cient whose value lies between +1 (perfect, positive the y-axis intercept
correlation) and –1 (perfect, negative correlation).
It is used to measure the linear relationship between
two random variables that are represented by pairs (∑ Yi)(∑ Xi 2 ) − (∑ Xi)(∑ XiYi) (3.51)
a=
of numerical values. The most commonly used n∑ Xi 2 − (∑ Xi)2
formula is:
n∑ XiYi − (∑ Xi)(∑ Yi ) Example
r= (3.47)
[[n∑ Xi 2 − (∑ Xi)2 ][n∑ Yi 2 − (∑ Yi)2 ]]
Compute a and b coefficients of the Angstrom
formula.
If the number of pairs is small, the sample corre-
lation coefficient between the two series is Angstrom’s formula:
subject to large random errors, and in these cases
numerically large coefficients may not be R/RA = a + b n/N (3.52)
significant.
is used to estimate the global radiation at surface level
The statistical significance of the correlation may (R) from the radiation at the upper limit of the atmos-
be determined by seeing whether the sample corre- phere (RA), the actual hours of bright sunshine (n) and
lation r is significantly different from zero. The test the day length (N). RA and N are taken from appropri-
statistic is: ate tables or computed; n is an observational value
n−2 obtained from the Campbell–Stokes sunshine recorder.
t=r (3.48)
1 − r2
The data in Table 3.6 are sunshine normals from
and t is compared to the tabulated value of Student’s Lyamungu, United Republic of Tanzania (latitude
t with n – 2 degrees of freedom. 3° 14’ S, longitude 27° 17’ E, elevation 1 250 m).
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–27

Table 3.6. Sunshine normals from Lyamungu, model, such that effects can be tested for categori-
United Republic of Tanzania cal predictor variables, as well as effects for
continuous predictor variables. An objective in
n/N R/RA
(X) (Y) performing multiple regression analysis is to spec-
ify a parsimonious model whose factors contribute
Jan 0.660 0.620 significantly to variation in response. Statistical
Feb 0.647 0.578 software such as INSTAT provides tools to select
independent factors for a regression model. These
Mar 0.536 0.504
programs include forward stepwise regression to
Apr 0.366 0.395 individually add or delete the independent varia-
May 0.251 0.368 bles from the model at each step of the regression
0.319 0.399 until the “best” regression model is obtained, or
Jun
backward stepwise regression to remove the inde-
Jul 0.310 0.395 pendent variables from the regression equation
Aug 0.409 0.442 one at a time until the “best” regression model is
Sept 0.448 0.515 obtained. It is generally recommended that one
should have at least 10 times as many observations
Oct 0.542 0.537
or cases as one has variables in a regression
Nov 0.514 0.503 model.
Dec 0.602 0.582
Residual analysis is recommended as a tool to
– –
N = 12, X = 0.467, sx = 0.132, Y = 0.487, sy = 0.081, assess the multiple regression models and to iden-
b = 0.603, a = 0.205, r = 0.973 tify violations of assumptions that threaten the
validity of results. Residuals are the deviations of
The regression explains r² = 95 per cent of the vari- the observed values of the dependent variable
ance of R/RA and is significantly below p = 0.01. from the predicted values. Most statistical software
provides extensive residuals analyses, allowing
There are cases where a scatter diagram suggests one to use a variety of diagnostic tools in inspect-
that the relationship between variables is not linear. ing different residual and predicted values, and
This can be turned into a linear regression by taking thus to examine the adequacy of the prediction
the logarithms of the relationship if it is exponen- model, the need for transformations of the varia-
tial, or by turning it into a reciprocal if it is square, bles in the model, and the existence of outliers in
and so forth. For example, when the saturation the data. Outliers (that is, extreme cases) can seri-
vapour pressure is plotted against temperature, the ously bias the results.
curve suggests that a function like y = p.ebx could
probably be used to describe the function. This is 3.7.11.4 Stepwise regression
turned into a linear regression ln (y) = ln (p) + bX,
where X is the temperature function and y is the This will be explained by using an example for
saturation vapour pressure. An expression of the yields. A combination of variables may work
form y = aX2 can be turned into a linear form by together to produce the final yield. These variables
taking the reciprocal 1 = X −2 . could be the annual precipitation, the temperature
y a
of a certain month, the precipitation of a certain
3.7.11.3 Multiple regressions month, the potential evapotranspiration of a
certain month, or the difference between precipi-
The general purpose of multiple regression is to tation and potential evapotranspiration for a given
learn more about the relationship between several month.
independent or predictor variables and a depend-
ent variable. A linear combination of predictor In stepwise regression, a simple linear regression for
factors is used to predict the outcomes or response the yield is constructed on each of the variables and
factor. For example, multiple regression has been their coefficients of determination found. The varia-
successfully used to estimate crop yield as a func- ble that produces the largest r2 statistic is selected.
tion of weather, or to estimate soil temperatures as Additional variables are then brought in one by one
a function of air temperature, soil characteristic and subjected to a multivariate regression with the
and soil cover. It has been used to perform a trend best variable to see how much that variable would
analysis of agrometeorological parameters using a contribute to the model if it were to be included.
polynomial expansion of time. The general linear This is done by calculating the F statistic for each
model is a generalization of the linear regression variable. The variable with the largest F statistic that
3–28 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

has a significance probability greater than the speci- predictors when univariate splits are used. They
fied significance level for entry is included in the readily lend themselves to graphical display,
multivariate regression model. Other variables are which makes them easier to interpret.
included in the model one by one. If the partial F Classification trees are used in medicine for
statistic of a variable is not significant at a specified diagnosis and in biology for classification. They
level for staying in the regression model, it is left out. have been used to predict levels of winter survival
Only those variables that have produced significant of overwintering crops using weather and
F statistics are included in the regression. A more categorical variables related to topography and
in-depth explanation can be found in Draper and crop cultivars.
Smith (1981).
3.7.12 Climatic periodicities and time
3.7.11.5 Cluster analysis series

Cluster analysis is a technique for grouping indi- Data are commonly collected as time series,
viduals or objects into unknown groups. In biology, namely, observations made on the same variable
cluster analysis has been used for taxonomy, which at repeated points in time. The INSTAT software
is the classification of living things into arbitrary provides facilities for descriptive analysis and
groups on the basis of their characteristics. In display of such data. The goals of time series anal-
agrometeorology, cluster analysis can be used to ysis include identifying the nature of the
analyse historical records of the spatial and tempo- phenomenon represented by the sequence of
ral variations in pest populations in order to classify observations and predicting future values of the
regions on the basis of population densities and the times series. Moving averages are frequently used
frequency and persistence of outbreaks. The analy- to “smooth” a time series so that trends and other
sis can be used to improve regional monitoring and patterns are seen more easily. Sivakumar et al.
control of pest populations. (1993) present a number of graphs showing the
five-year moving averages of monthly and annual
Clustering techniques require that one define a rainfall at selected sites in Niger. Most time series
measure of closeness or similarity between two can be described in terms of trend and seasonality.
observations. Clustering algorithms may be hierar- When trends, such as seasonal or other determin-
chical or non-hierarchical. Hierarchical methods can istic patterns, have been identified and removed
be either agglomerative or divisive. Agglomerative from a series, the interest focuses on the random
methods begin by assuming that each observation is component. Standard techniques can be used to
a cluster and then, through successive steps, the clos- look at its distribution. The feature of special inter-
est clusters are combined. Divisive methods begin est, resulting from the time-series nature of the
with one cluster containing all the observations and data, is the extent to which consecutive observa-
successively split off cases that are the most dissimi- tions are related. A useful summary is provided by
lar to the remaining ones. K-means clustering is a the sample autocorrelations at various lags, the
popular non-hierarchical clustering technique. It autocorrelation at lag m being the correlation
begins with user-specified clusters and then reassigns between observations m time units apart. In simple
data on the basis of the distance from the centroid of applications this is probably most useful for deter-
each cluster. See von Storch and Zwiers (2001) for mining whether the assumption of independence
more detailed explanations. of successive observations used in many elemen-
tary analyses is valid. The autocorrelations also
3.7.11.6 Classification trees give an indication of whether more advanced
modelling methods are likely to be helpful. The
The goal of classification trees is to predict or cross-correlation function provides a summary of
explain responses on a categorical dependent the relationship between two series from which all
variable. They have much in common with trend and seasonal patterns have been removed.
discriminate analysis, cluster analysis, The lag m cross-correlation is defined as the corre-
non-parametric statistics and non-linear lation between x and y lagged by m units.
estimation. They are one of the main techniques
used in data mining. The ability of classification More than any other user of climatic data, the
trees to perform univariate splits, examining the agrometeorologist may be tempted to search for
effects of predictors one at a time, contributes to climatic periodicities that could provide a basis for the
their flexibility. Classification trees can be management of agricultural production. It should be
computed for categorical predictors, continuous noted that the Guide to Climatological Practices (WMO-
predictors or any mix of the two types of No. 100) (section 5.3) is more than cautious with
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–29

regard to such periodicities and that, although they scales used on the graph must be specified and their
may be of theoretical interest, they have been found graduations should be shown. Publications intended
to be unreliable, having amplitudes that are too small for wide distribution among agricultural users
for any practical conclusions to be drawn. should not have complicated scales (for instance,
logarithmic, Gaussian, and so forth) with which
the users may be unfamiliar, and which might lead
to serious errors in interpreting the data.
3.8 PUBLICATION OF RESULTS Furthermore, giving too much information on the
same graph and using complicated conventional
symbols should be avoided.
3.8.1 General methods

For statistical analyses to have practical value,


3.8.5 Maps
they must be distributed to users in a readily
understandable format that does not require an To present concisely the results of agroclimatological
advanced knowledge of statistics. Adequate analysis covering an area or region, it is often better to
details should be given in each publication to draw isopleths or colour classification from the data
avoid any ambiguity in interpreting the numeri- plotted at specific points. The interpolation between
cal tables or graphs. the various locations can be used in a digital map
plotted by special plotting tools such as Graph, Grids,
Surfer and GIS. Many climatic parameters useful to
3.8.2 Tables
agriculture can be shown in this way, for example:
Numerical tables of frequencies, averages, distribu- (a) Mean values of climatic elements (tempera-
tion parameters, return periods of events, and so ture, precipitation, evapotranspiration, water
on, should state clearly: balance, radiation balance, and the like);
(a) The geographical location (including eleva- (b) Frequencies: number of consecutive days with-
tion of the observation site); out frost, without thawing, without rain, and
(b) The period on which the statistical analysis is so forth; return periods of atmospheric events;
based (necessary to estimate how representa- (c) Dispersion parameters: standard deviations,
tive the data are); coefficients of variation;
(c) The number of data (enabling the continuity (d) Agrometeorological indices.
of the series to be assessed);
(d) The units; Depending on the scale adopted, this type of
(e) The meaning of any symbols. supplementary chart can be drawn more or less
taking geomorphological factors into account. The
For frequency tables, it is better to give relative users of the charts should be made aware of their
(percentage) frequencies in order to facilitate the generalized nature, however, and in order to inter-
comparison of populations consisting of different pret them usefully, users should know that
numbers of observations. In this case, it must be corrections for local conditions must be made. This
made quite clear whether the percentages refer to is particularly important for hilly regions.
the total population or to separate classes.
3.8.6 The agrometeorological bulletin
3.8.3 Contingency tables
Because of the diverse nature of the users, the
Estimates of the simultaneous occurrence of given content of an agrometeorological bulletin (agmet
values of several elements or events are often bulletin) cannot be standardized. But the basic
needed. The resulting contingency tables should be objectives of all successful agmet bulletins are the
as simple as possible. same: the provision of the right agmet information
to the right users at the right time. To attain this
3.8.4 Graphs objective, the following guidelines are suggested.
For a complete discussion of the matter, readers are
Graphs are used to show in a concise format the referred to WMO (2004c).
information contained in numerical tables. They
are a useful adjunct to the tables themselves and First, it is essential to determine who the users are.
facilitate the comparison of results. Cumulative One category of users may be farmers who need
frequency curves, histograms and climograms give daily information to assist them in day-to-day
a better overall picture than the multiplicity of activities such as sowing, spraying and irrigating.
numerical data obtained by statistical analysis. The Another category may be more interested in
3–30 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

long-term agricultural decisions such as crop temperature observed during the three pentads are
adaptation to weather patterns, marketing decisions compared to their respective normal values.
or modelling.
In agmet bulletins, extreme weather events, which
Second, the users’ requirements must be clearly are masked by the averaging procedure involved in
established, so that the most appropriate informa- the calculation of the pentad, must be highlighted,
tion is provided. This is possible only after discussion probably in the form of a footnote, to draw the
with them. In most cases, they do not have a clear attention of users. For example, in Table 3.7 it can
picture of the type of information that is best suited be seen that during the period 6–15 July, the maxi-
for their purpose; the role of the agrometeorologist mum temperature was below the normal by not
is crucial here. more than 1.8°C. But in fact, during the period
9–12 July, maximum temperature was below the
Third, the methods of dissemination of informa- normal by 2.8°C to 3.0°C; this can be of importance
tion must be decided upon after consultation to both animals and plants.
with the users. Some farmers may have full access
to the Internet, while others have only limited The presentation of data in this format, together
access, and others have no such access to this with the broadcast of daily values on the radio
technology. Obviously, the presentation of data and on television, is very effective. It can be used
for these categories will not be the same. by farmers interested in day-to-day activities and
Furthermore, some information must be provided by research workers and model builders. It is suit-
as quickly as possible, while other information able for all types of crops, ranging from tomatoes
may be provided two or three weeks later. and lettuce to sugar cane and other deep-root
crops.
Fourth, it is very important to consider the cost of
the agmet bulletin that is proposed to the users, 3.8.6.1.2 Data in 10-day intervals
especially in developing countries where the finan-
cial burden is becoming heavier. On the basis of the agrometeorological requirements
for a Mediterranean climate with two main seasons
and two transitional seasons, the main climatic
3.8.6.1 Some examples
parameters should be published on a year-round basis.
Some examples of the presentation of agmet informa- The selection of agromet parameters/indices should be
tion are given below to illustrate the points mentioned. published according to the season and the agricultural
situation of the crops, including data representing the
3.8.6.1.1 Data in pentads various agricultural regions of the country.

Table 3.7 shows part of an agmet bulletin issued by The bulletin should include daily data, 10-day means
a government service in a tropical country where or totals, and deviation or per cent from average. In
agriculture is an important component of the parameters such as maximum and minimum
economy. This bulletin was developed to cater for temperature and maximum and minimum relative
all crops, ranging from tomatoes to sugar cane. It humidity, absolute values of the decade based on a
is issued on a half-monthly basis and is sent to the long series of years are also recommended.
users by post and is also available on the service’s
Website. Bearing in mind the time taken to collect The list of recommended data to be published in
the data, it would not be before the 20th day of the agrometeorological bulletin is as follows:
the month at the very earliest that the bulletin (a) Daily data of maximum and minimum
would reach the users. To provide farmers (tomato temperature and relative humidity;
growers, for example) with data relevant to their (b) Temperature near the ground;
day-to-day activities, the agmet bulletin is (c) Soil temperature;
supplemented by daily values of rainfall and (d) Radiation and/or sunshine duration in hours;
maximum and minimum temperatures, which are (e) Class A pan evaporation and/or Penman
broadcast on radio and television. Of course, data evapotranspiration;
relevant to different geographical localities can be (f) Rainfall amount;
included. (g) Accumulated rainfall from the beginning of
the rainy season;
Rainfall amounts (RR) and maximum temperature (h) Number of rainy days;
(MxT) are shown for a given area of a tropical coun- (i) Accumulated number of dry days since the
try. Total rainfall amounts and the mean maximum last rainy day;
CHAPTER 3. AGRICULTURAL METEOROLOGICAL DATA, THEIR PRESENTATION AND STATISTICAL ANALYSIS 3–31

(j) Number of hours below different temperature Little rainfall was observed during the first two
thresholds depending on the crop; pentads of December 2003 and farmers were start-
(k) Number of hours temperature is below 0°C. ing to get worried. The indication that significant
rain was expected on Sunday 14 December (Table
Examples of agrometeorological parameters or indi- 3.8) had given great hope to the farmers and,
ces that should be published are: because it was a weekend, they made plans on
(a) Accumulated number of dynamic model units Friday to do some fieldwork on Saturday and on
since the beginning of the winter as an indica- Monday. Such plans are costly because they imply
tion of budbreak in deciduous trees; the booking of manpower and transport, the
(b) Accumulated number of units above 13°C purchase of fertilizers, and so forth. But model
since the beginning of spring as an indication output received on Friday 12 December indicated
of citrus growth; that the probability of having rain during the
(c) Physiological days – the accumulated number following five days was negligible, and in fact, it
of units above 12°C since the beginning of was not before 31 December that significant rainfall
spring as an indication of cotton growth. was observed.

3.8.6.1.3 Short-range weather outlook Table 3.8. Example of weather outlook

With the availability on the Internet of short- December 2003 West North
range weather forecasts (5 to 10 days) provided by Friday 12 <1.0 <1.0
World Weather Centres (WWCs), many Saturday 13 1.1–5.0 <1.0
Agrometeoro­logical Services are providing 5- to
10-day weather forecasts to farmers. An example is Sunday 14 5.1–25.0 5.1–25.0
given below, showing expected rainfall (in milli-
Monday 15 1.1–5.0 1.1–5.0
metres) for two rainfed farming areas. The
information was released to users through e-mail Tuesday 16 <1.0 <1.0
and posted on the Website.
Here, it is not the validity of the weather outlook that
This weather outlook, based on model output received is questioned. The point to be noted is that no update
early on Thursday 11 December from WWCs, was of the outlook could reach the farmers because the
released in the afternoon of 11 December; it was sent farming centres were closed for the weekend. If, besides
by e-mail to users through farming centres and posted being sent by e-mail and posted on the Website, the
on the Website. This outlook was not broadcast on outlook had been broadcast on radio and television,
either radio or television. The issue of such a weather the updated version would have reached the farmers
outlook is important, but it must be carefully planned. and appropriate measures could have been taken. To
Otherwise, it can lead to financial losses, as shown avoid similar incidents, it is advisable to decide on the
below. methods for dissemination of information.

Table 3.7. Part of an agmet bulletin issued by a government service in a tropical country

Agmet bulletin in pentads

Rainfall data Maximum temperature

Dates Rainfall amounts in millimetres Dates Maximum temperature (deg. C)


November Observed Normal values July 2003 Observed values Normal values
2003 values
1–5 3.6 4.7 1–5 23.6 23.7
6–10 7.1 4.7 6–10 21.9 23.7
11–15 3.6 4.7 11–15 22.1 23.7

Total RR 24.3 14.2 Mean MxT 22.5 23.7


3–32 GUIDE TO AGRICULTURAL METEOROLOGICAL PRACTICES

3.8.6.1.4 Seasonal forecast Table 3.9. Real data for the period October 2003
to January 2004
An extract from a seasonal forecast issued in the
first half of October 2003 for a country situated in Rainfall amounts in millimetres (mm)
the southern hemisphere, for summer 2003/2004 Oct. 2003 Nov. 2003 Dec. 2003 Jan. 2004
(summer in that country is from November to
April), reads as follows: “The rainfall season may First half 1.8 4.1 5.2 176.4
begin by November. The summer cumulative rain-
Second half 12.8 35.7 12.8 154.1
fall amount is expected to reach the long-term
mean of 1 400 millimetres. Heavy rainfall is
expected in January and February 2004.” This forecasts must be sent to specialists who are trained
seasonal forecast was published in the newspapers to interpret them and they should be supplemented
and read on television. by short-range weather forecasts.

The question is: who is qualified to interpret and


3.8.6.2 How costly should the agmet
use this forecast? Can it be misleading to farmers?
bulletin be?
To show the problems that such a forecast can
create, real data for an agricultural area covering Today, agricultural systems in some small and poor
the period October 2003 to January 2004 are countries are in great turmoil. Developed countries
presented in Table 3.9, which shows rainfall have dismantled the safety net, which had provided
amounts recorded during the period. Out of the some protection to the agricultural products of
35.7 mm of rainfall recorded during the second these countries. It is in relation to the bleak future
half of November, 35.0 mm fell during the period of agriculture in these areas that the question of the
16–25 November. cost of issuing agmet bulletins is raised.

Given that October and the first half of November Sooner or later, the financial situation in these
2003 were relatively dry and that a significant amount countries will not be able to sustain the issuing of
of rainfall was recorded during the second half of costly agmet bulletins by local personnel. So agro­
November, and noting that the seasonal forecast meteorologists must think carefully about the
opted for normal rainfall during summer and that the cost–benefit of the agmet bulletin, especially when
rainfall season may start in November, the farmers developed countries are getting ready to offer their
thought that the rainy season had begun. Most of services for free. (And one must ask how long they
them started planting their crops during the last will continue to be free.)
pentad of November. Unfortunately, the rainfall
during the second half of November was a false signal: Already, shipping bulletins, cyclone warnings and
December was relatively dry. The rainy season started aviation forecasts are being offered for free on a
in January 2004. global scale by a few developed countries. But how
long will these services be free? Sooner or later, the
To prevent seasonal forecasts from falling into the small and poor countries will have to pay for these
wrong hands, it is not advisable to have them services. It is very important to keep the cost of the
published in the newspapers; these seasonal agmet bulletin to a minimum.
ANNEX

Table of the Normal Distribution

Probability Content from ∞ to Z

Z | 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

-------------------------------------------------------------------------------------------------------------------------------------------

0.0 | 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 | 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 | 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 | 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 | 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 | 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 | 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 | 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 | 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 | 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 | 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 | 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 | 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 | 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 | 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 | 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 | 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 | 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 | 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 | 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 | 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 | 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 | 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 | 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 | 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 | 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 | 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 | 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 | 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 | 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 | 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
REFERENCES

Bessemoulin, G., 1973: Sur la statistique des valeurs Wijngaard, J.B., A.M.G. Klein Tank and G.P. Konnen,
extrêmes. Monographie No. 89. Paris, 2003: Homogeneity of 20th century European
Météorologie Nationale. daily temperature and precipitation series. Int.
Carruthers, N. and C.E.P. Brooks, 1953: Handbook of J. Climatol., 23:679–692.
Statistical Methods in Meteorology. Publ. No. 538. Wilks, D.S., 1995: Statistical Methods in the
London, Her Majesty’s Stationery Office. Atmospheric Sciences. San Diego, Academic
Coles, S., 2001: An Introduction to Statistical Modeling Press.
of Extreme Values. London, Springer. World Meteorological Organization, 1966: Some
Draper, N.R. and H. Smith, 1981: Applied Regression Methods of Climatological Analysis (H.C.S. Thom).
Analysis. Second edition. New York, John Wiley Technical Note No. 81 (WMO-No. 199), Geneva.
and Sons. ———, 1983: Guide to Climatological Practices
Gumbel, E.J., 1959: Statistics of Extremes. New York, (WMO-No. 100), Geneva.
Columbia University Press. ———, 2000: Agrometeorological Data Management
Hartkampa, A.D., J.W. Whitea and G. Hoogenboomb, (R.P. Motha, ed.) (WMO/TD-No. 1015),
2003: Comparison of three weather generators Geneva.
for crop modeling: a case study for subtropical ———, 2003a: Guidelines on Climate Observation
environments. Agric. Syst., 76:539–560. Networks and Systems (N. Plummer, T. Allsopp
Sivakumar, M.V.K., U.S. De, K.C. Simharay and and J.A. Lopez). WCDMP-No. 52 (WMO/TD-
M. Rajeevan (eds), 1998: User Requirements for No. 1185), Geneva.
Agrometeorological Services. Proceedings of an ———, 2003b: Guidelines on Climate Metadata and
International Workshop held at Pune, India, Homogenization (E. Aguilar, I. Auer, M. Brunet,
10–14 November 1997. Shivajinagar, India T.C. Peterson and J. Wieringa). WCDMP-No. 53
Meteorological Department. (WMO/TD-No. 1186), Geneva.
Sivakumar, M.V.K., A. Maidoukia and R.D. Stern, ———, 2004a: Fourth Seminar for Homogenization
1993: Agroclimatology of West Africa: Niger. and Quality Control in Climatological Databases.
Information Bulletin No. 5. Patancheru, ICRISAT. (WMO/TD-No. 1236), Geneva.
Sivakumar, M.V.K., C.J. Stigter and D. Rijks (eds), ———, 2004b: Statistical analysis of results of homoge-
2000: Agrometeorology in the 21st Century – neity testing and homogenisation of long
Needs and Perspectives. Papers from the climatological time series in Germany (G. Müller-
International Workshop held in Accra, Ghana, Westermeier). In: Proceedings of the 4th Seminar for
15–17 February 1999. Agric. For. Meteorol., Homogenization and Quality Control in Climatological
103(1–2). Special Issue. Databases (Budapest, October 2003). WCDMP-No.
Steel, R.G.D. and J.H. Torrie, 1980: Principles and 56 (WMO/TD-No. 1236), Geneva.
Procedures of Statistics: A Biometrical Approach. ———, 2004c: Improving agrometeorological
New York, McGraw-Hill. bulletins (M.V.K. Sivakumar, ed.). In: Proceedings
von Storch, H. and F. Zwiers, 1999: Statistical Analysis of the Inter-Regional Workshop, 15–19 October
in Climate Research. Cambridge, Cambridge 2001, Bridgetown, Barbados. AGM-5
University Press. (WMO/TD-No. 1108), Geneva.

You might also like