Methods, Algorithms, and Software
Methods, Algorithms, and Software
and Software
3.1 Modeling and Simulation
M.-J. Cros, F. Garcia,
R. Martin-Clouaire, and J.-P. Rellier
Abstract. Modeling and simulation are major assets in the engineering of systems
of all types, in particular for production management problems. This section provides
an overview of the basic aspects and key issues in modeling and simulation of agricul-
tural production systems. Particular attention is given to the kind of structures, proc-
esses, requirements, and difficulties that have to be dealt with in the modeling of the
interactive processes underlying management activities and biophysical components.
Simulation can be used in combination with optimization techniques to find settings
that optimize expected performance criteria; various such techniques are surveyed.
The issue of calibration and validation of an agricultural production system model is
also addressed.
Keywords. Agricultural production systems, Production management, Decision,
Modeling, Simulation, Optimization.
3.1.1 Introduction
The field of computer simulation is usually combined with the term modeling to
form modeling and simulation or simulation modeling. To model is to create a struc-
ture that captures interesting or noteworthy attributes and processes of an object of
study. A model is a simplified description of a system through a computerized repre-
sentation that enables us to conduct virtual experiments about its behavior. Basically,
computer simulation is the imitation for some period of consideration of the function-
ing of a system that accepts inputs, produces outputs, and interacts with its environ-
ment. A model of a system in its environment is an abstraction of reality, and as such,
certain details are excluded from it. It consists of a set of instructions, rules, equations
or constraints that generate input/output behavior once processed by a simulation
mechanism.
Simulation modeling is an amazingly diverse topic whose coverage can be found in
just about every discipline. In particular, modeling and simulation are major assets in
the engineering of systems of all types, in particular for production management prob-
lems. This section focuses on the modeling and simulation of agricultural production
systems, which constitute one of the main application domains in agricultural engi-
110 Chapter 3 Methods, Algorithms, and Software
follow interactions as time goes on. Statistical relations may suffice for some phenom-
ena, whereas mechanistic models of the functioning of some processes may be re-
quired in order to be able to extrapolate outside the range of data available and to per-
form fine-grained simulations, where order and time of occurrence of events matter. A
dynamic model may be continuous (e.g., a system of differential equations) or dis-
crete-event in which properties change instantaneously at separate instants. The model
of the external environment is also an important difference between simulation sys-
tems in terms of what is taken into account (e.g., climatic factors) and whether there
are stochastic inputs. Finally, another important discriminant feature concerns the rep-
resentation of the decision-making behavior and the various production constraints to
take into account in decision making and implementation of decisions.
The software systems that, to some extent, simulate farm or field production sys-
tems can be looked at from the standpoint of their use during simulation: users may
interact with the simulator to provide decision inputs as the execution progresses or
may passively watch the results. Examples of the first type of use include on-line deci-
sion-making support (guidance based on real data) by exploiting the predictive power
of biophysical models. Crop problems or requirements (water stress, fertilizer needs)
can be foreseen ahead of time and testing of an appropriate anticipation can be done
on computer or looked for with computer help [1]. Another application where on-line
use of a simulator might be helpful is to study the on-line reaction of an operator in the
course of realizing a particular task: the user can enter decisions as the simulation pro-
ceeds step by step (by day or week, for instance) and can learn to operate better, by
their mistakes or poor performance results. In these cases, only a dynamic model, ei-
ther mechanistic or empirical, of the biophysical system is required; no decision-
making model is needed.
Simulation software working in the passive mode is more common. The simple ex-
amples concern models aimed at budgeting analysis and production system configura-
tion. A static model constituted by simple algebraic relations (a spreadsheet model)
involving decision variables may suffice. In case some parameters are uncertain, prob-
ability distributions can be assumed and the prediction can be done by Monte Carlo
simulation. This computes the probability distributions of the variables connected to
these parameters. In another class of applications, simulation is used as a device for
helping capture and communicate intelligibly and convincingly how things work (i.e.,
interactions). By providing a well-founded, encompassing and shared body of knowl-
edge about the behavior of a production system of interest, this capability can serve as
a virtual experimentation platform for multiparticipant training sessions involving
extension agents and farmers; see for instance the DSSAT family of systems [2].
Thanks to insightful visual presentation, simulation is helpful in giving an overall un-
derstanding about how the system functions and creating objective opinion and con-
sensus about where some important issues lie. The underlying model, in particular as
regards the decision-making aspects, may be more or less complex depending on the
target population. Such systems may also be able to provide a rough estimate of the
economical performance or environmental impact of the management options across a
range of climatic scenarios. They might also be used to support diagnosis and for ex-
112 Chapter 3 Methods, Algorithms, and Software
planatory purposes. Indeed, they can help in understanding why certain physiological
phenomena occur (Why did this growth problem happen on this crop?) by exhibiting
the chronology of interactions that are responsible for the observed results.
Usually these systems are essentially crop models and only highly simplified man-
agement situations are dealt with under the form of a set of fixed management options
(e.g., crop species, sowing date, rate of nitrogen fertilization). Systems belonging to
this line of work have very limited capabilities to represent management practices and
to take into account production constraints during a season. They do not allow simu-
lated decisions to be planned and made in relation to the dynamic conditions on the
farm both as regards biophysical states and operational constraints; decisions have to
be made before each simulation regardless of what might happen and feasibility con-
siderations. Consequently, they provide little help in evaluating realistic management
strategies and operating procedures, identifying production resource bottlenecks (e.g.,
peak labor force demand), and designing by iterative local improvements in new man-
agement solutions adopted to changes in the technical, economical, and social context.
Overcoming these limitations to address management issues with simulation ap-
proaches is still a challenge that requires modeling human activities as an integral part
of the production system model. What is involved in the modeling of human activities
is the subject of the next section. The subsequent section deals with the modeling of
biophysical aspects.
3.1.3 Simulation Mechanisms
As discussed in the preceding section the modeler’s way of representing systems
might be different depending on the model purpose. The underlying simulation algo-
rithms are specific to the modeling approach chosen. The book by Zeigler et al. [3] is a
good technical reference about the continuous and discrete systems approaches that
are briefly presented here together with the popular spreadsheet approach.
Spreadsheet Simulation
Spreadsheet simulation refers to the use of a spreadsheet as a platform for repre-
senting the simulation model and performing the simulation experiments. For agricul-
tural production system application (e.g., [4]) this approach is often employed when
the study can be done with a simple static model that represents mathematical and
logical relationships between variables (e.g., feedlot ration analysis). Essentially a
spreadsheet is an electronic grid consisting of rows and columns of cells. Each cell can
be addressed individually and can contain data or a formula. The resolution mecha-
nism propagates any change in the data cell to the formula cells referring directly or
indirectly to the data cells. The data cell can be random numbers and propagation then
employs Monte-Carlo computation techniques.
Spreadsheet simulation is popular thanks to the wide availability of its software, its
intuitive interface, and its ease of use. At the same time, this approach is not recom-
mended for complex models that require dedicated simulation tools implementing one
of the next approaches presented.
3.1 Modeling and Simulation 113
Continuous Systems
The continuous systems approach assumes continuous state variables and time
through differential equation formulation: the rates of change of the state variables are
defined by derivative functions. See [5] for an example of the use of this approach in
modeling an agricultural production system. The simulation mechanism is based on a
numerical integrator [6] such as the Euler or Runge-Kutta method that uses a discre-
tized time base.
Systems dynamics [7] software implements this approach within a framework sup-
porting graphical representation of the structure of the model. System dynamics mod-
els are centrally concerned with cyclic relationships that represent positive and nega-
tive feedback loops. This approach looks at systems at a very high level of abstraction,
which makes it more appropriate for strategic analysis than for a detailed analysis as
required, for instance, in process re-engineering. The next two approaches are more
often used for this latter purpose.
Discrete Time Systems
Discrete time systems (e.g., [8]) assume a stepwise mode of execution. The dynam-
ics of the system are represented by difference equations or more generally by transi-
tion functions that each express how to update a state variable on the basis of the state
at the previous time step and the inputs (influencing factors). The simulation mecha-
nism relies on a fixed-step iterative algorithm that jumps from one simulation step to
another and computes the next state given the state and input at the current time. All
the model variables are scanned at each step.
The finite state automaton formalism is subsumed by discrete time systems. Cellu-
lar automata are particular examples widely used for studying physical spreading phe-
nomena (e.g., infestation propagation) or group phenomena (e.g., population dynam-
ics).
Discrete Event Systems
In the discrete event system approach [9], the modeling of the dynamic behavior of
the system is similar to that used in discrete time systems: transition functions specify
local changes. In addition, this approach relies on the identification of events that
cause the transitions to be fired at their occurrence time. The important difference with
discrete time systems is in the mechanism of event processing, which jumps from one
interesting point in time (when an event occurs) to another. It scans only time points
and variables concerned with the current event. As long as no events occur, no state
changes are made. The simulation clock is event-driven. Discrete event simulation
works by maintaining a list of events (agenda) sorted by their scheduled event times.
Events are treated by taking them from the event list in sequential order and executing
the associated processes, which produce state transitions. Processing events results in
new events being scheduled and inserted into the event list, as well as events being
canceled or removed from the event list. Events can also be caused by the environ-
ment, which is not under the control of the system itself.
114 Chapter 3 Methods, Algorithms, and Software
In his management task the farmer has to deal with several interactive dynamics
concerning the biophysical components, the environment of the system of interest (ex-
ternal events) and the unfolding of the technical actions resulting from the decision he
has made. Most actions are durative; their execution may be disturbed, interrupted,
and even never finished. The farmer has to combine planned and reactive behaviors in
order to organize the work as a function of known and exploitable regularities and
adapt to contingencies as they occur. Therefore, the tight coupling between sensing
and decision making in a timely manner is of primary importance. Managing an agri-
cultural production process is a dynamic process in which plan revision and execution
must be interwoven because the external environment changes dynamically beyond
the control of the farmers and because relevant aspects are revealed incrementally.
The decision-making behavior of farmers in their management task has been the
subject of different kinds of investigation, but the mental process that intervene be-
tween stimulus and response and by which that behavior is exhibited is still largely
unknown. Nevertheless, the concept of management strategy is often used as a means
to express beforehand the farmer’s management behavior. These management strate-
gies do not pretend to reproduce what is happening in the manager’s head. More mod-
estly they attempt to enable the derivation of what the farmer does depending on the
current state of affairs. A management strategy can be seen informally as a manually
elaborated construct that specifies a kind of flexible plan coming with its context-
responsive adaptations and the necessary implementation details to constrain the step-
wise determination and execution of the actions to perform. Due to evolving and un-
predictable circumstances the plans are flexible with respect to the temporal organiza-
tion of the constituting activities. The commitments to particular activities are delayed
until run-time conditions are known. In particular, what can be executed is strongly
constrained by the availability of resources and state-dependent requirements on the
operations suggested by the plan.
The modeling of human activities should not be of a black-box type when manage-
ment aspects are central for the study carried out with simulation. How can we deter-
mine that a management strategy is likely to succeed or can be improved if there is no
explicit representation of it? A model of human activities requires a computational and
declarative representation of the activities, execution processes, information, re-
sources, constraints, and behavior involved in the production process. The model
should enable us to determine the impact of changes on all parts of the production
process. For example, if some activities are inserted, deleted, modified, or coordinated
differently with others, the model must be able to represent how this will affect the
resource use, economic performance, and other aspects of interest. The model should
make explicit what is planned, what might happen, and, more generally, any informa-
tion and knowledge necessary in managing and operating the production process.
The agent modeling [12, 13] and enterprise modeling [14] fields of artificial intelli-
gence provide a set of inspiring and useful formalisms for representing such knowl-
edge structures and afferent processing mechanisms. See [15-21] for agricultural pro-
duction system examples involving some modeling of human activities.
116 Chapter 3 Methods, Algorithms, and Software
Stochastic weather generators are playing a more and more important role in deci-
sion support systems in conjunction with simulation models of agricultural production
systems. They are generally used to estimate, for every management strategy, the
probability distribution of different output variables of interest (such as yield, eco-
nomical margin, and labor hours) in order to compare them and to select the best man-
agement strategy. Stochastic weather generators can also be used in real time. Given
the observed data at the current time and a set of possible future weather data pro-
duced by the weather generator, the distribution of simulated crop yields, for instance,
can be obtained.
Several stochastic weather generators, such as WGEN [24] or LARS-WG [25, 26],
are now available and can be used in agricultural production models. They differ in
structure and complexity. However, these generators generally work similarly by es-
timating the model parameters on the basis of some observed daily weather data for a
given site, and then by producing daily weather data, using a pseudo-random number
generator.
3.1.7 Simulation Optimization
The problem of optimizing an agricultural production system is in finding the best
management strategy that yields the optimal expected performance. When a strategy is
defined as a sequence of decision rules, this optimization problem can be seen as a
control problem (see Section 3.2), for which dynamic programming methods exist
[27]. In this approach simulation models can be used for estimating the rewards [28]
or the probabilities of the state transitions [29] when a stochastic dynamic program-
ming model is used. However, dynamic programming methods are often inappropriate
in cases of large state and decision spaces, despite some promising improvements that
have been obtained recently with the reinforcement learning approach. This approach,
also called neuro-dynamic programming, directly approximates the solution of the
dynamic programming procedure during simulation [30].
A more flexible and realistic formulation of the optimization problem consists in
searching, with the use of simulation, for the best values for the parameters of a pre-
defined strategy. In that case the simulation model is considered as a black box func-
tion where the vector of strategy parameters θ = (θ1, …,θp) is taken as input variable,
and where the output is the objective function J. In most of the simulation models of
agricultural production systems, the objective function J also depends of the current
climatic series, which has to be considered as an unknown and uncontrollable random
variable ξ. Optimizing a strategy thus consists in searching for the set of parameters θ*
that maximize the expected value of the objective function J (assuming J models a
benefit function) according to the random climatic series. In mathematical terms:
θ* = argmax E(J(θ,ξ)), with θ ∈ Θ (1)
When the value domain Θ is in R (the p-dimensional Cartesian product of reals),
p
different efficient methods have been developed for solving this stochastic simulation
optimization problem, which belongs to the most difficult problems of mathematical
programming [31-33]. Until the beginning of the 1990s, these methods involved
118 Chapter 3 Methods, Algorithms, and Software
When the ξi are fixed, the objective function J thus becomes a deterministic func-
tion of the input variables θ, and a range of classical optimization algorithms can be
used. Note however that a good estimate of E(J(θ,ξ)) may require a large number N of
samples and thus multiple simulation run replications. This technique, called sample
path optimization, thus converts a stochastic problem into a deterministic one. It was
originally developed for solving continuous parameter simulation optimization prob-
lems, and has recently been studied in a variety of contexts.
More recently, stochastic versions of branching methods for discrete global optimi-
zation problems have been developed including, in particular, a stochastic version of
the branch-and-bound method and the nested partition method, which hierarchically
partitions and evaluates by random sampling the search space.
Evolutionary algorithms and other standard deterministic optimization methods
have all been applied to agricultural simulation models in the last years [34], where
they have shown good performance. However, most of these optimization problems
consist in optimizing postseason optimal management decisions, based on a complete
knowledge of the weather (e.g., [35-39]). Conversely, very few optimization methods
have been applied so far to agricultural systems for optimizing the expected value of
3.1 Modeling and Simulation 119
an objective function, as described above. See, for instance, [40] where the Kiefer-
Wolfowitz stochastic gradient method is used to derive the best values of some pa-
rameters involved in a grazing management strategy, or [41] where irrigation strate-
gies are optimized by using the Nelder-Mead simplex algorithm. The easy availability
of stochastic weather generators should lead to a fast development of this stochastic
optimization approach.
3.1.8 Calibration and Validation
Calibration involves estimating the values of various parameters in the model struc-
ture on the basis of real world data. Such a task is commonly accomplished with spe-
cialized statistical computer programs designed for just such purposes. In some cases
model calibration is also performed in-line by exploiting acquired data from real field
measurements during implementation of the model. The errors computed from the
model predictions and the subsequent measurements are back-propagated for tuning
the model. This procedure is very useful when little data are available for initial model
calibration or when the modeled process evolves in time by itself. Model calibration
can also be accomplished by using values of parameters from models estimated for
another context that is similar to the one being considered; this strategy is referred to
as importing model parameters and should be employed only by experienced practi-
tioners.
Classically, validation of a model involves experiments comparing model behavior
against actual measurements. Obviously this approach is feasible only in cases of
models involving a relatively small number of variables and parameters. Validation
becomes increasingly difficult for an agricultural production model which is expected
to provide realistic estimates of different dynamic aspects varying over a period of
several months. The extent of variation of most input variables (weather and manage-
ment) is large and precludes any systematic exploration. Consequently, it is impossible
for this kind of simulation model to be validated over its entire domain of application
[42]. Moreover, records from the observation of the natural system (the production
system and weather) are scarce and incomplete with respect to the set of aspects listed
above and the time frame of interest. Although these data might be used to perform
standard statistical validations of some parts of the model there is no assurance that the
assembled model will necessarily behave acceptably well. Some errors may be intro-
duced through linking model components at a higher level. The various parts of the
model may be unequally checked and some interactions may not be predictable.
Hence, the only possible approach is a subjective one in which scientists and ex-
perts in the agricultural production domain are provided with simulations of cases fa-
miliar to them and asked if the model behavior is consistent and reasonably accurate.
The validation consists then in checking that the results of simulation are in agreement
with those expected by the knowledgeable people involved in the validation process.
Furthermore, validation has to form part of the development of the whole simulation
system; the results of evaluation provide feedback to make corrective changes to the
biophysical model. Ultimately validation must provide substantiation that a computer-
120 Chapter 3 Methods, Algorithms, and Software
ized model within its domain of applicability possesses a satisfactory range of accu-
racy consistent with the intended application of the model.
The validity of the model is defined in terms of its ability to reproduce faithfully
and accurately biophysical system behavior induced by a particular management strat-
egy and its consistency and sensitivity for different weather patterns. Strategies are
evaluated with respect to user-defined criteria that are valued as a function of output
results provided by the model. Typically, the criteria are concerned with the dynamics
and timing of key events or the magnitude or trend of relevant quantities. More spe-
cifically the kinds of outputs that might be used in evaluation include:
• a time series of daily values of model variables;
• the date and duration of operations;
• a time series of daily values of user-defined variables, that is, variables defined
as functions of model variables for inspection purposes or for synthesizing deci-
sion-relevant indicators;
• a calendar of key events, that is, the date of occurrence of decision-relevant
situations;
• histograms of variable values simulated for different climatic scenarios.
The analysis of outputs obtained with particular farm configuration and management
options needs to be made using the weather pattern assumed for the simulation.
3.1.9 Conclusion
Thanks to their ability to model complex things as they are, simulation models of-
fer great potential for the study and design of agricultural production systems and for
education and training. Simulation, in the spirit advocated in this section, enables us to
evaluate management practices and understand why and in what cases they may per-
form acceptably well or fail. The approach can be used to provide more compelling
evidence to demonstrate beforehand that the production processes are in compliance
with norms of social acceptability. Simulators can make insightful and reliable projec-
tions in a range of external environment scenarios, which nobody would even attempt
due to the complexity of the system. In particular, they can support uncertainty analy-
sis (how robust is a management strategy), timing analysis (when is some particular
phenomenon occurring in the absolute time or relatively to another phenomenon) and
resource use analysis (what are the critical needs). Simulation can stimulate farmers’
thoughts about the management problems and potentially augment their innate knowl-
edge-handling abilities. From the point of view of the model developers, the activity of
modeling may reveal new ways of thinking about the decision domain and help par-
tially formalize aspects of decision making.
Simulation modeling is no longer limited by computational difficulties thanks to
computing power advances. Comprehensive systems that integrate all the aspects dis-
cussed in this section are, however, just starting to emerge, in particular for the model-
ing of human activities. Developing integrative farm-level models is new and little
methodological support is available to support the learning/discovery process: how to
choose what to feed to the simulator (what to experiment) and how to analyze outputs
when inputs involve management strategies. Some aspects of agricultural production,
3.1 Modeling and Simulation 121
such as precision agriculture and beyond farm-level interactions, are in need of further
investigation with the simulation modeling approach. The modeling task could be
eased by promoting reuse of knowledge and software modules and developing generic
frameworks dedicated to agricultural production systems.
Simulation modeling of agricultural production system requires a multidisciplinary
collaboration between specialists of various domains including crop and animal sci-
ences, farming systems, ergonomics of agricultural production system, agent model-
ing, and software engineering.
This section has focused on the modeling and simulation of an isolated agricultural
production system. There is, however, growing interest in using computer simulation
to explore and clarify the relationship between small-scale features observable at the
level of the individual production system and large-scale, societal (or macro) phenom-
ena that emerge from the micro-level interactions. This is the subject of multi-agent
simulation that has its roots in distributed artificial intelligence [43]. Multi-agent simu-
lation approaches [44] have great potential to study agricultural problems involving
resource sharing (e.g., irrigation water) or environmental consequences of the activi-
ties of a community.
References
1. Cabelguenne, M., C. A. Jones, and J. R. Williams. 1995. Strategies for limited
irrigation of maize in south-western France: A modeling approach. Trans. ASAE
38: 507-511.
2. Jones, J. W., G. Y. Tsuji, G. Hoogenboom, L. A. Hunt, P. K. Thornton, P. W.
Wilkens, D. T. Imamura, W. T. Bowen, and U. Singh. 1998. Decision support
system for agrotechnology transfer: DSSAT v3. Understanding Options for
Agricultural Production, eds. G. Y. Tsuji, G. Hoogenboom, and P. K. Thornton,
157-177. Dordrecht, The Netherlands: Kluwer Academic Publishers.
3. Zeigler, B. P., H. Praehofer, and T. G. Kim. 2000. Theory of Modeling and
Simulation. San Diego, CA: Academic Press.
4. Coléno, F. C., and M. Duru. 1999. A model to find and test decision rules for
turnout date and grazing area allocation for a dairy cow system in spring.
Agricultural Systems 61: 151-164.
5. Guerrin F. 2001. MAGMA: A simulation model to help manage animal wastes at
the farm level. Computers and Electronics in Agriculture 33: 35-54.
6. Burden, R. L., and J. D. Faires. 1989. Numerical Analysis. Boston, MA: PWS-
KENT Publishing Company.
7. Roberts, N., D. F. Andersen, R. M. Deal, M. S. Grant, and W. A. Shaffer. 1983.
Introduction to Computer Simulation: A System Dynamics Modeling Approach.
Boston, MA: Addison-Wesley.
8. Cros, M.-J., M. Duru, F. Garcia, and R. Martin-Clouaire. 2001. Simulating
rotational grazing management. Environment International 27: 139-145.
122 Chapter 3 Methods, Algorithms, and Software
9. Rellier, J.-P. 1992. Prediction and design problems in crop management. Proc.
12th International Conference on Artificial Intelligence, Expert Systems and
Natural Language 2: 739-748.
10. Aubry, C., F. Papy, and A. Capillon. 1998. Modeling of decision-making
processes for annual crop management. Agricultural Systems 56: 45-65.
11. Papy, F. 2000. Farm models and decision support: A summary review. Research
on Agricultural Systems: Accomplishments, Perspective and Issues, eds. J.-P.
Colin, and E. W. Crawford, 89-107. Huntington, NY: Nova Science Publishers.
12. Wooldbridge, M. J., and N. R. Jennings. 1995. Intelligent agents: Theory and
practice. Knowledge Engineering Review 10(2): 115-152.
13. De Giacomo G., Y. Lespérance, and H. Levesque. 2000 Congolog, a concurrent
programming language based on the situation calculus. Artificial Intelligence
121: 109-169.
14. Fox, M. S., and M. Gruninger. 1998. Enterprise modeling. AI Magazine (fall 98):
109-121.
15. Papy, F., J.-M. Attonaty, C. Laporte, and L.-G. Soler. 1988. Work organization
simulation as a basis for farm management advice. Agricultural Systems 27: 295-
314.
16. Sherlock, R. A., and K. Bright. 1999. An object framework for farm system
simulation. Proc. MODSIM ’99, eds. L. Oxley, F. Scrimgeour, and A. Jakeman,
3: 783-788. The Modeling and Simulation Society of Australia and New Zealand
Inc.
17. Martin-Clouaire, R., and J.-P. Rellier. 2000. Modeling needs in agricultural
decision support systems. Electronic Proc. of the XIVth CIGR Memorial World
Congress.
18. Bergez, J.-E., P. Debaeke, J.-M. Deumier, B. Lacroix, D. Leenhardt, P. Leroy,
and D. Wallach. 2001. MODERATO: An object-oriented decision tool for
designing maize irrigation schedules. Ecological Modelling 137: 43-60.
19. Jeannequin, B., R. Martin-Clouaire, M. Navarrete, and J.-P. Rellier. 2003.
Modeling management strategies for greenhouse tomato production. Proc. of
CIOSTA-CIGR V Congress, 506-513.
20. Martin-Clouaire R., and J.-P. Rellier. 2003. A conceptualization of farm
management strategies. Proc. of EFITA-03 Conference, 719-726.
21. Cros, M.-J., M. Duru, F. Garcia, and R. Martin-Clouaire. 2004. Simulating
management strategies: the rotational grazing example. Agricultural Systems 80:
23-42.
22. Boote, K. J., J. W. Jones, and G. Hoogenboom. 1998. Simulation of crop growth:
CROPGRO model. Agricultural Systems Modeling and Simulation, eds. R. Peart,
and R. Curry, 651-691. New York, NY: Marcel Dekker Inc.
23. Cros, M.-J., M. Duru, F. Garcia, and R. Martin-Clouaire. 2003. A biophysical
dairy farm model to evaluate rotational grazing management strategies.
Agronomie 23(2): 105-122.
3.1 Modeling and Simulation 123
24. Richarson, C. W., and D. A. Wright. 1984. WGEN: A model for generating daily
weather variables. U.S. Department of Agriculture, Agricultural Research
Service, ARS-8.
25. Racsko, P., L. Szeidl, and M. Semenov. 1991. A serial approach to local
stochastic weather models. Ecological Modelling 57: 27-41.
26. Semenov, M. A., R. J. Brooks, E. M. Barrow, and C. W. Richardson. 1998.
Comparison of the WGEN and LARS-WG stochastic weather generators in
diverse climates. Climatic Research 10: 95-107.
27. Kennedy, J. O. 1986. Dynamic Programming: Application to Agricultural and
Natural Resources. London, UK: Elsevier Applied Science.
28. Epperson, J. E., J. E. Hook, and Y. Mustafa. 1993. Dynamic programming for
improving irrigation scheduling strategies of maize. Agricultural Systems 42: 85-
101.
29. Bergez, J.-E., M. Eigenraam, and F. Garcia. 2001. Comparison between dynamic
programming and reinforcement learning: A case study on maize irrigation man-
agement. Proc. 3rd European Conference for Information Technology in
Agriculture, Food and the Environment, ed. J. Steffe, (2): 343-348. France: Agro
Montpellier.
30. Sutton, R. S., and A. G. Barto. 1998. Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press.
31. Andradóttir, S. 1998. Simulation optimization. Handbook of Simulation.
Principles, Methodology, Advances, Applications, and Practice, ed. J. Banks,
307-334. New York, NY: John Wiley and Sons, Inc.
32. Swisher, J. R., P. D. Hyden, S. H. Jacobson, and L. W. Schruben. 2000. A survey
of simulation optimization techniques and procedures. Proc. of the 2000 Winter
Simulation Conference, 119-128.
33. Fu, M. C. 2001. Simulation optimization. Proc. of the 2001 Winter Simulation
Conference, 53-61.
34. Mayer, D. G., J. A. Belward, and K. Burrage. 1998. Optimizing simulation
models of agricultural systems. Annals of Operations Research 82: 219-231.
35. Li, M., and R. S. Yest. 2000. Management-oriented modeling: Optimizing
nitrogen management with artificial intelligence. Agricultural Systems 65: 1-27.
36. Mayer, D. G., J. A. Belward, and K. Burrage. 1996. Use of advanced techniques
to optimize a multi-dimensional dairy model. Agricultural Systems 50: 239-253.
37. Mayer, D. G., J. A. Belward, and K. Burrage. 1998. Tabu search not an optimal
choice for models of agricultural systems. Agricultural Systems 58: 243-251.
38. Mayer, D. G., J. A. Belward, and K. Burrage. 2001. Robust parameter settings of
evolutionary algorithms for the optimisation of agricultural systems models.
Agricultural Systems 69: 199-213.
39. Parsons, D. J. 1998. Optimizing silage harvesting plans in a grass and grazing
simulation using the revised simplex method and a genetic algorithm.
Agricultural Systems 56: 29-44.
40. Cros, M.-J., M. Duru, F. Garcia, and R. Martin-Clouaire. 2001. Simulation
optimization of grazing management strategies. Proc. 3nd European Conference
124 Chapter 3 Methods, Algorithms, and Software
3.2.1 Introduction
Control is a subset of automation in the wider sense. Automation aims at perform-
ing operations without constant human effort. This section provides an overview of
control methods and algorithms, including dynamic optimization, in agricultural appli-
cations with a focus on continuous processes. There is another main stream in automa-
tion that deals with discrete operations like harvesting, packing, and internal transport.
This field is characterized by the use of programmed sequences and PLCs in combina-
tion with suitable on/off sensors and simple actuators. Though important and wide-
spread, this special field is not covered in this section.
There are two major fields of applications of control in the agricultural area: me-
chanics and mechatronics (power, implements, field operations, robotics) and proc-
essing (drying, storage, greenhouse cultivation).
Despite differences in time scales and targets, there are many similarities in ap-
proach. Therefore, in this section we concentrate on generic concepts. Our aim is to
present an overview of the various controller paradigms, in order to help the reader in
making the proper choice in practical situations. Section 4.1 Automation and Control
of this handbook provides an overview of components and techniques of practical con-
trol systems.
3.2.2 The Purpose of Control
Before designing or purchasing any control system, it is important to ask what the
main goal or objective of the controlled system should be. The goal is a very central
3.2 Control and Optimization 125
issue. Control can be defined as “to make the system behave as we desire.” The desire
expresses the objective. Possible control objectives are:
• To provide precision, accuracy and quality. Keeping things constant and repeat-
able is a major derived objective of this overall goal.
• To provide comfort and relieve people of monotonous work. This includes tasks
that without control would be hard to do.
• To ensure safety, and to avoid risks. The control system is used to provide
alarms and to stabilize a system within safety bounds.
• To prevent waste and abuse of valuable resources. The control system allows
operations to be performed near the margins, thus saving energy, preventing
quality loss, and making the best use of scarce resources. Such goals often ask
for more than just set-point control: constraint satisfaction and optimization
methods are important here.
• To enhance the economic result of the process, in the wide sense. Often, this en-
tails a trade-off between purely economic factors and constraints due to envi-
ronment and product quality. This may include savings on labor. It is obvious
that optimization methods play a major role here.
• To design systems that without control would not be possible, e.g., precision
farming. The control in these applications is not just an additional tool, but is an
essential element of the system as a whole.
3.2.3 Systems, Signals, and Models
The first step in designing or choosing a controller for a system is to define the sys-
tem boundaries. Once the system boundaries have been defined, it is necessary to
specify how the system interacts with its environment. This process can be viewed as
modeling. We need a model in order to know how the system responds to actions that
we impose upon it, so that we can make the system behave as desired. In practice,
sometimes successful control can be achieved without explicit models, by trial and
error. Yet, modeling is a very central issue in the science-based design and develop-
ment of controllers.
From the point of view of modeling for control purposes, physical streams such as
energy and mass exchange can all be viewed as information streams or signals, i.e.,
entities that vary in time. A dominant approach in control theory is to distinguish be-
tween input signals and output signals. Inputs, or forcing variables, are exerted upon
the system from the outside. Outputs represent the response of the system. Inputs can
be further subdivided into unobserved disturbance inputs (v), observed disturbance
inputs (d), and control inputs (u) (see Figure 1). Outputs can be subdivided into ob-
served outputs (y) and outputs of interest (q). These are not observed, but can be calcu-
lated from known variables of the system. From the point of view of the system dy-
namics, the subdivision is not important, but from the point of view of controller de-
sign it is, because each control input is associated with an actuator, and each observed
variable requires a sensor. In practice, actuators and sensors determine the cost of the
control system, and are therefore subject to trade-off between costs and achievable
performance.
126 Chapter 3 Methods, Algorithms, and Software
v(t)
d(t) q(t)
u(t) y(t)
I/O SS
u(t) y(t) u(t) y(t)
Table 1. Standard form for continuous input-output (I/O) models and for state space
models (SS). The non-linear and linear differential equation forms are
given, as well as the transfer function form in the linear case.
I/O SS
dy d 2 y du d 2 u { }
x = f x, u , t
Non-linear f y, , , = g u , , ,
dt dt 2 dt dt 2 y = g {x, u , t}
n n −1
d y d y dy
an + a n −1 + ... + a1 + ao y =
dt
n
dt
n −1
dt x = Ax + Bu
Linear
q
d u d
q −1
u du y = Cx + Du
bq + bq −1 + ... + b1 + bo u
q q −1
dt dt dt
q q −1
bq s + bq −1 s + … + b1 s + b0
Linear transfer function Gu → y =
n n −1
(
y = C( sI − A )−1 B + D u )
a n s + a n −1 s + … + a1 s + a 0
becomes irrelevant. State variables can have a physical meaning, but they can also be
virtual mathematical constructs. It should be noted that there may exist several state
space representations that yield the same input-output behavior, showing that the state
is not unique. The state space approach is very powerful, in particular in the frame of
design and optimization.
Table 1 presents the common form of continuous models in I/O form and in state
space form, both for the general, non-linear case and for the linear case. In this table
u(t) is the input and y(t) is the output. The I/O equations are given for a single input
and single output only, but the equations can easily be expanded to multiple inputs and
multiple outputs. In the state space representation u(t) is an m-dimensional input and
y(t) is a p-dimensional output, whereas x(t) is an n-dimensional state vector. For linear
models, the transfer function form (TF) is also frequently used. The transfer function
shows how the Laplace transformed output y depends upon the Laplace transformed
input u . There are as many transfer functions as there are input-output combinations.
The expression between brackets for the state space case is an m · p matrix. The linear
I/O differential equation form, the transfer function form, and the linear state-space
form can be transformed in one another. This means that controller designs based on
one of these forms can also be made applicable to any of the other forms.
3.2.4 PID Control
Before introducing a generic framework to control problems, it is appropriate to
present briefly the well-known classical Proportional-Integral-Derivative (PID) con-
troller. This controller uses a single input and a single output (SISO). The goal is to
reject disturbances, and to track a reference trajectory r(t), which in many applications
is just a constant set-point r. The controller is given by:
128 Chapter 3 Methods, Algorithms, and Software
v(t)
d(t) q(t)
FF Process
r(t) + u(t) y(t)
FB
_
Controller
Figure 3. Signal flows for a system with feed-forward and feedback controller.
1 dε (t )
u (t ) = c + K ε (t ) + ∫ ε (t )dt + τ d (1)
τ I dt
The main idea is to check whether the output deviates from its reference value, and
to correct the input u in proportion to the error ε(t) = r(t) – y(t), using controller gain
K. This idea of feedback (FB) is central to control. If the error is zero, the input is at its
nominal value c. Because in practice due to load variations it can happen that the
steady input needs to be adjusted to maintain the output, an integral action is added to
achieve this automatically. The derivative action is introduced to speed up the control-
ler response to fast disturbances.
Figure 3 shows the signal flow for this type of controller. The scheme also shows
the option of feed-forward (FF) compensation of measured disturbances.
3.2.5 A Generic Framework
General System Model Description
A generic model description in state space form is:
x(t ) = f {x(t ), u (t ), d (t )} , x(t o ) = xo (2a)
y (t ) g y {x(t ), u (t ), d (t )}
= (2b)
g {x(t ), u (t ), d (t )}
q (t ) q
Here, x is the state vector, with an initial value xo at the initial time to. The time
evolution of the state is given by differential equations (Equation 2). The vector val-
ued function f represents the rate of change, as a function of the state itself and the
external forcing variables. These input signals are separated into manipulated variables
u(t), and disturbance variables d(t). From the state output signals can be computed.
The outputs are separated into observed outputs y(t) and variables of interest (“qual-
ity” variables) q(t) that either cannot be measured on-line or can only be computed.
The functions gy and gq are vector valued algebraic read-out functions. In practice,
sometimes some of the variables of interest are computed by integration of states or
outputs, for instance total energy consumption in a greenhouse model. In the frame-
3.2 Control and Optimization 129
work above, such integrated values are handled as states. By proper spatial discretiza-
tion, the state space description can also accommodate spatially distributed systems.
Prior to setting up this scheme, the designer has to make decisions about the choice
of suitable control signals and output signals. This choice depends upon the availabil-
ity and cost of actuators and sensors in relation to their expected effectiveness. Making
this choice is a very important step in control system design, and often requires an
iterative procedure.
Ideal Control over a System
Provided we have a model without error, including full knowledge of future distur-
bances, and a suitable goal function J (u, d̂) to be optimized, then it is possible—in
theory—to compute a control trajectory u*(t), such that J is minimal. Also, the optimal
paths of the states x*(t) and the signals y*(t) and q*(t) will become available. Note that
the answer depends upon the future disturbances, so that a prediction is needed for d.
The goal function can be formulated as a cost function in which case its minimum
must be found, or as a benefit function that needs to be maximized. Since one can be
converted into the other by simply changing the sign, we will use the terms cost mini-
mization and goal optimization as equivalents.
A Typical Goal Function
A generic form of a goal function is:
tf
{ }
J = Φ x(t f ) + ∫ L{x(τ ), u (τ ), d (τ ), q (τ ),τ }dτ (3)
to
Here Φ represents the costs associated to the final state x(tf ). Usually these are bene-
fits, such as crop at harvest time, or biomass value at completion of the cultivation in
batch reactors. L represents the running costs. An example is the heating in a green-
house, where the costs relate to the heating power u(t) and the temperature x(t). The
final time tf can be fixed, free, or infinite.
Dynamic Optimization
The task is to find the optimal control trajectory u*(t), such that the cost J is mini-
mized, while satisfying the model equations. In mathematical terms:
u * (t ) = arg min J (4)
subject to Equations 1 and 2. In addition, there may be constraints on x, u, and y.
There is software available to solve these kinds of problems, both for continuous
systems as presented here, as well as for discrete time and sampled data systems. The
solutions differ depending upon the nature of the final time and the presence of state or
control constraints [1].
Open Loop Control
Under the assumptions made above the calculated optimal control u*(t) can be sup-
plied to the system. If the model is correct, the system will behave in an optimal way,
such that x(t) = x*(t), y(t) = y*(t), q(t )= q*(t), and the goal function J = J*. This is
130 Chapter 3 Methods, Algorithms, and Software
called open loop optimal control. An example is the steering of a robot to perform a
time-optimal movement under fixed load conditions. Another example is the optimal
feed rate control to a fed-batch bioreactor, so as to optimize biomass yield.
The Need for Feedback
In real life, the picture outlined above is spoiled for several reasons:
• modeling errors,
• initial state deviations,
• deviations from the predicted disturbances, and
• unpredictable and unknown disturbances.
Deviations from predicted disturbances can also be treated as modelling errors, be-
cause disturbance prediction can be viewed as a special case of modeling. Modeling
errors arise, for instance, from neglect of sub-processes, aggregation of variables, and
erroneous parameters.
Because of uncertainties, the system behavior will deviate from the ideal behavior
x*(t). The answer to uncertainty is feedback. The system is observed, via the observ-
able outputs y(t), and if deviations occur from the expected trajectory, a correction is
made. So, the scheme is enhanced with a controller, which can be viewed as a de-
vice—possibly in the form of an algorithm implemented on the computer—that maps
the observed output y(t), and/or the systems state x(t) derived from it, into a control
input u(t), such that the goal function J is optimized. The controller may also use the
observed disturbance inputs d(t). Finding such maps is called feedback controller de-
sign. In order to be effective, feedback needs to be computed on-line.
3.2.6 Feedback Control Families
Optimal Control with Linear Feedback Compensator
The addition of a feedback compensator in addition to optimization leads to the
scheme of Figure 4a [2]. The dynamic optimization is performed off-line. The com-
pensator corrects the pre-calculated optimal controls with a correction signal, on the
basis of deviations between predicted output and observed output. If the disturbance
deviations are small, the compensator can be designed as a linear quadratic (LQ) con-
troller. The underlying model is a locally linearized transfer function or state space
model of the system. The goal function is the following quadratic expression:
tf
∫ (δ u (τ ) )
T
J= Q(τ )δ u (τ ) + δ x(τ )T R (τ )δ x dτ (5)
t0
where δ u and δ x are deviations from the nominal optimal trajectory, and Q and R are
weighting matrices, that allow the designer to balance between tracking properties and
control effort. It can be shown that for normally distributed independent stochastic
modeling and measurement errors the closed loop optimal control law is a state feed-
back law of the form:
δ u (t ) = − F (t )δ x(t ) (6)
where F follows from the matrices Q and R [2][3].
3.2 Control and Optimization 131
Application of control law (Equation 6) requires that the state can be observed. The
system state is often not accessible directly. In that case, state reconstruction is
needed. This can be done with a Kalman filter (LQG, linear quadratic Gaussian state
reconstruction) or by an observer [4]. The principle idea is that the state δ x in Equa-
tion 6 is replaced by an estimate, obtained as:
dδ xˆ (t )
= A(t )δ xˆ (t ) + B (t )δ u (t ) + L(t )δ y (t ) (7)
dt
where δy is the difference between the optimal output y*(t) and the observed output
y(t). In case of the Kalman filter the gain L is computed from the assumed system and
measurement noise variances. If such information is not available, the observer gain
can be designed such that the state error vanishes in a prescribed way.
Note that the compensator scheme fails when the optimal trajectory hits control
constraints.
Feedback Control with a Supervisor (Decision Maker)
A very common scheme appears if the off-line dynamic optimization is replaced by
a supervisory decision maker. This is shown in Figure 4b. The observed output is
compared to some set-point or set-point trajectory, which is dictated by a decision
d(t)
a d(t)
b
u(t) Process y(t) u(t) Process y(t)
Goal function
d(t) c
u(t) Process y(t)
u*(t) y*(t)
q*(t)
On-line dynamic optimisation
with receding horizon
Goal function
Figure 4. (a) Optimal control with linear feedback compensator, (b) standard
feedback/feed-forward control, (c) receding horizon optimal control.
132 Chapter 3 Methods, Algorithms, and Software
maker as supervisor. This can be a human operator, a scheduler (e.g., the program
sequence in common greenhouse climate control computers), or the result of static or
dynamic optimization.
Note that the overall benefits or costs of this scheme depend entirely upon the su-
pervisor. This can be sub-optimal, no matter how well the controller has been de-
signed. In many practical applications, sub-optimal control is quite satisfactory. The
reason why the main stream of control applications belongs to this scheme is that re-
sponsibilities are split: the (economic) optimality is put into the hands of the supervi-
sor, while the controller “only” needs to make sure that set-points are tracked and that
disturbances are rejected. Any measured disturbance can be accommodated by feed-
forward compensation in the controller.
A large range of controller designs is available, as outlined below, depending upon
the number of controls and observations and the available knowledge about the system
dynamics. The common classical single input-single output designs based on a linear
transfer function description belong to this family. But, also, the multiple input-
multiple output LQ-controller of Equation 6 based on the off-line optimization of
Equation 5 can be used in this frame. In that case, δx represents the deviation from a
predefined set-point, and δu is taken with respect to a pre-set nominal control value.
Model Predictive Receding Horizon Control with On-line Dynamic Optimization
This scheme is presented in Figure 4c. Instead of off-line optimization in conjunc-
tion with a compensator, on-line optimization is performed, which directly calculates
the controls needed in the forthcoming control interval, from an optimization of the
type outlined before over a specific horizon. The length of the horizon must be suffi-
ciently long as compared to the slowest time constant of the system. At the next inter-
val, the optimization is repeated, and the final time is “receded” [5, 6]. Receding hori-
zon schemes are particularly applicable when the disturbances are unpredictable but
can be measured, and when there are control and state constraints. A typical example
is greenhouse climate control. Because on-line optimizations must be carried out, this
method can only be applied to systems that are slow enough, compared to the required
computation time. With the ever-increasing speed of computers the class of potential
applications is growing.
On-line optimization can be avoided under the assumption of linearity in conjunc-
tion with a quadratic goal function. In those situations closed loop solutions can be
found, as for instance in generalized predictive control (GPC) [7]. It is clear that the
original optimality might be lost in this way. However, such designs can still be used
as the controller in the supervisory scheme of Figure 4, as an alternative to a classical
design or LQ design. The advantage of these model predictive feedback controllers
over classical feedback designs is that they can easily handle constraints.
3.2.7 Controller Paradigms and Design Methods
It is now possible to place the various control methods available in the literature in
perspective. The choice of a method is dictated essentially by the available informa-
tion about a problem. Important issues are:
3.2 Control and Optimization 133
1. Is the system already available, or should it still be designed? In the first case,
data-based modeling methods can be used. System identification tools are avail-
able to estimate linear dynamic models from the data [8, 9]. If the system is sup-
posed to be non-linear, neural-net modeling yields good results [10].
2. Is the model structure known, partially known, qualitatively known, or not
known at all? If the model structure is known, state space methods are superior.
Many controllers can be designed on the basis of transfer function information,
which can be derived from the state space model (possibly after linearization) or
from data [11, 12]. If only qualitative knowledge is available, fuzzy models may
be used. If nothing is known at all, nothing remains but trial and error, probably
best by starting with off-the-shelf PID control.
3. Are reliable model parameters available? If not, experiments for parameter esti-
mation are needed, or adaptive methods may be selected that obtain the parame-
ters “on the fly.” Experimental design and identifiability are important issues
here. The field of parameter estimation is related to regression, where methods
are available to assess the confidence in the estimates [13].
4. If disturbances are measurable, they can be incorporated by feed-forward con-
trol. A model is needed. Feed-forward control is particularly useful for measur-
able load variations, in order to compute the necessary pre-compensation in ad-
vance [12].
5. The states of a system may not be fully accessible. In that case they need to be
reconstructed by state estimation methods (e.g., observers, Kalman filter). For
linear systems, observability conditions can be checked. Additional sensors may
be needed.
6. For linear systems, a proposed scheme can be checked for controllability. A re-
arrangement of actuators and sensors may be required.
Table 2 summarizes the most important situations and associates to it the most suit-
able control methods. It is important to notice that the most important step in improv-
ing the performance of a controlled system is in improving the understanding of the
system, i.e., in improving the model.
Tables 3 to 6 give a non-exhaustive summary of a number of well-known control
methods. They are intended to help the reader in finding an appropriate method. Some
features are explained below.
The Model
In many methods, the model plays a key role, yet the final controller may or may
not explicitly contain the model itself. Sometimes the model is used only to find the
controller parameters, such as in PID control. Sometimes the model is used to derive
the controller form, such as in LQ and robust control, in conjunction with the control-
ler parameters, but in all these cases the model itself is not part of the controller. In
contrast, in optimizing control, including receding horizon model predictive control,
and some adaptive control schemes, the model itself is embedded in the controller
algorithm.
3.2 Control and Optimization 135
Table 6. Non-linear state space design methods and artificial intelligence design methods.
Non-Linear State Space Methods AI Methods
Type Optimal Control Non-Linear Control Neural Control Fuzzy Control
Neural net
Model Non-linear state space None
(NARMAX)
Typical Accurate general Non-linear systems Many data,
Prior qualitative
problem model; any goal with wide no mechanistic
knowledge
area function operating range model
MPC but with neural
Hamiltonian & Back stepping; Membership function
model or
Design method co-states or feedback tuning on available
direct mimicking
NL programming linearization behavior
desired behavior
Analysis Simulation Simulation
State & control
Result Non-linear FB Non-linear FB Fuzzy FB rules
trajectories
No need for Exploit human
Advantage Truly optimal
mechanistic model knowledge
Computationally
Drawback Requires many data Subject to prejudice
demanding
Usually in Many variants;
Theory under
Remarks combination with suitable for iterative Many variants
development
compensator learning
References [1, 22, 23] [17] [10] [18]
136 Chapter 3 Methods, Algorithms, and Software
seen as non-linear mappings. It should be noted that neural nets require lots of meas-
urement data to train the network, whereas the formulation of fuzzy rules requires
good qualitative knowledge about the system. In the control context, dynamic models
are needed, which can be achieved in two ways. In one approach the state equations
are as usual, but the right-hand side is approximated by a static neural net or fuzzy
model. In the other approach the dynamics are approximated by discretization in time,
and a mapping is sought between the output and past values of output and input. Apart
from these model-based approaches, it is also possible to have neural or fuzzy control-
lers. In these cases the controllers are tuned to obtain the desired behavior or to mimic
the control strategies of a skilled operator.
3.2.8 Methods for Dynamic Optimization
In the previous paragraphs optimality has been chosen as the fundamental idea. The
question remains how optimal patterns can be computed.
In what are called direct methods, the (discretized) control sequence is obtained by
common optimization techniques. Within this class, one group of methods is formed
by gradient algorithms using Newton’s method and variants. They are relatively fast,
but they require the gradients and may get stuck in a local minimum. To overcome
this, other algorithms have been developed, such as controlled random search [19] or
evolutionary algorithms, and in particular the relatively efficient differential evolution
method [20]. They are usually slower, but are better in locating a global optimum. The
second group of direct methods tries to find the optimal path by variants of dynamic
programming, which is basically a planning method. A fairly efficient method is itera-
tive dynamic programming [21].
Indirect methods exploit the dynamic nature of the problem. They work with ad-
joint variables and a Hamiltonian function. The conditions for a minimum are derived
from what are called the Euler-Lagrange equations [22]. An additional advantage is
that these methods provide insight in the sensitivity of the solutions. Depending upon
the various control, state and end constraints, there are many variants of this algorithm
(see, e.g., [23]).
References
1. Bryson, A. E., Jr. 1999. Dynamic Optimization. Menlo Park, CA: Addison-
Wesley.
2. Athans, M. 1971. The role and use of the stochastic linear-quadratic-gaussian
problem in control system design. IEEE Transactions on Automatic Control, AC-
16(6): 529-552.
3. Kwakernaak H., and R. Sivan. 1972. Linear Optimal Control Systems. New
York, NY: Wiley.
4. Lewis, F. L. 1986. Optimal Estimation. New York, NY: Wiley Interscience.
5. Mayne, D. Q., and H. Michalska. 1990. Receding horizon control of nonlinear
systems. IEEE Transactions on Automatic Control 35(7): 814-824.
138 Chapter 3 Methods, Algorithms, and Software
3.3.1 Introduction
The huge evolution of computer systems in the last decades has two equally impor-
tant components: hardware and software. The software development process has re-
ceived considerable attention because, compared to hardware, software costs have
increased significantly. Besides cost, software quality and its user-friendliness, to
mention a few aspects, have demanded a more professional approach. A few points
related to these issues are presented in this section, which covers operating systems,
software languages, object-oriented software modeling and its Unified Modeling Lan-
guage, Extensible Markup Language for data interchange, and open software concepts.
3.3.2 Operating Systems
Each computer needs some type of system software developed to make it more
convenient to use. This system, called its operating system (OS), is responsible for
controlling the hardware and software resources of a computer, allowing the applica-
tion programs to handle them properly to perform their specific tasks. The user’s pro-
gram is not allowed to directly control the systems resources (processor, memory, and
peripherals); instead, multiple programs must share standard software drivers incorpo-
rated in the operating system. Due to that, OSs can be developed with different fea-
tures, according to the application requirements, e.g., some are designed to simplify
140 Chapter 3 Methods, Algorithms, and Software
computer use while other OSs are complex to make it more efficient, e.g., support
multitasking, multi-user, or use in real-time applications.
The evolution of the OSs has been very important for microcomputer success. The
following items show the most important OSs used in microcomputers nowadays [1].
Desktop Computer OS
The DOS (Disc Operating System) has been used for many years, since personal
computers—PCs—were launched. Due to its command-oriented interface it is not
intuitive and quite difficult for most users. In the 1970s the first graphical user inter-
face (GUI) was invented at Xerox Corporation’s Palo Alto Research Center. The
WIMP or Windows, Icons, Menus, and Pointing device was one of the GUIs available
at that time and it was called a WIMP GUI; nowadays they are all known simply by
the acronym GUI. In a GUI, pictures and graphic symbols represent the commands,
choices, or actions. Another common aspect of those interfaces is the desktop meta-
phor: the area on the display screen where icons are grouped is referred to as the desk-
top because the icons are intended to represent real objects on a real desktop. The
desktop is divided into windows, and in each of them different programs can be exe-
cuted or different files can be displayed.
Well-designed graphical user interfaces can free the user from learning complex
command languages, although some expert users feel that they work more effectively
with a command-driven interface. They started to become commercially important
since 1983, firstly with Mac OS from Apple, and then with Microsoft Windows,
which has become the most popular for IBM-compatible microcomputers.
The next items show some steps of the evolution and the main characteristics of the
most popular OSs for desktop computers.
Mac OS
The Apple Mac OS was developed to be simple and intuitive and has fewer com-
patibility and configuration problems. Many consumers consider Mac OS much better
than Windows because they find it easier to use, more attractive, and more productive,
especially when it comes to multimedia editing; it is also less subject to viruses and
other malicious code attacks. Nevertheless it is only used in less than ten percent of
computers. In spite of the fact that Mac OS has been continually improved its core has
not changed substantially since its first version. However, it has limited multitask ca-
pabilities and presents difficulties in running multiple large applications at the same
time.
An important update has been created, called Mac OS X, with many new features
and enhancements. Its kernel, called Darwin, is based on BSD UNIX, the version of
UNIX developed at the University of California, Berkeley. Unlike Microsoft, Apple
has decided to adopt the idea of open source code so that it can encourage the commu-
nity to submit modifications and enhancements, and it will also be easier to customize
Mac OS X to meet specific needs. The main features of the Mac OS X are a virtual
memory manager, no interference between application programs, a robust multitask-
ing, availability of most of the services the users need, and significant Internet re-
sources. Its appearance is quite different providing a more photo-realistic look to the
3.3 Topics on Software Evolution 141
desktop. A lot of other interesting features are provided, although an important di-
lemma must be solved: few products based on Mac are available, usually with high
performance but high cost and so only a small community uses Mac, compared to
other OSs. The attempts to increase its market share have not been successful; maybe
the release of the Mac Mini, a fairly powerful computer with a low price, can help
change that picture.
Windows
The first versions of Windows were indeed an operating environment, requiring
MS-DOS to be loaded on first. Later MS-DOS functions were incorporated to Win-
dows.
Although more than 90% of PCs run on a Windows version, it has frequently re-
ceived criticism, mainly due to its performance and faults. Compatibility with MS-
DOS and the desire to gain market quickly are the main reasons of the problems. The
Windows 95 version starts to solve these problems: its Plug and Play feature, although
not always perfect, is an important evolution to usability of Windows PCs. But one of
Windows’ main advantages is the huge number of programs developed for it.
Other specialized versions of Windows that are available now include:
• Windows ME (Millennium Edition)—This is a consumer-oriented OS, designed
to perform the wide variety of functions that people expect from PCs, such as
home and office applications, simple networking and connectivity to the Inter-
net, playing games, and multimedia applications (listening to MP3s, editing
movies, etc). Based on the 9x kernel, an essential OS part that must be efficient,
ME probably will be the last version to support MS-DOS programs. ME’s more
interesting new features are the System Restore tool, to return to a previous state
after an important problem, the Auto Update, to check for critical OS updates
and fixes (and download them) whenever the PC is connected to the Internet,
and facilities for home networking and multimedia.
• Windows NT and Windows 2000—These products compete in the corporate
server and workstation market that requires a reliable and secure environment.
New Technology (NT) was designed as a 32-bit OS, unlike Windows 9x. Win-
dows 2000 is an evolution of NT with the same reliability, but with the flexibil-
ity of a consumer OS. It is faster and more reliable than Windows 9x, reducing
the chance that one software application interferes in another. Also, connecting
to network resources is easier. Unlike NT, Windows 2000 offers full support for
Plug and Play, supporting a large number of products (although a smaller num-
ber than ME).
• Windows XP—The intention in creating Windows XP was to provide a more
stable and simplified computing environment by merging Windows 2000’s sta-
bility with ME’s versatility, and also simplifying software and hardware devel-
opment. It is upgradeable from Windows 98/ME/2000. Two versions are avail-
able: Windows XP Professional for business and power users and Windows XP
Home Edition for consumers. Besides ME features, many others were added: PC
quick resume operation with low power consumption in standby mode, many
142 Chapter 3 Methods, Algorithms, and Software
cations, such as MP3 and video playback. The availability of popular applica-
tions such as Word, Excel, and Outlook is also very attractive to users. How-
ever, Windows CE requires more memory and the battery consumption is high.
The new versions are similar to Windows XP, and have Internet Explorer, net-
work access, MSN Messenger, a new look based on the Windows XP desktop.
Other OSs are available, such as Epoc in Psion PDAs, and recently the development of
Linux-based PDAs has been encouraged: Agenda Computing, Samsung and Sharp are
examples.
Real-Time Operational Systems and Embedded Software
Many applications interact with environments that have time-varying properties
and require predictable time-dependent responses. So the systems designed for these
applications and their software must not only produce correctly calculated responses,
but also exhibit predictable time-dependent behavior regardless of the system load and
other conditions. They are called Real-Time Systems (RTS) [5, 6].
There are some important features that are desirable in RTS, e.g., time manage-
ment, compliance with the tasks time requirement, time response predictability, and
support peak load. RTS are frequently used in critical applications, such as industrial
process, aircraft, and automotive, so fault tolerance is also a very important feature.
They are not necessarily faster than a standard OS, but they must have a predictable
time response, independently of whether or not the external stimuli are predictable.
The processor’s interrupt mechanisms can be used to assure predictability in many
cases. The tasks scheduling must be carefully designed for RTOS, assigning the more
suitable tasks to processors at each time. The pre-emptive priority scheduling should
be supported, allowing a task to be interrupted at any point so that a more important
task is able to gain the processor immediately. Table 1 has some examples of RTOS.
Nowadays there are many electronic products based on microprocessors (the em-
bedded system or ES) such as cellular telephones, MP3 players, automobile devices,
computer peripherals, and many others. Embedded software is used to control them,
and it must be compact, efficient, reliable, and precise in treating inputs and outputs.
With the advent of systems-on-silicon, RTOSs will be a component of ESs, expanding
their applications even more.
or applications design, and still others can produce stand-alone applications for distri-
bution.
Rapid application development (RAD) tools such as Visual Basic, Visual C, Delphi,
Kylix and others are powerful languages that help develop software in a higher level
of abstraction. They are based on textual languages, which use a graphical GUI builder
to make programming interfaces easier. They are especially interesting for developing
Windows interfaces on event-driven programs as they invoke fragments of code when
the user performs certain operations on graphical objects on-screen. They have been
widely used for in-house application program development and for prototyping. Proto-
types can be powerful tools for eliciting requirements with the user, especially con-
cerning man-machine interfaces and basic functionalities. When used at early phases
of the development process, they can improve software quality and reduce the devel-
opment costs; however, it is claimed that code maintainability is poor. In any case,
they let users put more effort into solving their particular problems rather than learning
about a programming language. Their success and the increasing complexity of appli-
cations, including web and distributed systems, promoted the development of new and
more complete platforms such as .NET and J2.
.NET
.NET is a new programming model from Microsoft that incorporates almost every-
thing related to the Windows environment and the facilities of the Internet [8]. It is
claimed to have been developed from the ground up, based on a completely new
framework, within which most programming tasks can be easily accomplished.
What could be done on Windows, including data access, windowing, connecting to
the Internet, and much of the functionality of the Win32 API, is now mostly accessible
through a very simple object model. The VB language has been widely upgraded, so it
now includes classes and most of the features previously accessible in C++. A new
language, C Sharp or C#, has been introduced, which combines the efficiency of C++
with some of the ease of development of VB. Memory management for .NET applica-
tions is much more sophisticated, meaning that a badly behaved .NET component is
extremely unlikely to crash other components running in the same process. ASP.NET
has replaced ASP (Active Server Pages). ASP is a specification for a dynamically cre-
ated web page with the extension .ASP, which utilizes ActiveX scripting—usually VB
Script or Jscript code. When a browser requests an ASP page, the web server gener-
ates a page with HTML code and sends it back to the browser. ASPs are thus similar
to CGI scripts, but they allow Visual Basic programmers to work with familiar tools.
The new ASP.NET offers compiled web pages (making processing of web requests
much more efficient) and includes a large number of pre-written components that can
generate commonly used HTML form and user-interface items. The main program-
ming languages have been moved far closer together, so code written in VB, C++, and
C# can be intermixed. Components are wrapped up in a new unit called an assembly,
which is highly self-describing, making installation and use of components very easy.
The most significant aspect of the .NET architecture is that code in VB and C# is
compiled not to native executable code, but to an Intermediate Language (IL), with the
146 Chapter 3 Methods, Algorithms, and Software
final step of converting to native executable normally happening at runtime. Such code
is termed Managed Code C++. This makes your C++ code interoperable with VB and
C# and allows you to take advantage of all the .NET features, but does restrict you to
not using some features of C++ (such as multiple inheritance) that are not supported
on .NET.
Java
Another popular programming language is Java and its new Platform Editions. Java
is designed to solve a number of problems in modern programming processes. Java is
a simple, object-oriented, network-ready, interpreted, robust, secure, architecture-
independent, portable, high-performance, multithreaded and dynamic language. Java
omits many rarely used, poorly understood, confusing structure features of languages
such as C++. Another aspect of being simple is being small to enable the construction
of software that can run stand-alone in small machines. Also, the Java interpreter and
standard libraries have a small footprint, i.e., they require a small amount of disk or
memory space. Java is object-oriented, which is very powerful because it facilitates
the clean definition of interfaces and makes it possible to provide reusable software to
dynamically link to C++ facilities. Java has an extensive library of routines for coping
easily with TCP/IP protocols such as HTTP and FTP. This makes creating network
connections easier and applications that open and access objects across the web via
URLs in the same way as when accessing a local file system. With Java, the same ver-
sion of one application runs on all platforms. The use of general bytecode instructions,
which are not related to one specific computer architecture, makes the application
portable. Surprisingly, Java has high performance. This is achieved by translating
bytecode on the fly (at runtime) into machine code for the particular CPU the applica-
tion is running on.
Recently, Sun Microsystems redefined the architecture for the Java platform, now
named Java 2 [9]. Three products are part of the Java 2 Platform: Standard Edition
(J2SE), Enterprise Edition (J2EE), and Micro Edition (J2ME). Each of these editions
is composed of a Java virtual machine (JVM), Java programming language, technolo-
gies and features that are core to each product.
J2SE is based on JavaOne, optimized to run on individual desktops and worksta-
tions, which includes the Java Foundation Classes (JFC) API, Java plug-in software,
international support, support for operability across heterogeneous environments, a 2D
API, a new security model, and the Java HotSpot performance engine.
Building on the J2SE base, J2EE adds full support for Enterprise JavaBeans com-
ponents, Java Servlets API, JavaServer Pages, and XML technology. The J2EE stan-
dard includes complete specifications to ensure portability of applications across the
wide range of existing enterprise systems capable of supporting J2EE.
J2ME is a runtime environment optimized for very small and limited-memory de-
vices, such as cellular phones, pagers, personal digital assistants, screenphones, digital
kiosks, and automobile systems. J2ME’s key component is the tiny-footprint K virtual
machine (KVM). The most important thing about it is the connectivity of small de-
vices with desktop and large enterprise systems.
3.3 Topics on Software Evolution 147
and relationships that depict different views of a system. The diversity of diagrams
(there are nine different types) is useful to stress different aspects of a system that oth-
erwise would not be clear. The nine types of diagrams are: class, object, use case, se-
quence, collaboration, state chart, activity, component, and deployment diagram. UML
rules define semantics for building the models in order to obtain well-formed models,
which are self-consistent and harmonized with all related models. An example is the
semantic rules for names (for things and relationships).
UML is a recent language but is increasingly used as it provides powerful, coherent
and comprehensive concepts and notation for software modeling. In addition, many
commercial tools support it and facilitate its use and integration with other software
development tools. However, it is only a part (though an important part) of the soft-
ware engineering process that must be used with a well-defined method. It is particu-
larly adequate in a process that is use case-driven, architecture-based, iterative and
incremental. Current information about UML and its formal specifications can be
found at www.omg.org.
3.3.5 XML: The Universal Language for Data
XML or Extensible Markup Language can be summarized as a new tag-based lan-
guage for describing data. It is a subset of the Standard Generalized Markup Lan-
guage (SGML), a complex standard for describing document structure and content. It
is a non-proprietary specification, supervised by the XML Working Group of the
World Wide Web Consortium (W3C) [11], which was issued in 1998 as a recommen-
dation, but that is still evolving with new functionalities and characteristics.
It is a metalanguage (a language for describing other languages), which means that
it can be used to define customized markup languages for specific application domains
(precision agriculture, for example) and their document classes. It is not a language for
presenting data (such as HTML, Hypertext Markup Language), but for organizing
data.
SGML, HTML, and XML
SGML, an ISO standard (ISO 8879), was published in 1986 and introduced a for-
mat for embedding descriptive markup in documents and a method for the description
of document structure. It allows the creation of hardware- and software-independent
documents and supports a range of document structures. However, as it is very gen-
eral, it is also complex, and it is difficult to implement processing programs with it.
HTML is an application of SGML that uses a very limited subset of tags that con-
form to a single SGML specification aimed at the presentation of data. Its simplicity
led it to be the web-publishing language, but its fixed formats restrict its usefulness.
XML takes only the most important and simple features of SGML so it is more un-
derstandable and easier to use for application development, and it is more adequate for
delivery and interoperability over the web. Compared to HTML, XML focuses on the
content of the document, adding context and meaning to data. It allows the user to
define the tags to be used so that data can be represented logically and in a structured
way.
150 Chapter 3 Methods, Algorithms, and Software
Its simplicity comes from its simple rules for creating a markup language to encap-
sulate data. The tags, which usually come in pairs, describe the information contained
in the document, so it becomes almost self-describing. XML data is stored between
tags as plain text, so it can be edited with any standard text editor. It supports the Uni-
code Standard, a character-encoding standard that supports all major languages, and
that lets XML accept virtually all the world characters. This is especially interesting
for developing applications that can be accessed by diverse cultures and nations. Its
documents have a rooted tree structure, which is powerful to represent complex data
for many applications, yet is easy to manipulate and to implement processing pro-
grams.
A document type definition (DTD) defines a grammar, or an XML document legal
structure, specifying the tags that are available, where they may occur, and how to use
them together. Although it is not compulsory to use a DTD it enables checking tags
for validity and standardization on document naming and construction. An improve-
ment to DTDs is the XML Schema language, which specifies the valid structure, con-
straints, and data types for the various elements and attributes of an XML document.
Data typing is an important new feature, as it allows enforcement of proper syntax and
semantics in XML documents, instead of treating all data as plain text.
In order to use XML documents an application program needs a parser, a software
module that acts as an interface between them, reading the documents and giving the
application program access to the document content and structure. Usually, there are
parsers available in different languages and they are free.
Applications and Limitations
XML separates content definition from display instructions, so the same content
can easily be used for different platforms or devices such as personal computers, per-
sonal digital assistants (PDAs), and cell phones, by using different style sheets (in
Extensible Stylesheet Language, or XSL) to the same document.
Because XML documents also store meta-information (information about informa-
tion) online information search and retrieval are easier.
One of the main points of XML is its potential for information exchange between
different platforms and organizations. Since it is text-based all platforms can under-
stand it very easily. It is claimed to be the universal language for data description. It
also allows the definition of specific XML syntax between partners (for instance, agri-
cultural organizations or companies) automating information transfer across the Inter-
net.
However, XML is probably not a good choice for stand-alone systems. It does not
provide any security features by itself, which can be a problem in a public network
like the Internet unless some extra mechanism is used (cryptography, digital signa-
tures, etc.).
Finally, standard vocabularies or tag sets are missing and must be developed, at
least for specific industries (this is already occurring) to avoid misinterpretation of
data. An example is the development of GML, or Geography Markup Language,
which is an XML encoding for the transport and storage of geographic information,
3.3 Topics on Software Evolution 151
including both the geometry and properties of geographic features (see http://www.
opengeospatial.org/specs/).
XML is becoming an important language for many applications and will probably
change the way information is used and delivered, especially on the Internet. It is still
in progress and new characteristics are being incorporated. More information on XML
can be found at http://www.w3.org/XML.
3.3.6 Open-Source Software
In a world where a few companies threaten to dominate software and the Internet,
the strongest rival to their dominance is the collection of free software tools and oper-
ating systems collectively called open-source software (OSS). Unlike most commer-
cial software, the core code of such software can be easily customized, modified, and
improved.
Open source does not just mean access to the source code. The distribution terms of
open source software must comply with the following criteria [12].
1. Free redistribution—The license shall not restrict any party from selling or giv-
ing away the software as a component of an aggregate software distribution con-
taining programs from several different sources.
2. Source code—The program must include source code, and must allow distribu-
tion in source code as well as compiled form.
3. Derived works—The license must allow modifications and derived works, dis-
tributed under the same terms as the license of the original software.
4. Integrity of the author’s source code—The license may restrict source code from
being distributed only in original form. Path files, derived, or modified codes
must be distributed with their own license.
5. No discrimination against persons or groups—The license must not discriminate
against any person or group of persons.
6. No discrimination against fields of endeavor—The license must not restrict any-
one from making use of the program in a specific field of endeavor.
7. Distribution of license—The rights attached to the program must apply to all of
those to whom the program is redistributed without the need for execution of an
additional license by those parties.
8. The license must not be specific to a product—The rights attached to the pro-
gram must not depend on the program being part of a particular software distri-
bution.
9. The license must not restrict other software—The license must not place restric-
tions on other software that is distributed along with the licensed software.
Nowadays, there are many products that can be considered OSS and many groups
dedicated to its development [13]. The most popular one is the Linux Operating Sys-
tem and its related application software. Most UNIX-based operating systems observe
the OSS features, which make UNIX sometimes be considered the father of OSS. The
Internet is full of open-source software in heavy commercial use: Apache, which runs
over half of the world’s web servers; Perl, which is the engine of the World Wide
152 Chapter 3 Methods, Algorithms, and Software
Web; and others such as FreeBSD, MySQL, Kylix, Java, and more recently, Open
Office which is similar to MS Office.
Open-source software might seem to be a valorous fighter against world software
monopoly. In 2002, the market share of Linux for web server OSs went from 29% to
34% while the market share for all other major OSs declined, including Windows, Sun
Solaris, and other UNIX variants [14].
Some market researchers forecast that in the next few years, open source and free
software will become the standard in operating systems, as well as in much of the
commodity software in widespread use. It also means that one of two things may hap-
pen: either most computer users will switch to a new operating system and commodity
software, or Microsoft will release Windows under an Open Source license. If people
switch operating systems, it seems as though the most probable target is Linux, and
other possibilities are FreeBSD or an Open Source release of BeOS. Regarding the
second possibility, Microsoft might recognize the new trend and jump on it or it may
have to do so because of the strong switch of users to OSSs. Most people see both
scenarios as unlikely but there is no doubt that the OSSs are gaining popularity and
market share.
3.3.7 Conclusion
There is no doubt that software has evolved very much during the last decades,
bringing more comprehensive and friendlier programs to a much wider public and
applied to a wider range of applications, some very complex and demanding. How-
ever, there are many issues that must be further addressed, related to methodology,
technology, compatibility, intellectual properties, and cost, among others.
Software became a very important industry and this is one of the main changes that
occurred from the early days of programming. As a worldwide business involving
huge resources it attracts many people and companies interested in that market. On the
other side, a growing free software community interested on sharing knowledge and
information proposes an alternative scenario to what may be the future of software.
References
1. Davis, S. J., J. MacCrisken, and K. M. Murphy. 1999. The evolution of the PC
operating system: An economic analysis of software design. Accessed on March
03, 2005, from http://gsbwww.uchicago.edu/fac/steven.davis/research/.
2. Hansmann, U., L. Merk, M. S. Niklous, and T. Stober. 2003. Pervasive
Computing, 2nd ed. New York, NY: Springer Professional Computing.
3. The Palm computing platform: An overview. Accessed on March 03, 2005, from
http://www.wirelessdevnet.com/channels/pda/training/palmoverview.html.
4. Microsoft Windows CE: An overview. Accessed on March 03, 2005, from
http://www.wirelessdevnet.com/channels/pda/training/winceoverview.html.
5. Li, Y., M. Potkonjak, and W. Wolf. 1997. Real-time operating systems for
embedded computing. International Conference on Computer Design (ICCD
’97).
3.4 Artificial Intelligence Methodologies 153
knowledge and procedures to draw conclusions from this knowledge. In doing so, we
have to place special emphasis on uncertainty and vagueness and methods to handle
these phenomena, because most of our knowledge is neither certain nor precise.
Formal Logic
If we want to convey information to someone else or if we want to carefully ana-
lyze the arguments for and against a decision we have to make, we need to express the
information or the arguments in a language. For this purpose, the language used must
exhibit a certain structure. Of course, all natural languages, like English, French, Ger-
man, etc., possess the necessary structure. However, natural language is most often not
precise enough for computer representations, because in human communication the
context in which some statement is made as well as common knowledge is implicitly
drawn upon to fix the interpretation of certain words. In addition, natural language is
very flexible, and allows for expressing the same thing in many different ways. Al-
though this is surely an advantage in human communication, it turns out to be a hin-
drance for representing knowledge in a computer, because it can make it very difficult
to check automatically whether two statements say the same thing.
Therefore, in artificial intelligence, formalized languages are used, in which the
meaning of each term is precisely defined and which only exhibit the core structure
needed for reasoning. These special languages are studied in the area of (formal) logic:
Logic describes the core structure of argumentative languages, i.e., of languages in
which one can argue. Formal logic reduces the complexity and ambiguity of natural
language to a level on which it becomes possible to manage knowledge in a computer.
There are different logical calculi, depending on how far the complexity reduction
goes. The most basic and simple system is propositional logic. It describes how the
truth value of combined statements depends on the truth values of the basic statements
they consist of. The basic statements considered are simple propositions like “Hanni-
bal is a dog.” Although such propositions have a structure, this structure is neglected.
They are represented by simple truth variables, which can take one of the values true
or false. Truth variables may be combined with logical connectives such as and, or,
not, if ...then ..., etc. Propositional logic states, for instance, that the statement “A or B”
is false if and only if both A and B are false. This truth functionality allows us to draw
simple inferences. If we know, for instance, that the combined statement “A or B” is
true and then find out that A is false, we can infer that B must be true.
However, propositional logic is not powerful enough for most applications, mainly
because it neglects the structure of the basic propositions. Therefore an extension of
propositional logic, namely (first order) predicate logic is frequently drawn upon. Like
propositional logic it models the truth functionality of logical connectives. In addition
it captures (part of) the structure of the basic statements using constants, variables,
functions, and predicates. Constants denote specific objects, basically like names in
natural language. The constant “John Steinbeck,” for instance, refers to a specific
American writer. Variables also reference objects, but in an unspecific way in order to
make universal or existence statements in connection with the so-called quantifiers
“For all ...” and “There exists ....” Functions model indirect characterizations, i.e., the
3.4 Artificial Intelligence Methodologies 155
reference to an object through other objects. For instance, we may refer to John Stein-
beck also as “author(‘The Grapes of Wrath’),” where “author” is a function that is
applied to the constant “The Grapes of Wrath.” Finally, predicates are used to assign
properties to objects or to describe relations between objects. For instance, the fact that
John Steinbeck wrote the novel “The Grapes of Wrath” in 1939 can be expressed as
“wrote(‘John Steinbeck’, ‘The Grapes of Wrath’, 1939),” where “wrote” is a predicate
and its arguments are constants. Note that predicate expressions can be true or false
and thus correspond to the truth variables of propositional logic.
With predicate calculus more general inferences become possible. For example, us-
ing the fact that all humans are mortal, expressed as “For all x: if x is a human, then x
is mortal” we can infer from the fact that Socrates is a human, that Socrates must be
mortal. That is, with predicate calculus we can reason conveniently about properties of
and relations between objects.
In artificial intelligence predicate logic is used, for instance, in automatic theorem
provers, which can reason about situations described by logical formulae and can an-
swer questions about these situations. There are also special programming languages
like PROLOG that are based on specific subsets of predicate logic and are very power-
ful tools for artificial intelligence applications.
Further extensions of logic capture even more of the features of natural languages:
temporal logic, for example, makes it possible to reason about time and takes care of
changes of truth values in time (“The sun is shining” may be true today, but false to-
morrow). Modal logic allows us to label statements as possible or necessary and thus
provides the means for a−though limited−treatment of uncertainty. However, discuss-
ing these extensions of standard (first order) predicate calculus is beyond the scope of
this section. More details about different logical calculi and their application can be
found, for instance, in [3-5].
Rule-Based Systems
Rules, i.e. if-then statements, are a very convenient representation of many forms of
knowledge, mainly because they are generally regarded as easily understandable.
Rules can be used to specify inferences that can be drawn or to associate actions with
conditions under which they have to be carried out. Therefore, rules are the most
popular building blocks of expert or decision support systems as well as knowledge-
based controllers. They seem to provide one of the best ways to reconcile human
thinking and the abstraction and formal precision needed for computer representations.
A rule-based system usually consists of a knowledge base, which contains the rules
representing the knowledge about the application domain; an inference engine, which
checks which rules can be applied and carries out inferences or triggers actions; and an
application interface, through which a user can communicate with the system or
through which sensor information is fed into the controller. A decision support system
may also comprise an explanation component, which generates justifications for the
suggestions made by the system. This is necessary, because the responsibility rests, of
course, with the human decision maker, who would like to know the reasons for a de-
cision in order to be able to make it conscientiously. In a knowledge-based controller
156 Chapter 3 Methods, Algorithms, and Software
there may also be state variables as an additional component, which can be used to
make the behavior of the controller dependent on the preceding control state, i.e., to
program control strategies that can take care of, for instance, time lags.
Uncertain Knowledge
Human expert knowledge is rarely certain. Most rules are only generally true and
have exceptions. The best-known example is, of course, “All birds can fly.” Although
most birds can fly, there are some that cannot, such as penguins and ostriches. So if we
infer from “Tweety is a bird” that “Tweety can fly” we may be mistaken.
However, the fact that most knowledge is uncertain does not mean that it is useless.
In medicine, for example, certain symptoms point often−but not always−towards cer-
tain diseases. Therefore a physician is frequently in a situation in which he cannot
infer the disease with certainty, but nevertheless his diagnoses are most often correct.
Unfortunately, rules that are only generally true, but have exceptions, cannot be
handled easily with pure logic-based approaches. The problem is that in classical logic
a statement must be either true or false. Since a “generally true” statement is, in a strict
sense, neither true nor false, it cannot be used without introducing consistency prob-
lems. With modal logic, which we mentioned above, it is possible to label “generally
true” rules as usable as long as there is no explicit information to the contrary. How-
ever, it lacks sufficiently powerful means to express and infer preferences between
conclusions drawn from only “generally true” knowledge.
The most natural approach to handle uncertain knowledge is to soften the notion of
truth by introducing additional truth values between true and false. For instance, one
may introduce “maybe” levels or one may use real numbers between 0 and 1, identify-
ing 0 with false and 1 with true. Enhancing rule-based systems with certainty factors,
which specify the reliability of rules, and combination rules for these factors, have
been successful in specific applications like the diagnosis of bacteriogenic infections.
Unfortunately, however, they have been shown to lead to inconsistent results in the
general case. The main problems are implicit independence assumptions, which are
hidden in this approach and which are not satisfied in many applications.
A more promising approach to handle uncertainty, which has gained considerable
popularity in recent years, is Bayesian networks. The basic idea underlying Bayesian
networks is to make the dependence and independence relations (which hold between
the attributes used to describe the domain of interest) explicit, thus avoiding the prob-
lems that result from implicit independence assumptions. The dependence and inde-
pendence relations are encoded with a help of a graph, in which each node represents
an attribute and each edge a direct dependence between attributes. From this graph the
valid independences can be obtained with simple graph theory criteria. The graph also
specifies how the (probabilistic) knowledge about the domain can be decomposed. As
a consequence it prescribes the paths of probabilistic inference, so that mathematically
sound and efficient evidence propagation methods can be derived. However, the
mathematics of this decomposition and inference process are much too involved to be
discussed here. An interested reader can find a detailed treatment of Bayesian net-
works and related approaches in, for instance, [6, 7].
3.4 Artificial Intelligence Methodologies 157
Vague Knowledge
Human expert knowledge is not only uncertain, but also often vague. While uncer-
tainty means that we cannot decide which of a set of (precisely defined) possible alter-
native statements correctly describes the obtaining situation, usually because we are
lacking information, vagueness refers to situations in which the meaning or the appli-
cability of a statement is in doubt. Vagueness results from the fact that the words of
natural language usually have no precisely bounded domain of application. Although
there are some situations in which they are surely applicable and others in which they
are definitely not, there is a penumbra in between, where their applicability is not
clear. For instance, 35°C is surely hot weather, while 10°C is surely not. But what
about 25°C? Obviously, there is no precise boundary (a specific temperature) that
separates the temperatures to which the term “hot” is applicable from those to which it
is not, so that a decision is somewhat arbitrary.
The idea underlying fuzzy set theory and fuzzy logic is to describe the domain of
applicability of a linguistic term like “hot” not by a crisp set (which would presuppose
a sharp boundary), but by a fuzzy set, which has a soft or “fuzzy” boundary. This soft
boundary is brought about by allowing for gradual membership in a set, which is de-
scribed by a real number between 0 and 1. Zero means that an element is not contained
in the set, 1 that it is contained without restriction. With such a gradual membership,
linguistic terms like “cool,” “warm,” or “hot” may be interpreted as shown in Figure 1.
Temperatures between 0°C and 10°C are definitely cool and neither warm nor hot.
The temperatures between 10°C and 20°C, however, may be described as cool or
warm, but with different degrees of membership. The lower the temperature, the
higher the degree of membership for “cool” and the higher the temperature, the higher
the degree of membership for “warm.” Of course, the exact shape of the membership
functions depend on the application.
The main advantage of fuzzy logic and fuzzy set theory is that they allow us to express
our knowledge in rules that use linguistic terms, which are interpreted very intuitively
with membership functions. They also provide very convenient means to interpolate
between such rules if more than one rule is applicable. This is the reason why fuzzy
logic is very popular in knowledge-based control: it enables a human expert to con-
struct a controller by specifying some key input/output relations through linguistic
rules, between which the fuzzy logic inference engine then interpolates to complete
the control function. Fuzzy logic controllers have had enormous commercial success
and can be found today in many household appliances. An extensive treatment of
fuzzy theory and its applications is in, for example, [8-10].
Figure 1. Fuzzy sets for the linguistic terms “cool,” “warm,” and “hot.”
158 Chapter 3 Methods, Algorithms, and Software
Knowledge Acquisition
Unfortunately, most knowledge needed in artificial intelligence systems is “buried”
in human minds, and it has turned out to be very difficult to elucidate it in such a way
that it can be fed into a computer. For instance, after an agricultural expert has looked
at the landscape, has examined the soil, and has checked the weather in the region, he
may be able to tell what crop the area is best suited for. But he may not be able to fully
explain his reasoning, let alone state it as clear and simple rules. In other words, know-
ing something is not the same as teaching it−and we have to teach computers how to
do certain things if we want them to help us.
As a consequence, knowledge acquisition is an important area of artificial intelli-
gence. It is mainly concerned with the problem of how to get from a human expert the
knowledge that is needed to automate processes and to provide computer-based deci-
sion support. As such, it is sometimes more a part of psychology than of computer
science, because it involves interviewing human experts, setting up questionnaires,
mapping the domain-specific terms and notions to computer-processable functions and
predicates, etc. Since knowledge acquisition can be a very tedious and time-
consuming process, recent research, supported by considerable advances in computing
power, storage media, and sensor technology, has focused on learning from data, or, as
we may say, by observing what the human expert does and trying to find automatically
the rules that are needed to imitate him (see Section 3.4.3).
Case-Based Reasoning
A popular way to bypass the knowledge acquisition bottleneck is case-based rea-
soning. It draws on the idea that human beings often assess a new situation by compar-
ing it to situations they (or their fellow humans) experienced in the past, and then act
in a similar way as they acted in the most similar situations they have encountered or
heard about. As a consequence, in a case-based reasoning system, the knowledge is
embodied in a library of past cases, rather than being encoded in rules. Each entry in
the library describes a case or a problem together with its outcome or its solution, re-
spectively. Thus the knowledge or reasoning involved in deriving the outcome or solu-
tion is not made explicit, but is left implicit in the case descriptions. Therefore it is
sometimes also called lazy learning, as all knowledge acquisition consists in simply
storing example cases in a library.
Reasoning in case-based systems consists basically in retrieving those cases from
the library that are most similar to the currently investigated case. It is then conjec-
tured that the outcome of these cases or their solutions are also applicable to the new
case. By trial applications of the solutions it is then determined whether they have to
be revised or adapted in order to fit the current problem. Of course, after the outcome
for the current case or the solution to the current problem has been found, it is added
in a new case description to the library. A simple special form of cased-based reason-
ing is k-nearest neighbor classification. Based on a distance measure, which encodes
the similarity of cases, the k closest examples in the library are retrieved and the class
of the new case is predicted by a simple majority vote of these cases.
3.4 Artificial Intelligence Methodologies 159
This is, of course, a toy example. State graphs for real world problems tend to be
much larger, containing several hundred or even several thousand states. As a conse-
quence sometimes it becomes impossible to construct the full state graph, as shown in
this example. In such a case implicit representations of a state graph are used, consist-
ing of an initial state, operations to construct the states that can be reached directly
from a given state, and a function to identify goal states.
Search in State Graphs
State graphs are only representations of planning problems. They do not provide
solutions immediately, as we still have to search for a path from the initial state to the
goal state. In doing so we may have to take into account additional constraints; for
example, each action may incur a cost and we may desire a minimum-cost solution.
Straightforward solutions to the path search problem are the breadth-first search or
its more general version the uniform-cost search, as well as the depth-first search. In
breadth-first and uniform-cost searches the states are visited in the order of increasing
distance (measured as the number of edges) or increasing cost, respectively, from the
initial state. In depth-first searches one always proceeds from the state that has been
visited most recently, until a depth limit, specified as a number of edges, is reached. At
this limit, or if no new states can be reached, the search backtracks to a state visited
earlier, from which new states can be reached. The advantage of breadth-first (uni-
form-cost) search is that it always finds the solution that needs a minimum number of
actions (causes minimum costs). However, it has the disadvantage that a lot of states
have to be processed in parallel, so that a program proceeding in this way can con-
sume a lot of computer memory. The advantage of the depth-first search is that it is
much more memory-efficient, but it may not find the optimal solution if the search is
terminated as soon as a goal state is reached. In addition, it may fail to find a solution
if all goal states lie beyond the “search horizon” that is defined by the depth limit.
All these approaches are called uninformed search, because none of them uses spe-
cial knowledge about the application domain to guide the search. In practice, however,
it is often indispensable to exploit such domain-specific knowledge to guide the
search, because otherwise the search would take prohibitively long. For instance, if we
searched for the shortest route from town A to city B with a breadth-first search on the
road network, treating each town or even each crossing as a state, the search explodes,
except when A and B are very close together. Instead we use the useful heuristics to
start the search by following only roads that emanate from A in the direction of city B
or lead to the nearest highway. That is, we restrict the search to the most promising
alternatives, considering other alternatives only after a failure.
A general method that makes use of such heuristics is the A* algorithm. It is based
on two functions g and h, which assess states. Function g measures the costs it takes to
reach a state from the initial state. These costs are usually estimated from the best path
that has been found up to now. Function h measures the costs needed to reach a goal
state from a state. It is estimated by a problem-specific heuristic function. In a naviga-
tion problem like the one discussed above, we may simply use the straight-line dis-
tance between a town or crossing and city B. The A* algorithm prescribes continuing
3.4 Artificial Intelligence Methodologies 161
the search from the state for which the sum of the estimates of the two functions is
minimal. This has the advantage that a path to the goal state can be found in consid-
erably fewer steps, provided the heuristic estimate is appropriate. A nice property of
the A* algorithm is that it is guaranteed to find an optimal (i.e. minimum cost) solu-
tion, provided the function h never overestimates the costs.
More information about search in general and especially about heuristic search
techniques can be found, for instance, in [13, 14].
Reasoning about States and Actions
More sophisticated approaches to planning use logic-based representations of states
and how actions modify states and their properties. In this case the results of an action
can be described in a generic way and it is left to the logical inference engine underly-
ing the planning system to determine the outcomes of an action for a specific situation.
Of course, this approach has the advantage of an immensely increased flexibility.
However, it also leads to heavy computational costs, which can make it infeasible for
some applications.
3.4.3 Learning
One of the most striking abilities of all higher natural organisms is their ability to
learn, that is, to modify their behavior as a result of past experience. Learning enables
a (natural or artificial) agent to perform the same tasks in a more efficient way and to
handle new tasks he could not manage before. Therefore, learning ability is commonly
regarded as one of the cornerstones of intelligent behavior. In this section we highlight
a few approaches to automated or machine learning. A more detailed treatment can be
found, for instance, in [15, 16].
Inductive Logic Programming
Inductive logic programming consists of finding logical formulae that aptly de-
scribe the given data by systematically searching the space of all logical formulae that
are applicable to the data. Most often the search is restricted to rules, i.e. if-then state-
ments, because such rules are most convenient in many applications (see Section
3.4.1).
The search through the space of logical formulae is usually carried out by proceed-
ing from the general to the specific (although there are also a few approaches which
work the other way round). That is, the search is started with a very general formula,
often an empty rule, which is applicable to all example cases. Thus it is made more
specific, but hopefully without lowering the support of the rule too severely, i.e., the
number of example cases to which it is applicable. The quality of each rule is then
assessed with heuristic measures, which provide a ranking for the presentation of the
rules found, but may also be used to guide the search. If the rules with the highest
scores are the first to be expanded, chances are better that the best and most useful
rules are found early in the search process. This has the advantage that the search can
be stopped early, before the whole search space has been visited, without losing too
much of the overall result.
162 Chapter 3 Methods, Algorithms, and Software
Inductive logic programming is very well suited for classification and concept de-
scription tasks. In these cases the search starts from one rule for each class or concept,
each having an empty antecedent and the class or concept in the consequent. Condi-
tions are only added to the antecedent of the rule to make it more specific. The best
rules found in the search can be used to classify cases for which the class is yet un-
known or to understand what properties are characteristic for the concept. Applica-
tions are, for instance, tasks like assessing what properties characterize customers of a
bank that are credit-worthy (in order to distinguish them from those who will pre-
sumably not pay back their loan) or the fault-prone vehicles of a car manufacturer (in
order to find the causes and to improve the product quality).
The advantage of inductive logic programming is that it is not bound to single table
representations of the training data, but can also handle multiple relations, which are
connected by references and keys. This can be very important in applications where
the data is often stored in relational database systems, divided into several relations
that refer to each other. In such a case it can be difficult, if possible at all, to transform
the data into tabular form, so that each line of a single table describes one example
case. However, decision trees and artificial neural networks, which are described be-
low, require such a single table representation.
The main disadvantage of inductive logic programming is that implementations are
usually fairly slow, partly due to the fact that most of them are programmed in
PROLOG, a language that is hardly renowned for fast execution. However, the core of
the problem is rather that the search space that has to be traversed by inductive logic
programming approaches is usually vast, so that prohibitively long execution times
can result. Therefore, it is very important to restrict the search space and to guide the
search by a declarative bias, which specifies the type of rules to look for by formal
grammars or rule templates, and by applying search heuristics such as beam search,
i.e., by concentrating in each step on the k best rules found in the preceding step.
More information about inductive logic programming approaches can be found, for
instance, in [17, 18].
Decision Trees
The decision tree is the most popular machine learning method, because it is very
simple and fast and produces easily understandable results. A decision tree is a classi-
fier, i.e., a method to assign a class from a predefined set to a case or object under
consideration, based on the values of a set of descriptive attributes.
As its name indicates, a decision tree has tree structure. Each inner node (i.e., each
node having descendents) specifies a test of a descriptive attribute; each leaf (i.e., each
node not having descendants) assigns a class. A case or an object is classified with a
decision tree by starting at the root and descending along those branches, which corre-
spond to the outcome of the test specified by an inner node, until a leaf is reached
from which the predicted class can be read. As an illustration consider Figure 3, which
shows a very simple decision tree for a medical task: a decision about a drug to admin-
ister. The tree prescribes to check the patient’s blood pressure first. If the blood
3.4 Artificial Intelligence Methodologies 163
Figure 3. A simple decision tree which yields a suggestion for which drug to administer,
depending on the blood pressure and the age of the patient.
pressure is low, drug A should be used; if it is high, drug B. If the blood pressure is
normal, a second test is necessary. Depending on whether the patient is under or over
forty years of age, it suggests administering drug A or drug B.
Decision trees can be constructed automatically from a set of preclassified example
cases by a recursive procedure that is based on a “divide and conquer” approach to-
gether with a “greedy” selection of the test attribute. With this approach a decision tree
is constructed from top to bottom (in the representation of Figure 3), which is why it is
called top-down induction of decision trees (TDIDT). It works as follows: The amount
of information provided by each of the available descriptive attributes about the class
is assessed with an evaluation measure. The attribute (or rather a specific test of this
attribute) that receives the best score is selected and forms the root of the tree. The
training cases are then split into subsets according to the outcome of the chosen test.
The rest of the tree is constructed recursively by applying the same procedure to each
of the subsets. The recursion stops if all cases in a subset are assigned to the same
class, if there are too few cases for another split, or if there are no tests that could be
performed left.
It should be noted that a decision tree can also be seen as a compact representation
of a set of rules. Each path from the root to a leaf node describes one rule, with the
conjunction of the test outcomes along the path providing the antecedent and the class
assignment in the leaf providing the consequent of the rule. Therefore the induction of
decision trees can also be used to find rules, although it should be kept in mind that the
result may not contain the most expressive individual rules for a domain, due to the
way in which they are constructed. The goal is rather to find a set of (related) rules
that together yield a good classification of all training examples.
The advantage of decision trees is that learning is usually very fast and that the re-
sult is easily interpretable by human experts, so that it can be checked for plausibility.
Their disadvantages are that the induction process usually selects only a small number
of the available attributes so that information that is distributed on a large number of
attributes, with each attribute carrying only limited information about the class, cannot
be handled adequately, resulting in suboptimal prediction accuracy. In such situations
(naive) Bayes classifiers and artificial neural networks are often superior.
A detailed treatment of the algorithms and mathematics underlying decision tree
construction can be found, for instance, in [19, 20]. The latter also treats regression
trees, which can predict numeric values instead of classes.
164 Chapter 3 Methods, Algorithms, and Software
Figure 4. A simple three-layered perceptron with three inputs x1, x2, and x3;
two outputs y1 and y2; and seven neurons a through g.
3.4 Artificial Intelligence Methodologies 165
functions. Indeed, it is fairly simple to prove that at most two hidden layers are neces-
sary to approximate any Riemann-integrable function with arbitrary accuracy, even
though this may require a fairly large number of neurons.
Multilayer perceptrons can be trained to approximate a function that is given only
as a set of input/output patterns by a method that is called error backpropagation. The
idea underlying this method is, in broad strokes, the following: The parameters of the
network (connection weights and threshold values) are initialized to random values.
Then the input patterns are processed with the network and the produced output is
compared to the desired one. The error, usually the sum of the squared differences
between the desired and actual outputs, is propagated backwards through the network
and the parameters are adapted in such a way that the error gets smaller. This proce-
dure is repeated until either the error is small enough or does not change anymore.
Mathematically, the backpropagation procedure is a gradient descent on the error
function. That is, repeatedly small steps are made in the direction in which the error
gets smaller. This direction is determined by differentiating the error function.
The advantage of artificial neural networks is that they frequently produce the best
results in w.r.t. accuracy compared to other methods. Their disadvantage is that the
trained network is “black box.” Since its “knowledge” is stored in the connection
weights and threshold values, its computations are almost uninterpretable.
More detailed accounts of artificial neural network methodologies and especially of
other network types and the tasks they are suited for can be found, for instance, in
[21,22]. An interesting variant are neuro-fuzzy systems, which combine the learning
ability of artificial neural networks with the comprehensiveness of fuzzy systems [23].
Genetic Algorithms
With genetic algorithms one tries to imitate the optimization process of biological
evolution. Random mutations and recombinations of candidate solutions as well as
quality dependent reproduction aim at finding (near-) optimal solutions of (combinato-
rial) optimization problems. Since most learning tasks can be reformulated as (combi-
natorial) optimization problems, genetic algorithms are almost universally applicable.
In general, a genetic algorithm approach consists of the following steps:
1. Find an appropriate way of encoding candidate solutions. An encoding of a can-
didate solution is called a chromosome.
2. Define a fitness function to assess the quality of a candidate solution.
3. Generate a random initial population of (encodings of) candidate solutions.
4. Evaluate the candidate solutions of the population.
5. Select the candidate solutions of the next generation according to their quality:
The higher the fitness of a candidate solution, the higher the probability that it
receives a child (a copy of itself in the next generation). The same candidate so-
lution may be selected several times.
6. Modify the chromosomes representing the selected candidate solutions by ap-
plying genetic operators like mutation (randomly modify a small part of a
chromosome) and crossover (randomly combine parts of two parent chromo-
somes).
166 Chapter 3 Methods, Algorithms, and Software
7. Repeat steps 4 to 6 until some termination criterion is reached, for example the
best individual has a given minimum quality, no improvement occurred during a
certain number of steps, or a preset number of generations has been created.
8. The best-candidate solution of the last generation (or, if it is recorded, the best
solution encountered during the whole process) is the solution found.
Which encoding of the candidate solutions is most suitable depends on the specific
problem. Usually strings of characters are used in analogy to the sequential organiza-
tion of information in biological chromosomes. Each position in the string corresponds
to a gene, each character that may be at the position of a given gene to an allele (or
allelomorph) of the gene. The practical advantage of such an encoding is that several
standard genetic operators are available for string representations. For example, muta-
tion can consist in randomly selecting a few positions in the string and altering the
characters at these positions (see Figure 5, left side). Crossover can be defined as se-
lecting a random cut point and exchanging the part of the strings on one side of the cut
point (called a one-point crossover, see Figure 5, right side).
A simple method to achieve fitness-dependent selection is fortune wheel selection.
A fortune wheel is set up on which each sector is associated with a candidate solution
in the current population. The size of the sector reflects the fitness of the candidate
solution. Each turn of the fortune wheel then selects one candidate solution for the
next generation. Since better candidate solutions are associated with larger sectors,
they have a better chance of getting selected (or getting selected more often).
More sophisticated genetic algorithms include elitism, i.e., that the best candidate
solutions are always copied unchanged to the next generation, and techniques to avoid
crowding, i.e. low diversity of the candidate solutions of a population, which can hin-
der improvements because of the limited amount of genetic material that is available.
The main advantages of genetic algorithms are that they are model-free (i.e., they
do not presuppose a specific model of the domain under consideration) and that there
are almost no limits to their applicability. Their disadvantages are that it often takes
them a long time to find a solution and that their success can depend heavily on the
chosen encoding of the problem. If the encoding is unsuitable, a genetic algorithm
may even fail completely. This reduces the advantage of its being model-free, because
it requires a lot of effort to find a good encoding, which can be just as costly as build-
ing a domain-specific model.
An extensive treatment of genetic algorithms and their variants, such as simulated
annealing, genetic programming, and evolution strategies, is in [24-26].
References
1. Russell, S., and P. Norvig. 2003. Artificial Intelligence: A Modern Approach, 2nd
ed. Upper Saddle River, NJ: Prentice Hall.
2. Nilsson, N. J. 1998. Artificial Intelligence: A New Synthesis. San Francisco, CA:
Morgan Kaufmann.
3. Genesereth, M. R., and N. J. Nilsson. 1987. Logical Foundations of Artificial
Intelligence. San Mateo, CA: Morgan Kaufmann.
4. Levesque, H. J., and G. Lakemeyer. 2001. The Logic of Knowledge Bases.
Cambridge, MA: MIT Press.
5. Reiter, R. 2001. Knowledge in Action. Cambridge, MA: MIT Press.
6. Jensen, F. V. 2001. Bayesian Networks and Decision Graphs. New York, NY:
Springer-Verlag.
7. Borgelt, C., and R. Kruse. 2002. Graphical Models: Methods for Data Analysis
and Mining. Chichester, UK: J. Wiley and Sons.
8. Kruse, R., J. Gebhardt, and F. Klawonn. 1994. Foundations of Fuzzy Systems.
Chichester, UK: J. Wiley and Sons.
9. Klir, G. J., and B. Yoan. 1995. Fuzzy Sets and Fuzzy Logic: Theory and
Applications. Upper Saddle River, NJ: Prentice-Hall.
10. Zimmermann, H.-J. 1996. Fuzzy Set Theory and Its Applications. Dordrecht,
Netherlands: Kluwer.
11. Kolodner, J. 1993. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann.
12. Watson, I. 1997. Applying Case-Based Reasoning: Techniques for Enterprise
Systems. San Francisco, CA: Morgan Kaufmann.
13. Pearl, J. 1984. Heuristics: Intelligent Search Strategies for Computer Problem
Solving. Reading, MA: Addison-Wesley.
14. Rayward-Smith, V. J., I. H. Osman, and C. R. Reeves, eds. 1996. Modern
Heuristic Search Methods. New York, NY: J. Wiley and Sons.
15. Langley, P. 1995. Elements of Machine Learning. San Mateo, CA: Morgan
Kaufmann.
16. Mitchell, T. 1997. Machine Learning. New York, NY: McGraw-Hill.
17. Lavrac, N., and S. Dzeroski. 1994. Inductive Logic Programming: Techniques
and Applications. Upper Saddle River, NJ: Prentice Hall.
18. Bergadano, F., and D. Gunetti. 1995. Inductive Logic Programming. Cambridge,
MA: MIT Press.
19. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA:
Morgan Kaufmann.
20. Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification
and Regression Trees. Belmont, CA: Wadsworth.
21. Haykin, S. 1994. Neural Networks: A Comprehensive Foundation. Upper Saddle
River, NJ: Prentice-Hall.
22. Anderson, J. A. 1995. An Introduction to Neural Networks. Cambridge, MA:
MIT Press.
168 Chapter 3 Methods, Algorithms, and Software
ware that also stores account, product, and address information. A third system han-
dles inventory management, and processes products and addresses. In this scenario
data is stored redundantly: addresses are kept in all three systems.
A DBMS integrates all data into a single structured database, and can thus help to
avoid redundancies. To this end, uniform access to the data is provided for all applica-
tions and users in form of a descriptive query language. Such a language specifies
what data are accessed without any consideration of the access paths. This allows for
an internal optimization of queries on large datasets.
As the data is integrated into a single database, multiple users and applications may
access a database simultaneously. A DBMS supports the concurrent access of multiple
users by introducing the concept of transactions, which are operations consisting of
several database read and write actions, and their synchronization.
Since each application has its own requirements on the data, a concept of data in-
dependence is needed. Data independence allows for the abstraction from the actual
physical storage scheme and the introduction of a standardized query interface. A
three-tier architecture realizing data independence comprises the following layers [1]:
• The internal schema describes the physical storage of the data.
• The middle tier specifies a logical, implementation independent view on the en-
tire database. It is called the conceptual schema.
• The last layer consists of external schemas that define special views on the con-
ceptual schema for different applications.
In addition, database management systems provide support with respect to the impor-
tant concept of data safety and security. Firstly, powerful recovery and backup mecha-
nisms assure the persistence of the data even in the case of an error. Secondly, access
control techniques such as fine-grained user roles and rights prevent unauthorized ac-
cess to the data.
All these mechanisms are combined as follows: A database is a structured dataset
that is managed by a DBMS. A DBMS consists of different software modules that
handle a database. The database system (DBS) denotes the combination of a DBMS
and a specific database.
Relational Databases
A conceptual schema defines a logical view of the data. The most popular concep-
tual data model is the relational model [2, 3]. It is used by many commercial database
systems like Oracle 10g, Microsoft SQL Server, or IBM DB2. A relational database
consists of a collection of tables. A table represents a relation. Figure 1 depicts the
concepts of a relation. The table header defines the structure of the table and is called
the relation schema. One row of the table is called a tuple. A relation is a set of all
tuples, i.e. the body of the table. A column represents an attribute. The name of an
attribute is given in the table header, and its values are stored in its column entries.
The relational model is a very simple data model. However, it makes use of addi-
tional integrity constraints. Integrity constraints are classified in local and global con-
straints. Local constraints describe conditions within a single table. For instance, a
170 Chapter 3 Methods, Algorithms, and Software
unique key condition on an attribute specifies that the attribute values identify each
tuple uniquely. Global constraints affect multiple tables. An example is the foreign key
constraint, which requires that an attribute value present in one table also has to be
contained in a second referred table. Altogether, a relation schema consists of a struc-
ture description and integrity constraints.
As described above, one of the advantages of DBMSs is the utilization of a descrip-
tive query language. Different query languages have been proposed based on the rela-
tional model. These languages are required to be at least as powerful as the relational
algebra, which comprises operations such as projection, selection, renaming, and join,
as well as the set theoretic operations union, intersection, and difference [4]. Selection
extracts tuples from a relation that satisfy the condition defined in the selection state-
ment. Projection allows the selection of columns, and renaming assigns a new name to
a column in the query result. Join enables the combination of tables over common
attributes and values. The most important relational query language is SQL, which is
discussed in the next section.
SQL: Structured Query Language
The structured query language SQL is a relational query language that is supported
by many vendors of commercial relational DBMSs. The current standard is SQL:2003
[5] that provides features for object-relational data processing, interfaces for data min-
ing, information retrieval, and external data sources as well as the standard of process-
ing XML data in SQL databases. We give only a brief overview about basic query
concepts of SQL. Data or view definition features of SQL as well as advanced features
are not discussed here. The interested reader is referred to the literature references,
e.g., [5].
Here, we use an example to illustrate the basics of SQL. Assume the relations
shown in Figure 2. The relation Milkprod holds all data about the milk production
per cow in the recent years. Information about the animals is stored in the second rela-
tion Cows. Both relations are linked to each other via the common attribute No.
A basic SQL query consists of three clauses: select, from and where. The
from clause specifies the relations used in the query. Attributes required in the result
set, column renaming, and additional functions are specified in the select clause. A
selection condition can be defined in the where clause of a query. Consider the sim-
ple query: “Return all cows, denoted as Cow_no, with milk production of at least
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 171
Milkprod Cows
No Milk Year No Price Birth_date
110 3500 2003 110 1000 05-06-1997
111 4000 2003 111 1200 31-05-1999
110 4500 2004 113 800 11-05-2000
Figure 2. Example tables “Milkprod” and “Cows.”
4000 liters in the year 2004.” This query declared as an SQL statement is:
Select No as Cow_no, Milk
FROM Milkprod
WHERE Milk >= 4000 AND Year = 2004
A second example illustrates the combination of two relations via a join operation.
Assume we want to know the correlation of price, age, and milk production per cow.
Thus, we have to connect both tables based on the “No” attribute. One possibility to
express a join in SQL is using the natural join keyword in the FROM clause.
Thus, the relations Milkprod and Cows are joined via the common attribute “No”.
Finally, the required attributes are specified in the select statement:
SELECT No, Milk, Price, Birth_date
FROM Milkprod natural join Cows
Besides relational DBMSs, systems using other conceptual models exist. Spatial
database systems support spatial queries, typical for maps, with special data models,
for instance constraint databases [6]. Extensions to the relational model, e.g., the ob-
ject-relational data model [7], allow the inclusion of extended query concepts such as
data mining, information retrieval queries, and XML processing, as well as complex
data types, into database management systems. Advanced concepts of SQL query fea-
tures as well as extended SQL abilities like data mining and multimedia interfaces can
be found in [1,4,8].
3.5.2 Knowledge Discovery in Databases
Knowledge Discovery in Databases (KDD) aims at the discovery of useful informa-
tion in large data collections usually stored in databases, for instance unusual or fre-
quently occurring patterns or subgroups of certain properties, clusters, or rules provid-
ing a general description of the data or allowing future predictions of yet unknown
objects of the same domain.
Nowadays, digital information is relatively easy to access and inexpensive to store,
and rapidly growing amounts of data are collected. The number of the gathered objects
is soaring as well as the number of features describing an object. Human abilities to
analyze vast amounts of data lag far behind the technical means of data collection and
storage. Valuable information coded in the data may remain undiscovered by conven-
tional methods of analysis.
172 Chapter 3 Methods, Algorithms, and Software
• Finally, the mined patterns have to be evaluated and interpreted. Some or all of
the previous steps may have to be repeated until the initially defined objective is
met.
Data Mining
Data mining is sometimes used synonymously with KDD. However, we will here
follow the common approach [9, 11] and define Data mining as one step in the KDD
process detailed in the preceding paragraph in which data are analyzed on the basis of
some data mining algorithms.
Over the past years, data mining workbench systems, such as Weka [13], Enterprise
Miner [14] and Clementine [15], have been successfully put into practice in a wide
range of scientific (e.g., geophysics [16, 17], medicine [17-19]) and commercial areas
(e.g., fraud detection [17], investment [11], risk management [19], telecommunication
[17]). For a more complete overview about data mining techniques, systems, and ap-
plications, the interested reader is referred to [13,20,21].
The systems and underlying techniques designated to accomplish the data mining
step of the KDD process differ depending on the knowledge discovery goal. Discovery
goals can be broadly categorized as prediction or classification, and description [11].
Prediction or classification techniques aim at constructing classification schemes
from empirical data that can be employed to predict the behavior of yet unknown ob-
jects of the same domain. Based on the attributes of the given data, hypotheses in the
form of functions mapping a data item to one of several predefined classes [22] are
automatically generated. The objective is to categorize the data at hand as well as fu-
ture data points as accurately as possible. Predictive techniques comprise methods
such as decision trees, support vector machines, neural nets, rule learners, and prob-
abilistic nets.
174 Chapter 3 Methods, Algorithms, and Software
In contrast, descriptive systems identify data regions of particular local interest and
present the discovered patterns in a human-understandable form. Popular descriptive
techniques include subgroup discovery, clustering, change and deviation detection, de-
pendency modeling, and summarization.
Applications of Data Mining to Agriculture
In agriculture, KDD has only recently emerged as an essential technology [23].
Little et al. [24] describe a project demonstrating the potential utility of KDD in the
administration of the United States Department of Agriculture (USDA) crop insurance
program. Their approach aims at detecting suspicious planting and harvesting patterns
in more than one million records from an USDA database containing information
about agricultural practices, cropping type (irrigated vs. non-irrigated, grain vs. si-
lage), acres planted and acres harvested, regional characteristics, and meteorology.
KDD methods are applied to identify patterns of exceptionally small yields compared
to the total crop planted, which cannot be explained by a regional climatological event,
and which thus might indicate misuse of the USDA crop insurance program.
In the Data Mining for Site-Specific Agriculture project of the Illinois Council on
Food and Agricultural Research (C-FAR) [25], KDD is applied in the area of precision
agriculture and variable rate application practices in an effort to improve crop yields.
The objective of the project is to predict the spatial characteristics of crop yield based
on interactions between spatial and temporal characteristics of weather, fertilizer, seed
variety, soil properties, planting date, cropping and management history.
Canteri et al. [26] explore the usefulness of data mining methods on precision agri-
culture databases which, because of their size and complexity, cannot be efficiently
analyzed by traditional methods. They investigate successful techniques to relate crop
yield and physical-chemical soil properties.
3.5.3 Information Retrieval
Information retrieval (IR) deals with searching information in collections of un-
structured and semantically fuzzy data, such as natural language texts, images, audio,
or video. While data retrieval systems deal with structured data stored, for example, in
a relational database, and retrieve all items containing the keywords defined in a
query, information retrieval systems (IRSs) aim at retrieving information about a sub-
ject rather than data that satisfy a given query. In fact, queries in information retrieval
are not crisp, and an IRS aims at retrieving all information that might be relevant to a
query describing the user’s information need. Relevance is the central aspect in IR.
In the following, we focus on classical text information retrieval and introduce ba-
sic concepts underlying retrieval models as well as measures for the evaluation of the
retrieval performance. More detailed information on IR can be found in the books by
van Rijsbergen [27], Salton and McGill [28], and Beaza-Yates and Riberio-Neto [29].
The IR Process
The information retrieval process, which is depicted in Figure 4, comprises the
preprocessing of the documents stored in a given data collection, their retrieval, and
the ranking of relevant documents.
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 175
• Stemming of words to reduce a word to its root form, and to allow the retrieval
of documents containing syntactic variations of query terms;
• Selection of index terms to determine which words will be used as index terms;
• Construction of a term categorization structure, such as a thesaurus, to expand
the query with related terms.
After the preprocessing step, the indexed documents can be queried. The issued query
is evaluated based on the underlying retrieval model. In what follows, we will discuss
several information retrieval models.
Information Retrieval Models
The three classical models in information retrieval are the Boolean, vector and
probabilistic models, which we will briefly present now. For a more detailed discus-
sion, the interested reader is referred to [29].
The Boolean model is a simple retrieval model based on classical set theory and
Boolean algebra. A query is specified as a Boolean expression with a crisp semantics,
and the retrieval strategy is based on Boolean logic without any notion of a grading
scale. Thus, the Boolean model performs a data rather than an information retrieval
task. It returns true if an index term is present in a document, and false otherwise. The
weights of the index terms are binary, and each document is either relevant or not.
The vector model [30] is based on the similarity of multi-dimensional vectors
which reflect the occurrence of index terms in the query and the searched documents.
More specifically, a document’s relevance for a query is evaluated based on the cosine
of the angle between the index term vectors of the document and the query. The re-
trieved documents are ranked according to their relevance, and documents might be
retrieved even if they only partially match a query.
The probabilistic model [31] estimates the probability that a term appears in a
document, or that a document satisfies the information need and is therefore relevant
for a given query. The retrieved documents are ranked based on the probability values.
In general, the Boolean model is considered as the weakest classical model, since it
cannot recognize partial matches [28]. Whether the vector model surpasses the prob-
abilistic model is an ongoing discussion.
Over the past years, alternative models for each type of classical model have been
proposed. The Boolean model has been enhanced by the extended Boolean model and
the fuzzy model [29]. The generalized vector model [32] and the latent semantic index-
ing model [33] are based on the classical vector model. The inference network and the
belief network are advancements of the classical probabilistic model [29].
Ranking
After the retrieval process, the ordered list of relevant documents is presented to the
user. In the vector and the probabilistic model, the documents’ ranking order is based
on their relevance for the given query. In the Boolean model, ranking is not possible,
since a document is either in the result set or not, and all retrieved documents are
equally relevant.
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 177
At this point the user can interactively refine the query by marking documents as
relevant or non-relevant. Based on the user’s judgment the query is reformulated, e.g.,
represented as a new vector of index terms in the vector model. Then, the IRS re-
trieves all documents that are relevant according to the refined query. This iterative
query refinement process is called relevance-feedback [34].
Retrieval Performance Evaluation
It is possible that non-relevant documents are retrieved by an IRS or that relevant
documents are not retrieved. For that reason there exist certain measures to determine
the performance of an IRS. The standard measures of IR performance are recall and
precision.
Consider a query q, and a set Dr of documents in a given collection that are rele-
vant to q, where |Dr| is the number of documents in Dr. Based on the query q, the sys-
tem retrieves an answer set Da with |Da| documents. |Dra| is the number of documents
in the answer set Da that are relevant to q, i.e. |Dra| is the cardinality of the set Dra= Da
∩ Dr, as shown in Figure 5.
The recall R expresses how many of the relevant documents in the collection are
retrieved, and is defined as the number of relevant documents retrieved compared to
the total number of relevant documents in the collection: R = |Dra| / |Dr|.
The precision P describes how many of the retrieved documents are relevant, and is
defined as the number of relevant documents retrieved compared to the total number
of documents retrieved: P = |Dra| / |Da|
Ideally, precision and recall should both be 100%. However, the two measures are
interdependent, and an increase in recall usually results in a decrease in precision.
Thus, IR systems attempt to simultaneously maximize precision and recall.
Applications of Information Retrieval to Agriculture
In the agricultural sector, information retrieval plays an important role in the dis-
semination of information and knowledge about agricultural practice, market trends,
price of agro-forestry products, international trade laws, legal documents, etc.
Otuka [35] applies IR techniques to share and reuse farming knowledge and experi-
ence among farmers and advisers. Textual documents describing both successful and
unsuccessful farming cases are stored in a web-based system, which can be queried for
information pertinent to a specific problem. The retrieval system’s problem-solving
abilities are improved by additionally enabling the system to retrieve agricultural spe-
178 Chapter 3 Methods, Algorithms, and Software
cialists for a given problem, where specialists are characterized in the system by terms
occurring in their publications [36].
Hoa [37] describes the use of scientific, technological, and economic information to
build up an infrastructure for the agricultural development process in Vietnam. He
illustrates the use of IR in the Information Centre for Agriculture and Rural Develop-
ment (ICARD) of Vietnam, which maintains several databases on crop varieties, fertil-
izers and agro-chemicals, domestic and foreign agricultural production, and on inter-
national prices of rice, wheat, corn, coffee, rubber, and fertilizer at some main mar-
kets. Retrieved information is applied to support policy making, strategic planning and
decision processes, to seek export markets for agricultural products, and to enlarge
investment cooperation and joint ventures between foreign countries.
3.5.4 Web Mining
Since its invention in 1989 as an Internet-based hypermedia initiative for global in-
formation sharing, the World Wide Web (also called the web or WWW) has had an
explosive growth. The number of web users has grown at an unknown but presumably
exponential rate; the information sources available on the WWW have proliferated;
and the web sites, programs, and technology have been in continuous change. All
these aspects have created the need for automatic tools and techniques to help the user
find the desired information, and to analyze the structure and usage of the web. Thus,
web mining, or the application of data mining techniques to the World Wide Web, has
been the focus of several research projects in the last years. Scime [38] provides a re-
cord of current research and practical applications in web mining.
Web mining has been broadly defined as “the discovery and analysis of useful in-
formation from the World Wide Web” [39] or more precisely as “the extraction of
interesting and potentially useful patterns and implicit information from artifacts or
activity related to the World Wide Web” [40].
Web mining research can be classified into the categories of web content mining,
web structure mining, and web usage mining [41]. Web content mining concerns the
discovery of useful information from the content of web sites. Web structure mining
focuses on the web’s structure and the links between the web sites, and web usage
mining studies the access patterns of web users.
Web Content Mining
Web content mining is the process of extracting knowledge from the content of
web documents, including texts, images, audio, and video, or from the description of
such documents. It focuses on the automatic search of information sources available
online. Web content mining techniques go beyond keyword extraction used mostly in
the first generation of search engines on the web.
Within web content mining, the systems can be classified based on their mining
strategy into systems that directly extract information from the content of the docu-
ments, and systems that improve the search results of other tools, such as search en-
gines and web spiders [40]. In addition, one can differentiate between agent-based and
database approaches to web content mining [39].
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 179
Database approaches, such as WebLog [42] and ARANEUS [43], combine stan-
dard database querying mechanisms and data mining techniques to access and analyze
information from the web, and focus on the integration of heterogeneous and semi-
structured data from the web into more structured data collections, such as relational
databases.
Agent-based systems can, on behalf of a particular user, autonomously or semi-
autonomously search for and organize relevant information from the web. They can be
classified into three categories. Intelligent search agents, such as OCCAM [44], FAQ-
Finder [45], and ShopBot [46], search for relevant information using characteristics of
a particular domain and possibly a user profile. Information filtering agents, such as
HyPursuit [47], use information retrieval techniques to automatically retrieve, filter,
and categorize web documents. Personalized web agents, such as WebWatcher [48],
attempt to learn user preferences and to discover the appropriate web documents.
Here, we briefly describe some web content mining projects:
• Occam [44] is an information-gathering engine. The user can specify the re-
quired information as a database query, and Occam tries to use its knowledge
about various sites to derive an action plan to obtain the information.
• FAQ-Finder [45] is an automated question-answering system that uses files of
Frequently Asked Questions (FAQs) available on the web. The user poses a
question to the system about any topic, and FAQ-Finder attempts to find the
FAQ file most likely to yield an answer, searches within that file for similar
questions, and returns the given answers to the user.
• ShopBot [46] is a comparison-shopping agent. It receives as input the home
pages of online stores, learns how to buy from these sites, and is then able to
visit these sites, obtain product information and summarize the results to the
user.
• WebWatcher [48] is a tour guide agent for assisting users to browse the web.
Based on the information the user seeks, WebWatcher accompanies the user
from page to page, highlights links believed to be relevant, and learns from ex-
perience.
• The CiteSeer [49] project at the NEC Research Institute provides algorithms,
techniques, and software to implement digital libraries. All these are imple-
mented in the NEC Research Index (http://citeseer.ist.psu.edu/), which crawls
the web to locate scientific articles, extracts information such as the citations, ci-
tation context, article title, authors, etc., and performs full-text indexing and
autonomous citation indexing.
Web Structure Mining
Web structure mining is the process of inferring information about web pages from
the web’s link structure. Links between web pages can generally be interpreted as in-
dicators for relevance or quality [50]. The rationale is that a web page that is fre-
quently referred to is likely to be more important than a page that is seldom refer-
enced. Accordingly, the number of web pages a document refers to may indicate the
180 Chapter 3 Methods, Algorithms, and Software
richness or topic diversity of this document. Thus, a document with a large number of
links is likely to be a good source of information.
Web structure mining has applications in search, browsing, and traffic estimation.
For example, PageRank [51] is a global ranking of all web pages based on their loca-
tion in the web’s link structure, and provides the basis of the popular search engine
Google (http://www.google.com/).
The HITS (Hyperlink-Induced Topic Search) algorithm [50], which searches a col-
lection of web pages for authority pages (highly referenced web documents) and for
hub pages (documents that provide links to authority pages), has been used in various
web applications.
The web structure mining project Clever [52] incorporates several algorithms that
classify a web resource based on an analysis of its link structure. The SALSA [53] al-
gorithm is a stochastic approach to web structure mining based on random walks on
graphs derived from the link structure between web documents.
Web Usage Mining
Every web server records and maintains a log entry for every single access it gets.
The log files constitute a huge collection of structured information. Web usage min-
ing, also known as web log mining, is the process of automatic discovery of interest-
ing usage patterns from logs kept by web servers [40].
The knowledge obtained by web usage mining can be used to enhance the quality
of service provided by web servers, to customize web sites to users, to improve the
design and navigation of web sites, to target users for electronic commerce, and to
support marketing decisions. There are several research and commercial projects con-
cerned with web usage mining, such as WebSIFT [52], WebLogMiner [40], and Web
Utilization Miner (WUM) [54]. Web usage mining generally comprises the following
three steps:
1. Preprocessing, which consists of data cleaning, i.e. the elimination of irrelevant
or noisy data, and in grouping the data into useful data abstractions;
2. Pattern discovery, which is the application of methods and algorithms to extract
knowledge from the data. Some of the methods used are statistical analysis, as-
sociation rules, clustering, and classification;
3. Pattern analysis, which consists of understanding the patterns or rules found in
step 2, and in filtering out uninteresting patterns.
Applications of Web Mining to Agriculture
Web mining techniques form a vital basis for information and knowledge dissemi-
nation systems and decision support systems for the agricultural sector. Pan [55] dis-
cusses methods and techniques of collecting and managing information resources from
the World Wide Web to increase information dissemination for various users in the
agricultural sector.
Gandhi [56] describes a specific market information system which automatically
collects information from online sources, processes it and widely disseminates it to
farmers, retailers, wholesalers and the government.
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 181
References
1. Date, C. J. 2000. An Introduction to Database Systems. 7th ed. Reading, MA:
Addison-Wesley Publishing Company.
2. Codd, E. 1970. A relational model for large shared data banks. Communication of
the ACM 13(6): 377-387.
3. Codd, E. 1982. Relational database: A practical foundation for productivity.
Communication of the ACM 25(2): 109-117.
4. Elmasri, R., and S. B. Navathe. 2000. Fundamentals of Database Systems. 3rd ed.
Reading, MA: Addison-Wesley.
5. Melton, J., ed. 2003. Database Languages—SQL. ISO/IEC 9075-1:2003.
6. Kuper, G., L. Libkin, and J. Paredaens. 2000. Constraint Databases. Heidelberg,
Germany: Springer-Verlag.
7. Stonebraker, M., P. Brown, and D. Moore. 1999. Object-Relational DBMSs –
Tracking the Next Great Wave. 2nd ed. San Francisco, CA: Morgan Kaufman
Publishers.
8. Ullman, J. D., and J. Widom. 1997. A First Course in Database Systems. Upper
Saddle River, NJ: Prentice-Hall, Inc.
9. Mannila, H. 1996. Data mining: Machine learning, statistics, and databases. Proc.
of the 8th International Conference on Scientific and Statistical Database
Management, 1-6.
10. Keim, D., and H. Kriegel. 1996. Visualization techniques for mining large data-
bases: A comparison. IEEE Transactions on Knowledge and Data Engineering,
Special Issue on Data Mining 8(6): 923-938.
11. Fayyad, U. M., G. Piatetsky-Shapiro, and P. Smyth. 1996. From data mining to
knowledge discovery in databases. AI Magazine 1996: 37-54.
12. Mannila, H. 1997. Methods and problems in data mining. Proc. of International
Conference on Database Theory, 41-55.
13. Witten, I.H., and E. Frank. 2005. Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations. 2nd ed. San Francisco, CA:
Morgan Kaufmann.
14. http://www.sas.com/technologies/analytics/datamining/miner/
15. http://www.spss.com/spssbi/clementine/
16. Fayyad, U., D. Haussler, and P. Stolorz. 1996. KDD for Science data analysis:
Issues and examples. Proc. of the 2nd International Conference on Knowledge
Discovery and Data Mining (KDD-96), 50-56.
182 Chapter 3 Methods, Algorithms, and Software
17. Han, J., R. B. Altman, V. Kumar, H. Mannila, and D. Pregibon. 2002. Emerging
scientific applications in data mining. Communications of the ACM 45(8): 54-58.
18. Lloyd-Williams, M. 1997. Discovering the hidden secrets in your data–The data
mining approach to information. SCGISA & RRL.net Workshop on Health and
Crime Data Analysis.
19. Apte, C., B. Liu, E. P. D. Pednault, and P. Smyth. 2002. Business applications of
data mining. Communications of the ACM 45(8): 49-53.
20. Dunham, M. H. 2002. Data Mining: Introductory and Advanced Topics.
Englewood Clifts, NJ: Prentice Hall, Inc.
21. Larose, D. T. 2004. Discovering Knowledge in Data: An Introduction to Data
Mining. New York, NY: John Wiley.
22. Weiss, S. I., and C. Kulikowski. 1991. Computer Systems That Learn:
Classification and Prediction Methods from Statistics, Neural Networks, Machine
Learning, and Expert Systems. San Francisco, CA: Morgan Kaufmann.
23. Persidis, A. 2000. Data mining in biotechnology. Nature Biotechnology 18: 237-
238.
24. Little, B., W. Johnston, A. Lovell, S. Steed, V. O’Conner, G. Westmorland, and
D. Stonecypher. 2001. Data mining U.S. corn fields. Proc. of the First SIAM
International Conference on Data Mining 1: 99-104.
25. http://www.gis.uiuc.edu/cfardatamining/default.htm
26. Canteri, M. G, B. C. Ávila, E. L. dos Santos, M. K. Sanches, D. Kovalechyn, J. P.
Molin, and Gimenez. 2002. Application of data mining in automatic description
of yield behavior in agricultural areas. Proc. of the World Congress of Computers
in Agriculture and Natural Resources, 183-189.
27. van Rijsbergen, C. J. 1975. Information Retrieval. London, UK: Butterworths.
28. Salton, G., and M. J. Gill. 1983. Introduction to Modern Information Retrieval.
New York, NY: McGraw-Hill Book Co.
29. Baeza-Yates, R., and B. Ribeiro-Neto. 1999. Modern Information Retrieval.
Harlow, UK: Addison-Wesley.
30. Salton, G. 1971. The SMART Retrieval System–Experiments in Automatic
Document Processing. Englewood Cliffs, NJ: Prentice-Hall Inc.
31. Robertson, S. E., and K. S. Jones. 1976. Relevance weighting of search terms. J.
American Society for Information Science 27: 129-146.
32. Wong, S. K. M., W. Ziarko, and P. C. N. Wong. 1985. Generalized vector space
model in information retrieval. Proc. of the 8th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval.
33. Dumais, S. T. 1991. Improving the retrieval of information from external sources.
Behavior Research Methods, Instruments and Computers 23: 229-236.
34. Rocchio, J. J. 1971. Relevance feedback in information retrieval. The SMART
Retrieval System–Experiments in Automatic Document Processing, ed. G. Salton,
313-323. Englewood Cliffs, NJ: Prentice-Hall Inc.
35. Otuka, A. 1999. Case retrieval and management system for agricultural case base.
Proc. of the 2nd Conference of the European Federation for Information
Technology in Agriculture, Food and the Environment.
3.5 Databases, Knowledge Discovery, Information Retrieval, and Web Mining 183
36. Otuka, A. 2000. Retrieval of specialists and their works for agriculture case base.
Proc. of the 2nd Asian Conference for IT in Agriculture.
37. Hoa, T. T. T. 1998. Database for Agriculture in Information Centre for
agriculture and rural development of Vietnam. Agricultural Information
Technology in Asia and Oceania, 25-28.
38. Scime A. 2004. Web Mining. Idea Group Inc. (IGI).
39. Cooley, R., J. Srivastava, and B. Mobasher. 1997. Web mining: Information and
pattern discovery on the World Wide Web. Proc. of the 9th IEEE International
Conference on Tools with Artificial Intelligence.
40. Zaïane, O. R. 1999. Resource and knowledge discovery from the internet and
multimedia repositories. PhD thesis. B.C., Canada: Simon Fraser University.
41. Kosala, R., and H. Blockeel. 2000. Web mining research: A survey. ACM
SIGKDD Explorations 2(1): 1-15.
42. Lakshmanan, L., F. Sadri, and I. N. Subramanian. 1996. A declarative language
for querying and restructuring the web. Proc. of the 6th International Workshop
on Research Issues in Data Engineering: Interoperability of Nontraditional
Database Systems (RIDE-NDS’96).
43. Merialdo P., P. Atzeni, and G. Mecca. 1997. Semistructured and structured data
in the web: Going back and forth. Proc. of the Workshop on the Management of
Semistructured Data.
44. Kwok, C., and D. Weld. 1996. Planning to gather information. Proc. of the 14th
National Conference on Artificial Intelligence.
45. Hammond, K., R. Burke, C. Martin, and S. Lytinen. 1995. Faq-finder: A case-
based approach to knowledge navigation. Working Notes of the AAAI Spring
Symposium: Information Gathering from Heterogeneous, Distributed
Environments. Menlo Park, CA: AAAI Press.
46. Doorenbos, R. B., O. Etzioni, and D. S. Weld. 1997. A scalable comparison-
shopping agent for the World-Wide Web. Proc. of the 1st International
Conference on Autonomous Agents.
47. Weiss, R., B. Velez, M. A. Sheldon, C. Namprempre, P. Szilagyi, A. Duda, and
D. K. Gifford. 1996. HyPursuit: A hierarchical network search engine that
exploits content-link hypertext clustering. Proc. of the 7th ACM Conf. on
Hypertext.
48. Joachims, T., D. Freitag, and T. Mitchell. 1997. WebWatcher: A tour guide for
the World Wide Web. Proc. of the 11th International Joint Conference on
Artificial Intelligence.
49. Lawrence, S., K. Bollacker, and C. L. Giles. 1999. Indexing and retrieval of
scientific literature. Proc. of the 8th International Conference on Information and
Knowledge Management.
50. Kleinberg, J. 1999. Authorative sources in a hyperlinked environment. Journal of
the ACM 46(5): 604-632.
51. Page, L., S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank Citation
Ranking: Bringing order to the web. Technical Report, Stanford Digital Library
Technologies Project.
184 Chapter 3 Methods, Algorithms, and Software