Ai CH 4-6
Ai CH 4-6
Sentence (1) can be tested by appealing to basic facts about arithmetic (and by tacitly
assuming an Arabic, decimal representation of natural numbers).
Sentence (2) is a bit more problematic. In order to give it a truth value, we need to know
who Bontu and Abebe are and perhaps to have a reliable account from someone who
witnessed the situation described. In principle, e.g., if we had been at the scene, we feel
that we would have been able to detect Bontu’s violent reaction, provided that it indeed
occurred in that way.
Sentence (3), known as Goldbach’s conjecture, seems straightforward on the face of it.
Clearly, a fact about all even numbers >2 is either true or false. But to this day nobody
knows whether sentence (3) expresses a truth or not. It is even not clear whether this
could be shown by some finite means, even if it were true. However, in this text we will
be content with sentences as soon as they can, in principle, attain some truth value
regardless of whether this truth value reflects the actual state of affairs suggested by the
sentence in question.
Sentence (4) seems a bit silly, although we could say that if Ethiopians eat Thibs then
all of them will either like Injira on it or not. Because someone may like bread, ambasha
or rus with Thibs.
Sentences (5) and (6) are fine if you happen to read French and German a bit. Thus,
declarative statements can be made in any natural, or artificial, language. The kind of
sentences we won’t consider here are non-declarative ones, like
Could you please pass me the salt?
(a) P or not Q.
Other words or phrases may occur in statements. Here's a list of some of them and how
they are translated.
Consider the word "but", for example. If I say "Tesfaye is here, but Bontu is there", I
mean that Tesfaye is here and Bontu is there. My intention is that both of the
statements should be true. That is the same as what I mean when I say "Tesfaye is
here and Bontu is there".
In practice, mathematicians tend to a small set of phrases over and over. It doesn't
make for exciting reading, but it allows the reader to concentrate on the meaning of
what is written. For example, a mathematician will usually say "if Q, then P", rather than
the logically equivalent "P whenever Q". The second statement is less familiar, and
therefore more apt to be misunderstood.
This is a good time to discuss the way the word "or" is used in mathematics. When you
say "I'll have dinner at Armahic Hotel or at Kanain Hotel", you probably mean "or" in its
exclusive sense: You'll have dinner either at Armahic Hotel or you'll have dinner at
Kanain Hotel, but not both. The "but not both" is what makes this an exclusive or.
Mathematicians use "or" in the inclusive sense. When "or" is used in this way, "I'll have
dinner at Armahic Hotel or at Kanain Hotel " means you'll have dinner at Armahic Hotel
or you'll have dinner at Kanain Hotel, or possibly both. Obviously, I'm not guaranteeing
that both will occur; I'm just not ruling it out.
Example 2: Translate the following statements into logical notation, using the
following symbols:
(a) "The Shiiroo is hot and the pizza will not be delivered."
(b) "If the laslasa is cold, then the pizza will be delivered."
(d) "If the pizza won't be delivered, then both the Shiiroo is hot and the laslasa is cold."
(e) "The laslasa isn't cold if and only if the Shiiroo isn't hot."
(g) "The Shiiroo is hot and the laslasa isn't cold, but the pizza will be delivered."
Logic format - “if it was so, it might be, and if it were so, it would be; but as it isn’t, it
aren’t. That’s logic!”
Prepositional Logic
To develop a language in which we can express sentences in such a way that brings
out their logical structure. The language we begin with is the language of propositional
logic. It is based on propositions,
Let us assign certain distinct symbols p, q, r, . . ., or sometimes p1, p2, p3, . . . to each
of these atomic sentences and we can then code up more complex sentences in a
compositional way. For example, given the atomic sentences
p: ‘I won the lottery last week.’
q: ‘I purchased a lottery ticket.’
r: ‘I won last week’s sweepstakes.’
We can form more complex sentences according to the rules below:
¬ : The negation of p is denoted by ¬ p and expresses ‘I did not win the lottery last
week,’ or equivalently ‘It is not true that I won the lottery last week.’
∨: Given p and r we may wish to state that at least one of them is true: ‘I won the lottery
last week, or I won last week’s sweepstakes;’ we denote this declarative sentence by p
∨ r and call it the disjunction of p and r.
∧: Dually, the formula p ∧ r denotes the rather fortunate conjunction of p and r: ‘Last
week I won the lottery and the sweepstakes.’
→: Last, but definitely not least, the sentence ‘If I won the lottery last week, then I
purchased a lottery ticket.’ expresses an implication between p and q, suggesting that q
is a logical consequence of p. We write p → q for that. We call p the assumption of p →
q and q its conclusion. Of course, we are entitled to use these rules of constructing
propositions repeatedly.
For example, we are now in a position to form the proposition p ∧ q → ¬r ∨ q which
means that ‘if p and q then not r or q’. You might have noticed a potential ambiguity in
this reading. One could have argued that this sentence has the structure ‘p is the case
and if q then . . .’ A computer would require the insertion of brackets, as in (p ∧ q) →
((¬r) ∨ q).
The Syntax of Propositional Logic
propositional symbols.)
3) For any two wffs W1 and W2, the following are also wffs: W1 ∧ W2, W1 ∨ W2, W1 ⇒
W2, W1 ⇔ W2 .
Rules (2) and (3) form compound wffs from other wffs.
The symbols ¬,∧,∨,⇒ and ⇔ are the connectives. Here are examples of wffs:
Propositional Logic Statements
Example
If X, then Y
X
Therefore, Y
Where 'X' is "it is raining" and 'Y' is "the road is wet", with the first two statements being
the premises and the final statement the conclusion
Logical Validity considers only the structure of the argument, not if the argument is
actually true. In the above argument, it is valid, because the structure of the argument is
true. Another argument which has the same structure is the following:
The above argument is logically valid, again, meaning that the argument is true based
only on its structure.
P: It is Saturday.
Q: It will rain.
Premises: P → Q, P
Conclusion: Q
Typically we write the argument like this:
P→Q
P
Q
Also, the pattern involved in the following logical equivalences cannot be captured by
the propositional logic:
Hence we need more powerful logic to deal with these and other problems. The
predicate logic is one of such logic and it addresses these issues among others. To
cope with deficiencies of propositional logic we introduce two new features: predicates
and quantifiers.
For example, the sentences "The car Tom is driving is blue", "The sky is blue", and "The
cover of this book is blue" come from the template "is blue" by placing an appropriate
noun/noun phrase in front of it. The phrase "is blue" is a predicate and it describes the
property of being blue. Predicates are often given a name. For example any of
"is_blue", "Blue" or "B" can be used to represent the predicate "is blue" among others. If
we adopt B as the name for the predicate "is_blue", sentences that assert an object is
blue can be represented as "B(x)", where x represents an arbitrary object.
Similarly the sentences "John gives the book to Mary", "Jim gives a loaf of bread to
Tom", and "Jane gives a lecture to Mary" are obtained by substituting an appropriate
object for variables x, y, and z in the sentence "x gives y to z". The template "... gives ...
to ..." is a predicate and it describes a relationship among three objects. This predicate
can be represented by Give( x, y, z ) or G( x, y, z ), for example.
Note: The sentence "John gives the book to Mary" can also be represented by another
predicate such as "gives a book to". Thus if we use B( x, y ) to denote this predicate,
"John gives the book to Mary" becomes B( John, Mary ). In that case, the other
sentences, "Jim gives a loaf of bread to Tom", and "Jane gives a lecture to Mary", must
be expressed with other predicates.
Examples
P(x): “x is even”.
The truth set is: {2, 4, 6, 8, 10}
5.2 Quantification
It is a Logical expression or an expression, as “all” or “some,” that indicates the quantity of
a proposition, especially a modifier, which indicates the quantity of something.
Definition: The symbol ∀ is called the universal quantifier. The universal quantification of
P(x) is the statement “P(x) for all values x in the universe”, which is written in logical
notation as:
∀xP(x)
Ways to read ∀xP(x):
For every x, P(x)
For every x, P(x) is true
For all x, P(x)
A predicate with variables is not a proposition. For example, the statement x > 1 with
variable x over the universe of real numbers is neither true nor false since we don't
know what x is. It can be true or false depending on the value of x.
For x > 1 to be a proposition either we substitute a specific number for x or change it to
something like "There is a number x for which x > 1 holds", or "For every number x, x >
1 holds".
P(x) holds", "for each x, P(x) holds" or "for every x, P(x) holds". is called the universal
quantifier, and x means all the objects x in the universe. If this is followed
by P(x) then the meaning is that P(x) is true for every object x in the universe. For
example, "All cars have wheels" could be transformed into the propositional form, x
P(x), where:
For example, let the universe be the set of airplanes and let F(x, y) denote "x flies faster
than y". Then x y F(x, y) can be translated initially as "For every airplane x the
following holds: x is faster than every (any) airplane y". In simpler English it means
"Every airplane is faster than every airplane (including itself!)".
x y F(x, y) can be read initially as "For every airplane x the following holds: for some
airplane y, x is faster than y".
In simpler English it says "There is an airplane which is faster than every airplane" or
"Some airplane is faster than every airplane".
x y F(x, y) reads "For some airplane x there exists an airplane y such that x is faster
than y", which means "Some airplane is faster than some airplane".
The following four rules describe when and how the universal and existential quantifiers
can be added to or deleted from an assertion.
Universal Instantiation:
Universal Generalization:
P(c)
---------- Where P(c) holds for every element c of the universe of discourse.
x P(x)
Existential Instantiation:
x P(x)
----------- Where c is some element of the universe of discourse, It is not arbitrary but
P(c) must be one for which P(c) is true.
Where c is some element of the universe of discourse. It is not arbitrary but must be
one for which P(c) is true.
Existential Generalization:
P(c)
---------- Where c is an element of the universe.
x P(x)
Example:
As an example of inference using these rules, let us consider the following reasoning:
A cheque is void if it has not been cashed for 30 days. This cheque has not been
cashed for 30 days. Therefore this cheque is void. You cannot cash a cheque
which is void. Therefore you cannot cash this cheque. We now have a cheque
which cannot be cashed.
This can be put into symbolic form using the following predicates assuming the universe
is the set of all objects:
C(x): x is a cheque.
V(x): x is void.
This_ cheque represents a specific object in the universe which corresponds to "this
cheque”.
T( This_cheque )
-----------------------------------
V( This_cheque )
-----------------------------------------------
S( This_cheque)
-----------------------------------------------
x [ C(x) S(x) ]
Hence
[ [ C( This_cheque) T( This_cheque) ] V( This_cheque) ]
[ C( This_cheque) T( This_cheque) ]
---------------------------------------------------------
V( This_cheque)
5.4 Knowledge Representation and Reasoning
Artificial Intelligence Cycle
Almost all AI systems have the following components in general:
Perception
Learning
Knowledge Representation and Reasoning
Planning
Execution
An AI system has a perception component that allows the system to get information
from its environment. As with human perception, this may be visual, audio or other
forms of sensory information. The system must then form a meaningful and useful
representation of this information internally. This knowledge representation maybe static
or it may be coupled with a learning component that is adaptive and draws trends from
the perceived data.
Knowledge representation (KR) and reasoning are closely coupled components which
are basically tied to each other. A representation scheme is not meaningful on its own; it
must be useful and helpful in achieve certain tasks. The same information may be
represented in many different ways, depending on how you want to use that
information. For example, in mathematics, if we want to solve problems about ratios, we
would most likely use algebra, but we could also use simple hand drawn symbols. To
say half of something, you could use 0.5 x or you could draw a picture of the object with
half of it coloured differently. Both would convey the same information but the former is
more compact and useful in complex scenarios where you want to perform reasoning
on the information. It is important at this point to understand how knowledge
representation and reasoning are interdependent components, and as AI system
designer, you have to consider this relationship when coming up with any solution.
Knowledge and Its Type
Knowledge is referred as the “Understanding of a subject area”. A well-focused subject
area is referred to as a knowledge domain, for example, medical domain, engineering
domain, business domain, etc. If we analyze the various types of knowledge we use in
everyday life, we can broadly define knowledge to be one of the following categories:
Procedural knowledge: Describes how to do things, provides a set of directions
of how to perform certain tasks, e.g., how to drive a car.
5.5 Reasoning
As we have learned knowledge representation, we will look at mechanisms to reason on
the knowledge once we have represented it using some logical scheme. Reasoning is
the process of deriving logical conclusions from given facts. Reasoning is referred as
“the process of working with knowledge, facts and problem solving strategies to draw
conclusions”.
Types of Reasoning
Deductive Reasoning
Deductive reasoning, as the name implies, is based on deducing new information
from logically related known information. A deductive argument offers assertions
that lead automatically to a conclusion.
Example: If there is dry wood, oxygen and a spark, there will be a fire.
Given: There is dry wood, oxygen and a spark
We can deduce: There will be a fire.
Inductive Reasoning
Inductive reasoning is based on forming, or inducing a ‘generalization’ from a
limited set of observations. Example:
Observation: All the crows that I have seen in my life are black.
Conclusion: All crows are black.
Thus the essential difference between deductive and inductive reasoning is that
inductive reasoning is based on experience while deductive reasoning is based
on rules, hence the latter will always be correct.
Abductive Reasoning
Deduction is exact in the sense that deductions follow in a logically provable way
from the axioms. Abduction is a form of deduction that allows for plausible
(reasonable) inference, i.e. the conclusion might be wrong, Example:
Implication: She carries an umbrella if it is raining
Axiom: she is carrying an umbrella
Conclusion: It is raining
This conclusion might be false, because there could be other reasons that she is
carrying an umbrella, for instance she might be carrying it to protect herself from
the sun.
Analogical Reasoning
Analogical reasoning works by drawing analogies between two situations, looking
for similarities and differences, e.g. when you say driving a truck is just like
driving a car, by analogy you know that there are some similarities in the driving
mechanism, but you also know that there are certain other distinct characteristics
of each.
Common-Sense Reasoning
Common-sense reasoning is an informal form of reasoning that uses rules
gained through experience or what we call rules-of-thumb. It operates on
heuristic knowledge and heuristic rules.
Non-Monotonic Reasoning
Non-Monotonic reasoning is used when the facts of the case are likely to change
after some time, Example:
Rule: IF the wind blows
THEN the curtains sway
When the wind stops blowing, the curtains should sway no longer. However, if
we use monotonic reasoning, this would not happen. The fact that the curtains
are swaying would be retained even after the wind stopped blowing. In non
monotonic reasoning, we have a “truth maintenance system”. It keeps track of
what caused a fact to become true. If the cause is removed, that fact is removed
(retracted) also.
5.7 Facts
Facts are a basic block of knowledge (the atomic units of knowledge). They represent
declarative knowledge (they declare knowledge about objects). A proposition is the
statement of a fact. Each proposition has an associated truth value. It may be either true
or false. In AI, to represent a fact, we use a proposition and its associated truth value,
for example:
Proposition A: It is raining
Proposition B: I have an umbrella
Proposition C: I will go to school
Types of Facts
Single Valued or Multiple Valued
Facts may be single-valued or multi-valued, where each fact (attribute) can take one or
more than one values at the same time, for example an individual can only have one
eye color, but may have many cars. So the value of attribute cars may contain more
than one value.
Uncertain Facts
Sometimes we need to represent uncertain information in facts. These facts are called
uncertain facts, e.g. it will probably be sunny today. We may chose to store numerical
certainty values with such facts that tell us how much uncertainty there is in the fact.
Fuzzy Facts
Fuzzy facts are ambiguous in nature, e.g. the book is heavy/light. Here it is unclear what
heavy means because it is a subjective description. Fuzzy representation is used for
such facts. While defining fuzzy facts, we use certainty factor values to specify value of
“truth”. We will look at fuzzy representation in more detail later.
Object Attribute Value Triplets
Object-Attribute Value Triplets or OAV triplets are a type of fact composed of three
parts; object, attribute and value. Such facts are used to assert a particular property of
some object.
Example 1: Khalid’s Eye Color is Brown.
Object: Khalid
Attribute: Eye Color
Value: Brown
Example 2: Khalid’s daughter is Sara.
Object: Khalid
Attribute: Daughter
Value: Sara
domain, as well as most other judgmental domains: law, business, design, automobile
repair, gardening, dating, and so on. The agent’s knowledge can at best provide only a
degree of belief in the relevant sentences. Our main tool for dealing with degrees of
belief is probability theory.
5.9 Probabilistic Reasoning
Probabilistic reasoning is the concept of using logic and probability to handle uncertain
situations. An example of Probabilistic Reasoning is using past situations and statistics
to predict an outcome.
Bayes Theorem
One of the most significant developments in the probability field has been the
development of Bayesian decision theory which has proved to be of immense help in
making decisions under uncertain conditions. The Bayes Theorem was developed by a
British Mathematician Rev. Thomas Bayes. The probability given under Bayes theorem
is also known by the name of inverse probability, posterior probability or revised
probability. This theorem finds the probability of an event by considering the given
sample information; hence the name posterior probability.
Bayes theorem describes the probability of an event based on other information that
might be relevant. Essentially, you are estimating a probability, but then updating that
estimate based on other things that you know. This is something that you already do
every day in real life. For instance, if your friend is supposed to pick you up to go out to
dinner, you might have a mental estimate of if she will be on time, be 15 minutes late, or
be a half hour late. That would be your starting probability. Bayes theorem is a formal
way of doing that.
The equation for Bayes theorem is:
Where,
A & B are events.
P(A) and P(B) are the probabilities of A and B without regard for each other.
P(A|B) is the conditional probability, the probability of A given that B is true.
P(B|A) is the probability of B given that A is true.
Example 1:
Suppose that a test for using a particular drug is 99% sensitive and 99% specific. That
is, the test will produce 99% true positive results for drug users and 99% true negative
results for non-drug users. Suppose that 0.5% of people are users of the drug. What is
the probability that a randomly selected individual with a positive test is a drug user?
Even if an individual tests positive, it is more likely that they do not use the drug than
that they do. This is because the number of non-users is large compared to the number
of users. The number of false positives outweighs the number of true positives. For
example, if 1000 individuals are tested, there are expected to be 995 non-users and 5
users. From the 995 non-users, 0.01 × 995 ≃ 10 false positives are expected. From the
5 users, 0.99 × 5 ≈ 5 true positives are expected. Out of 15 positive results, only 5 are
genuine.
The importance of specificity in this example can be seen by calculating that even if
sensitivity is raised to 100% and specificity remains at 99% then the probability of the
person being a drug user only rises from 33.2% to 33.4%, but if the sensitivity is held at
99% and the specificity is increased to 99.5% then the probability of the person being a
drug user rises to about 49.9%.
Example 2:
The entire output of a factory is produced on three machines. The three machines
account for 20%, 30%, and 50% of the factory output. The fraction of defective items
produced is 5% for the first machine; 3% for the second machine; and 1% for the third
machine. If an item is chosen at random from the total output and is found to be
defective, what is the probability that it was produced by the third machine?
Once again, the answer can be reached without recourse to the formula by applying the
conditions to any hypothetical number of cases. For example, if 100,000 items are
produced by the factory, 20,000 will be produced by Machine A, 30,000 by Machine B,
and 50,000 by Machine C. Machine A will produce 1000 defective items, Machine B
900, and Machine C 500. Of the total 2400 defective items, only 500, or 5/24 were
produced by Machine C.
A solution is as follows. Let Xi denote the event that a randomly chosen item was made
by the i th machine (for i = A,B,C). Let Y denote the event that a randomly chosen item is
defective. Then, we are given the following information:
If the item was made by the first machine, then the probability that it is defective is 0.05;
that is, P(Y | XA) = 0.05. Overall, we have
To answer the original question, we first find P(Y). That can be done in the following
way:
We are given that Y has occurred, and we want to calculate the conditional probability
of XC. By Bayes' theorem,
Given that the item is defective, the probability that it was made by the third machine is
only 5/24. Although machine C produces half of the total output, it produces a much
smaller fraction of the defective items. Hence the knowledge that the item selected was
defective enables us to replace the prior probability P(XC) = 1/2 by the smaller posterior
probability P(XC | Y) = 5/24.
The Hidden Markov Model (HMM) provides a framework for modelling daily rainfall
occurrences and amounts on multi-site rainfall networks. The HMM fits a model to
observed rainfall records by introducing a small number of discrete rainfall states. These
states allow a diagnostic interpretation of observed rainfall variability in terms of a few
rainfall patterns. They are not directly observable, or 'hidden' from the observer.
The time sequence of which state is active on each day follows a Markov chain. Thus,
the state which is active 'today' depends only on the state which was active 'yesterday'
according to transition probabilities.
The HMM also allows you to simulate rainfall at each of the station locations, such that
key statistical properties (eg. rainfall probabilities, dry/wet spell lengths) of the simulated
rainfall match those of the observed rainfall records. This can be useful for generating
large numbers of synthetic realizations of rainfall for input into statistical analysis, or
input into a crop simulation model.
Example:
Goal: Be in Campus at 8:20 AM to give a lecture.
There are several plans that achieve the goal:
P1: Get up at 7:00, take the bus at 7:30 AM, arrive at 8:00.
P2: Get up at 7:30 AM, walk to campus and arrive at 8:00.
All these plans are correct, but
They imply different costs and different probabilities of actually achieving the
goal.
P2 eventually is the plan of choice, since giving a lecture is very important, and
the success rate of P1 is only 90-95%.
If the environment were deterministic, a solution would be easy: the agent will
always reach +1 with moves [U, U, R, R, R]
Because actions are unreliable, a sequence of moves will not always lead to the
desired outcome
Let each action achieve the intended effect with probability 0.8 but with probability
0.1 the action moves the agent to either of the right angles to the intended direction
If the agent bumps into a wall, it stays in the same square
Now the sequence [U, U, R, R, R] leads to the goal state with probability 0.85 =
0.32768
In addition, the agent has a small chance of reaching the goal by accident going the
other way around the obstacle with a probability 0.14 × 0.8, for a grand total of
0.32776
A transition model specifies outcome probabilities for each action in each possible
state
Let P(s’ | s, a) denote the probability of reaching state s' if action a is done in state s
The transitions are Markovian in the sense that the probability of reaching s’
depends only on s and not the earlier states
We still need to specify the utility function for the agent
The decision problem is sequential, so the utility function depends on a sequence of
states — an environment history — rather than on a single state
For now, we will simply stipulate that is each state s, the agent receives a reward
R(s), which may be positive or negative
For our particular example, the reward is -0.04 in all states except in the terminal
states
The utility of an environment history is just (for now) the sum of rewards received
If the agent reaches the state +1, e.g., after ten steps, its total utility will be 0.6
The small negative reward gives the agent an incentive to reach [4, 3] quickly
A sequential decision problem for a fully observable environment with
A Markovian transition model and
Additive rewards
is called a Markov decision problem (MDP)
Chapter 6 – Learning
6.1 Introduction to Learning
Learning can be described as normally a relatively permanent change that occurs in
behavior as a result of experience. Learning occurs in various regimes. For example, it
is possible to learn to open a lock as a result of trial and error; possible to learn how to
use a word processor as a result of following particular instructions.
Once the internal model of what ought to happen is set, it is possible to learn by
practicing the skill until the performance converges on the desired model. One begins
by paying attention to what needs to be done, but with more practice, one will need to
monitor only the trickier parts of the performance. Automatic performance of some skills
by the brain points out that the brain is capable of doing things in parallel i.e. one part is
devoted to the skill whilst another part mediates conscious experience. The definition of
Learning is as follow:
"Learning denotes changes in a system that ... enables a system to do the same
task more efficiently the next time." By Herbert Simon.
"Learning is constructing or modifying representations of what is being
experienced." By Ryszard Michalski.
"Learning is making useful changes in our minds." By Marvin Minsky.
The goal of machine learning is to build computer systems that can learn from their
experience and adapt to their environments. Obviously, learning is an important aspect
or component of intelligence. There are both theoretical and practical reasons to
support such a claim. Some people even think intelligence is nothing but the ability to
learn, though other people think an intelligent system has a separate "learning
mechanism" which improves the performance of other mechanisms of the system.
In simple words, we can say that machine learning is the competency of the software to
perform a single or series of tasks intelligently without being programmed for those
activities. This is part of Artificial Intelligence. Normally, the software behaves the way
the programmer programmed it; while machine learning is going one step further by
making the software capable of accomplishing intended tasks by using statistical
analysis and predictive analytics techniques.
These phases may not be distinct. For example, there may not be an explicit validation
phase; instead, the learning algorithm guarantees some form of correctness. Also in
some circumstances, systems learn "on the job", that is, the training and application
phases overlap.
Inductive Learning
Inductive learning takes examples and generalizes rather than starting with existing
knowledge. For example, having seen many cats, all of which have tails, one might
conclude that all cats have tails. This is an unsound step of reasoning but it would be
impossible to function without using induction to some extent. In many areas it is an
explicit assumption. There is scope of error in inductive reasoning, but still it is a useful
technique that has been used as the basis of several successful systems.
One major subclass of inductive learning is concept learning. This takes examples of a
concept and tries to build a general description of the concept. Very often, the examples
sare described using attribute-value pairs. The example of inductive learning given here
is that of a fish. Look at the table below:
Swims No No Yes No
Is a Fish No No No No
In the above example, there are various ways of generalizing from examples of fish and
non-fish. The simplest description can be that a fish is something that does not have
lungs. No other single attribute would serve to differentiate the fish.
This is just the glimpse of the applications that use some intelligent learning
components. The current era has applied learning in the domains ranging from
agriculture to astronomy to medical sciences.
General Model of Learning Agent (Pattern Recognition)
Any given learning problem is primarily composed of three things:
Input
Processing unit
Output
The input is composed of examples that help the learner learn the underlying problem
concept. Suppose we were to build the learner for recognizing spoken digits. We would
ask some of our friends to record their sounds for each digit [0 to 9]. Positive examples
of digit ‘1’ would be the spoken digit ‘1’, by the speakers. Negative examples for digit ‘1’
would be all the rest of the digits. For our learner to learn the digit ‘1’, it would need
positive and negative examples of digit ‘1’ in order to truly learn the difference between
digits ‘1’ and the rest. The processing unit is the learning agent in our focus of study.
6.4 Symbol Based Learning
Ours is a world of symbols. We use symbolic interpretations to understand the world
around us. For instance, if we saw a ship, and were to tell a friend about its size, we will
not say that we saw a 254.756 meters long ship, instead we’d say that we saw a ‘huge’
ship about the size of ‘Eiffel tower’. And our friend would understand the relationship
between the size of the ship and its hugeness with the analogies of the symbolic
information associated with the two words used: ‘huge’ and ‘Eiffel tower’.
Similarly, the techniques we are to learn now use symbols to represent knowledge and
information. Let us consider a small example to help us see where we’re headed. What
if we were to learn the concept of a GOOD STUDENT? We would need to define, first of
all some attributes of a student, on the basis of which we could tell apart the good
student from the average. Then we would require some examples of good students and
average students. To keep the problem simple we can label all the students who are
“not good” (average, below average, satisfactory, bad) as NOT GOOD STUDENT. Let’s
say we choose two attributes to define a student: grade and class participation. Both the
attributes can have either of the two values: High, Low. Our learner program will require
some examples from the concept of a student, for instance:
As you can see the system is composed of symbolic information, based on which the
learner can even generalize that a student is a GOOD STUDENT if his/her grade is
high, even if the class participation is low:
Student (GOOD STUDENT): Grade (High) ^ Class Participation (?)
This is the final rule that the learner has learnt from the enumerated examples. Here the
‘?’ means that the attribute class participation can have any value, as long as the grade
is high.
6.5 Problem and Problem Spaces
In theoretical computer science there are two main branches of problems: Tractable
and Intractable. Those problems that can be solved in polynomial time are termed as
tractable; the other half is called intractable. The tractable problems are further divided
into structured and complex problems. Structured problems are those which have
defined steps through which the solution to the problem is reached. Complex problems
usually don’t have well-defined steps. Machine learning algorithms are particularly more
useful in solving the complex problems like recognition of patterns in images or speech,
for which it’s hard to come up with procedural algorithms. The solution to any problem is
a function that converts its inputs to corresponding outputs. To understand this concept
let us discuss this example:
Let us consider the domain of HEALTH. The problem in this case is to distinguish
between a sick and a healthy person. Suppose we have some domain knowledge;
keeping a simplistic approach, we say that two attributes are necessary and sufficient to
declare a person as healthy or sick. These two attributes are: Temperature (T) and
Blood Pressure (BP). Any patient coming into the hospital can have three values for T
and BP: High (H), Normal (N) and Low (L). Based on these values, the person is to be
classified as Sick (SK). SK is a Boolean concept, SK = 1 means the person is sick and
SK = 0 means person is healthy. So the concept to be learnt by the system is of Sick,
i.e., SK=1.
Instance Space
How many distinct instances can the concept sick have? Since there are two attributes:
T and BP, each having 3 values, there can be a total of 9 possible distinct instances in
all. If we were to list these, we’ll get the following table:
X T BP SK
X1 L L -
X2 L N -
X3 L H -
X4 N L -
X5 N N -
X6 N H -
X7 H L -
X8 H N -
X9 H H -
This is the entire instance space, denoted by X, and the individual instances are
denoted by xi. |X| gives us the size of the instance space, which in this case are 9.
|X| = 9
The set X is the entire data possibly available for any concept. However, sometimes in
real world problems, we don’t have the liberty to have access to the entire set X, instead
we have a subset of X, known as training data, denoted by D, available to us, on the
basis of which we make our learner learn the concept.
Concept Space
A concept is the representation of the problem with respect to the given attributes, for
example, if we’re talking about the problem scenario of concept SICK defined over the
attributes T and BP, then the concept space is defined by all the combinations of values
of SK for every instance x. One of the possible concepts for the concept SICK might be
listed in the following table:
X T BP SK
X1 L L 0
X2 L N 0
X3 L H 1
X4 N L 0
X5 N N 0
X6 N H 1
X7 H L 1
X8 H N 1
X9 H H 1
But there are a lot of other possibilities besides this one. The question is: how many
total concepts can be generated out of this given situation. The answer is: 2 |x|.
Hypothesis Space
In this space the learner has to apply some hypothesis, which has either a search or the
language bias to reduce the size of the concept space. This reduced concept space
becomes the hypothesis space. For example, the most common language bias is that
the hypothesis space uses the conjunctions (AND) of the attributes, i.e. H = <T, BP>
H is the denotive representation of the hypothesis space; here it is the conjunction of
attribute T and BP. If written in English it would mean:
H = <T, BP>:
IF “Temperature” = T AND “Blood Pressure” = BP
THEN
H=1
ELSE
H=0
Version Space and Searching
Version space is a set of all the hypotheses that are consistent with all the training
examples. When we are given a set of training examples D, it is possible that there
might be more than one hypothesis from the hypothesis space that are consistent with
all the training examples. By consistent we mean h(xi) = C(xi). That is, if the true output
of a concept [c(xi)] is 1 or 0 for an instance, then the output by our hypothesis [h(xi)] is 1
or 0 as well, respectively. If this is true for every instance in our training set D, we can
say that the hypothesis is consistent.
X T BP SK
X1 H H 1
X2 L L 0
X3 N N 0
sphere of knowledge and access to data in its local memory. Typically, a neural network
is initially "trained" or fed large amounts of data and rules about data relationships (for
example, "A grandfather is older than a person's father"). A program can then tell the
network how to behave in response to an external stimulus (for example, to input from a
computer user who is interacting with the network) or can initiate activity on its own
(within the limits of its access to the external world).
In making determinations, neural networks use several principles, including gradient-
based training, fuzzy logic, genetic algorithms, and Bayesian methods. Neural networks
are sometimes described in terms of knowledge layers, with, in general, more complex
networks having deeper layers. Current applications of neural networks include: oil
exploration data analysis, weather prediction, the interpretation of nucleotide sequences
in biology labs, and the exploration of models of thinking and consciousness.
The term neural network was traditionally used to refer to a network or circuit of
biological neurons. The modern usage of the term often refers to artificial neural
networks, which are composed of artificial neurons or nodes. Thus the term has two
distinct usages:
Biological neural networks are made up of real biological neurons that are
connected or functionally related in a nervous system. In the field of neuro-
science, they are often identified as groups of neurons that perform a specific
physiological function in laboratory analysis.
Artificial neural networks are composed of interconnecting artificial neurons
(programming constructs that mimic the properties of biological neurons).
Artificial neural networks may either be used to gain an understanding of
biological neural networks, or for solving artificial intelligence problems without
necessarily creating a model of a real biological system. The real, biological
nervous system is highly complex: artificial neural network algorithms attempt to
abstract this complexity and focus on what may hypothetically matter most from
an information processing point of view. Good performance (e.g. as measured by
good predictive ability, low generalization error), or performance mimicking
animal or human error patterns, can then be used as one source of evidence
towards supporting the hypothesis that the abstraction really captured something
important from the point of view of information processing in the brain. Another
While this clearly shows that the human information processing system is superior to
conventional computers, but still it is possible to realize an artificial neural network
which exhibits the above mentioned properties.
Directions and Classifications of Neural Network
The Neural Networks can roughly be classified into four different types of orientations
which includes its applications to real-world problems.
In Cognitive science/Artificial intelligence, the interest is in modeling
intelligent behavior. The interest in neural networks is mostly to overcome the
problems and pitfalls of classical and symbolic methods of modeling intelligence.
Neurobiological modeling has the goal to develop models of biological
neurons. Here, the exact properties of the neurons play an essential role. In most
of these models there is a level of activation of an individual neuron.
Scientific modeling uses neural networks as modeling tools. In physics,
psychology, and sociology neural networks have been successfully applied.
Computer science views neural networks as an interesting class of algorithms
that has properties like noise and fault tolerance, and generalization ability– that
make them suited for application to real-world problems.
Learning Rule
Weights are modified by learning rules. The learning rules determine how experiences
of a network use their influence on its future behavior. There are basically three types of
learning rules.
Supervised Learning Rule: The term supervised is used both in general as well
as narrow technical sense. In the narrow technical sense supervised means the
following. If for a certain input the corresponding output is known, the network is