AI Unit 3&4
AI Unit 3&4
UNIT – III:
Strong slot and Filter structures, Conceptual Dependencies, Scripts. Introduction to Non
monotonic reasoning ,Logics for Non monotonic reasoning, Implementation : Depth First Search,
Dependency-Directed Back Tracking, Justification based Truth Maintenance System, Logic
based Truth Maintenance Systems, Statistical Reasoning, Probability and Bayes
Theorem,Certainty factors, Rule based Systems, Beyesian Networks, Dempster-Shaffer Theory.
UNIT – IV:
Minmax search, alpha-beta cutoffs, Planning system, Goal stack planning, Hierarchical Planning,
Natural Language Processing., Syntactic Analysis, Semantic Analysis, Discuses and Pragmatic
Processing. Introduction and Fundamentals of Artificial Neural Networks, Biological Prototype,
Artificial Neuron, Single Layer Artificial Neural Networks, Multilayer Artificial Neural
Networks, Training of Artificial Neural Networks
Unit-III
Strong Slot and Filler Structures
Strong Slot and Filler Structures typically:
It has been used by many programs that portend to understand English (MARGIE,
SAM, PAM). CD provides:
Sentences are represented as a series of diagrams depicting actions using both abstract
and real physical situations.
ATRANS
-- Transfer of an abstract relationship. e.g. give.
PTRANS
-- Transfer of the physical location of an object. e.g. go.
PROPEL
-- Application of a physical force to an object. e.g. push.
MTRANS
-- Transfer of mental information. e.g. tell.
MBUILD
-- Construct new information from old. e.g. decide.
SPEAK
-- Utter a sound. e.g. say.
ATTEND
-- Focus a sense on a stimulus. e.g. listen, watch.
MOVE
-- Movement of a body part by owner. e.g. punch, kick.
GRASP
-- Actor grasping an object. e.g. clutch.
INGEST
-- Actor ingesting an object. e.g. eat.
EXPEL
-- Actor getting rid of an object from body. e.g. ????.
Six primitive conceptual categories provide building blocks which are the set of
allowable dependencies in the concepts in a sentence:
PP
-- Real world objects.
ACT
-- Real world actions.
PA
-- Attributes of objects.
AA
-- Attributes of actions.
T
-- Times.
LOC
-- Locations.
o
-- object.
R
-- recipient-donor.
I
-- instrument e.g. eat with a spoon.
D
-- destination e.g. going home.
Double arrows ( ) indicate two-way links between the actor (PP) and action
(ACT).
The actions are built from the set of primitive acts (see above).
o These can be modified by tense etc.
p
-- past
f
-- future
t
-- transition
-- start transition
-- finished transition
k
-- continuing
?
-- interrogative
/
-- negative
delta
-- timeless
c
-- conditional
the absence of any modifier implies the present tense.
The has an object (actor), PP and action, ACT. I.e. PP ACT. The triplearrow (
) is also a two link but between an object, PP, and its attribute, PA. I.e. PP PA.
It represents isa type dependencies. E.g
Primitive states are used to describe many state descriptions such as height, health,
mental state, physical state.
There are many more physical states than primitive actions. They use a numeric scale.
E.g. John height(+10) John is the tallest John height(< average) John is
short Frank Zappa health(-10) Frank Zappa is dead Dave mental_state(-
10) Dave is sad Vase physical_state(-10) The vase is broken
You can also specify things like the time of occurrence in the relation ship.
Now let us consider a more complex sentence: Since smoking can kill you, I stopped
Lets look at how we represent the inference that smoking can kill:
Advantages of CD:
Disadvantages of CD:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
Applications of CD:
MARGIE
(Meaning Analysis, Response Generation and Inference on English) -- model
natural language understanding.
SAM
(Script Applier Mechanism) -- Scripts to understand stories.
PAM
(Plan Applier Mechanism) -- Scripts to understand stories.
2. Scripts
A script is a structure that prescribes a set of circumstances which could be expected
to follow on from one another.
Entry Conditions
-- these must be satisfied before events in the script can occur.
Results
-- Conditions that will be true after events in script occur.
Props
-- Slots representing objects involved in events.
Roles
-- Persons involved in the events.
Track
-- Variations on the script. Different tracks may share components of the same
script.
Scenes
-- The sequence of events that occur. Events are represented in conceptual
dependency form.
Scripts are useful in describing certain situations such as robbing a bank. This might
involve:
Getting a gun.
Hold up a bank.
Escape with the money.
Gun, G.
Loot, L.
Bag, B
Get away car, C.
Robber, S.
Cashier, M.
Bank Manager, O.
Policeman, P.
S is poor.
S is destitute.
Advantages of Scripts:
Disadvantages:
Note: Students are advised to follow Conceptual Dependency (CD)[ all 14 rules] and Script for
Riche Khight AI book.
Questions
1. Construct CD representation of the following:
1. John begged Mary for a pencil.
2. Jim stirred his coffee with a spoon.
3. Dave took the book off Jim.
4. On my way home, I stopped to fill my car with petrol.
5. I heard strange music in the woods.
6. Drinking beer makes you drunk.
7. John killed Mary by strangling her.
8. .
2. What is a CD ? Explain CD with examples.
3. Write the script of 1) Supper market 2) Eating in a restaurant
4. Write a script for enrolling as a student.
3. Introduction to Non monotonic reasoning
Reasoning:
The reasoning is the mental process of deriving logical conclusion and making predictions from
available knowledge, facts, and beliefs. Or we can say, "Reasoning is a way to infer facts from
existing data." It is a general process of thinking rationally, to find valid conclusions.
Reasoning is the act of deriving a conclusion from certain premises using a given methodology.
■ Any knowledge system must reason, if it is required to do something which has not been told
explicitly .
■ For reasoning, the system must find out what it needs to know from what it already knows.
Example :
In artificial intelligence, the reasoning is essential so that the machine can also think
rationally as a human brain, and can perform like a human.
Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:
o Monotonic Reasoning
o Non-monotonic Reasoning
1. Monotonic Reasoning:
In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we
add some other information to existing information in our knowledge base.
In monotonic reasoning, adding knowledge does not decrease the set of prepositions that can be
derived.
To solve monotonic problems, we can derive the valid conclusion from the available facts only,
and it will not be affected by new facts.
Monotonic reasoning is not useful for the real-time systems, as in real time, facts get changed, so
we cannot use monotonic reasoning.
Example:
It is a true fact, and it cannot be changed even if we add another sentence in knowledge base like,
"The moon revolves around the earth" Or "Earth is not round," etc.
2. Non-monotonic Reasoning
Logic will be said as non-monotonic if some conclusions can be invalidated by adding more
knowledge into our knowledge base.
"Human perceptions for various things in daily life, "is a general example of non-monotonic
reasoning.
Example: Let suppose the knowledge base contains the following knowledge:
So from the above sentences, we can conclude that Twetty can fly.
However, if we add one another sentence into knowledge base " Twetty is a penguin", which
concludes " Twetty cannot fly", so it invalidates the above conclusion.
o For real-world systems such as Robot navigation, we can use non-monotonic reasoning.
o In Non-monotonic reasoning, we can choose probabilistic facts or can make assumptions.
o In non-monotonic reasoning, the old facts may be invalidated by adding new sentences.
o It cannot be used for theorem proving.
■ Default reasoning
■ Circumscription
■ Truth Maintenance Systems
Default Reasoning
This is a very common from of non-monotonic reasoning. The conclusions are drawn based on
what is most likely to be true.
There are two approaches, both are logic type, to Default reasoning : one is Non-monotonic logic
and the other is Default logic.
where aba is an atom that means abnormal with respect to some aspect a.
Given c, the agent can infer b unless it is told aba. Adding aba to the knowledge
base can prevent the conclusion of b. Rules that imply aba can be used to
prevent the default under the conditions of the body of the rule.)
■ Non-monotonic logic
It has already been defined. It says, "the truth of a proposition may change when new
information (axioms) are added and a logic may be build to allows the statement to be retracted."
Non-monotonic logic is predicate logic with one extension called modal operator M which
means “consistent with everything we know”.
The purpose of M is to allow consistency. A way to define consistency with PROLOG notation
is :
To show that fact P is true, we attempt to prove ¬P.
If we fail we may say that P is consistent since ¬P is false.
Example :
∀x : plays_instrument(x) ∧ M manage(x) → jazz_musician(x)
States that for all x, the x plays an instrument and if the fact that x can manage is consistent with
all other knowledge then we can conclude that x is a jazz musician.
Default logic
where
A is known as the prerequisite,
B as the justification, and
C as the consequent.
" if A, and if it is consistent with the rest of what is known to assume that B, then conclude that
C ".
‡ The rule says that given the prerequisite, the consequent can be inferred, provided it is
consistent with the rest of the data.
Circumscription
Circumscription is a rule of conjecture that allows you to jump to the conclusion that the objects
you can show that posses a certain property, p, are in fact all the objects that posses that property.
Circumscription is a formalized rule of conjecture (guess) that can be used along with the rules
of inference of first order logic.
Circumscription involves formulating rules of thumb with "abnormality" predicates and then
restricting the extension of these predicates, circumscribing them, so that they apply to only
those things to which they are currently known.
The rule of thumb is that "birds typically fly" is conditional. The predicate "Abnormal" signifies
abnormality with respect to flying ability.
Observe that the rule ∀x(Bird(x) & ¬ Abnormal(x) → Flies)) does not allow us to infer that
"Tweety flies", since we do not know that he is abnormal with respect to flying ability.
But if we add axioms which circumscribe the abnormality predicate to which they are currently
known say "Bird Tweety" then the inference can be drawn. This inference is non-monotonic.
Implementations: Truth Maintenance Systems
When a problem solving system gives an answer to a user's query, an explanation of that answer
is required;
Example : An advice to a stockbroker is supported by an explanation of the reasons for that
advice. This is constructed by the Inference Engine (IE) by tracing the justification of the
assertion.
■ Recognize inconsistencies
The Inference Engine (IE) may tell the TMS that some sentences are contradictory. Then, TMS
may find that all those sentences are believed true, and reports to the IE which can eliminate the
inconsistencies by determining the assumptions used and changing them appropriately.
Example : A statement that either Abbott, or Babbitt, or Cabot is guilty together with other
statements that Abbott is not guilty, Babbitt is not guilty, and Cabot is not guilty, form a
contradiction.
■ Support default reasoning
In the absence of any firm knowledge, in many situations we want to reason from default
assumptions.
Example : If "Tweety is a bird", then until told otherwise, assume that "Tweety flies" and for
justification use the fact that "Tweety is a bird" and the assumption that "birds fly".
Basically TMSs:
The RS provides the RMS with information about each inference it performs, and in return the
RMS provides the RS with information about the whole set of inferences. Several
implementations of RMS have been proposed for non-monotonic reasoning. The important ones
are the :
The TMS maintains the consistency of a knowledge base as soon as new knowledge is added. It
considers only one state at a time so it is not possible to manipulate environment.
The ATMS is intended to maintain multiple environments.
This is a simple TMS in that it does not know anything about the structure of the
assertions themselves.
Each supported belief (assertion) in has a justification.
Each justification has two parts:
o An IN-List -- which supports beliefs held.
o An OUT-List -- which supports beliefs not held.
Nodes (assertions) assume no relationships among them except ones explicitly stated in
justifications.
JTMS can represent P and P simultaneously. An LTMS would throw a contradiction
here.
If this happens network has to be reconstructed.
JTMS and LTMS pursue a single line of reasoning at a time and backtrack (dependency-
directed) when needed -- depth first search.
ATMS maintain alternative paths in parallel -- breadth-first search
Backtracking is avoided at the expense of maintaining multiple contexts.
However as reasoning proceeds contradictions arise and the ATMS can be pruned
o Simply find assertion with no valid justification.
Statistical reasoning
Statistical
reasoning
Statistical reasoning is the way people reason with statistical ideas and make
sense of statistical information. Statistical reasoning may involve connecting
one concept to another (e.g., center and spread) or may combine ideas about
data and chance. Reasoning means understanding and being able to explain
statistical processes, and being able to fully interpret statistical results.
To read more about statistical reasoning see Garfield (2002).
Symbolic versus statistical reasoning
True,
False, or
Neither True nor False.
Incomplete Knowledge
Contradictions in the knowledge.
Statistical methods provide a method for representing beliefs that are not certain (or uncertain)
but for which there may be some supporting (or contradictory) evidence.
Genuine Randomness
-- Card games are a good example. We may not be able to predict any outcomes with
certainty but we have knowledge about the likelihood of certain items (e.g. like being
dealt an ace) and we can exploit this.
Exceptions
-- Symbolic methods can represent this. However if the number of exceptions is large
such system tend to break down. Many common sense and expert reasoning tasks for
example. Statistical techniques can summarise large exceptions without resorting
enumeration.
In the logic based approaches described, we have assumed that everything is either believed false
or believed true.
However, it is often useful to represent the fact that we believe such that something is probably
true, or true with probability (say) 0.65.
This is useful for dealing with problems where there is randomness and unpredictability (such as
in games of chance) and also for dealing with problems where we could, if we had sufficient
information, work out exactly what is true.
To do all this in a principled way requires techniques for probabilistic reasoning.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the
real world.
Posterior Probability: The probability that is calculated after all evidence or information
has taken into account. It is a combination of prior probability and new information.
■ Probability Experiment :
Process which leads to well-defined results call outcomes.
■ Independent Events :
Two events, E1 and E2, are independent if the fact that E1 occurs does not affect the probability
of E2 occurring.
■ Mutually Exclusive Events :
Events E1, E2, ..., En are said to be mutually exclusive if the occurrence of any one of them
automatically implies the non-occurrence of the remaining n − 1 events.
■ Disjoint Events :
Another name for mutually exclusive events.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already
happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the
probability of A under the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given
as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is
already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who
likes English and mathematics, and then what is the percent of students those who like
English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
■ Joint probability :
The probability of two events in conjunction. It is the probability of both events together. The
joint probability of A and B is written P(A ∩ B) ; also written as P(A, B).
■ Marginal Probability :
The probability of one event, regardless of the other event. The marginal probability of A is
written P(A), and the marginal probability of B is written P(B).
Mutually Exclusive Events (disjoint) : means nothing in common Two events are mutually
exclusive if they cannot occur at the same time.
(a) If two events are mutually exclusive, then probability of both occurring at same time
is P(A and B) = 0
(b) If two events are mutually exclusive , then the probability of either occurring is P(A
or B) = P(A) + P(B)
Given P(A)= 0.20, P(B)= 0.70, where A and B are disjoint then P(A and B) = 0
Bayes' theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can
determine the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic
of most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability
of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule
can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff
neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13
Example-3
Problem : Marie's marriage is tomorrow, in recent years, each year it has rained only 5 days.
The weatherman has predicted rain for tomorrow, when it actually rains, the weatherman
correctly forecasts rain 90% of the time. When it doesn't rain, the weatherman incorrectly
forecasts rain 10% of the time.
The question : What is the probability that it will rain on the day of Marie's wedding?
Solution : The sample space is defined by two mutually exclusive events – "it rains" or "it does
not rain". Additionally, a third event occurs when the "weatherman predicts rain".
The events and probabilities are stated below.
◊ Event A1 : rains on Marie's wedding.
◊ Event A2 : does not rain on Marie's wedding
◊ Event B : weatherman predicts rain.
◊ P(A1) = 5/365 =0.0136985 [Rains 5 days in a year.]
◊ P(A2) = 360/365 = 0.9863014 [Does not rain 360 days in a year.]
◊ P(B|A1) = 0.9 [When it rains, the weatherman predicts rain 90% time.]
◊ P(B|A2) = 0.1 [When it does not rain, weatherman predicts rain 10% time.]
We want to know P(A1|B), the probability that it will rain on the day of Marie's wedding, given
a forecast for rain by the weatherman.
The answer can be determined from Bayes' theorem, shown below.
P(A1).P(B|A1) (0.014)(0.9)
P(A1|B) = P(A1).P(B|A1)+P(A2).P(B|A2) [(0.014)(0.9)+(0.986)(0.1)]
= 0.111
So, despite the weatherman's prediction, there is a good chance that Marie will not get rain on at
her wedding.
Thus Bayes theorem is used to calculate conditional probabilities
o It is used to calculate the next step of the robot when the already executed step is
given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
Thus Simple Bayes rule-based systems are not suitable for uncertain reasoning.
However, Bayesian statistics still provide the core to reasoning in many uncertain reasoning
systems with suitable enhancement to overcome the above problems.
Certainty factors,
Dempster-Shafer models,
Bayesian networks.
This approach has been suggested by Shortliffe and Buchanan and used in their famous medical
diagnosis MYCIN system.
MYCIN is essentially and expert system. Here we only concentrate on the probabilistic reasoning
aspects of MYCIN.
■ A rule is an expression of the form "if A then B" where A is an assertion and B can be either
an action or another assertion.
Example : Trouble shooting of water pumps
1. If pump failure then the pressure is low
2. If pump failure then check oil level
3. If power failure then pump failure
■ Rule based system consists of a library of such rules.
■ Rules reflect essential relationships within the domain.
■ Rules reflect ways to reason about the domain.
■ Rules draw conclusions and points to actions, when specific information about the
domain comes in. This is called inference.
■ The inference is a kind of chain reaction like : If there is a power failure then (see rules
1, 2, 3 mentioned above)
Rule 3 states that there is a pump failure, and
Rule 1 tells that the pressure is low, and
Rule 2 gives a (useless) recommendation to check the oil level.
■ It is very difficult to control such a mixture of inference back and forth in the same
session and resolve such uncertainties
A problem with rule-based systems is that often the connections reflected by the rules are not
absolutely certain (i.e. deterministic), and the gathered information is often subject to
uncertainty.
In such cases, a certainty measure is added to the premises as well as the conclusions in the rules
of the system.
A rule then provides a function that describes : how much a change in the certainty of the
premise will change the certainty of the conclusion. In its simplest form, this looks like :
If A (with certainty x) then B (with certainty f(x))
Dempster-Shafer Theory
DST is a mathematical theory of evidence based on belief functions and plausible reasoning. It
is used to combine separate pieces of information (evidence) to calculate the probability of an
event.
DST offers an alternative to traditional probabilistic theory for the mathematical representation
of uncertainty.
DST can be regarded as, a more general approach to represent uncertainty than the Bayesian
approach.
Bayesian methods are sometimes inappropriate
Example :
Let A represent the proposition "Moore is attractive".
Then the axioms of probability insist that P(A) + P(¬A) = 1.
Now suppose that Andrew does not even know who "Moore" is, then
‡ We cannot say that Andrew believes the proposition if he has no
idea what it means.
‡ Also, it is not fair to say that he disbelieves the proposition.
‡ It would therefore be meaningful to denote Andrew's belief B of
B(A) and B(¬A) as both being 0.
‡ Certainty factors do not allow this.
This can be regarded as a more general approach to representing uncertainty than the
Bayesian approach.
For example, suppose we have a belief of 0.5 and a plausibility of 0.8 for a proposition, say “the cat
in the box is dead.” This means that we have evidence that allows us to state strongly that the
proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis (i.e.
“the cat is alive”) only has a confidence of 0.2. The remaining mass of 0.3 (the gap between the 0.5
supporting evidence on the one hand, and the 0.2 contrary evidence on the other) is “indeterminate,”
meaning that the cat could either be dead or alive. This interval represents the level of uncertainty
based on the evidence in the system.
Beliefs from different sources can be combined with various fusion operators to model specific
situations of belief fusion,
. The third condition, however, is subsumed by, but relaxed in DS theory: [2]:p. 19
For example, a Bayesian would model the color of a car as a probability distribution over (red,
green, blue), assigning one number to each color. Dempster–Shafer would assign numbers to
each of (red, green, blue, (red or green), (red or blue), (green or blue), (red or green or blue))
which do not have to cohere, for example Bel(red)+Bel(green) != Bel(red or green). This may be
computationally more efficient if a witness reports "I saw that the car was either blue or green" in
which case the belief can be assigned in a single step rather than breaking down into values for
two separate colors. However this can lead to irrational conclusions.
Equivalently, each of the following conditions defines the Bayesian special case of the DS
theory:[2]:p. 37,45
For finite X, all focal elements of the belief function are singletons.
Bayes' conditional probability is a special case of Dempster's rule of combination
Dempster-Shafer Calculus
This method allows for further additions to the set of knowledge and does not
assume disjoint outcomes.
M is a probability density function defined not just for but for em all subsets.
So if is the set { Flu (F), Cold (C), Pneumonia (P) } then is the set { ,
{F}, {C}, {P}, {F, C}, {F, P}, {C, P}, {F, C, P} }
where i.e. all the evidence that makes us believe in the correctness of P,
and
Bayesian networks
A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set
of variables and their probabilistic independencies.
For example, a Bayesian network could represent the probabilistic relationships between
diseases and symptoms. Given symptoms, the network can be used to compute the probabilities
of the presence of various diseases.
Bayesian Networks are also called : Bayes nets, Bayesian Belief Networks (BBNs) or simply
Belief Networks. Causal Probabilistic Networks (CPNs). Initially developed by Pearl
(1988).
A Bayesian network consists of :
a set of nodes and a set of directed edges between nodes.
the edges reflect cause-effect relations within the domain.
The effects are not completely deterministic (e.g. disease -> symptom).
the strength of an effect is modeled as a probability.
The basic idea is:
Knowledge in the world is modular -- most events are conditionally independent of most
other events.
Adopt a model that can use a more local representation to allow interactions between
events that only affect each other.
Some events may only be unidirectional others may be bidirectional -- make a distinction
between these in model.
Events may be causal and thus get chained together in a network.
Implementation
In order to decide whether to fix the car myself or send it to the garage I make the following
decision:
If the headlights do not work then the battery is likely to be flat so i fix it myself.
If the starting motor is defective then send car to garage.
If battery and starting motor both gone send car to garage.
Unit -IV
Mini-Max Algorithm
o Mini-max algorithm is a recursive or backtracking algorithm which is used in
decision-making and game theory. It provides an optimal move for the player
assuming that opponent is also playing optimally.
o Mini-Max algorithm uses recursion to search through the game-tree.
o Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various tow-players game. This Algorithm computes the minimax
decision for the current state.
o In this algorithm two players play the game, one is called MAX and other is called
MIN.
o Both the players fight it as the opponent player gets the minimum benefit while they
get the maximum benefit.
o Both Players of the game are opponent of each other, where MAX will select the
maximized value and MIN will select the minimized value.
o The minimax algorithm performs a depth-first search algorithm for the exploration of
the complete game tree.
o The minimax algorithm proceeds all the way down to the terminal node of the tree,
then backtrack the tree as the recursion.
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility
function to get the utility values for the terminal states. In the below tree diagram, let's
take A is the initial state of the tree. Suppose maximizer takes first turn which has worst-
case initial value =- infinity, and minimizer will take next turn which has worst-case initial
value = +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we
will compare each value in terminal state with initial value of Maximizer and determines the
higher nodes values. It will find the maximum among the all.
o Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist),
in the finite search tree.
o Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
o Time complexity- As it performs DFS for the game-tree, so the time complexity of
Min-Max algorithm is O(bm), where b is branching factor of the game-tree, and m is
the maximum depth of the tree.
o Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS
which is O(bm).
The main drawback of the minimax algorithm is that it gets really slow for complex games
such as Chess, go, etc. This type of games has a huge branching factor, and the player has
lots of choices to decide. This limitation of the minimax algorithm can be improved
from alpha-beta pruning.
Alpha-Beta Cutoffs(Pruning)
o Alpha-beta pruning is a modified version of the minimax algorithm. It is an
optimization technique for the minimax algorithm.
o As we have seen in the minimax search algorithm that the number of game states it
has to examine are exponential in depth of the tree. Since we cannot eliminate the
exponent, but we can cut it to half. Hence there is a technique by which without
checking each node of the game tree we can compute the correct minimax decision,
and this technique is called pruning. This involves two threshold parameter Alpha
and beta for future expansion, so it is called alpha-beta pruning. It is also called
as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point
along the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along
the path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as
the standard algorithm does, but it removes all the nodes which are not really
affecting the final decision but making algorithm slow. Hence by pruning these
nodes, it makes the algorithm fast.
Note: To better understand this topic, kindly study the minimax algorithm.
1. α>=β
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β=
+∞, and Node B passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β=
3, where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node A,
the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value
of beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
Move Ordering in Alpha-Beta pruning:
The effectiveness of alpha-beta pruning is highly dependent on the order in which each node
is examined. Move order is an important aspect of alpha-beta pruning.
o Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any
of the leaves of the tree, and works exactly as minimax algorithm. In this case, it
also consumes more time because of alpha-beta factors, such a move of pruning is
called worst ordering. In this case, the best move occurs on the right side of the
tree. The time complexity for such an order is O(bm).
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of
pruning happens in the tree, and best moves occur at the left side of the tree. We
apply DFS hence it first search left of the tree and go deep twice as minimax
algorithm in the same amount of time. Complexity in ideal ordering is O(b m/2).
o We can bookkeep the states, as there is a possibility that states may repeat.
o
Planning System
This is one of the most important planning algorithms, which is specifically used by STRIPS.
Methods which focus on ways of decomposing the original problem into appropriate subparts
and on ways of recording. And handling interactions among the subparts as they are detected
during the problem-solving process are often called as planning.
Planning refers to the process of computing several steps of a problem-solving procedure before
executing any of them
The stack is used in an algorithm to hold the action and satisfy the goal. A knowledge base is
used to hold the current state, actions.
Goal stack is similar to a node in a search tree, where the branches are created if there is a
choice of an action.
The important steps of the algorithm are as stated below:
i. Start by pushing the original goal on the stack. Repeat this until the stack becomes empty. If
stack top is a compound goal, then push its unsatisfied subgoals on the stack.
ii. If stack top is a single unsatisfied goal then, replace it by an action and push the action’s
precondition on the stack to satisfy the condition.
iii. If stack top is an action, pop it from the stack, execute it and change the knowledge base by
the effects of the action.
iv. If stack top is a satisfied goal, pop it from the stack.
Non-linear planning
This planning is used to set a goal stack and is included in the search space of all possible
subgoal orderings. It handles the goal interactions by interleaving method.
Non-linear planning may be an optimal solution with respect to plan length (depending on search
strategy used).
It takes larger search space, since all possible goal orderings are taken into consideration.
Complex algorithm to understand.
Algorithm
1. Choose a goal 'g' from the goalset
2. If 'g' does not match the state, then
Choose an operator 'o' whose add-list matches goal g
Push 'o' on the opstack
Add the preconditions of 'o' to the goalset
3. While all preconditions of operator on top of opstack are met in state
Pop operator o from top of opstack
state = apply(o, state)
plan = [plan; o]
Hierarchical Planning
In order to solve hard problems, a problem solver may have to generate long plans.
It is important to be able to eliminate some of the details of the problem until a solution
that addresses the main issues is found. Then an attempt can make to fill in the appropriate
details.
Early attempts to do this involved the use of macro operators, in which larger operators
were built from smaller ones. In this approach, no details eliminated from actual descriptions
of the operators.
ABSTRIPS
Speech
Written Text
Components of NLP
Difficulties in NLU
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity −
Lexical ambiguity − It is at very primitive level such as word-level.
For example, treating the word “board” as noun or verb?
Syntax Level ambiguity − A sentence can be parsed in different ways.
For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he
lifted a beetle that had red cap?
Referential ambiguity − Referring to something using pronouns. For example, Rima
went to Gauri. She said, “I am tired.” − Exactly who is tired?
One input can mean different meanings.
Many inputs can mean the same thing.
NLP Terminology
Steps in NLP
There are a number of algorithms researchers have developed for syntactic analysis, but we
consider only the following simple methods −
Context-Free Grammar
Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite
rules. Let us create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily
understand and process it. In order for the parsing algorithm to construct this parse tree, a set of
rewrite rules, which describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb
Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite rules
for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb
agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
They are not highly precise. For example, “The grains peck the bird”, is a syntactically
correct according to parser, but even if it makes no sense, parser takes it as a correct
sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may require
a completely different sets of rules for parsing singular and plural variations, passive
sentences, etc., which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal
symbols that matches the classes of the words in the input sentence until it consists entirely of
terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started
over again with a different set of rules. This is repeated until a specific rule is found which
describes the structure of the sentence.
Merit − It is simple to implement.
Demerits −
The idea of ANNs is based on the belief that working of human brain by making the right
connections, can be imitated using silicon and wires as living neurons and dendrites.
The human brain is composed of 86 billion nerve cells called neurons. They are connected to
other thousand cells by Axons. Stimuli from external environment or inputs from sensory
organs are accepted by dendrites. These inputs create electric impulses, which quickly travel
through the neural network. A neuron can then send the message to other neuron to handle the
issue or does not send it forward.
ANNs are composed of multiple nodes, which imitate biological neurons of human brain. The
neurons are connected by links and they interact with each other. The nodes can take input data
and perform simple operations on the data. The result of these operations is passed to other
neurons. The output at each node is called its activation or node value.
Each link is associated with weight. ANNs are capable of learning, which takes place by
altering weight values. The following illustration shows a simple ANN −
Biological Neuron
A nerve cell neuronneuron is a special biological cell that processes information. According to
an estimation, there are huge number of neurons, approximately 10 11 with numerous
interconnections, approximately 1015.
Schematic Diagram
Soma Node
Dendrites Input
The following table shows the comparison between ANN and BNN based on some criteria
mentioned.
Learning They can Very precise, structured and formatted data is required to tolerate ambiguity
tolerate
ambiguity
Fault Performance It is capable of robust performance, hence has the potential to be fault tolerant
tolerance degrades
with even
partial
damage
The following diagram represents the general model of ANN followed by its processing.
For the above general model of artificial neural network, the net input can be calculated as
follows −
yin=x1.w1+x2.w2+x3.w3…xm.wmyin=x1.w1+x2.w2+x3.w3…xm.wm
Network Topology
Adjustments of Weights or Learning
Activation Functions
In this chapter, we will discuss in detail about these three building blocks of ANN
Network Topology
A network topology is the arrangement of a network along with its nodes and connecting lines.
According to the topology, ANN can be classified as the following kinds −
1. Feedforward Network
Single layer feedforward network − The concept is of feedforward ANN having only
one weighted layer. In other words, we can say the input layer is fully connected to the
output layer.
2. Feedback Network
As the name suggests, a feedback network has feedback paths, which means the signal can flow
in both directions using loops. This makes it a non-linear dynamic system, which changes
continuously until it reaches a state of equilibrium. It may be divided into the following types −
Recurrent networks − They are feedback networks with closed loops. Following are the
two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture because all
nodes are connected to all other nodes and each node works as both input and output.
Jordan network − It is a closed loop network in which the output will go to the input
again as feedback as shown in the following diagram.
Learning, in artificial neural network, is the method of modifying the weights of connections
between the neurons of a specified network. Learning in ANN can be classified into three
categories namely supervised learning, unsupervised learning, and reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of a teacher. This
learning process is dependent.
During the training of ANN under supervised learning, the input vector is presented to the
network, which will give an output vector. This output vector is compared with the desired
output vector. An error signal is generated, if there is a difference between the actual output and
the desired output vector. On the basis of this error signal, the weights are adjusted until the
actual output is matched with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision of a teacher. This
learning process is independent.
During the training of ANN under unsupervised learning, the input vectors of similar type are
combined to form clusters. When a new input pattern is applied, then the neural network gives
an output response indicating the class to which the input pattern belongs.
There is no feedback from the environment as to what should be the desired output and if it is
correct or incorrect. Hence, in this type of learning, the network itself must discover the patterns
and features from the input data, and the relation for the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen the network over
some critic information. This learning process is similar to supervised learning, however we
might have very less information.
During the training of network under reinforcement learning, the network receives some
feedback from the environment. This makes it somewhat similar to supervised learning.
However, the feedback obtained here is evaluative not instructive, which means there is no
teacher as in supervised learning. After receiving the feedback, the network performs
adjustments of the weights to get better critic information in future.
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an exact output. In
ANN, we can also apply activation functions over the input to get the exact output. Followings
are some activation functions of interest −
Linear Activation Function
It is also called the identity function as it performs no input editing. It can be defined as −
F(x)=xF(x)=x
F(x)=sigm(x)=11+exp(−x)F(x)=sigm(x)=11+exp(−x)
Bipolar sigmoidal function − This activation function performs input editing between -
1 and 1. It can be positive or negative in nature. It is always bounded, which means its
output cannot be less than -1 and more than 1. It is also strictly increasing in nature like
sigmoid function. It can be defined as
F(x)=sigm(x)=21+exp(−x)−1=1−exp(x)1+exp(x)
Areas of Application
Followings are some of the areas, where ANN is being used. It suggests that ANN has an
interdisciplinary approach in its development and applications.
Speech Recognition
Speech occupies a prominent role in human-human interaction. Therefore, it is natural for
people to expect speech interfaces with computers. In the present era, for communication with
machines, humans still need sophisticated languages which are difficult to learn and use. To
ease this communication barrier, a simple solution could be, communication in a spoken
language that is possible for the machine to understand.
Great progress has been made in this field, however, still such kinds of systems are facing the
problem of limited vocabulary or grammar along with the issue of retraining of the system for
different speakers in different conditions. ANN is playing a major role in this area. Following
ANNs have been used for speech recognition −
Multilayer networks
Multilayer networks with recurrent connections
Kohonen self-organizing feature map
The most useful network for this is Kohonen Self-Organizing feature map, which has its input
as short segments of the speech waveform. It will map the same kind of phonemes as the output
array, called feature extraction technique. After extracting the features, with the help of some
acoustic models as back-end processing, it will recognize the utterance.
Character Recognition
It is an interesting problem which falls under the general area of Pattern Recognition. Many
neural networks have been developed for automatic recognition of handwritten characters,
either letters or digits. Following are some ANNs which have been used for character
recognition −