CSC 208_2023_NOTE
CSC 208_2023_NOTE
ARTIFICIAL INTELLIGENCE
1
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
Artificial Intelligence is concerned with the design of intelligence in an artificial device. The term
was coined by John McCarthy in 1956.
There are probably as many definitions of intelligence as there are experts who study it. Simply
put, however, “intelligence is the ability to learn about, learn from, understand, and interact with
one’s environment. This general ability consists of a number of specific abilities, which include
these specific abilities:
2
ARTIFICIAL INTELLIGENCE
AI is a broad field, and means different things to different people. It is concerned with getting
computers to do tasks that require human intelligence which require complex and sophisticated
reasoning processes and knowledge.
AI is the study of the mental faculties through the use of computational models AI is the study of
intellectual/mental processes as computational processes. AI program will demonstrate a high
level of intelligence to a degree that equals or exceeds the intelligence required of a human in
performing some task. AI is unique, sharing borders with Mathematics, Computer Science,
Philosophy, Psychology, Biology, Cognitive Science and many others. Although there is no clear
definition of AI or even Intelligence, it can be described as an attempt to build machines that like
humans can think and act, able to learn and use knowledge to solve problems on their own.
The definitions on the top, (a) and (b) are concerned with reasoning, whereas those on the
bottom, (c) and (d) address behavior. The definitions on the left, (a) and (c) measure success in
terms of human performance, and those on the right, (b) and (d) measure the ideal concept of
intelligence called rationality.
3
In 1957, The General Problem Solver (GPS) demonstrated by Newell, Shaw & Simon
In 1958, John McCarthy (MIT) invented the Lisp language.
In 1959, Arthur Samuel (IBM) wrote the first game-playing program, for checkers, to
achieve sufficient skill to challenge a world champion.
In 1963, Ivan Sutherland's MIT dissertation on Sketchpad introduced the idea of
interactive graphics into computing.
In 1966, Ross Quillian (PhD dissertation, Carnegie Inst. of Technology; now CMU)
demonstrated semantic nets
In 1967, Dendral program (Edward Feigenbaum, Joshua Lederberg, Bruce Buchanan,
Georgia
Sutherland at Stanford) demonstrated to interpret mass spectra on organic chemical
compounds.
First successful knowledge-based program for scientific reasoning.
In 1967, Doug Engelbart invented the mouse at SRI
In 1968, Marvin Minsky & Seymour Papert publish Perceptrons, demonstrating limits of
simple neural nets.
In 1972, Prolog developed by Alain Colmerauer.
In Mid 80’s, Neural Networks become widely used with the Back propagation algorithm
(first described by Werbos in 1974).
1990, Major advances in all areas of AI, with significant demonstrations in machine
learning,
intelligent tutoring, case-based reasoning, multi-agent planning, scheduling, uncertain
reasoning, data mining, natural language understanding and translation, vision, virtual
reality, games, and other topics.
In 1997, Deep Blue beats the World Chess Champion Kasparov
In 2002,iRobot, founded by researchers at the MIT Artificial Intelligence Lab, introduced
Roomba, a vacuum cleaning robot. By 2006, two million had been sold.
4
SUB AREAS OF ARTIFICIAL INTELLIGENCE
1) Game Playing:
Deep Blue Chess program beat world champion Gary Kasparov
2) Speech Recognition:
PEGASUS spoken language interface to American Airlines' EAASY SABRE reservation system,
which allows users to obtain flight information and make reservations over the telephone. The
1990s saw significant advances in speech recognition so that many systems are now successful.
3) Computer Vision:
Face recognition programs in use by banks, government, etc. The ALVINN system from CMU
autonomously drove a van from Washington, D.C. to San Diego (all but 52 of 2,849 miles),
averaging 63 mph day and night, and in all weather conditions. Handwriting recognition,
electronics and manufacturing inspection, photo interpretation, baggage inspection, reverse
engineering to automatically construct a 3D geometric model.
4) Expert Systems:
Application-specific systems that rely on obtaining the knowledge of human experts in an area and
programming that knowledge into a system. Examples are:
a. Diagnostic Systems: MYCIN system for diagnosing bacterial infections of the blood and
suggesting treatments. Intellipath pathology diagnosis system (AMA approved). Pathfinder
medical diagnosis system, which suggests tests and makes diagnoses. Whirlpool customer
assistance center.
b. System Configuration: DEC's XCON system for custom hardware configuration.
Radiotherapy treatment planning.
c. Financial Decision Making: Credit card companies, mortgage companies, banks, and the
U.S. government employ AI systems to detect fraud and expedite financial transactions.
For example, AMEX credit check.
d. Classification Systems: Put information into one of a fixed set of categories using several
sources of information. e.g., financial decision making systems. NASA developed a system
for classifying very faint areas in astronomical images into either stars or galaxies with
very high accuracy by learning from human experts' classifications.
5) Mathematical Theorem Proving:
Use inference methods to prove new theorems.
6) Natural Language Understanding:
5
AltaVista's translation of web pages. Translation of Catepillar Truck manuals into 20 languages.
7) Scheduling and Planning:
Automatic scheduling for manufacturing. DARPA's DART system used in Desert Storm and
Desert Shield operations to plan logistics of people and supplies. American Airlines rerouting
contingency planner. European space agency planning and scheduling of spacecraft assembly,
integration and verification.
8) Artificial Neural Networks
9) Machine Learning etc.
INTELLIGENT SYSTEMS
In order to design intelligent systems, it is important to categorize them into four categories (Luger
and Stubberfield 1993), (Russell and Norvig, 2003)
Scientific Goal: To determine which ideas about knowledge representation, learning, rule
systems search, and so on, explain various sorts of real intelligence.
Engineering Goal: To solve real world problems using AI techniques such as Knowledge
representation, learning, rule systems, search, and so on.
Traditionally, computer scientists and engineers have been more interested in the engineering
goal, while psychologists, philosophers and cognitive scientists have been more interested in the
scientific goal.
6
1. Cognitive Science: Think Human-Like
a. Requires a model for human cognition. Precise enough models allow simulation by computers.
b. Focus is not just on behavior and I/O, but looks like reasoning process.
c. Goal is not just to produce human-like behavior but to produce a sequence of steps of the
reasoning process, similar to the steps followed by a human in solving the same task.
c. Goal is to formalize the reasoning process as a system of logical rules and procedures of
inference.
d. Develop systems of representation to allow inferences to be like
“Socrates is a man. All men are mortal. Therefore Socrates is mortal”
7
Turing defined the intelligent behavior of a computer as the ability to achieve the human-level
performance in cognitive tasks. In other words, a computer passes the test if interrogators cannot
distinguish the machine from a human on the basis of the answers to their questions.
The imitation game proposed by Turing originally included two phases. In the first phase, shown
in the figure 1 above, the interrogator, a man and a woman are each placed in separate rooms and
can communicate only via a neutral medium such as a remote terminal. The interrogator’s
objective is to work out who is the man and who is the woman by questioning them. The rules of
the game are that the man should attempt to deceive the interrogator that he is the woman, while
the woman has to convince the interrogator that she is the woman.
8
Physical simulation of a human is not important for intelligence. Hence, in the Turing test the
interrogator does not see, touch or hear the computer and is therefore not influenced by its
appearance or voice. However, the interrogator is allowed to ask any questions, even provocative
ones, in order to identify the machine. The interrogator may, for example, ask both the human and
the machine to perform complex mathematical calculations, expecting that the computer will
provide a correct solution and will do it faster than the human. Thus, the computer will need to
know when to make a mistake and when to delay its answer. The interrogator also may attempt to
discover the emotional nature of the human, and thus, he might ask both subjects to examine a
short novel or poem or even painting. Obviously, the computer will be required here to simulate a
human’s emotional understanding of the work. The Turing test has two remarkable qualities that
make it really universal. Thus;
By maintaining communication between the human and the machine via terminals, the test
gives us an objective standard view on intelligence. It avoids debates over the human nature
of intelligence and eliminates any bias in favour of humans. .
The test itself is quite independent from the details of the experiment. It can be conducted
either as a two-phase game as just described, or even as a singlephase game in which the
interrogator needs to choose between the human and the machine from the beginning of
the test. The interrogator is also free to ask any question in any field and can concentrate
solely on the content of the answers provided.
AI Problems:
AI problems (speech recognition, NLP, vision, automatic programming, knowledge
representation, etc.) can be paired with techniques (NN, search, Bayesian nets, production systems,
etc.).AI problems can be classified in two types:
1. Common-place tasks (Mundane Tasks)
9
2. Expert tasks
Common-Place Tasks:
1. Recognizing people, objects.
2. Communicating (through natural language).
3. Navigating around obstacles on the streets.
These tasks are done matter of factly and routinely by people and some other animals.
Expert tasks:
1. Medical diagnosis.
2. Mathematical problem solving
3. Playing games like chess
These tasks cannot be done by all people, and can only be performed by skilled specialists. Clearly
tasks of the first type are easy for humans to perform, and almost all are able to master them. The
second range of tasks requires skill development and/or intelligence and only some specialists can
perform them well. However, when we look at what computer systems have been able to achieve
to date, we see that their achievements include performing sophisticated tasks like medical
diagnosis, performing symbolic integration, proving theorems and playing chess.
INTELLIGENT AGENTS
Agents and Environments
An Agent is anything that can be viewed as perceiving its environment through sensors and acting
upon that environment through actuators. A human agent has eyes, ears, and other organs for
sensors and hands, legs, mouth, and other body parts for actuators. A robotic agent might have
cameras and infrared range finders for sensors and various motors for actuators. A software agent
10
receives keystrokes, file contents, and network packets as sensory inputs and acts on the
environment by displaying on the screen, writing files, and sending network packets.
The agent function maps from percept histories to actions:
[f: P* A]
• The agent program runs on the physical architecture to produce f
• Agent = architecture + program
Percept:
We use the term percept to refer to the agent's perceptual inputs at any given instant.
Percept Sequence:
An agent's percept sequence is the complete history of everything the agent has ever perceived.
Agent function:
Mathematically speaking, we say that an agent's behavior is described by the agent function that
maps any given percept sequence to an action.
Agent program:
Internally, the agent function for an artificial agent will be implemented by an agent program. It is
important to keep these two ideas distinct. The agent function is an abstract mathematical
description while the agent program is a concrete implementation, running on the agent
architecture.
To illustrate these ideas, we will use a very simple example-the vacuum-cleaner world shown in
figure 4 below.
This particular world has just two locations: squares A and B. The vacuum agent perceives which
square it is in and whether there is dirt in the square. It can choose to move left, move right, suck
up the dirt, or do nothing. One very simple agent function is the following: if the current square is
dirty, then suck, otherwise move to the other square. A partial tabulation of this agent function is
shown in Figure 5.
11
Agent function
Figure 5: Partial tabulation of a simple agent function for the example: vacuum-cleaner
world shown in the Figure 4
Figure 6: The REFLEX-VACCUM-AGENT program is invoked for each new percept (location,
status) and returns an action each time.
12
AGENT ENVIRONMENT IN AI
An environment is everything in the world which surrounds the agent, but it is not a part of an
agent itself. An environment can be described as a situation in which an agent is present.
The environment is where agent lives, operate and provide the agent with something to sense and
act upon it. An environment is mostly said to be non-feministic.
Some programs operate in the entirely artificial environment confined to keyboard input, database,
computer file systems and character output on a screen.
In contrast, some software agents (software robots or softbots) exist in rich, unlimited softbots
domains. The simulator has a very detailed, complex environment. The software agent needs to
choose from a long array of actions in real time. For example, a softbot designed to scan the online
preferences of the customer and show interesting items to the customer works in the real as well
as an artificial environment.
The most famous artificial environment is the Turing Test environment, in which one real and
other artificial agents are tested on equal ground. This is a very challenging environment as it is
highly difficult for a software agent to perform as well as a human.
Features of Environment
An environment can have various features from the point of view of an agent: (Russell & Norvig,
2020):
Fully Observable vs Partially Observable
Static vs Dynamic
Discrete vs Continuous
Deterministic vs Stochastic
Single-agent vs Multi-agent
Episodic vs Sequential
Known vs Unknown
Accessible vs Inaccessible
13
words, the environment is fully observable if everything an agent requires to choose its actions is
available to it via its sensors. The environment is partially observable if parts of the environment
are inaccessible which implies that an agent must make informed guesses about the world)
A fully observable environment is easy as there is no need to maintain the internal state to keep
track of the history of the world. If an agent is with no sensors in all environments then such an
environment is called as unobservable environment. E.g. Cross word puzzle is fully observable
while Poker is partially observable.
2. Deterministic vs Stochastic:
If an agent's current state and selected action can completely determine the next state of the
environment, then such environment is called a deterministic environment.
3. Episodic vs Sequential:
In an episodic environment, there is a series of one-shot actions, and only the current percept is
required for the action. However, in Sequential environment, an agent requires memory of past
actions to determine the next best actions i.e. the choice of current action is dependent on pervious
action. Episodic environments are much simpler because the agent does not need to think ahead.
E.g. Cross word puzzle and Poker are sequential while image analysis is episodic.
4. Single-agent vs Multi-agent
If only one agent is involved in an environment, and operating by itself then such an environment
is called single agent environment.
However, if multiple agents are operating in an environment, then such an environment is called a
multi-agent environment. The agent design problems in the multi-agent environment are different
from single agent environment.
5. Static vs Dynamic:
If the environment can change itself while an agent is deliberating then such environment is called
a dynamic environment else it is called a static environment.
Static environments are easy to deal because an agent does not need to continue looking at the
world while deciding for an action. However for dynamic environment, agents need to keep
14
looking at the world at each action. Alternatively, agents need to anticipate the change during
deliberation OR make decision very fast.
Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an example
of a static environment.
6. Discrete vs Continuous:
If in an environment there are a finite number of percepts and actions that can be performed within
it, then such an environment is called a discrete environment else it is called continuous
environment.
A chess game comes under discrete environment as there is a finite number of moves that can be
performed. A self-driving car is an example of a continuous environment.
7. Known vs Unknown:
Known and unknown are not actually a feature of an environment, but it is an agent's state of
knowledge to perform an action.
In a known environment, the results for all actions are known to the agent. While in unknown
environment, agent needs to learn how it works in order to perform an action.
8. Accessible vs Inaccessible:
If an agent can obtain complete and accurate information about the state's environment, then such
an environment is called an accessible environment else it is called inaccessible.
An empty room whose state can be defined by its temperature is an example of an accessible
environment. Information about an event on earth is an example of inaccessible environment.
Rationality
Rationality is the state of being reasonable, sensible, and having good sense of judgment.
Rationality is concerned with expected actions and results depending upon what the agent has
perceived. Performing actions with the aim of obtaining useful information is an important part of
rationality.
15
2. Its built-in knowledge base
Rationality of an agent depends on the following:
A rational agent always performs right action, where the right action means the action that causes
the agent to be most successful in the given percept sequence. The problem the agent solves is
characterized by Performance Measure, Environment, Actuators, and Sensors (PEAS).
Types of Agent
1. Simple Reflex Agents
16
Figure: Simple Reflex Agents
They use a model of the world to choose their actions. They maintain an internal state.
Model − knowledge about “how the things happen in the world”.
Internal State − It is a representation of unobserved aspects of current state depending on percept
history.
17
3. Goal Based Agents
They choose their actions in order to achieve goals. Goal-based approach is more flexible than
reflex agent since the knowledge supporting a decision is explicitly modeled, thereby allowing for
modifications.
Goal − It is the description of desirable situations.
There are conflicting goals, out of which only few can be achieved. Goals have some uncertainty
of being achieved and you need to weigh likelihood of success against the importance of a goal.
18
PEAS: Performance measure, Environment, Actuator, Sensor
PEAS System is used to categorize similar agents together. The PEAS system delivers the
performance measure with respect to the environment, actuators and sensors of the respective
agent. Most of the highest performing agents are Rational Agents.
Rational Agent: The rational agent considers all possibilities and chooses to perform the highly
efficient action. For example it chooses the shortest path with low cost for high efficiency.
PEAS stands for Performance measure, Environment, Actuator, Sensor.
Performance Measure: Performance measure is the unit to define the success of an agent.
Performance varies with agents based on their different precept.
Environment: Environment is the surrounding of an agent at every instant. It keeps changing
with time if the agent is set in motion. There are 5 major types of environments:
Fully Observable & Partially Observable
Episodic & Sequential
Static & Dynamic
Discrete & Continuous
Deterministic & Stochastic
Actuator: Actuator is a part of the agent that delivers the output of an action to the environment.
Sensor: Sensors are the receptive parts of an agent which takes in the input for the agent.
19
Subject Maximize scores, Classroom, Desk, Smart displays, Eyes, Ears,
Tutoring Improvement in Chair, Board, Corrections, Notebooks
students Staff, Students Exercises
a. Search Space: Search space represents a set of possible solutions, which a system may
have.
b. Start State: It is a state from where agent begins the search.
c. Goal test: It is a function which observe the current state and returns whether the goal
state is achieved or not.
o Search tree: A tree representation of search problem is called Search tree. The root of
the search tree is the root node which is corresponding to the initial state.
o Actions: It gives the description of all the available actions to the agent.
o Transition model: A description of what each action do, can be represented as a
transition model.
o Path Cost: It is a function which assigns a numeric cost to each path.
o Solution: It is an action sequence which leads from the start node to the goal node.
o Optimal Solution: If a solution has the lowest cost among all solutions.
20
Properties of Search Algorithms:
Following are the four essential properties of search algorithms to compare the efficiency of
these algorithms:
Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest
path cost) among all other solutions, then such a solution for is said to be an optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.
Space Complexity: It is the maximum storage space required at any point during the search, as
the complexity of the problem.
21
The uninformed search does not contain any domain knowledge such as closeness, the location
of the goal. It only includes information about how to traverse the tree and how to identify leaf
and goal nodes. Uninformed search applies a way in which search tree is searched without any
information about the search space like initial state operators and test for the goal, so it is also
called blind search. It examines each node of the tree until it achieves the goal node.
o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search is implemented using FIFO queue data structure.
Advantages:
o BFS will provide a solution if any solution exists.
o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.
Disadvantages:
o It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.
22
BFS illustrated:
Step 1: Initially fringe contains only one node corresponding to the source state A.
Step 2: A is removed from fringe. The node is expanded, and its children B and C are generated.
They are placed at the back of fringe.
Figure: Fringe BC
Step 3: Node B is removed from fringe and is expanded. Its children D, E are generated and put
at the back of fringe.
23
Step 4: Node C is removed from fringe and is expanded. Its children D and G are added to the
back of fringe.
Step 5: Node D is removed from fringe. Its children C and F are generated and added to the back
of fringe.
24
Figure: Fringe GCFBF
Step 8: G is selected for expansion. It is found to be a goal node. So the algorithm returns the
path A C G by following the parent pointers of the node corresponding to G. The algorithm
terminates.
The above example shows the traversing of the tree using BFS algorithm from the root node S to
goal node K. BFS search algorithm traverse in layers.
Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).
25
Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.
DFS illustrated:
Step 1: Initially fringe contains only the node for A.
Step 2: A is removed from fringe. A is expanded and its children B and C are put in front of
fringe.
Figure: Fringe BC
Step 3: Node B is removed from fringe, and its children D and E are pushed in front of fringe.
26
Figure: Fringe DEC
Step 4: Node D is removed from fringe. C and F are pushed in front of fringe.
27
3. Iterative Deepening Depth-First Search (IDDFS):
Iterative Deepening Search (IDS)
The iterative deepening algorithm is a combination of DFS and BFS algorithms. In other words,
IDS is an iterative graph searching strategy that takes advantage of the completeness of the
Breadth-First Search (BFS) strategy but uses much less memory in each iteration (similar to
Depth-First Search). It searches each branch of a node from left to right until it reaches the
required depth. Once it has, IDS goes back to the root node and explores a different branch that
is similar to DFS.
This search algorithm finds out the best depth limit and does it by gradually increasing the limit
until a goal is found. This algorithm performs depth-first search up to a certain "depth limit", and
it keeps increasing the depth limit after each iteration until the goal node is found.
Advantages:
o It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.
Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.
o The time taken to reach the goal node is exponential.
IDDFS illustrated:
28
As a general rule of thumb, we use iterative deepening when we do not know the depth of our
solution and have to search a very large state space.
29
it places those nodes which have already expanded and in the OPEN list, it places nodes which
have yet not been expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.
Two main algorithms of the informed search are given below:
The greedy best first algorithm is implemented by the priority queue in order to store costs of
nodes.
Best first search algorithm:
o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.
o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.
o Step 5: Check each successor of node n, and find whether any node is a goal node or not.
If any successor node is goal node, then return success and terminate the search, else
proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in
both list, then add it to the OPEN list.
o Step 7: Return to Step 2.
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
30
o This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
o This algorithm is not optimal.
o It is incomplete because it can start down an infinite path and never return to try other
possibilities.
o The worst-case time complexity for greedy search is O (bm ), where m is the maximum
depth of the search space.
o Because greedy search retains all nodes in memory, its space complexity is the same as its
time complexity
Example:
Consider the below search problem, and we will traverse it using best-first search. At each
iteration, each node is expanded using evaluation function f(n)=h(n)
Start from source "S" and search for goal "I" using given costs and Best First search.
31
Remove H from Priority_Queue. Since our goal "I" is a neighbor of H, we return.
2. A* Search Algorithm
The Best First algorithm is a simplified form of the A* algorithm.
The A* search algorithm is a tree search algorithm that finds a path from a given initial node to a
given goal node (or one passing a given goal test). It employs a "heuristic estimate" which ranks
each node by an estimate of the best route that goes through that node. It visits the nodes in order
of this heuristic estimate. In other words, A * algorithm searches for the shortest path between the
initial and the final state.
In maps the A* algorithm is used to calculate the shortest distance between the source (initial state)
and the destination (final state).
A* Search algorithm is one of the best and popular technique used in path-finding and graph
traversals.
Similar to greedy best-first search but is more accurate because A* takes into account the nodes
that have already been traversed.
A* search properties:
o The algorithm A* is admissible. This means that provided a solution exists, the first
solution found by A* is an optimal solution. A* is admissible under the following
conditions:
o Heuristic function: for every node n, h(n) ≤ h*(n) .
o A* is also complete.
o A* is optimally efficient for a given heuristic.
o A* is much more efficient that uninformed search.
g: the cost of moving from the initial cell to the current cell. Basically, it is the sum of all the cells
that have been visited since leaving the first cell. In other words, the movement cost to move from
the starting point to a given square on the grid, following the path generated to get there.
h: also called the heuristic value, it is the estimated cost of moving from the current cell to the final
cell. The actual cost cannot be calculated until the final cell is reached. Hence, h is the estimated
cost. We must make sure that there is never an over estimation of the cost.
f: it is the sum of g and h. So, f = g + h.
32
Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
The way that the algorithm makes its decisions is by taking the f-value into account. The algorithm
selects the smallest f-valued cell and moves to that cell. This process continues until the algorithm
reaches its goal cell. Notice how node B is never visited.
33
Formulation and Solution of Constraint Satisfaction Problems
Formally speaking, a constraint satisfaction problem (or CSP) is defined by a set of variables,
X1;X2; : : : ;Xn, and a set of constraints, C1;C2; : : : ;Cm. Each variable Xi has a non-empty
domain Di of possible values. Each constraint Ci involves some subset of t variables and specifies
the allowable combinations of values for that subset. A state of the problem is defined by an
assignment of values to some or all of the variables, {Xi = vi;Xj =vj ; : : :} An assignment that
does not violate any constraints is called a consistent or legal assignment. A complete assignment
is one in which every variable is mentioned, and a solution to a CSP is a complete assignment that
satisfies all the constraints. Some CSPs also require a solution that maximizes an objective
function.
CSP can be given an incremental formulation as a standard search problem as follows:
1. Initial state: the empty assignment fg, in which all variables are unassigned.
2. Successor function: a value can be assigned to any unassigned variable, provided that it does
not conflict with previously assigned variables.
3. Goal test: the current assignment is complete.
4. Path cost: a constant cost for every step.
Example:
The map coloring problem. The task of coloring each region red, green or blue in such a way that
no neighbouring regions have the same color. We are given the task of coloring each region red,
green, or blue in such a way that the neighboring regions must not have the same color.
To formulate this as CSP, we define the variable to be the regions: WA, NT, Q, NSW, V, SA, and
T. The domain of each variable is the set {red, green, blue}.
The constraints require neighboring regions to have distinct colors: for example, the allowable
combinations for WA and NT are the pairs
{(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.
(The constraint can also be represented as the inequality WA ≠ NT).
There are many possible solutions, such as {WA = red, NT = green, Q = red, NSW = green, V =
red, SA = blue, T = red}.
34
Figure: Map of Australia showing each of its states and territories
Constraint Graph: A CSP is usually represented as an undirected graph, called constraint graph
where the nodes are the variables and the edges are the binary constraints.
Initial state: the empty assignment {},in which all variables are unassigned.
Successor function: a value can be assigned to any unassigned variable, provided that it
does not conflict with previously assigned variables.
Goal test: the current assignment is complete.
Path cost: a constant cost (E.g.,1) for every step.
35
Adversarial Search
Adversarial search, or game-tree search, is a technique for analyzing an adversarial game in order
to try to determine who can win the game and what moves the players should make in order to
win. Adversarial search is one of the oldest topics in Artificial Intelligence. The original ideas for
adversarial search were developed by Shannon in 1950 and independently by Turing in 1951, in
the context of the game of chess—and their ideas still form the basis for the techniques used today.
Adversarial search is a search, where we examine the problem which arises when we try to plan
ahead of the world and other agents are planning against us.
In previous topics, we have studied the search strategies which are only associated with a single
agent that aims to find the solution which are often expressed in the form of a sequence of actions.
But, there might be some situations where more than one agent is searching for the solution in the
same search space, and this situation usually occurs in game playing.
The environment with more than one agent is termed as multi-agent environment, in which each
agent is an opponent of other agent and playing against each other. Each agent needs to consider
the action of other agent and effect of that action on their performance.
So, Searches in which two or more players with conflicting goals are trying to explore the same
search space for the solution, are called adversarial searches, often known as Games. Games are
modelled as a Search problem and heuristic evaluation function, and these are the two main factors
which help to model and solve games in AI.
Perfect information: A game with the perfect information is that in which agents can look into the
complete board. Agents have all the information about the game, and they can see each other
moves also. Examples are Chess, Checkers, Go, etc.
Imperfect information: If in a game agents do not have all information about the game and not
aware with what's going on, such type of games are called the game with imperfect information,
such as tic-tac-toe, Battleship, blind, Bridge, etc.
36
Deterministic games: Deterministic games are those games which follow a strict pattern and set of
rules for the games, and there is no randomness associated with them. Examples are chess,
Checkers, Go, tic-tac-toe, etc.
Non-deterministic games: Non-deterministic are those games which have various unpredictable
events and has a factor of chance or luck. This factor of chance or luck is introduced by either dice
or cards. These are random, and each action response is not fixed. Such games are also called as
stochastic games. Example: Backgammon, Monopoly, Poker, etc.
Zero-Sum Game
Zero-sum games are adversarial search which involves pure competition. In Zero-sum game each
agent's gain or loss of utility is exactly balanced by the losses or gains of utility of another agent.
One player of the game try to maximize one single value, while other player tries to minimize it.
Each move by one player in the game is called ply. Chess and tic-tac-toe are examples of a Zero-
sum game.
What to do.
How to decide the move
Needs to think about his opponent as well
The opponent also thinks what to do
Each of the players is trying to find out the response of his opponent to their actions. This requires
embedded thinking or backward reasoning to solve the game problems in AI.
37
o Both Players of the game are opponent of each other, where MAX will select the
maximized value and MIN will select the minimized value.
o The minimax algorithm performs a depth-first search algorithm for the exploration of the
complete game tree.
o The minimax algorithm proceeds all the way down to the terminal node of the tree, then
backtrack the tree as the recursion.
MiniMax Terminologies:
MiniMax Algorithm:
1. Generate the whole game tree.
2. Apply the utility function to leaf nodes to get their values.
3. Use the utility of nodes at level n to derive the utility of nodes at level n-1.
4. Continue backing up values towards the root (one layer at a time).
5. Eventually the backed up values reach the top of the tree, at which point Max chooses the move
that yields the highest value. This is called the minimax decision because it maximises the utility
for Max on the assumption that Min will play perfectly to minimise it.
38
o At the terminal node, the terminal values are given so we will compare those value and
backtrack the tree until the initial state occurs. Following are the main steps involved in
solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility function
to get the utility values for the terminal states. In the below tree diagram, let's take A as the initial
state of the tree. Suppose maximizer takes first turn which has worst-case initial value =- infinity,
and minimizer will take next turn which has worst-case initial value = +infinity.
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we will
compare each value in terminal state with initial value of Maximizer and determine the higher
nodes values. It will find the maximum among all nodes.
o For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) => max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞, and
will find the 3rd layer node values.
39
o For node B= min(4,6) = 4
o For node C= min(-3, 7) = -3
Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes value
and find the maximum value for the root node. In this game tree, there are only 4 layers, hence we
reach root node fast, but in real games, there will most likely be more than 4 layers.
For node A max(4, -3)= 4
Alpha-Beta Pruning
Pruning: eliminating a branch of the search tree from consideration without exhaustive
examination of each node.
40
• α-β Pruning: the basic idea is to prune portions of the search tree that cannot improve the utility
value of the max or min node, by just considering the values of nodes seen so far. Alpha-beta
pruning is used on top of minimax search to detect paths that do not need to be explored. In other
words, alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization
technique for the minimax algorithm.
Terminologies:
• The MAX player is always trying to maximize the score. Call this α.
• The MIN player is always trying to minimize the score. Call this β.
• Alpha cut-off: Given a Max node n, cutoff the search below n (i.e., don't generate or examine
any more of n's children) if alpha(n) >= beta(n).
• Beta cut-off.: Given a Min node n, cutoff the search below n (i.e., don't generate or examine any
more of n's children) if beta(n) <= alpha(n).
Example 1:
1. Setup phase:
Assign to each left-most (or right-most) internal node of the tree, variables: alpha = -infinity (-∞),
beta = +infinity (+∞).
2. Look at first computed final configuration value. It’s a 3. Parent is a min node, so set the beta
(min) value to 3.
41
3. Look at next value, 5. Since parent is a min node, we want the minimum of 3 and 5 which is 3.
Parent min node is done – fill alpha (max) value of its parent max node. Always set alpha for max
nodes and beta for min nodes. Copy the state of the max parent node into the second unevaluated
min child.
4. Look at next value, 2. Since parent node is min with b=+inf, 2 is smaller, change beta.
5. Now, the min parent node has a max value of 3 and min value of 2. The value of the 2nd child
does not matter. If it is >2, 2 will be selected for min node. If it is <2, it will be selected for min
42
node, but since it is <3 it will not get selected for the parent max node. Thus, we prune the right
sub-tree of the min node. Propagate max value up the tree.
6. Max node is now done and we can set the beta value of its parent and propagate node state to
sibling sub tree’s left-most path.
7. The next node is 10. 10 is not smaller than 3, so state of parent does not change. We still have
to look at the 2nd child since alpha is still –inf.
43
8. The next node is 4. Smallest value goes to the parent min node. Min subtree is done, so the
parent max node gets the alpha (max) value from the child. Note that if the max node had a 2nd
subtree, we can prune it since a>b.
9. Continue propagating value up the tree, modifying the corresponding alpha/beta values. Also
propagate the state of root node down the left-most path of the right subtree.
10. Next value is a 2. We set the beta (min) value of the min parent to 2. Since no other children
exist, we propagate the value up the tree.
44
11. We have a value for the 3rd level max node, now we can modify the beta (min) value of the
min parent to 2. Now, we have a situation that a>b and thus the value of the rightmost subtree of
the min node does not matter, so we prune the whole subtree.
12. Finally, no more nodes remain, we propagate values up the tree. The root has a value of 3
that comes from the left-most child. Thus, the player should choose the left-most child’s move in
order to maximize his/her winnings. As you can see, the result is the same as with the mini-max
example, but we did not visit all nodes of the tree.
ADVANCED SEARCH
Genetic Algorithm
A genetic algorithm is a search heuristic that is inspired by Charles Darwin’s theory of natural
evolution. This algorithm reflects the process of natural selection where the fittest individuals are
selected for reproduction in order to produce offspring of the next generation.
Genetic algorithms are a type of optimization algorithm, meaning they are used to find the optimal
solution(s) to a given computational problem that maximizes or minimizes a particular function.
Genetic algorithms represent one branch of the field of study called evolutionary computation, in
45
that they imitate the biological processes of reproduction and natural selection to solve for the
‘fittest’ solutions. Like in evolution, many of a genetic algorithm’s processes are random, however
this optimization technique allows one to set the level of randomization and the level of control.
These algorithms are far more powerful and efficient than random search and exhaustive search
algorithms, yet require no extra information about the given problem.
A genetic algorithm (or GA) is a search technique used in computing to find true or approximate
solutions to optimization and search problems. GAs are categorized as global search heuristics.
GAs are a particular class of evolutionary algorithms that use techniques inspired by evolutionary
biology such as inheritance, mutation, selection, and crossover (also called recombination).
Natural Selection
The process of natural selection starts with the selection of fittest individuals from a population.
They produce offspring which inherit the characteristics of the parents and will be added to the
next generation. If parents have better fitness, their offspring will be better than parents and have
a better chance at surviving. This process keeps on iterating and at the end, a generation with the
fittest individuals will be found.
This notion can be applied for a search problem. We consider a set of solutions for a problem and
select the set of best ones out of them. Five phases are considered in a genetic algorithm:
Initial population
Fitness function
Selection
Crossover
Mutation
1. Initial Population
The process begins with a set of individuals which is called a Population. Each individual is a
solution to the problem you want to solve.
46
An individual is characterized by a set of parameters (variables) known as Genes. Genes are joined
into a string to form a Chromosome (solution).
In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of
an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the genes
in a chromosome.
2. Fitness Function
The fitness function determines how fit an individual is (the ability of an individual to compete
with other individuals). It gives a fitness score to each individual. The probability that an individual
will be selected for reproduction is based on its fitness score.
3. Selection
The idea of selection phase is to select the fittest individuals and let them pass their genes to the
next generation. Two pairs of individuals (parents) are selected based on their fitness scores.
Individuals with high fitness have more chance to be selected for reproduction.
4. Crossover
Crossover is the most significant phase in a genetic algorithm. For each pair of parents to be mated,
a crossover point is chosen at random from within the genes.
For example, consider the crossover point to be 3 as shown below.
Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached. The new offspring are added to the population.
New offspring
47
5. Mutation
In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped. Mutation
occurs to maintain diversity within the population and prevent premature convergence.
Termination:
The algorithm terminates if the population has converged (does not produce offspring which are
significantly different from the previous generation). Then it is said that the genetic algorithm
has provided a set of solutions to our problem.
Note:
The population has a fixed size. As new generations are formed, individuals with least fitness
die, providing space for new offspring.
Economics − GAs are also used to characterize various economic models like the cobweb model,
game theory equilibrium resolution, asset pricing, etc.
Neural Networks − GAs are also used to train neural networks, particularly recurrent neural
networks.
Parallelization − GAs also have very good parallel capabilities, and prove to be very effective
means in solving certain problems, and also provide a good area for research.
Image Processing − GAs are used for various digital image processing (DIP) tasks as well like
dense pixel matching.
Scheduling applications − GAs are used to solve various scheduling problems as well,
particularly the time tabling problem.
Robot Trajectory Generation − GAs have been used to plan the path which a robot arm takes by
moving from one point to another.
48
Parametric Design of Aircraft − GAs have been used to design aircrafts by varying the
parameters and evolving better solutions.
DNA Analysis − GAs have been used to determine the structure of DNA using spectrometric
data about the sample.
Multimodal Optimization − GAs are obviously very good approaches for multimodal
optimization in which we have to find multiple optimum solutions.
Example:
The Traveling Salesman Problem:
The aim is to find a tour of a given set of cities so that each city is visited only once. The total
distance traveled is minimized.
Representation: is an ordered list of city numbers known as an order-based GA.
1) London 3) Dunedin 5) Beijing 7) Tokyo
2) Venice 4) Singapore 6) Phoenix 8) Victoria
CityList1 (3 5 7 2 1 6 4 8)
CityList2 (2 5 7 6 8 1 3 4)
Crossover: combines inversion and recombination:
Parent1 (3 5 7 2 1 6 4 8)
Parent2 (2 5 7 6 8 1 3 4)
______________________________
Child (2 5 7 6 8 1 3 4)
This operator is called the Order1 crossover.
Mutation: involves reordering of the list:
Before: (5 8 7 2 1 6 3 4)
After: (5 8 6 2 1 7 3 4)
49
Knowledge Representation
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things,
which is knowledge and as per their knowledge they perform various actions in the real world. But
how machines do all these things comes under knowledge representation and reasoning. Hence we
can describe Knowledge representation as following:
Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which
concerned with AI agents thinking and how thinking contributes to intelligent behavior of agents.
It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such as
diagnosis a medical condition or communicating with humans in natural language.
It is also a way which describes how we can represent knowledge in artificial intelligence.
Knowledge representation is not just storing data into some database, but it also enables an
intelligent machine to learn from that knowledge and experiences so that it can behave intelligently
like a human.
Types of Knowledge
Knowledge is awareness or familiarity gained by experiences of facts, data, and situations.
Following are the types of knowledge in artificial intelligence:
1. Declarative Knowledge:
Declarative knowledge is to know about something.
It includes concepts, facts, and objects.
It is also called descriptive knowledge and expressed in declarative sentences.
It is simpler than procedural language
2. Procedural Knowledge:
It is also known as imperative knowledge.
50
Procedural knowledge is a type of knowledge which is responsible for knowing how to do
something.
It can be directly applied to any task.
It includes rules, strategies, procedures, agendas, etc.
3. Meta-knowledge:
Knowledge about the other types of knowledge is called Meta-knowledge.
4. Heuristic knowledge:
Heuristic knowledge is representing knowledge of some experts in a filed or subject.
Heuristic knowledge is rules of thumb based on previous experiences, awareness of approaches,
and which are good to work but not guaranteed.
5. Structural knowledge:
Structural knowledge is basic knowledge to problem-solving.
It describes relationships between various concepts such as kind of, part of, and grouping of
something.
It describes the relationship that exists between concepts or objects.
Logical Representation
Semantic Network Representation
Frame Representation
Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals with propositions and
has no ambiguity in representation. Logical representation means drawing a conclusion based on
various conditions. This representation lays down some important communication rules. It consists
of precisely defined syntax and semantics which supports the sound inference. Each sentence can
be translated into logics using syntax and semantics.
Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
51
o Semantics are the rules by which we can interpret the sentence in the logic.
o Semantic also involves assigning a meaning to each sentence.
Example:
Following are some statements which we need to represent in the form of nodes and arcs.
Statements:
a. Jerry is a cat.
b. Jerry is a mammal
c. Jerry is owned by Priya.
d. Jerry is white colored.
e. All Mammals are animal.
In the diagram above, the different types of knowledge has been represented in the form of nodes
and arcs. Each object is connected with another object by some relation.
52
Advantages of Semantic network:
1. Semantic networks are a natural representation of knowledge.
2. Semantic networks convey meaning in a transparent manner.
3. These networks are simple and easily understandable.
3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values to
describe an entity in the world. Frames are the AI data structure which divides knowledge into
substructures by representing stereotypes situations. It consists of a collection of slots and slot
values. These slots may be of any type and sizes. Slots have names and values which are called
facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which enable
us to put constraints on the frames. Example: IF-NEEDED facts are called when data of any
particular slot is needed. A frame may consist of any number of slots, and a slot may include any
number of facets and facets may have any number of values. A frame is also known as slot-filter
knowledge representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes and
objects. A single frame is not much useful. Frames system consist of a collection of frames which
are connected. In the frame, knowledge about an object or event can be stored together in the
knowledge base. The frame is a type of technology which is widely used in various applications
including Natural language processing and machine visions.
53
Example:
Let's take an example of a frame for a book
Slots Filters
Year 1996
Page 1152
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then action".
It has mainly three parts:
54
In production rules, agent checks for the condition and if the condition exists then production rule
fires and corresponding action is carried out. The condition part of the rule determines which rule
may be applied to a problem. And the action part carries out the associated problem-solving steps.
This complete process is called a recognize-act cycle.
The working memory contains the description of the current state of problems-solving and rule
can write knowledge to the working memory. This knowledge match and may fire other rules.
If there is a new situation (state) generated, then multiple production rules will be fired together,
this is called conflict set. In this situation, the agent needs to select a rule from these sets, and it is
called a conflict resolution.
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).
Propositional Logic
Propositional logic is the branch of logic that studies ways of joining and/or modifying entire
propositions, statements or sentences to form more complicated propositions, statements or
sentences, as well as the logical relationships and properties that are derived from these methods
of combining or altering statements. A proposition is a declarative statement which is either true
or false. It is a technique of knowledge representation in logical and mathematical form.
Example:
1. It is Sunday.
2. The Sun rises from West (False proposition)
3. 3+3= 7(False proposition)
4. 5 is a prime number.
55
Basic facts about propositional logic:
o Propositional logic is also called Boolean logic as it works on 0 and 1.
o In propositional logic, we use symbolic variables to represent the logic, and we can use
any symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called
o Statements which are questions, commands, or opinions are not propositions, e.g.
"Where is Ruth?", "How are you?", "What is your name?”.
Syntax of propositional logic:
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
Logical Connectives
Logical connectives are used to connect two simpler propositions or representing a sentence
logically. We can create compound propositions with the help of logical connectives. There are
mainly five connectives, which are given as follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive
literal or negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: “Kemi is intelligent and hardworking”. It can be written as,
P= Kemi is intelligent,
Q= Kemi is hardworking. → P∧ Q.
56
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction,
where P and Q are the propositions.
Example: "Ahmed is a doctor or Engineer",
Here P= Ahmed is Doctor. Q= Ahmed is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also
known as if-then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am
breathing, then I am alive.
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Following is the summarized table for Propositional Logic Connectives:
Truth Table
In propositional logic, we need to know the truth values of propositions in all possible scenarios.
We can combine all the possible combination with logical connectives, and the representation of
these combinations in a tabular format is called Truth table. Following are the truth table for all
logical connectives:
57
Precedence of connectives
Just like arithmetic operators, there is a precedence order for propositional connectors or logical
operators. This order should be followed while evaluating a propositional problem. Following is
the list of the precedence order for operators:
58
Logical equivalence
Logical equivalence is one of the features of propositional logic. Two propositions are said to be
logically equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In
below truth table we can see that column for ¬A∨ B and A→B, are identical hence ¬A∨ B is
Equivalent to A→B.
First-Order Logic
In propositional logic, we can only represent the facts, which are either true or false. PL is not
sufficient to represent the complex sentences or natural language statements. The propositional
logic has very limited expressive power.
First-order logic is another way of knowledge representation in artificial intelligence. It is an
extension to propositional logic. FOL is sufficiently expressive to represent the natural language
statements in a concise way.
First-order logic is also known as Predicate logic or First-order predicate logic. First-order
logic is a powerful language that develops information about the objects in a more easy way and
can also express the relationship between those objects.
As a natural language, first-order logic has two main parts:
a. Syntax
b. Semantics
Syntax of First-Order logic:
59
The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in short-
hand notation in FOL.
Basic Elements of First-order logic syntax:
Variables x, y, z, a, b,....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic Sentences
Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms. Atomic sentences
can be represented as: Predicate (term1, term2, ......, term n).
Example: Yemi and Ajayi are brothers: => Brothers(Yemi, Ajayi).
Tom is a cat: => cat (Tom).
Complex Sentences
Complex sentences are made by combining atomic sentences using connectives. First-order logic
statements can be divided into two parts:
Subject: Subject is the main part of the statement.
Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement.
Consider the statement: "x is an integer.” it consists of two parts, the first part x is the subject of
the statement and second part "is an integer," is known as a predicate.
60
Forward and Backward Chaining
Forward chaining
Forward chaining is a method of reasoning in artificial intelligence in which inference rules are
applied to existing data to extract additional data until an endpoint (goal) is achieved.
In this type of chaining, the inference engine starts by evaluating existing facts, derivations, and
conditions before deducing new information. An endpoint (goal) is achieved through the
manipulation of knowledge that exists in the knowledge base. Forward chaining can be used in
planning, monitoring, controling, and interpreting applications.
Examples:
A simple example of forward chaining can be explained in the following sequence.
A
A->B
B
A is the starting point. A->B represents a fact. This fact is used to achieve a decision B.
61
Advantages
It can be used to draw multiple conclusions.
It provides a good basis for arriving at conclusions.
It’s more flexible than backward chaining because it does not have a limitation on the
data derived from it.
Disadvantages
The process of forward chaining may be time-consuming. It may take a lot of time to
eliminate and synchronize available data.
Unlike backward chaining, the explanation of facts or observations for this type of
chaining is not very clear. The former uses a goal-driven method that arrives at
conclusions efficiently.
Backward chaining
Backward chaining involves backtracking from the endpoint or goal to steps that led to the
endpoint. This type of chaining starts from the goal and moves backward to comprehend the
steps that were taken to attain this goal.
The backtracking process can also enable a person establish logical steps that can be used to find
other important solutions. Backward chaining can be used in debugging, diagnostics, and
prescription applications.
Examples:
B
A->B
A
62
B is the goal or endpoint that is used as the starting point for backward tracking. A is the initial
state. A->B is a fact that must be asserted to arrive at the endpoint B.
A practical example of backward chaining will go as follows:
Tom is sweating (B).
If a person is running, he will sweat (A->B).
Tom is running (A).
Advantages
The result is already known, which makes it easy to deduce inferences.
It’s a quicker method of reasoning than forward chaining because the endpoint is
available.
In this type of chaining, correct solutions can be derived effectively if pre-determined
rules are met by the inference engine.
Disadvantages
The process of reasoning can only start if the endpoint is known.
It doesn’t deduce multiple solutions or answers.
It only derives data that is needed, which makes it less flexible than forward chaining.
PROBABILISTIC REASONING
Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty.
Probability is used in probabilistic reasoning because it provides a way to handle the uncertainty
that is the result of someone's laziness and ignorance. In the real world, there are lots of scenarios,
where the certainty of something is not confirmed, such as "It will rain today," "behavior of
someone for some situations," "A match between two teams or two players." Etc. These are
probable sentences for which we can assume that it will happen but not sure about it, so
probabilistic reasoning is used.
Probability
Probability can be defined as a chance that an uncertain event will occur. It is the numerical
measure of the likelihood that an event will occur. The value of probability always remains
between 0 and 1 that represent ideal uncertainties.
0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
P(A) = 0, indicates total uncertainty in an event A.
P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
63
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real
world.
Prior probability: The prior probability of an event is probability computed before observing new
information.
Posterior Probability: The probability that is calculated after all evidence or information has
taken into account. It is a combination of prior probability and new information.
Conditional probability: Conditional probability is a probability of an event occurring when
another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability
of A under the conditions of B", it can be written as:
P(A⋀B)= Joint probability of A and B
P(B)= Marginal probability of B.
BAYES' THEOREM
Bayes' theorem is also known as Bayes' rule, Bayes' law or Bayesian reasoning, which determines
the probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference
is an application of Bayes' theorem, which is fundamental to Bayesian statistics It is a way to
calculate the value of P(B|A) with the knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Example
If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability
of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
64
Equating right hand side of both the equations, we will get:
The above equation is called as Bayes' rule or Bayes' theorem. This equation is basic of most
modern
AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when an evidence B has occurred.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
P(B) is called marginal probability, pure probability of an evidence.
65
Hence, we can assume that 1 patient out of 750 patients has meningitis with a stiff neck.
References
Negnevitsky, M. (2005). Artificial Intelligence: A Guide to Intelligent Systems. Pearson Education
Limited, Edinburgh Gate Harlow, Essex, England.
Russell S. J. and Norvig Peter. (1995). Artificial Intelligence A Modern Approach. Prentice Hall,
Englewood Cliffs, NJ. Hopfield.
https://www.javatpoint/
66