Ai Course File
Ai Course File
DEPARTMENT OF ______________________________________________
COURSE FILE INDEX – THEORY COURSE
Name of the Faculty
Designation and Department
Course Code and Name
Department to which the Course is offered
Academic Year / Semester / Section
Yes / Yes /
SL.No List/Detail SL.No List/Detail
No No
Vision, Mission of the Institute
1 9 Personal log book
& Department
Consolidated Internal Mark
2 Authenticated Syllabus Copy 10
Statement
Student’s Nominal Roll
3 11 Course End Survey
(Approved Copy)
Internal Assessment Test / End Semester Examination (tick the corresponding cell)
Slow Learners
Question Result Sample
Test No. Answer Key List & Remedial
Paper Analysis Answer Scripts
Measures
1
17
2
3
End Semester
Other Assessments (Quiz / Open Book Test / Assignment etc.,) (tick the corresponding cell)
Answer Key
Question Mark Details /
with Rubrics Sample
Type
Paper Statement Answer Scripts Remarks
18 (If any)
Data Science and produce technocrats who can able to provide solutions to
in the student’s mind, thus empowering them to meet the global challenges.
COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Explain intelligent agent frameworks
CO2: Apply problem solving techniques
CO3: Apply game playing and CSP techniques
CO4: Perform logical reasoning
CO5: Perform probabilistic reasoning under uncertainty
TOTAL:45 PERIODS
TEXT BOOKS:
1. Stuart Russell and Peter Norvig, “Artificial Intelligence – A Modern Approach”,
Fourth Edition, Pearson Education, 2021.
REFERENCES:
1. Dan W. Patterson, “Introduction to AI and ES”, Pearson Education,2007
2. Kevin Night, Elaine Rich, and Nair B., “Artificial Intelligence”, McGraw Hill,
2008
3. Patrick H. Winston, "Artificial Intelligence", Third Edition, Pearson Education,
2006
4. Deepak Khemani, “Artificial Intelligence”, Tata McGraw Hill Education, 2013.
5. http://nptel.ac.in/
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
Approved by AICTE, NewDelhi & Affiliated to Anna University, Chennai
Elayampalayam, Tiruchengode Tk – 637205, Namakkal Dt.
Name Of The
S.No Reg. No Remarks
Student
1 613023243001 Aashika S
2 613023243002 Abinaya K
3 613023243003 Agalya V
4 613023243004 Akila K
5 613023243006 Anusree M
6 613023243007 Arunaethi M S
7 613023243008 Arunika E
8 613023243009 Chella
9 613023243010 Devasri K
10 613023243011 Dharshini G S
11 613023243012 Dhivyalakshmi V
12 613023243013 Elangani E
13 613023243014 Hemadharshini M
14 613023243015 Hemamalini L
15 613023243016 Hemavathi S
16 613023243017 Indhuja P
17 613023243018 Indhumathi E
18 613023243019 Janani S
19 613023243020 Jaya S
20 613023243021 Joshika S
21 613023243022 Kalpana J
22 613023243023 Kanishka J
23 613023243024 Kanmani V
24 613023243025 Kaviya C
25 613023243026 Kaviya K
26 613023243027 Keerthana P
27 613023243028 Kiruthika R
28 613023243029 Kiruthika S
29 613023243030 Kowshika S
30 613023243031 Lathasree S
31 613023243032 Madhumitha M
32 613023243033 Mahalakshmi S
33 613023243034 Nandhini R
34 613023243035 Nasiha M
35 613023243036 Navya K
38 613023243039 Pavithra B
39 613023243040 Punitham S
40 613023243041 Reemas M
41 613023243042 Rithicka M
42 613023243043 Sandhiya R
43 613023243044 Sandhiya S
44 613023243045 Selvamithra S
46 613023243047 Sophiya B
47 613023243048 Sowbarnika S J
48 613023243049 Srivarshini K
49 613023243050 Suganthi S
50 613023243051 Suganya V
51 613023243052 Sujitha M
52 613023243053 Swetha C
55 613023243056 Thejeswini S
56 613023243057 Thennarasi A
58 613023243059 Varsha A
59 613023243060 Vethavalli J
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
Approved by AICTE, NewDelhi & Affiliated to Anna University, Chennai
Elayampalayam, Tiruchengode Tk – 637205, Namakkal Dt.
Time 11:15 AM
01:40 PM
03:20 PM
11:05 AM
I II III IV V VI VII
12:45 PM
03:10 PM
09:30AM 10:20 AM 11:15 AM 12:00 PM 01:40 PM 02:25 PM 03:20 PM
To
To
to
to to to to to to to
Day 10:20AM 11:05 AM 12:00 PM 12:45 PM 02:25 PM 03:10 PM 04:00 PM
Monday AI
Tuesday AI
Lunch
Break
Break
Wednesday
Thursday AI
Friday AI
EnggTree.com
UNIT I
Introduction to AI:
In today's world, technology is growing very fast, and we are getting in touch with different new
technologies day by day.
Here, one of the booming technologies of computer science is Artificial Intelligence which is
ready to create a new revolution in the world by making intelligent machines.The Artificial
Intelligence is now all around us. It is currently working with a variety of subfields, ranging from
general to specific, such as self-driving cars, playing chess, proving theorems, playing music,
Painting, etc.
AI is one of the fascinating and universal fields of Computer science which has a great scope in
Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial
defines "man-made," and intelligence defines "thinking power", hence AI means "a man-made
thinking power."
"It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions."
Artificial Intelligence exists when a machine can have human based skills such as learning,
reasoning, and solving problems
With Artificial Intelligence you do not need to preprogram a machine to do some work, despite
that you can create a machine with programmed algorithms which can work with own
intelligence, and that is the awesomeness of AI.
It is believed that AI is not a new technology, and some people says that as per Greek myth,
there were Mechanical men in early days which can work and behave like humans.
Before Learning about Artificial Intelligence, we should know that what is the importance of AI
and why should we learn it. Following are some main reasons to learn about AI:
o With the help of AI, you can create such software or devices which can solve real-world
problems very easily and with accuracy such as health issues, marketing, traffic issues,
etc.
o With the help of AI, you can create your personal virtual Assistant, such as Cortana,
Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an environment where
survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new Opportunities.
Artificial Intelligence is not just a part of computer science even it's so vast and requires lots of
other factors which can contribute to it. To create the AI first we should know that how
intelligence is composed, so the Intelligence is an intangible part of our brain which is a
combination of Reasoning, learning, problem-solving perception, language understanding,
etc.
To achieve the above factors for a machine or software Artificial Intelligence requires the
following discipline:
o Mathematics
o Biology
o Psychology
o Sociology
o Computer Science
o Neurons Study
o Statistics
o High Accuracy with less errors: AI machines or systems are prone to less errors and
high accuracy as it takes decisions as per pre-experience or information.
o High-Speed: AI systems can be of very high-speed and fast-decision making, because of
that AI systems can beat a chess champion in the Chess game.
o High reliability: AI machines are highly reliable and can perform the same action
multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as defusing a
bomb, exploring the ocean floor, where to employ a human can be risky.
o Digital Assistant: AI can be very useful to provide digital assistant to the users such as
AI technology is currently used by various E-commerce websites to show the products as
per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as a self-driving
car which can make our journey safer and hassle-free, facial recognition for security
purpose, Natural language processing to communicate with the human in human-
language, etc.
Every technology has some disadvantages, and thesame goes for Artificial intelligence. Being so
advantageous technology still, it has some disadvantages which we need to keep in our mind
while creating an AI system. Following are the disadvantages of AI:
o High Cost: The hardware and software requirement of AI is very costly as it requires lots
of maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI, but still they
cannot work out of the box, as the robot will only do that work for which they are trained,
or programmed.
o No feelings and emotions: AI machines can be an outstanding performer, but still it does
not have the feeling so it cannot make any kind of emotional attachment with human, and
may sometime be harmful for users if the proper care is not taken.
o Increase dependency on machines: With the increment of technology, people are
getting more dependent on devices and hence they are losing their mental capabilities.
o No Original Creativity: As humans are so creative and can imagine some new ideas but
still AI machines cannot beat this power of human intelligence and cannot be creative and
imaginative.
Prerequisite
Before learning about Artificial Intelligence, you must have the fundamental knowledge of
following so that you can understand the concepts easily:
o Any computer language such as C, C++, Java, Python, etc.(knowledge of Python will be
an advantage)
o Knowledge of essential Mathematics such as derivatives, probability theory, etc.
Right Education can enhance the power of individuals/nations; on the other hand, misuse of the
same could lead to devastating results.
3. AI in Finance
Quantification of growth for any country is directly related to its economic and financial
condition. As AI has enormous scope in almost every field, it has great potential to boost
individuals’ economic health and a nation. Nowadays, the AI algorithm is being used in
managing equity funds.
An AI system could take a lot number of parameters while figuring out the best way to manage
funds. It would perform better than a human manager. AI-driven strategies in the field of finance
are going to change the classical way of trading and investing. It could be devastating for some
fund managing firms who cannot afford such facilities and could affect business on a large scale,
as the decision would be quick and abrupt. The competition would be tough and on edge all the
time.
AI-assisted strategies would enhance mission effectiveness and will provide the safest way to
execute it. The concerning part with AI-assisted system is that how it performs algorithm is not
quite explainable. The deep neural networks learn faster and continuously keep learning the main
problem here would be explainable AI. It could possess devastating results when it reaches in the
wrong hands or makes wrong decisions on its own.
Intelligent Agents:
Agents can be grouped into five classes based on their degree of perceived intelligence and
capability. All these agents can improve their performance and generate better action over the
time. These are given below:
o The Simple reflex agents are the simplest agents. These agents take decisions on the basis
of the current percepts and ignore the rest of the percept history.
o These agents only succeed in the fully observable environment.
o The Simple reflex agent does not consider any part of percepts history during their
decision and action process.
o The Simple reflex agent works on Condition-action rule, which means it maps the current
state to action. Such as a Room Cleaner agent, it works only if there is dirt in the room.
o Problems for the simple reflex agent design approach:
o They have very limited intelligence
o They do not have knowledge of non-perceptual parts of the current state
o Mostly too big to generate and to store.
o Not adaptive to changes in the environment.
o The Model-based agent can work in a partially observable environment, and track the
situation.
o A model-based agent has two important factors:
o Model: It is knowledge about "how things happen in the world," so it is called a
Model-based agent.
o Internal State: It is a representation of the current state based on percept history.
o These agents have the model, "which is knowledge of the world" and based on the model
they perform actions.
o Updating the agent state requires information about:
a. How the world evolves
3. Goal-based agents
o The knowledge of the current state environment is not always sufficient to decide for an
agent to what to do.
o The agent needs to know its goal which describes desirable situations.
o Goal-based agents expand the capabilities of the model-based agent by having the "goal"
information.
o They choose an action, so that they can achieve the goal.
o These agents may have to consider a long sequence of possible actions before deciding
whether the goal is achieved or not. Such considerations of different scenario are called
searching and planning, which makes an agent proactive.
4. Utility-based agents
o These agents are similar to the goal-based agent but provide an extra component of utility
measurement which makes them different by providing a measure of success at a given
state.
o Utility-based agent act based not only goals but also the best way to achieve the goal.
o The Utility-based agent is useful when there are multiple possible alternatives, and an
agent has to choose in order to perform the best action.
o The utility function maps each state to a real number to check how efficiently each action
achieves the goals.
5. Learning Agents
o A learning agent in AI is the type of agent which can learn from its past experiences, or it
has learning capabilities.
o It starts to act with basic knowledge and then able to act and adapt automatically through
learning.
o A learning agent has mainly four conceptual components, which are:
a. Learning element: It is responsible for making improvements by learning from
environment
b. Critic: Learning element takes feedback from critic which describes that how
well the agent is doing with respect to a fixed performance standard.
c. Performance element: It is responsible for selecting external action
d. Problem generator: This component is responsible for suggesting actions that
will lead to new and informative experiences.
Hence, learning agents are able to learn, analyze performance, and look for new ways to improve
the performance.
Nature of Nature of
Nature of Environments:
The environment is the Task Environment (problem) for which the Rational Agent is the
solution. Any task environment is characterised on the basis of PEAS.
1. Performance – What is the performance characteristic which would either make the agent
successful or not. For example, as per the previous example clean floor, optimal energy
consumption might be performance measures.
2. Environment – Physical characteristics and constraints expected. For example, wood floors,
furniture in the way etc
3. Actuators – The physical or logical constructs which would take action. For example for the
vacuum cleaner, these are the suction pumps
4. Sensors – Again physical or logical constructs which would sense the environment.
Rational Agents could be physical agents like the one described above or it could also be a
program that operates in a non-physical environment like an operating system. Imagine a bot
web site operator designed to scan Internet news sources and show the interesting items to its
users, while selling advertising space to generate revenue.
As another example, consider an online tutoring system
Environments can further be classified into various buckets. This would help determine the
intelligence which would need to be built in the agent. These are
Observable – Full or Partial? If the agents sensors get full access then they do not need to pre-
store any information. Partial may be due to inaccuracy of sensors or incomplete information
about an environment, like limited access to enemy territory
Number of Agents – For the vacuum cleaner, it works in a single agent environment but for
driver-less taxis, every driver-less taxi is a separate agent and hence multi agent environment
Deterministic – The number of unknowns in the environment which affect the predictability of
the environment. For example, floor space for cleaning is mostly deterministic, the furniture is
where it is most of the time but taxi driving on a road is non-deterministic.
Discrete – Does the agent respond when needed or does it have to continuously scan the
environment. Driver-less is continuous, online tutor is discrete
Static – How often does the environment change. Can the agent learn about the environment and
always do the same thing?
Episodic – If the response to a certain precept is not dependent on the previous one i.e. it is
stateless (static methods in Java) then it is discrete. If the decision taken now influences the
future decisions then it is a sequential environment.
An AI system can be defined as the study of the rational agent and its environment. The agents
sense the environment through sensors and act on their environment through actuators. An AI
agent can have mental properties such as knowledge, belief, intention, etc.
What is an Agent?
An agent can be anything that perceiveits environment through sensors and act upon that
environment through actuators. An Agent runs in the cycle of perceiving, thinking, and acting.
An agent can be:
o Human-Agent: A human agent has eyes, ears, and other organs which work for sensors
and hand, legs, vocal tract work for actuators.
o Robotic Agent: A robotic agent can have cameras, infrared range finder, NLP for
sensors and various motors for actuators.
o Software Agent: Software agent can have keystrokes, file contents as sensory input and
act on those inputs and display output on the screen.
Hence the world around us is full of agents such as thermostat, cellphone, camera, and even
we are also agents.
Before moving forward, we should first know about sensors, effectors, and actuators.
Sensor: Sensor is a device which detects the change in the environment and sends the
information to other electronic devices. An agent observes its environment through sensors.
Actuators: Actuators are the component of machines that converts energy into motion. The
actuators are only responsible for moving and controlling a system. An actuator can be an
electric motor, gears, rails, etc.
Effectors: Effectors are the devices which affect the environment. Effectors can be legs, wheels,
arms, fingers, wings, fins, and display screen.
Intelligent Agents:
An intelligent agent is an autonomous entity which act upon an environment using sensors and
actuators for achieving goals. An intelligent agent may learn from the environment to achieve
their goals. A thermostat is an example of an intelligent agent.
Rational Agent:
A rational agent is an agent which has clear preference, models uncertainty, and acts in a way to
maximize its performance measure with all possible actions.
A rational agent is said to perform the right things. AI is about creating rational agents to use for
game theory and decision theory for various real-world scenarios.
For an AI agent, the rational action is most important because in AI reinforcement learning
algorithm, for each best possible action, agent gets the positive reward and for each wrong
action, an agent gets a negative reward.
Rationality:
The rationality of an agent is measured by its performance measure. Rationality can be judged on
the basis of following points:
Note: Rationality differs from Omniscience because an Omniscient agent knows the actual
outcome of its action and act accordingly, which is not possible in reality.
Structure of an AI Agent
The task of AI is to design an agent program which implements the agent function. The structure
of an intelligent agent is a combination of architecture and agent program. It can be viewed as:
Following are the main three terms involved in the structure of an AI agent:
f:P* → A
PEAS Representation
PEAS is a type of model on which an AI agent works upon. When we define an AI agent or
rational agent, then we can group its properties under PEAS representation model. It is made up
of four words:
o P: Performance measure
o E: Environment
o A: Actuators
o S: Sensors
Here performance measure is the objective for the success of an agent's behavior.
1. Keyboard
o Healthy patient o Patient o Tests
Medical (Entry of symptoms)
Diagnose o Minimized cost o Hospital o Treatments
o Staff
2.
o Cleanness o Room o Wheels o Camera
Vacuum
Cleaner o Efficiency o Table o Brushes o Dirt detection
o Battery life o Wood floor o Vacuum sensor
3. Part -
o Percentage of o Conveyor belt o Jointed Arms o Camera
picking
parts in correct with parts, o Hand o Joint angle
Robot
bins. o Bins sensors.
An environment is everything in the world which surrounds the agent, but it is not a part of an
agent itself. An environment can be described as a situation in which an agent is present.
The environment is where agent lives, operate and provide the agent with something to sense and
act upon it. An environment is mostly said to be non-feministic.
Features of Environment
Environment can have various features from the point of view of an agent:
o If an agent sensor can sense or access the complete state of an environment at each point
of time then it is a fully observable environment, else it is partially observable.
o A fully observable environment is easy as there is no need to maintain the internal state to
keep track history of the world.
o An agent with no sensors in all environments then such an environment is called
as unobservable.
2. Deterministic vs Stochastic:
o If an agent's current state and selected action can completely determine the next state of
the environment, then such environment is called a deterministic environment.
o A stochastic environment is random in nature and cannot be determined completely by an
agent.
o In a deterministic, fully observable environment, agent does not need to worry about
uncertainty.
3. Episodic vs Sequential:
o In an episodic environment, there is a series of one-shot actions, and only the current
percept is required for the action.
o However, in Sequential environment, an agent requires memory of past actions to
determine the next best actions.
4. Single-agent vs Multi-agent
o If only one agent is involved in an environment, and operating by itself then such an
environment is called single agent environment.
o However, if multiple agents are operating in an environment, then such an environment is
called a multi-agent environment.
o The agent design problems in the multi-agent environment are different from single agent
environment.
5. Static vs Dynamic:
o If the environment can change itself while an agent is deliberating then such environment
is called a dynamic environment else it is called a static environment.
o Static environments are easy to deal because an agent does not need to continue looking
at the world while deciding for an action.
o However for dynamic environment, agents need to keep looking at the world at each
action.
o Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an
example of a static environment.
6. Discrete vs Continuous:
o If in an environment there are a finite number of percepts and actions that can be
performed within it, then such an environment is called a discrete environment else it is
called continuous environment.
o A chess gamecomes under discrete environment as there is a finite number of moves that
can be performed.
o A self-driving car is an example of a continuous environment.
7. Known vs Unknown
o Known and unknown are not actually a feature of an environment, but it is an agent's
state of knowledge to perform an action.
o In a known environment, the results for all actions are known to the agent. While in
unknown environment, agent needs to learn how it works in order to perform an action.
o It is quite possible that a known environment to be partially observable and an Unknown
environment to be fully observable.
8. Accessible vs Inaccessible
o If an agent can obtain complete and accurate information about the state's environment,
then such an environment is called an Accessible environment else it is called
inaccessible.
o An empty room whose state can be defined by its temperature is an example of an
accessible environment.
o Information about an event on earth is an example of Inaccessible environment.
Search algorithms are one of the most important areas of Artificial Intelligence.
Problem-solving agents:
solve a specific problem and provide the best result. Problem-solving agents are the goal-based
agents and use atomic representation. In this topic, we will learn various problem-solving search
algorithms.
Following are the four essential properties of search algorithms to compare the efficiency of
these algorithms:
Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest
path cost) among all other solutions, then such a solution for is said to be an optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.
Space Complexity: It is the maximum storage space required at any point during the search, as
the complexity of the problem.
Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.
Uninformed search is a class of general-purpose search algorithms which operates in brute force-
way. Uninformed search algorithms do not have additional information about state or search
space other than how to traverse the tree, so it is also called blind search.
1. Breadth-first Search
2. Depth-first Search
3. Depth-limited Search
4. Iterative deepening depth-first search
5. Uniform cost search
6. Bidirectional Search
1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first search.
o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search implemented using FIFO queue data structure.
Advantages:
Disadvantages:
o It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm from
the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow the path
which is shown by the dotted arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes
traversed in BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a
node at every state.
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier
which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
2. Depth-first Search
o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.
Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.
Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the path
from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right
path).
Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee of
finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
It will start searching from root node S, and traverse A, then B, then D and E, after traversing E,
it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal node.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.
o Standard failure value: It indicates that problem does not have any solution.
o Cutoff failure value: It defines no solution for the problem within a given depth limit.
Advantages:
Disadvantages:
Example:
Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not optimal
even if ℓ>d.
Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of
the uniform-cost search is to find a path to the goal node which has the lowest cumulative cost.
Uniform-cost search expands nodes according to their path costs form the root node. It can be
used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
algorithm is implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of all edges
is the same.
Advantages:
o Uniform cost search is optimal because at every state the path with the least cost is
chosen.
Disadvantages:
o It does not care about the number of steps involve in searching and only concerned about
path cost. Due to which this algorithm may be stuck in an infinite loop.
Example:
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and ε is each step to get closer to the goal node. Then
the number of steps is = C*/ε+1. Here we have taken +1, as we start from state 0 and end to
C*/ε.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1 + [C*/ε]).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
The iterative deepening algorithm is a combination of DFS and BFS algorithms. This search
algorithm finds out the best depth limit and does it by gradually increasing the limit until a goal
is found.
This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing
the depth limit after each iteration until the goal node is found.
This Search algorithm combines the benefits of Breadth-first search's fast search and depth-first
search's memory efficiency.
The iterative search algorithm is useful uninformed search when search space is large, and depth
of goal node is unknown.
Advantages:
o Itcombines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.
Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.
Example:
Following tree structure is showing the iterative deepening depth-first search. IDDFS algorithm
performs various iterations until it does not find the goal node. The iteration performed by the
algorithm is given as:
1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.
Completeness:
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).
Space Complexity:
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the node.
Bidirectional search algorithm runs two simultaneous searches, one form initial state called as
forward-search and other from goal node called as backward-search, to find the goal node.
Bidirectional search replaces one single search graph with two small subgraphs in which one
starts the search from an initial vertex and other starts from goal vertex. The search stops when
these two graphs intersect each other.
Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.
Advantages:
Disadvantages:
Example:
In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and starts
from goal node 16 in the backward direction.
So far we have talked about the uninformed search algorithms which looked through search
space for all possible solutions of the problem without having any additional knowledge about
search space. But informed search algorithm contains an array of knowledge such as how far we
are from the goal, path cost, how to reach to goal node, etc. This knowledge help agents to
explore less to the search space and find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search algorithm
uses the idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not always
give the best solution, but it guaranteed to find a good solution in reasonable time. Heuristic
function estimates how close a state is to the goal. It is represented by h(n), and it calculates the
cost of an optimal path between the pair of states. The value of the heuristic function is always
positive.
Pure heuristic search is the simplest form of heuristic search algorithms. It expands nodes based
on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In the CLOSED
list, it places those nodes which have already expanded and in the OPEN list, it places nodes
which have yet not been expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.
In the informed search we will discuss two main algorithms which are given below:
Greedy best-first search algorithm always selects the path which appears best at that moment. It
is the combination of depth-first search and breadth-first search algorithms. It uses the heuristic
function and search. Best-first search allows us to take the advantages of both algorithms. With
the help of best-first search, at each step, we can choose the most promising node. In the best
first search algorithm, we expand the node which is closest to the goal node and the closest cost
is estimated by heuristic function, i.e.
f(n)= g(n).
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the
algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the
below table.
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following
are the iteration for traversing the above example.
9 0
Time Complexity: The worst case time complexity of Greedy best first search is O(b m).
Space Complexity: The worst case space complexity of Greedy best first search is O(b m).
Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
A* search is the most commonly known form of best-first search. It uses heuristic function h(n),
and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds the
shortest path through the search space using the heuristic function. This search algorithm
expands less search tree and provides optimal result faster. A* algorithm is similar to UCS
except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we
can combine both costs as following, and this sum is called as a fitness number.
Algorithm of A* search:
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the
memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of
all states is given in the below table so we will calculate the f(n) of each state using the formula
f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Solution:
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with cost
6.
Points to remember:
o A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">
o Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in nature.
o Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least cost path.
Time Complexity: The time complexity of A* search algorithm depends on heuristic function,
and the number of nodes expanded is exponential to the depth of solution d. So the time
complexity is O(b^d), where b is the branching factor.
What is Heuristics?
A heuristic is a technique that is used to solve a problem faster than the classic methods. These
techniques are used to find the approximate solution of a problem when classical methods do not.
Heuristics are said to be the problem-solving techniques that result in practical and quick
solutions.
Heuristics are used in situations in which there is the requirement of a short-term solution. On
facing complex situations with limited resources and time, Heuristics can help the companies to
make quick decisions by shortcuts and approximated calculations. Most of the heuristic methods
involve mental shortcuts to make decisions on past experiences.
The heuristic method might not always provide us the finest solution, but it is assured that it
helps us find a good solution in a reasonable time.
Based on context, there can be different heuristic methods that correlate with the problem's
scope. The most common heuristic methods are - trial and error, guesswork, the process of
elimination, historical data analysis. These methods involve simply available information that is
not particular to the problem but is most appropriate. They can include representative, affect, and
availability heuristics.
It includes Blind Search, Uninformed Search, and Blind control strategy. These search
techniques are not always possible as they require much memory and time. These techniques
search the complete space for a solution and use the arbitrary ordering of operations.
The examples of Direct Heuristic search techniques include Breadth-First Search (BFS) and
Depth First Search (DFS).
It includes Informed Search, Heuristic Search, and Heuristic control strategy. These techniques
are helpful when they are applied properly to the right types of tasks. They usually require
domain-specific information.
The examples of Weak Heuristic search techniques include Best First Search (BFS) and A*.
Before describing certain heuristic techniques, let's see some of the techniques listed below:
o Bidirectional Search
o A* search
o Simulated Annealing
o Hill Climbing
o Best First search
o Beam search
It is a technique for optimizing the mathematical problems. Hill Climbing is widely used when a
good heuristic is available.
It is also called greedy local search as it only looks to its good immediate neighbor state and not
beyond that. The steps of a simple hill-climbing algorithm are listed below:
Step 1: Evaluate the initial state. If it is the goal state, then return success and Stop.
Step 2: Loop Until a solution is found or there is no new operator left to apply.
Else if it is better than the current state, then assign a new state as a current state.
Else if not better than the current state, then return to step2.
Step 5: Exit.
This algorithm always chooses the path which appears best at that moment. It is the combination
of depth-first search and breadth-first search algorithms. It lets us to take the benefit of both
algorithms. It uses the heuristic function and search. With the help of the best-first search, at
each step, we can choose the most promising node.
Step 3: Remove the node n from the OPEN list, which has the lowest value of h(n), and places it
in the CLOSED list.
Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any
successor node is the goal node, then return success and stop the search, else continue to next
step.
Step 6: For each successor node, the algorithm checks for evaluation function f(n) and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in both lists,
then add it to the OPEN list.
A* Search Algorithm
A* search is the most commonly known form of best-first search. It uses the heuristic function
h(n) and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently.
It finds the shortest path through the search space using the heuristic function. This search
algorithm expands fewer search tree and gives optimal results faster.
Algorithm of A* search:
Step 2: Check if the OPEN list is empty or not. If the list is empty, then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of the evaluation
function (g+h). If node n is the goal node, then return success and stop, otherwise.
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list. If not, then compute the
evaluation function for n' and place it into the Open list.
Step 5: Else, if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Some of the real-life examples of heuristics that people use as a way to solve a problem:
o Common sense: It is a heuristic that is used to solve a problem based on the observation
of an individual.
o Rule of thumb: In heuristics, we also use a term rule of thumb. This heuristic allows an
individual to make an approximation without doing an exhaustive search.
o Working backward: It lets an individual solve a problem by assuming that the problem
is already being solved by them and working backward in their minds to see how much a
solution has been reached.
o Availability heuristic: It allows a person to judge a situation based on the examples of
similar situations that come to mind.
Types of heuristics
There are various types of heuristics, including the availability heuristic, affect heuristic and
representative heuristic. Each heuristic type plays a role in decision-making. Let's discuss about
the Availability heuristic, affect heuristic, and Representative heuristic.
Availability heuristic
Availability heuristic is said to be the judgment that people make regarding the likelihood of an
event based on information that quickly comes into mind. On making decisions, people typically
rely on the past knowledge or experience of an event. It allows a person to judge a situation
based on the examples of similar situations that come to mind.
Representative heuristic
It occurs when we evaluate an event's probability on the basis of its similarity with another event.
Example: We can understand the representative heuristic by the example of product packaging,
as consumers tend to associate the products quality with the external packaging of a product. If a
company packages its products that remind you of a high quality and well-known product, then
consumers will relate that product as having the same quality as the branded product.
So, instead of evaluating the product based on its quality, customers correlate the products
quality based on the similarity in packaging.
Affect heuristic
It is based on the negative and positive feelings that are linked with a certain stimulus. It includes
quick feelings that are based on past beliefs. Its theory is one's emotional response to a stimulus
that can affect the decisions taken by an individual.
When people take a little time to evaluate a situation carefully, they might base their decisions
based on their emotional response.
If someone carefully analyzes the benefits and risks of consuming fast food, they might decide
that fast food is unhealthy. But people rarely take time to evaluate everything they see and
generally make decisions based on their automatic emotional response. So, Fast food companies
present advertisements that rely on such type of Affect heuristic for generating a positive
emotional response which results in sales.
Limitation of heuristics
o Although heuristics speed up our decision-making process and also help us to solve
problems, they can also introduce errors just because something has worked accurately in
the past, so it does not mean that it will work again.
o It will hard to find alternative solutions or ideas if we always rely on the existing
solutions or heuristics.
Heuristic Functions in AI: As we have already seen that an informed search make use of
heuristic functions in order to reach the goal node in a more prominent way. Therefore, there are
several pathways in a search tree to reach the goal node from the current node. The selection of a
good heuristic function matters certainly. A good heuristic function is determined by its
efficiency. More is the information about the problem, more is the processing time.
Some toy problems, such as 8-puzzle, 8-queen, tic-tac-toe, etc., can be solved more efficiently
with the help of a heuristic function. Let’s see how
Consider the following 8-puzzle problem where we have a start state and a goal state. Our task is
to slide the tiles of the current/start state and place it in an order followed in the goal state. There
can be four moves either left, right, up, or down. There can be several ways to convert the
current/start state to the goal state, but, we can use a heuristic function h(n) to solve the problem
more efficiently.
It is seen from the above state space tree that the goal state is minimized from h(n)=3 to h(n)=0.
However, we can create and use several heuristic functions as per the reqirement. It is also clear
from the above example that a heuristic function h(n) can be defined as the information required
to solve a given problem more efficiently. The information can be related to the nature of the
state, cost of transforming from one state to another, goal node characterstics, etc., which is
expressed as a heuristic function.
Which leads to a solution state required to reach the goal node. But beyond these “classical
search algorithms," we have some “local search algorithms” where the path cost does not
matters, and only focus on solution-state needed to reach the goal node.
A local search algorithm completes its task by traversing on a single current node rather than
multiple paths and following the neighbors of that node generally.
Although local search algorithms are not systematic, still they have the following two
advantages:
Local search algorithms use a very little or constant amount of memory as they operate
only on a single path.
Most often, they find a reasonable solution in large or infinite state spaces where the
classical or systematic algorithms do not work.
Does the local search algorithm work for a pure optimized problem?
Yes, the local search algorithm works for pure optimized problems. A pure optimization problem
is one where all the nodes can give a solution. But the target is to find the best state out of all
according to the objective function. Unfortunately, the pure optimization problem fails to find
high-quality solutions to reach the goal state from the current state.
Note: An objective function is a function whose value is either minimized or maximized in
different contexts of the optimization problems. In the case of search algorithms, an objective
function can be the path cost for reaching the goal node, etc.
Working of a Local search algorithm
Let's understand the working of a local search algorithm with the help of an example:
Consider the below state-space landscape having both:
The local search algorithm explores the above landscape by finding the following two points:
Global Minimum: If the elevation corresponds to the cost, then the task is to find the
lowest valley, which is known as Global Minimum.
Global Maxima: If the elevation corresponds to an objective function, then it finds the
highest peak which is called as Global Maxima. It is the highest point in the valley.
Hill-climbing Search
Simulated Annealing
Local Beam Search
Global Maximum: It is the highest point on the hill, which is the goal state.
Local Maximum: It is the peak higher than all other peaks but lower than the global
maximum.
Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It
is a saturated point of the hill.
Shoulder: It is also a flat area where the summit is possible.
Current state: It is the current position of the person.
Local Maxima: It is that peak of the mountain which is highest than all its neighboring
states but lower than the global maxima. It is not the goal peak because there is another peak higher than it.
Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the
climber to decide that in which direction he should move to reach the goal point.
Sometimes, the person gets lost in the flat area.
Ridges: It is a challenging problem where the person finds two or more local maxima of
the same height commonly. It becomes difficult for the person to navigate the right point
and stuck to that point itself.
Simulated Annealing
Simulated annealing is similar to the hill climbing algorithm. It works on the current situation. It
picks a random move instead of picking the best move. If the move leads to the improvement
of the current situation, it is always accepted as a step towards the solution state, else it accepts
the move having a probability less than 1. This search technique was first used in 1980 to
solve VLSI layout problems. It is also applied for factory scheduling and other large
optimization tasks.
Local Beam Search
Local beam search is quite different from random-restart search. It keeps track of k states instead
of just one. It selects k randomly generated states, and expand them at each step. If any state is a
goal state, the search stops with success. Else it selects the best k successors from the complete
list and repeats the same process. In random-restart search where each search process runs
independently, but in local beam search, the necessary information is shared between the parallel
search processes.
Disadvantages of Local Beam search
This search can suffer from a lack of diversity among the k states.
It is an expensive version of hill climbing search.
1. Game Theory:
Game theory is basically a branch of mathematics that is used to typical strategic interaction
between different players (agents), all of which are equally rational, in a context with
predefined rules (of playing or maneuvering) and outcomes. Every player or agent is a rational
entity who is selfish and tries to maximize the reward to be obtained using a particular
strategy. All the players abide by certain rules in order to receive a predefined playoff- a
reward after a certain outcome. Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are competing.
Game Theory has now become a describing factor for both Machine Learning algorithms and
many daily life situations.
Consider the SVM (Support Vector Machine) for instance. According to Game Theory, the
SVM is a game between 2 players where one player challenges the other to find the best hyper-
plane after providing the most difficult points for classification. The final playoff of this game
is a solution that will be a trade-off between the strategic abilities of both players competing.
Nash equilibrium:
Nash equilibrium can be considered the essence of Game Theory. It is basically a state, a point
of equilibrium of collaboration of multiple players in a game. Nash Equilibrium guarantees
maximum profit to each player.
Let us try to understand this with the help of Generative Adversarial Networks (GANs).
What is GAN?
It is a combination of two neural networks: the Discriminator and the Generator. The
Generator Neural Network is fed input images which it analyzes and then produces new
sample images, which are made to represent the actual input images as close as possible. Once
the images have been produced, they are sent to the Discriminator Neural Network. This neural
network judges the images sent to it and classifies them as generated images and actual input
images. If the image is classified as the original image, the DNN changes its parameters of
judging. If the image is classified as a generated image, the image is rejected and returned to
the GNN. The GNN then alters its parameters in order to improve the quality of the image
produced.
This is a competitive process which goes on until both neural networks do not require to make
any changes in their parameters and there can be no further improvement in both neural
networks. This state of no further improvement is known as NASH EQUILIBRIUM. In other
words, GAN is a 2-player competitive game where both players are continuously optimizing
themselves to find a Nash Equilibrium.
In any game, one of the agents is required to disclose their strategy in front of the other agents.
After the revelation, if none of the players changes their strategies, it is understood that the
game has reached Nash Equilibrium.
Now that we are aware of the basics of Game Theory, let us try to understand how Nash
Equilibrium is attained in a simultaneous game. There are many examples but the most famous
is the Prisoner’s Dilemma. There are some more examples such as the Closed-bag exchange
Game, the Friend or For Game, and the iterated Snowdrift Game.
In all these games, two players are involved and the final playoff is a result of a decision that
has to be made by both players. Both players have to make a choice between defection and co-
operation. If both players cooperate, the final playoff will turn out to be positive for both.
However, if both defect, the final playoff will be negative for both players. If there is a
combination of one player defecting and the other co-operating, the final playoff will be
positive for one and negative for another.
Here, Nash Equilibrium plays an important role. Only if both players jot out a strategy that
benefits each other and provide both with a positive playoff, the solution to this problem will
be optimal.
There are many more real examples and a number of pieces of code that try to solve this
dilemma. The basic essence, however, is the attainment of the Nash Equilibrium in an
uncomfortable situation.
Where is GAME THEORY now?
Game Theory is increasingly becoming a part of the real-world in its various applications in
areas like public health services, public safety, and wildlife. Currently, game theory is being
used in adversary training in GANs, multi-agent systems, and imitation and reinforcement
learning. In the case of perfect information and symmetric games, many Machine Learning and
Deep Learning techniques are applicable. The real challenge lies in the development of
techniques to handle incomplete information games, such as Poker. The complexity of the
game lies in the fact that there are too many combinations of cards and the uncertainty of the
cards being held by the various players.
Types of Games:
Currently, there are about 5 types of classification of games. They are as follows:
1. Zero-Sum and Non-Zero Sum Games: In non-zero-sum games, there are multiple players
and all of them have the option to gain a benefit due to any move by another player. In zero-
sum games, however, if one player earns something, the other players are bound to lose a
key playoff.
2. Simultaneous and Sequential Games: Sequential games are the more popular games where
every player is aware of the movement of another player. Simultaneous games are more
difficult as in them, the players are involved in a concurrent game. BOARD GAMES are the
perfect example of sequential games and are also referred to as turn-based or extensive-form
games.
Humans’ intellectual capacities have been engaged by games for as long as civilization has
existed, sometimes to an alarming degree. Games are an intriguing subject for AI researchers
because of their abstract character. A game’s state is simple to depict, and actors are usually
limited to a small number of actions with predetermined results. Physical games, such as
croquet and ice hockey, contain significantly more intricate descriptions, a much wider variety
of possible actions, and rather ambiguous regulations defining the legality of activities. With
the exception of robot soccer, these physical games have not piqued the AI community’s
interest.
Games are usually intriguing because they are difficult to solve. Chess, for example, has an
average branching factor of around 35, and games frequently stretch to 50 moves per player,
therefore the search tree has roughly 35100 or 10154 nodes (despite the search graph having
“only” about 1040 unique nodes). As a result, games, like the real world, necessitate the ability
to make some sort of decision even when calculating the best option is impossible.
Inefficiency is also heavily punished in games. Whereas a half-efficient implementation of A
search will merely take twice as long to complete, a chess software that is half as efficient in
utilizing its available time will almost certainly be beaten to death, all other factors being
equal. As a result of this research, a number of intriguing suggestions for making the most use
of time have emerged.
Let us start with games with two players, whom we’ll refer to as MAX and MIN for obvious
reasons. MAX is the first to move, and then they take turns until the game is finished. At the
conclusion of the game, the victorious player receives points, while the loser receives
penalties. A game can be formalized as a type of search problem that has the following
elements:
S0: The initial state of the game, which describes how it is set up at the start.
Player (s): Defines which player in a state has the move.
Actions (s): Returns a state’s set of legal moves.
Result (s, a): A transition model that defines a move’s outcome.
Terminal-Test (s): A terminal test that returns true if the game is over but false otherwise.
Terminal states are those in which the game has come to a conclusion.
Utility (s, p): A utility function (also known as a payout function or objective function )
determines the final numeric value for a game that concludes in the terminal state s for
player p. The result in chess is a win, a loss, or a draw, with values of +1, 0, or 1/2.
Backgammon’s payoffs range from 0 to +192, but certain games have a greater range of
possible outcomes. A zero-sum game is defined (confusingly) as one in which the total
reward to all players is the same for each game instance. Chess is a zero-sum game because
each game has a payoff of 0 + 1, 1 + 0, or 1/2 + 1/2. “Constant-sum” would have been a
preferable name, 22 but zero-sum is the usual term and makes sense if each participant is
charged 1.
The game tree for the game is defined by the beginning state, ACTIONS function, and
RESULT function—a tree in which the nodes are game states and the edges represent
movements. The figure below depicts a portion of the tic-tac-toe game tree (noughts and
crosses). MAX may make nine different maneuvers from his starting position. The game
alternates between MAXs setting an X and MINs placing an O until we reach leaf nodes
corresponding to terminal states, such as one player having three in a row or all of the
squares being filled. The utility value of the terminal state from the perspective of MAX is
shown by the number on each leaf node; high values are thought to be beneficial for MAX
and bad for MIN
The game tree for tic-tac-toe is relatively short, with just 9! = 362,880 terminal nodes.
However, because there are over 1040 nodes in chess, the game tree is better viewed as a
theoretical construct that cannot be realized in the actual world. But, no matter how big the
game tree is, MAX’s goal is to find a solid move. A tree that is superimposed on the whole
game tree and examines enough nodes to allow a player to identify what move to make is
referred to as a search tree.
A sequence of actions leading to a goal state—a terminal state that is a win—would be the best
solution in a typical search problem. MIN has something to say about it in an adversarial
search. MAX must therefore devise a contingent strategy that specifies M A X’s initial state
move, then MAX’s movements in the states resulting from every conceivable MIN response,
then MAX’s moves in the states resulting from every possible MIN reaction to those moves,
and so on. This is quite similar to the AND-OR search method, with MAX acting as OR and
MIN acting as AND. When playing an infallible opponent, an optimal strategy produces results
that are as least as excellent as any other plan. We’ll start by demonstrating how to find the
best plan.
We’ll move to the trivial game in the figure below since even a simple game like tic-tac-toe is
too complex for us to draw the full game tree on one page. MAX’s root node moves are
designated by the letters a1, a2, and a3. MIN’s probable answers to a1 are b1, b2, b3, and so
on. This game is over after MAX and MIN each make one move. (In game terms, this tree
consists of two half-moves and is one move deep, each of which is referred to as a ply.) The
terminal states in this game have utility values ranging from 2 to 14.
The optimal strategy can be found from the minimax value of each node, which we express as
MINIMAX, given a game tree (n). Assuming that both players play optimally from there
through the finish of the game, the utility (for MAX) of being in the corresponding state is the
node’s minimax value. The usefulness of a terminal state is obviously its minimax value.
Furthermore, if given the option, MAX prefers to shift to a maximum value state, whereas
MIN wants to move to a minimum value state. So here’s what we’ve got:
Let’s use these definitions to analyze the game tree shown in the figure above. The game’s
UTILITY function provides utility values to the terminal nodes on the bottom level. Because
the first MIN node, B, has three successor states with values of 3, 12, and 8, its minimax value
is 3. Minimax value 2 is also used by the other two MIN nodes. The root node is a MAX node,
with minimax values of 3, 2, and 2, resulting in a minimax value of 3. We can also find the
root of the minimax decision: action a1 is the best option for MAX since it leads to the highest
minimax value.
This concept of optimal MAX play requires that MIN plays optimally as well—it maximizes
MAX’s worst-case outcome. What happens if MIN isn’t performing at its best? Then it’s a
simple matter of demonstrating that MAX can perform even better. Other strategies may
outperform the minimax method against suboptimal opponents, but they will always
outperform optimal opponents.
o As we have seen in the minimax search algorithm that the number of game states it has
to examine are exponential in depth of the tree. Since we cannot eliminate the exponent,
but we can cut it to half. Hence there is a technique by which without checking each
node of the game tree we can compute the correct minimax decision, and this technique
is called pruning. This involves two threshold parameter Alpha and beta for future
expansion, so it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only
prune the tree leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along
the path of Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the
path of Minimizer. The initial value of beta is +∞.
The Alpha-beta pruning to a standard minimax algorithm returns the same move as the
standard algorithm does, but it removes all the nodes which are not really affecting the final
decision but making algorithm slow. Hence by pruning these nodes, it makes the algorithm
fast.
α>=β
Let's take an example of two-player search tree to understand the working of Alpha-beta
pruning
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and
β= +∞, these value of alpha and beta passed down to node B where again α= -∞ and β= +∞,
and Node B passes the same value to its child D.
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a
turn of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min
(∞, 3) = 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node
A, the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3,
and β= +∞, these two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α
remains 3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1,
and again it satisfies the condition α>=β, so the next child of C which is G will be pruned,
and the algorithm will not compute the entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
The effectiveness of alpha-beta pruning is highly dependent on the order in which each
node is examined. Move order is an important aspect of alpha-beta pruning.
Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the
leaves of the tree, and works exactly as minimax algorithm. In this case, it also consumes
more time because of alpha-beta factors, such a move of pruning is called worst ordering.
In this case, the best move occurs on the right side of the tree. The time complexity for
such an order is O(bm).
Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning
happens in the tree, and best moves occur at the left side of the tree. We apply DFS hence it
first search left of the tree and go deep twice as minimax algorithm in the same amount of
time. Complexity in ideal ordering is O(bm/2).
Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI). It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
In tree search, there’s always the possibility that the current best action is actually not the most
optimal action. In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy. This is known as the ” exploration-exploitation trade-off “. It
exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current
best.
Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path. In other words, we can say that exploration expands the
tree’s breadth more than its depth. Exploration can be useful to ensure that MCTS is not
overlooking any potentially better paths. But it quickly becomes inefficient in situations with
large number of steps or repetitions. In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy
approach and this will extend the tree’s depth more than its breadth. In simple words, UCB
formula applied to trees helps to balance the exploration-exploitation trade-off by periodically
exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
For this characteristic, MCTS becomes particularly useful in making optimal decisions in
Artificial Intelligence (AI) problems.
Selection: In this process, the MCTS algorithm traverses the current tree from the root node
using a specific strategy. The strategy uses an evaluation function to optimally select nodes
with the highest estimated value. MCTS uses the Upper Confidence Bound (UCB) formula
applied to trees as the strategy in the selection process to traverse the tree. It balances the
exploration-exploitation trade-off. During tree traversal, a node is selected based on some
parameters that return the maximum value. The parameters are characterized by the formula
that is typically used for this purpose is given below.
where;
Si = value of a node i
xi = empirical mean of a node i
C = a constant
t = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest
value from the above equation will be one that will get selected. During traversal, once a
child node is found which is also a leaf node, the MCTS jumps into the expansion step.
Expansion: In this process, a new child node is added to the tree to that node which was
optimally reached during the selection process.
Simulation: In this process, a simulation is performed by choosing moves or strategies until
a result or predefined state is achieved.
Backpropagation: After determining the value of the newly added node, the remaining tree
must be updated. So, the backpropagation process is performed, where it backpropagates
from the new node to the root node. During the process, the number of simulation stored in
each node is incremented. Also, if the new node’s simulation results in a win, then the
number of wins is also incremented.
The above steps can be visually understood by the diagram given below:
These types of algorithms are particularly useful in turn based games where there is no element
of chance in the game mechanics, such as Tic Tac Toe, Connect 4, Checkers, Chess, Go, etc.
This has recently been used by Artificial Intelligence Programs like AlphaGo, to play against
the world’s top Go players. But, its application is not limited to games only. It can be used in
any situation which is described by state-action pairs and simulations used to forecast
outcomes.
As we can see, the MCTS algorithm reduces to a very few set of functions which we can use
any choice of games or in any optimizing strategy.
1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of
memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there
might be a single branch or path, that might lead to loss against the opposition when
implemented for those turn-based games. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough number of times to
understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most
efficient path. So, there is a bit of a speed issue there.
5. Stochastic games:
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life.
Many games, such as dice tossing, have a random element to reflect this unpredictability.
These are known as stochastic games. Backgammon is a classic game that mixes skill and luck.
The legal moves are determined by rolling dice at the start of each player’s turn white, for
example, has rolled a 6–5 and has four alternative moves in the backgammon scenario shown
in the figure below.
This is a standard backgammon position. The object of the game is to get all of one’s pieces off
the board as quickly as possible. White moves in a clockwise direction toward 25, while Black
moves in a counterclockwise direction toward 0. Unless there are many opponent pieces, a
piece can advance to any position; if there is only one opponent, it is caught and must start
over. White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–
24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one
piece from position 5 to 11 and then another from 11 to 16.
Stochastic game tree for a backgammon position
White knows his or her own legal moves, but he or she has no idea how Black will roll, and
thus has no idea what Black’s legal moves will be. That means White won’t be able to build a
normal game tree-like in chess or tic-tac-toe. In backgammon, in addition to M A X and M I N
nodes, a game tree must include chance nodes. The figure below depicts chance nodes as
circles. The possible dice rolls are indicated by the branches leading from each chance node;
each branch is labelled with the roll and its probability. There are 36 different ways to roll two
dice, each equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–6.
P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36.
Each of the other 15 rolls has a 1/18 chance of happening.
The following phase is to learn how to make good decisions. Obviously, we want to choose the
move that will put us in the best position. Positions, on the other hand, do not have specific
minimum and maximum values. Instead, we can only compute a position’s anticipated value,
which is the average of all potential outcomes of the chance nodes.
As a result, we can generalize the deterministic minimax value to an expected-minimax value
for games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is
known), and MAX and MIN nodes (for which the dice roll is unknown) all function as before.
We compute the expected value for chance nodes, which is the sum of all outcomes, weighted
by the probability of each chance action.
nor the cards that will be dealt at some stage in the future. A memory system can be used to
remember the previously dealt cards that are now on the used pile. This adds to the total sum of
knowledge that the observer can use to make decisions.
In contrast, a fully observable system would be that of chess. In chess (apart from the 'who is
moving next' state, and minor subtleties such as whether a side has castled, which may not be
clear) the full state of the system is observable at any point in time.
Partially observable is a term used in a variety of mathematical settings, including that of
artificial intelligence and partially observable Markov decision processes.
A state-space
The notion of the solution.
Consistent or Legal Assignment: An assignment which does not violate any constraint
or rule is called Consistent or legal assignment.
Complete Assignment: An assignment where every variable is assigned with a value,
and the solution to the CSP remains consistent. Such assignment is known as Complete
assignment.
Partial Assignment: An assignment which assigns values to some of the variables only.
Such type of assignments are called Partial assignments.
Discrete Domain: It is an infinite domain which can have one state for multiple
variables. For example, a start state can be allocated infinite times for each variable.
Finite Domain: It is a finite domain which can have continuous states describing one
domain for one specific variable. It is also called a continuous domain.
Unary Constraints: It is the simplest type of constraints that restricts the value of a
single variable.
Binary Constraints: It is the constraint type which relates two variables. A value x2 will
contain a value which lies between x1 and x3.
Global Constraints: It is the constraint type which involves an arbitrary number of
variables.
Some special types of solution algorithms are used to solve the following types of
constraints:
Linear Constraints: These type of constraints are commonly used in linear
programming where each variable containing an integer value exists in linear form
only.
Non-linear Constraints: These type of constraints are used in non-linear programming
where each variable (an integer value) exists in a non-linear form.
Constraint Propagation
In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have
two choices either:
Constraint propagation is a special type of inference which helps in reducing the legal number
of values for the variables. The idea behind constraint propagation is local consistency.
In local consistency, variables are treated as nodes, and each binary constraint is treated as
an arc in the given problem. There are following local consistencies which are discussed
below:
Node Consistency: A single variable is said to be node consistent if all the values in the
Arc Consistency: A variable is arc consistent if every value in its domain satisfies the
binary constraints of the variables.
Path Consistency: When the evaluation of a set of two variable with respect to a third
variable can be extended over another variable, satisfying all the binary constraints. It is
similar to arc consistency.
k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.
CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving
the problem. CSP includes the following problems:
Graph Coloring: The problem where the constraint is that no adjacent sides can have
the same color.
Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be
repeated in the same row or column.
n-queen problem: In n-queen problem, the constraint is that no queen should be placed
either diagonally, in the same row or column.
Crossword: In crossword problem, the constraint is that there should be the correct
formation of the words, and it should be meaningful.
Latin square Problem: In this game, the task is to search the pattern which is occurring
several times in the game. They may be shuffled but will contain the same digits.
Cryptarithmetic Problem: This problem has one most important constraint that is, we
cannot assign a different digit to the same character. All digits should contain a unique
alphabet.
Cryptarithmetic Problem
Cryptarithmetic Problem is a type of constraint satisfaction problem where the game is about
digits and its unique replacement either with alphabets or other symbols. In cryptarithmetic
problem, the digits (0-9) get substituted by some possible alphabets or symbols. The task in
cryptarithmetic problem is to substitute each digit with an alphabet to get the result
arithmetically correct.
Let’s understand the cryptarithmetic problem as well its constraints better with the help
of an example:
Follow the below steps to understand the given problem by breaking it into its subparts:
Starting from the left hand side (L.H.S) , the terms are S and M. Assign a digit which
could give a satisfactory result. Let’s assign S->9 and M->1.
Hence, we get a satisfactory result by adding up the terms and got an assignment for O as O-
>0 as well.
Now, move ahead to the next terms E and O to get N as its output.
Adding E and O, which means 5+0=0, which is not possible because according to
cryptarithmetic constraints, we cannot assign the same digit to two letters. So, we need to think
more and assign some other value.
Note: When we will solve further, we will get one carry, so after applying it, the answer will be
satisfied.
But, we have already assigned E->5. Thus, the above result does not satisfy the values
because we are getting a different value for E. So, we need to think more.
Again, after solving the whole problem, we will get a carryover on this term, so our answer
will be satisfied.
Again, on adding the last two terms, i.e., the rightmost terms D and E, we get Y as its
result.
Backtracking search, a form of depth-first search, is commonly used for solving CSPs. Inference
can be interwoven with search.
Commutativity: CSPs are all commutative. A problem is commutative if the order of
application of any given set of actions has no effect on the outcome.
Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign.
Backtracking algorithm repeatedly chooses an unassigned variable, and then tries all values in
the domain of that variable in turn, trying to find a solution. If an inconsistency is detected, then
BACKTRACK returns failure, causing the previous call to try another value.
There is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,
action function, transition model, or goal test.
BACKTRACKING-SARCH keeps only a single representation of a state and alters that
representation rather than creating a new ones.
SELECT-UNASSIGNED-VARIABLE
Variable selection—fail-first
Minimum-remaining-values (MRV) heuristic: The idea of choosing the variable with the
fewest “legal” value. A.k.a. “most constrained variable” or “fail-first” heuristic, it picks a
variable that is most likely to cause a failure soon thereby pruning the search tree. If some
variable X has no legal values left, the MRV heuristic will select X and failure will be detected
immediately—avoiding pointless searches through other variables.
E.g. After the assignment for WA=red and NT=green, there is only one possible value for SA, so
it makes sense to assign SA=blue next rather than assigning Q.
[Powerful guide]
Degree heuristic: The degree heuristic attempts to reduce the branching factor on future choices
by selecting the variable that is involved in the largest number of constraints on other unassigned
variables. [useful tie-breaker]
e.g. SA is the variable with highest degree 5; the other variables have degree 2 or 3; T has degree
0.
ORDER-DOMAIN-VALUES
Value selection—fail-last
If we are trying to find all the solution to a problem (not just the first one), then the ordering does
not matter.
Least-constraining-value heuristic: prefers the value that rules out the fewest choice for the
neighboring variables in the constraint graph. (Try to leave the maximum flexibility for
subsequent variable assignments.)
e.g. We have generated the partial assignment with WA=red and NT=green and that our next
choice is for Q. Blue would be a bad choice because it eliminates the last legal value left for Q’s
neighbor, SA, therefore prefers red to blue.
INFERENCE
forward checking: [One of the simplest forms of inference.] Whenever a variable X is assigned,
the forward-checking process establishes arc consistency for it: for each unassigned variable Y
that is connected to X by a constraint, delete from Y’s domain any value that is inconsistent with
the value chosen for X.
There is no reason to do forward checking if we have already done arc consistency as a
preprocessing step.
Advantage: For many problems the search will be more effective if we combine the MRV
heuristic with forward checking.
Disadvantage: Forward checking only makes the current variable arc-consistent, but doesn’t look
ahead and make all the other variables arc-consistent.
MAC (Maintaining Arc Consistency) algorithm: [More powerful than forward checking,
detect this inconsistency.] After a variable Xi is assigned a value, the INFERENCE procedure
calls AC-3, but instead of a queue of all arcs in the CSP, we start with only the arcs(Xj, Xi) for
all Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint
propagation in the usual way, and if any variable has its domain reduced to the empty set, the call
to AC-3 fails and we know to backtrack immediately.
Intelligent backtracking
Forward checking can supply the conflict set with no extra work.
Whenever forward checking based on an assignment X=x deletes a value from Y’s domain, add
X=x to Y’s conflict set;
If the last value is deleted from Y’s domain, the assignment in the conflict set of Y are added to
the conflict set of X.
In fact,every branch pruned by backjumping is also pruned by forward checking. Hence simple
backjumping is redundant in a forward-checking search or in a search that uses stronger
consistency checking (such as MAC).
Conflict-directed backjumping:
e.g.
consider the partial assignment which is proved to be inconsistent: {WA=red, NSW=red}.
We try T=red next and then assign NT, Q, V, SA, no assignment can work for these last 4
variables.
Eventually we run out of value to try at NT, but simple backjumping cannot work because NT
doesn’t have a complete conflict set of preceding variables that caused to fail.
The set {WA, NSW} is a deeper notion of the conflict set for NT, caused NT together with any
subsequent variables to have no consistent solution. So the algorithm should backtrack to NSW
and skip over T.
A backjumping algorithm that uses conflict sets defined in this way is called conflict-direct
backjumping.
How to Compute:
When a variable’s domain becomes empty, the “terminal” failure occurs, that variable has a
standard conflict set.
Let Xj be the current variable, let conf(Xj) be its conflict set. If every possible value for Xj fails,
backjump to the most recent variable Xi in conf(Xj), and set
conf(Xi) ← conf(Xi)∪conf(Xj) – {Xi}.
The conflict set for an variable means, there is no solution from that variable onward, given the
preceding assignment to the conflict set.
e.g.
assign WA, NSW, T, NT, Q, V, SA.
SA fails, and its conflict set is {WA, NT, Q}. (standard conflict set)
Backjump to Q, its conflict set is {NT, NSW}∪{WA,NT,Q}-{Q} = {WA, NT, NSW}.
Backtrack to NT, its conflict set is {WA}∪{WA,NT,NSW}-{NT} = {WA, NSW}.
Hence the algorithm backjump to NSW. (over T)
After backjumping from a contradiction, how to avoid running into the same problem again:
Constraint learning: The idea of finding a minimum set of variables from the conflict set that
causes the problem. This set of variables, along with their corresponding values, is called a no-
good. We then record the no-good, either by adding a new constraint to the CSP or by keeping a
separate cache of no-goods.
Backtracking occurs when no legal assignment can be found for a variable. Conflict-directed
backjumping backtracks directly to the source of the problem.
Local search algorithms for CSPs use a complete-state formulation: the initial state assigns a
value to every variable, and the search change the value of one variable at a time.
The min-conflicts heuristic: In choosing a new value for a variable, select the value that results
in the minimum number of conflicts with other variables.
Local search techniques in Section 4.1 can be used in local search for CSPs.
The landscape of a CSP under the mini-conflicts heuristic usually has a series of plateau.
Simulated annealing and Plateau search (i.e. allowing sideways moves to another state with the
same score) can help local search find its way off the plateau. This wandering on the plateau can
be directed with tabu search: keeping a small list of recently visited states and forbidding the
algorithm to return to those tates.
Constraint weighting: a technique that can help concentrate the search on the important
constraints.
Each constraint is given a numeric weight Wi, initially all 1.
At each step, the algorithm chooses a variable/value pair to change that will result in the lowest
total weight of all violated constraints.
The weights are then adjusted by incrementing the weight of each constraint that is violated by
the current assignment.
Local search can be used in an online setting when the problem changes, this is particularly
important in scheduling problems.
Tree: A constraint graph is a tree when any two varyiable are connected by only one path.
Directed arc consistency (DAC): A CSP is defined to be directed arc-consistent under an
ordering of variables X1, X2, … , Xn if and only if every Xi is arc-consistent with each Xj for j>i.
By using DAC, any tree-structured CSP can be solved in time linear in the number of variables.
How to solve a tree-structure CSP:
Pick any variable to be the root of the tree;
Choose an ordering of the variable such that each variable appears after its parent in the tree.
(topological sort)
Any tree with n nodes has n-1 arcs, so we can make this graph directed arc-consistent in O(n)
steps, each of which must compare up to d possible domain values for 2 variables, for a total
time of O(nd2).
Once we have a directed arc-consistent graph, we can just march down the list of variables and
choose any remaining value.
Since each link from a parent to its child is arc consistent, we won’t have to backtrack, and can
move linearly through the variables.
There are 2 primary ways to reduce more general constraint graphs to trees:
1. Based on removing nodes;
e.g. We can delete SA from the graph by fixing a value for SA and deleting from the domains of
other variables any values that are inconsistent with the value chosen for SA.
The general algorithm:
Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree after
removal of S. S is called a cycle cutset.
For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domain of the remaining variables any values that are inconsistent with the
assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
Time complexity: O(dc·(n-c)d2), c is the size of the cycle cut set.
Cutset conditioning: The overall algorithmic approach of efficient approximation algorithms to
find the smallest cycle cutset.
2. Based on collapsing nodes together
Tree decomposition: construct a tree decomposition of the constraint graph into a set of
connected subproblems, each subproblem is solved independently, and the resulting solutions are
then combined.
UNIT IV
o An intelligent agent needs knowledge about the real world for taking decisions and reasoning to act
efficiently.
o Knowledge-based agents are those agents who have the capability of maintaining an internal state of
knowledge, reason over that knowledge, update their knowledge after observations and take actions.
These agents can represent the world with some formal representation and act intelligently.
o Knowledge-based agents are composed of two main parts:
o Knowledge-base and
o Inference system.
The above diagram is representing a generalized architecture for a knowledge-based agent. The knowledge-based
agent (KBA) take input from the environment by perceiving the environment. The input is taken by the inference
engine of the agent and which also communicate with KB to decide as per the knowledge store in KB. The learning
element of KBA regularly updates the KB by learning new knowledge.
Knowledge base: Knowledge-base is a central component of a knowledge-based agent, it is also known as KB. It is a
collection of sentences (here 'sentence' is a technical term and it is not identical to sentence in English). These
sentences are expressed in a language which is called a knowledge representation language. The Knowledge-base of
KBA stores fact about the world.
Knowledge-base is required for updating knowledge for an agent to learn with experiences and take action as per the
knowledge.
Inference system
Inference means deriving new sentences from old. Inference system allows us to add a new sentence to the
knowledge base. A sentence is a proposition about the world. Inference system applies logical rules to the KB to
deduce new information.
Inference system generates new facts so that an agent can update the KB. An inference system works mainly in two
rules which are given as:
o Forward chaining
o Backward chaining
Following are three operations which are performed by KBA in order to show the intelligent behavior:
1. TELL: This operation tells the knowledge base what it perceives from the environment.
2. ASK: This operation asks the knowledge base what action it should perform.
3. Perform: It performs the selected action.
function KB-AGENT(percept):
persistent: KB, a knowledge base
t, a counter, initially 0, indicating time
TELL(KB, MAKE-PERCEPT-SENTENCE(percept, t))
Action = ASK(KB, MAKE-ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t=t+1
return action
The knowledge-based agent takes percept as input and returns an action as output. The agent maintains the
knowledge base, KB, and it initially has some background knowledge of the real world. It also has a counter to
indicate the time for the whole process, and this counter is initialized with zero.
Each time when the function is called, it performs its three operations:
The MAKE-PERCEPT-SENTENCE generates a sentence as setting that the agent perceived the given percept at the
given time.
The MAKE-ACTION-QUERY generates a sentence to ask which action should be done at the current time.
MAKE-ACTION-SENTENCE generates a sentence which asserts that the chosen action was executed.
A knowledge-based agent can be viewed at different levels which are given below:
1. Knowledge level
Knowledge level is the first level of knowledge-based agent, and in this level, we need to specify what the agent
knows, and what the agent goals are. With these specifications, we can fix its behavior. For example, suppose an
automated taxi agent needs to go from a station A to station B, and he knows the way from A to B, so this comes at
the knowledge level.
2. Logical level:
are encoded into different logics. At the logical level, an encoding of knowledge into logical sentences occurs. At the
logical level we can expect to the automated taxi agent to reach to the destination B.
3. Implementation level:
This is the physical representation of logic and knowledge. At the implementation level agent perform actions as per
logical and knowledge level. At this level, an automated taxi agent actually implement his knowledge and logic so that
he can reach to the destination.
1. 1. Declarative approach: We can create a knowledge-based agent by initializing with an empty knowledge
base and telling the agent all the sentences with which we want to start with. This approach is called
Declarative approach.
2. 2. Procedural approach: In the procedural approach, we directly encode desired behavior as a program
code. Which means we just need to write a program that already encodes the desired behavior or agent.
However, in the real world, a successful agent can be built by combining both declarative and procedural approaches,
and declarative knowledge can often be compiled into more efficient procedural code.
2. Propositional logic
Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions. A proposition
is a declarative statement which is either true or false. It is a technique of knowledge representation in logical and
mathematical form.
Example:
a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.
The syntax of propositional logic defines the allowable sentences for the knowledge representation. There are two
types of Propositions:
a. Atomic Propositions
b. Compound propositions
o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single proposition
symbol. These are the sentences which must be either true or false.
Example:
Example:
Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a sentence logically. We can create
compound propositions with the help of logical connectives. There are mainly five connectives, which are given as
follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal or negative
literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P and Q are the
propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also known as if-then rules.
It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am breathing, then I
am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Truth Table:
In propositional logic, we need to know the truth values of propositions in all possible scenarios. We can combine all
the possible combination with logical connectives, and the representation of these combinations in a tabular format is
called Truth table. Following are the truth table for all logical connectives :
We can build a proposition composing three propositions P, Q, and R. This truth table is made-up of 8n Tuples as we
have taken three proposition symbols.
Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional connectors or logical operators. This order
should be followed while evaluating a propositional problem. Following is the list of the precedence order for
operators:
Precedence Operators
Note: For better understanding use parenthesis to make sure of the correct interpretations. Such as ¬R∨ Q, It
can be interpreted as (¬R) ∨ Q
Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions are said to be logically equivalent if
and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In below truth table we can
see that column for ¬A∨ B and A→B, are identical hence A is Equivalent to B
Properties of Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.
Theorem proving: Applying rules of inference directly to the sentences in our knowledge base to
construct a proof of the desired sentence without consulting models.
Inference rules are patterns of sound inference that can be used to find proofs. The resolution rule yields
a complete inference algorithm for knowledge bases that are expressed in conjunctive normal
form. Forward chaining and backward chaining are very natural reasoning algorithms for knowledge
bases in Horn form.
Logical equivalence:
Two sentences α and β are logically equivalent if they are true in the same set of models. (write as α ≡ β).
Also: α ≡ β if and only if α ⊨ β and β ⊨ α.
Whenever any sentences of the form α⇒β and α are given, then the sentence β can be inferred.
·And-Elimination:
·All of logical equivalence (in Figure 7.11) can be used as inference rules.
We can apply any of the search algorithms in Chapter 3 to find a sequence of steps that constitutes a
proof. We just need to define a proof problem as follows:
·INITIAL STATE: the initial knowledge base;
·ACTION: the set of actions consists of all the inference rules applied to all the sentences that match the
top half of the inference rule.
·RESULT: the result of an action is to add the sentence in the bottom half of the inference rule.
·GOAL: the goal is a state that contains the sentence we are trying to prove.
In many practical cases, finding a proof can be more efficient than enumerating models, because the
proof can ignore irrelevant propositions, no matter how many of them they are.
Monotonicity: A property of logical system, says that the set of entailed sentences can only increased as
information is added to the knowledge base.
For any sentences α and β,
If KB ⊨ αthen KB ∧β ⊨ α.
Monotonicity means that inference rules can be applied whenever suitable premises are found in the
knowledge base, what else in the knowledge base cannot invalidate any conclusion already inferred.
Proof by resolution
Resolution: An inference rule that yields a complete inference algorithm when coupled with any complete
search algorithm.
Clause: A disjunction of literals. (e.g. A∨ B). A single literal can be viewed as a unit clause (a disjunction of
one literal ).
Unit resolution inference rule: Takes a clause and a literal and produces a new clause.
where each l is a literal, li and m are complementary literals (one is the negation of the other).
Notice: The resulting clause should contain only one copy of each literal. The removal of multiple copies of literal is
called factoring.
e.g. resolve(A∨ B) with (A∨ ¬B), obtain(A∨ A) and reduce it to just A.
A resolution algorithm
e.g.
KB = (B1,1⟺(P1,2∨P2,1))∧¬B1,1
α = ¬P1,2
Notice: Any clause in which two complementary literals appear can be discarded, because it is always equivalent
to True.
e.g. B1,1∨¬B1,1∨P1,2 = True∨P1,2 = True.
PL-RESOLUTION is complete.
Definite clause: A disjunction of literals of which exactly one is positive. (e.g. ¬ L1,1∨¬Breeze∨B1,1)
Every definite clause can be written as an implication, whose premise is a conjunction of positive literals and whose
conclusion is a single positive literal.
Horn clause: A disjunction of literals of which at most one is positive. (All definite clauses are Horn clauses.)
In Horn form, the premise is called the body and the conclusion is called the head.
A sentence consisting of a single positive literal is called a fact, it too can be written in implication form.
Horn clause are closed under resolution: if you resolve 2 horn clauses, you get back a horn clause.
Inference with horn clauses can be done through the forward-chaining and backward-chaining algorithms.
Deciding entailment with Horn clauses can be done in time that is linear in the size of the knowledge base.
Goal clause: A clause with no positive literals.
Fixed point: The algorithm reaches a fixed point where no new inferences are possible.
Data-driven reasoning: Reasoning in which the focus of attention starts with the known data. It can be used within
an agent to derive conclusions from incoming percept, often without a specific query in mind. (forward chaining is an
example)
The set of possible models, given a fixed propositional vocabulary, is finite, so entailment can be checked by
enumerating models. Efficient model-checking inference algorithms for propositional logic include backtracking and
local search methods and can often solve large problems quickly.
2 families of algorithms for the SAT problem based on model checking:
a. based on backtracking
b. based on local hill-climbing search
DPLL embodies 3 improvements over the scheme of TT-ENTAILS?: Early termination, pure symbol heuristic, unit
clause heuristic.
Tricks that enable SAT solvers to scale up to large problems: Component analysis, variable and value
ordering, intelligent backtracking, random restarts, clever indexing.
On every iteration, the algorithm picks an unsatisfied clause, and chooses randomly between 2 ways to pick a symbol
to flip:
Either a. a “min-conflicts” step that minimizes the number of unsatisfied clauses in the new state;
Or b. a “random walk” step that picks the symbol randomly.
When the algorithm returns a model, the input sentence is indeed satifiable;
When the algorithm returns failure, there are 2 possible causes:
Either a. The sentence is unsatisfiable;
Or b. We need to give the algorithm more time.
If we set max_flips=∞, p>0, the algorithm will:
Either a. eventually return a model if one exists
Or b. never terminate if the sentence is unsatifiable.
Thus WALKSAT is useful when we expect a solution to exist, but cannot always detect unsatisfiability.
The notation CNFk(m, n) denotes a k-CNF sentence with m clauses and n symbols. (with n variables and k literals per
clause).
Given a source of random sentences, where the clauses are chosen uniformly, independently and without
replacement from among all clauses with k different literals, which are positive or negative at random.
Hardness: problems right at the threshold > overconstrained problems > underconstrained problems
Satifiability threshold conjecture: A theory says that for every k≥3, there is a threshold ratio rk, such that as n goes
to infinity, the probability that CNFk(n, rn) is satisfiable becomes 1 for all values or r below the threshold, and 0 for all
values above. (remains unproven)
Frame problem: some information lost because the effect axioms fails to state what remains unchanged as the
result of an action.
Solution: add frame axioms explicity asserting all the propositions that remain the same.
Representation frame problem: The proliferation of frame axioms is inefficient, the set of frame axioms will be
O(mn) in a world with m different actions and n fluents.
Solution: because the world exhibits locaility (for humans each action typically changes no more than some number
k of those fluents.) Define the transition model with a set of axioms of size O(mk) rather than size O(mn).
Inferential frame problem: The problem of projecting forward the results of a t step lan of action in time O(kt) rather
than O(nt).
Solution: change one’s focus from writing axioms about actions to writing axioms about fluents.
For each fluent F, we will have an axiom that defines the truth value of F t+1 in terms of fluents at time t and the action
that may have occurred at time t.
The truth value of Ft+1 can be set in one of 2 ways:
Either a. The action at time t cause F to be true at t+1
Or b. F was already true at time t and the action at time t does not cause it to be false.
An axiom of this form is called a successor-state axiom and has this schema:
Qualification problem: specifying all unusual exceptions that could cause the action to fail.
2. A hybrid agent
Hybrid agent: combines the ability to deduce various aspect of the state of the world with condition-action rules, and
with problem-solving algorithms.
The agent maintains and update KB as a current plan.
The initial KB contains the atemporal axioms. (don’t depend on t)
At each time step, the new percept sentence is added along with all the axioms that depend on t (such as the
successor-state axioms).
Then the agent use logical inference by ASKING questions of the KB (to work out which squares are safe and which
have yet to be visited).
The main body of the agent program constructs a plan based on a decreasing priority of goals:
1. If there is a glitter, construct a plan to grab the gold, follow a route back to the initial location and climb out of the
cave;
2. Otherwise if there is no current plan, plan a route (with A* search) to the closest safe square unvisited yet, making
sure the route goes through only safe squares;
3. If there are no safe squares to explore, if still has an arrow, try to make a safe square by shooting at one of the
possible wumpus locations.
4. If this fails, look for a square to explore that is not provably unsafe.
5. If there is no such square, the mission is impossible, then retreat to the initial location and climb out of the cave.
We use a logical sentence involving the proposition symbols associated with the current time step and the temporal
symbols.
Logical state estimation involves maintaining a logical sentence that describes the set of possible states consistent
with the observation history. Each update step requires inference using the transition model of the environment,
which is built from successor-state axioms that specify how each fluent changes.
State estimation: The process of updating the belief state as new percepts arrive.
Exact state estimation may require logical formulas whose size is exponential in the number of symbols.
One common scheme for approximate state estimation: to represent belief state as conjunctions of literals (1-CNF
formulas).
The agent simply tries to prove Xt and ¬Xt for each symbol Xt, given the belief state at t-1.
The conjunction of provable literals becomes the new belief state, and the previous belief state is discarded.
(This scheme may lose some information as time goes along.)
The set of possible states represented by the 1-CNF belief state includes all states that are in fact possible given the
full percept history. The 1-CNF belief state acts as a simple outer envelope, or conservative approximation.
Precondition axioms: stating that an action occurrence requires the preconditions to be satisfied, added to avoid
generating plans with illegal actions.
Action exclusion axioms: added to avoid the creation of plans with multiple simultaneous actions that interfere with
each other.
Propositional logic does not scale to environments of unbounded size because it lacks the expressive power to deal
concisely with time, space and universal patterns of relationshipgs among objects.
6. First-order logic
In the topic of Propositional logic, we have seen that how to represent statements using propositional logic. But
unfortunately, in propositional logic, we can only represent the facts, which are either true or false. PL is not sufficient
to represent the complex sentences or natural language statements. The propositional logic has very limited
expressive power. Consider the following sentence, which we cannot represent using PL logic.
To represent the above statements, PL logic is not sufficient, so we required some more powerful logic, such as first-
order logic.
First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is an extension to
propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is a
powerful language that develops information about the objects in a more easy way and can also express the
relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains facts like propositional
logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such as: the
sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics
The syntax of FOL determines which collection of symbols is a logical expression in first-order logic. The basic
syntactic elements of first-order logic are symbols. We write statements in short-hand notation in FOL.
Variables x, y, z, a, b,....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed from a
predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the statement and
second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement within its range is true for
everything or every instance of a particular thing.
o For all x
o For each x
o For every x.
Example:
Let a variable x which refers to a cat so all x can be represented in UOD as below:
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its scope is true for at least
one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate variable then it
is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Example:
It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
o The main connective for universal quantifier ∀ is implication →.
o The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
o In universal quantifier, ∀x∀y is similar to ∀y∀x.
o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.
Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
The quantifiers interact with variables which appear in a suitable way. There are two types of variables in First-order
logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the quantifier.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the quantifier.
What is knowledge-engineering?
The process of constructing a knowledge-base in first-order logic is called as knowledge- engineering. In knowledge-
engineering, someone who investigates a particular domain, learns important concept of that domain, and generates
a formal representation of the objects, is known as knowledge engineer.
In this topic, we will understand the Knowledge engineering process in an electronic circuit domain, which is already
familiar. This approach is mainly suitable for creating special-purpose knowledge base.
Following are some main steps of the knowledge-engineering process. Using these steps, we will develop a
knowledge base which will allow us to reason about digital circuit (One-bit full adder) which is given below
The first step of the process is to identify the task, and for the digital circuit, there are various reasoning tasks
At the first level or highest level, we will examine the functionality of the circuit:
At the second level, we will examine the circuit structure details such as:
In the second step, we will assemble the relevant knowledge which is required for digital circuits. So for digital circuits,
we have the following required knowledge:
3. Decide on vocabulary:
The next step of the process is to select functions, predicate, and constants to represent the circuits, terminals, signals,
and gates. Firstly we will distinguish the gates from each other and from other objects. Each gate is represented as an
object which is named by a constant, such as, Gate(X1). The functionality of each gate is determined by its type,
which is taken as constants such as AND, OR, XOR, or NOT. Circuits will be identified by a predicate: Circuit (C1).
For gate input, we will use the function In(1, X1) for denoting the first input terminal of the gate, and for output
terminal we will use Out (1, X1).
The function Arity(c, i, j) is used to denote that circuit c has i input, j output.
The connectivity between gates can be represented by predicate Connect(Out(1, X1), In(1, X1)).
We use a unary predicate On (t), which is true if the signal at a terminal is on.
To encode the general knowledge about the logic circuit, we need some following rules:
o If two terminals are connected then they have the same input signal, it can be represented as:
∀ t1, t2 Terminal (t1) ∧ Terminal (t2) ∧ Connect (t1, t2) → Signal (t1) = Signal (2).
o Signal at every terminal will have either value 0 or 1, it will be represented as:
∀ g Gate(g) ∧ Type(g) = XOR → Signal (Out(1, g)) = 1 ⇔ Signal (In(1, g)) ≠ Signal (In(2, g)).
o Output of NOT gate is invert of its input:
Now we encode problem of circuit C1, firstly we categorize the circuit and its gate components. This step is easy if
ontology about the problem is already thought. This step involves the writing simple atomics sentences of instances
of concepts, which is known as ontology.
For the given circuit C1, we can encode the problem instance in atomic sentences as below:
Since in the circuit there are two XOR, two AND, and one OR gate so atomic sentences for these gates will be:
In this step, we will find all the possible set of values of all the terminal for the adder circuit. The first query will be:
What should be the combination of input which would generate the first output of circuit C1, as 0 and a second
output to be 1?
∃ i1, i2, i3 Signal (In(1, C1))=i1 ∧ Signal (In(2, C1))=i2 ∧ Signal (In(3, C1))= i3
∧ Signal (Out(1, C1)) =0 ∧ Signal (Out(2, C1))=1
Now we will debug the knowledge base, and this is the last step of the complete process. In this step, we will try to
debug the issues of knowledge base.
Inference in First-Order Logic is used to deduce new facts or sentences from existing sentences. Before understanding
the FOL inference rule, let's understand some basic terminologies used in FOL.
Substitution:
Substitution is a fundamental operation performed on terms and formulas. It occurs in all inference systems in first-
order logic. The substitution is complex in the presence of quantifiers in FOL. If we write F[a/x], so it refers to
substitute a constant "a" in place of variable "x".
Note: First-order logic is capable of expressing facts about some or all objects in the universe.
Equality:
First-Order logic does not only use predicate and terms for making atomic sentences but also uses another way,
which is equality in FOL. For this, we can use equality symbols which specify that the two terms refer to the same
object.
As in the above example, the object referred by the Brother (John) is similar to the object referred by Smith. The
equality symbol can also be used with negation to represent that two terms are not the same objects.
As propositional logic we also have inference rules in first-order logic, so following are some basic inference rules in
FOL:
o Universal Generalization
o Universal Instantiation
o Existential Instantiation
o Existential introduction
1. Universal Generalization:
o Universal generalization is a valid inference rule which states that if premise P(c) is true for any arbitrary
element c in the universe of discourse, then we can have a conclusion as ∀ x P(x).
o This rule can be used if we want to show that every element has a similar property.
o In this rule, x must not appear as a free variable.
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.", it will also be true.
2. Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid inference rule. It can be applied
multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a constant within
domain x) from ∀ x P(x) for any object in the universe of discourse.
So from this information, we can infer any of the following statements using Universal Instantiation:
3. Existential Instantiation:
Existential instantiation is also called as Existential Elimination, which is a valid inference rule in first-order
logic.
Example:
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge base.
4. Existential introduction
o An existential introduction is also known as an existential generalization, which is a valid inference rule in
first-order logic.
o This rule states that if there is some element c in the universe of discourse which has a property P, then we
can infer that there exists something in the universe which has the property P.
For the inference process in FOL, we have a single inference rule which is called Generalized Modus Ponens. It is lifted
version of Modus ponens.
Generalized Modus Ponens can be summarized as, " P implies Q and P is asserted to be true, therefore Q must be
True."
According to Modus Ponens, for atomic sentences pi, pi', q. Where there is a substitution θ such that SUBST (θ, pi',)
= SUBST(θ, pi), it can be represented as:
Example:
We will use this rule for Kings are evil, so we will find some x such that x is king, and x is greedy so we can
infer that x is evil.
In artificial intelligence, forward and backward chaining is one of the important topics, but before understanding
forward and backward chaining lets first understand that from where these two terms came.
Inference engine:
The inference engine is the component of the intelligent system in artificial intelligence, which applies logical rules to
the knowledge base to infer new information from known facts. The first inference engine was part of the expert
system. Inference engine commonly proceeds in two modes, which are:
A. Forward chaining
B. Backward chaining
Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a more restricted
and efficient inference algorithm. Logical inference algorithms use forward and backward chaining approaches, which
require KB in the form of the first-order definite clause.
Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as a definite
clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as horn clause.
Hence all the definite clauses are horn clauses.
It is equivalent to p ∧ q → k.
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when using an inference engine.
Forward chaining is a form of reasoning which start with atomic sentences in the knowledge base and applies
inference rules (Modus Ponens) in the forward direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied, and add their
conclusion to the known facts. This process repeats until the problem is solved.
Properties of Forward-Chaining:
Consider the following famous example which we will use in both approaches:
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."
To solve the above problem, first, we will convert all the above facts into first-order definite clauses, and then we will
use a forward-chaining algorithm to reach the goal.
Step-1:
In the first step we will start with the known facts and will choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be represented as
below.
Step-2:
At the second step, we will see those facts which infer from available facts and with satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction of
Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.
B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method when using an inference
engine. A backward chaining algorithm is a form of reasoning, which starts with the goal and works backward,
chaining through rules to find known facts that support the goal.
o In backward chaining, the goal is broken into sub-goal or sub-goals to prove the facts true.
o It is called a goal-driven approach, as a list of goals decides which rules are selected and used.
o Backward -chaining algorithm is used in game theory, automated theorem proving tools, inference engines,
proof assistants, and various AI applications.
o The backward-chaining method mostly used a depth-first search strategy for proof.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and at last, we will prove
those facts true. So our goal fact is "Robert is Criminal," so following is the predicate of it.
Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can see in Rule-1, the
goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we will add all the conjunctive facts below
the first level and will replace p with Robert.
Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it satisfies Rule-(5). Weapon
(q) is also true with the substitution of a constant T1 at q.
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies the Rule- 4, with the
substitution of A in place of r. So these two statements are proved here.
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And hence all the
statements are proved true using backward chaining.
Following is the difference between the forward chaining and backward chaining:
o Forward chaining as the name suggests, start from the known facts and move forward by applying inference
rules to extract more data, and it continues until it reaches to the goal, whereas backward chaining starts
from the goal, move backward by using inference rules to determine the facts that satisfy the goal.
o Forward chaining is called a data-driven inference technique, whereas backward chaining is called a goal-
driven inference technique.
o Forward chaining is known as the down-up approach, whereas backward chaining is known as a top-
down approach.
o Forward chaining uses breadth-first search strategy, whereas backward chaining uses depth-first
search strategy.
o Forward and backward chaining both applies Modus ponens inference rule.
o Forward chaining can be used for tasks such as planning, design process monitoring, diagnosis, and
classification, whereas backward chaining can be used for classification and diagnosis tasks.
o Forward chaining can be like an exhaustive search, whereas backward chaining tries to avoid the unnecessary
path of reasoning.
o In forward-chaining there can be various ASK questions from the knowledge base, whereas in backward
chaining there can be fewer ASK questions.
o Forward chaining is slow as it checks for all the rules, whereas backward chaining is fast as it checks few
required rules only.
1. Forward chaining starts from known Backward chaining starts from the goal
facts and applies inference rule to and works backward through inference
extract more data unit it reaches to rules to find the required facts that
the goal. support the goal.
5. Forward chaining tests for all the Backward chaining only tests for few
available rules required rules.
9. Forward chaining is aimed for any Backward chaining is only aimed for the
conclusion. required data.
11. Resolution
Resolution in FOL
Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs by contradictions.
It was invented by a Mathematician John Alan Robinson in the year 1965.
Resolution is used, if there are various statements are given, and we need to prove a conclusion of those statements.
Unification is a key concept in proofs by resolutions. Resolution is a single inference rule which can efficiently operate
on the conjunctive normal form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.
Conjunctive Normal Form: A sentence represented as a conjunction of clauses is said to be conjunctive normal
form or CNF.
The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution can resolve two
clauses if they contain complementary literals, which are assumed to be standardized apart so that they share no
variables.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
To better understand all the above steps, we will take an example in which we will apply resolution.
Example:
a. John likes all kind of food.
b. Apple and vegetable are food
c. Anything anyone eats and not killed is food.
d. Anil eats peanuts and still alive
In the first step we will convert all the given statements into its first order logic.
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier for resolution
proofs.
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written in two separate statements.
In this statement, we will apply negation to the conclusion statements, which will be written as ¬likes(John, Peanuts)
Now in this step, we will solve the problem by resolution tree using substitution. For the above problem, it will be
given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with the given set of statements.
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional
logic with certainty, which means we were sure about the predicates. With this knowledge
representation, we might write A→B, which means if A is true then B is true, but consider a
situation where we are not sure about whether A is true or not then we cannot express this
statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need
uncertain reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
2. Bayesian inference
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the probability of
an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an application
of Bayes' theorem, which is fundamental to Bayesian statistics.
Bayes' theorem allows updating the probability prediction of an event by observing new information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most modern AI systems
for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of
evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A). This is very useful in cases
where we have a good probability of these three terms and want to determine the fourth one. Suppose we want to
perceive the effect of some unknown cause, and want to compute that cause, then the Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time. He is
also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient has meningitis. , so we can
calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is king is
4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a king card.
Solution:
o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
3. Probabilistic reasoning
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to indicate
the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the result of
someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A match between two teams or two players." These are probable
sentences for which we can assume that it will happen but not sure about it, so here we use probabilistic reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's
understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of
the likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal
uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It is
a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under the
conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be reduced to
set B, and now we can only calculate event A when event B is already occurred by dividing the probability of P(A⋀B)
by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and mathematics,
and then what is the percent of students those who like English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we need
a Bayesian network. It can also be used in various tasks including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge is
known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that
means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of the
network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node
A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which determines the
effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first understand the
joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as
Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
In general for each variable Xi, we can write the equation as:
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at detecting
a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at that time too. On the other hand, Sophia likes to
listen to high music, so sometimes she misses to hear the alarm. Here we would like to compute the probability of
Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and Sophia both called the
Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is showing that burglary
and earthquake is the parent node of the alarm and directly affecting the probability of alarm's going off,
but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set of
cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two parents,
then CPT will contain 4 probability values
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the above
probability statement using joint probability distribution:
Let's take the observed probability for the Burglary and earthquake component:
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
The Conditional probability of David that he will call depends on the probability of Alarm.
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
From the formula of joint distribution, we can write the problem statement in the form of probability distribution:
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
There are two ways to understand the semantics of the Bayesian network, which is given below:
1. Exact inference:
In exact inference, we analytically compute the conditional probability distribution over the
variables of interest.
But sometimes, that’s too hard to do, in which case we can use approximation techniques based on
statistical sampling
General question: What’s the whole probability distribution over variable X given evidence e, P(X |
e)?
In our discrete probability situation, the only way to answer a MAP query is to compute the
probability of x given e for all possible values of x and see which one is greatest
So, in general, we’d like to be able to compute a whole probability distribution over some variable or
variables X, given instantiations of a set of variables e
To answer any query involving a conjunction of variables, sum over the variables not involved in the
query
Given the joint distribution over the variables, we can easily answer any question about the value of
a single variable by summing (or marginalizing) over the other variables.
So, in a domain with four variables, A, B, C, and D, the probability that variable D has value d is the
sum over all possible combinations of values of the other three variables of the joint probability of
all four values. This is exactly the same as the procedure we went through in the last lecture, where
to compute the probability of cavity, we added up the probability of cavity and toothache and the
probability of cavity and not toothache.
In general, we’ll use the first notation, with a single summation indexed by a list of variable names,
and a joint probability expression that mentions values of those variables. But here we can see the
completely written-out definition, just so we all know what the shorthand is supposed to mean.
In the numerator, here, you can see that we’re only summing over variables A and C, because b and
d are instantiated in the query.
We’re going to learn a general purpose algorithm for answering these joint queries fairly efficiently.
We’ll start by looking at a very simple case to build up our intuitions, then we’ll write down the
algorithm, then we’ll apply it to a more complex case.
Here’s our very simple case. It’s a bayes net with four nodes, arranged in a chain.
So, we know from before that the probability that variable D has some value little d is the sum over
A, B, and C of the joint distribution, with d fixed.
Now, using the chain rule of Bayesian networks, we can write down the joint probability as a
product over the nodes of the probability of each node’s value given the values of its parents. So, in
this case, we get P(d|c) times P(c|b) times P(b|a) times P(a).
This expression gives us a method for answering the query, given the conditional probabilities that
are stored in the net. And this method can be applied directly to any other bayes net. But there’s a
problem with it: it requires enumerating all possible combinations of assignments to A, B, and C, and
then, for each one, multiplying the factors for each node. That’s an enormous amount of work and
we’d like to avoid it if at all possible.
So, we’ll try rewriting the expression into something that might be more efficient to evaluate. First,
we can make our summation into three separate summations, one over each variable.
Then, by distributivity of addition over multiplication, we can push the summations in, so that the
sum over A includes all the terms that mention A, but no others, and so on. It’s pretty clear that this
expression is the same as the previous one in value, but it can be evaluated more efficiently. We’re
still, eventually, enumerating all assignments to the three variables, but we’re doing somewhat
fewer multiplications than before. So this is still not completely satisfactory.
If you look, for a minute, at the terms inside the summation over A, you’ll see that we’re doing these
multiplications over for each value of C, which isn’t necessary, because they’re independent of C.
Our idea, here, is to do the multiplications once and store them for later use. So, first, for each value
of A and B, we can compute the product, generating a two dimensional matrix.
Then, we can sum over the rows of the matrix, yielding one value of the sum for each possible value
of b.
Now, we can substitute f1 of b in for the sum over A in our previous expression. And, effectively, we
can remove node A from our diagram. Now, we express the contribution of b, which takes the
contribution of a into account, as f_1 of b.
We can continue the process in basically the same way. We can look at the summation over b and
see that the only other variable it involves is c. We can summarize those products as a set of factors,
one for each value of c. We’ll call those factors f_2 of c.
We substitute f_2 of c into the formula, remove node b from the diagram, and now we’re down to a
simple expression in which d is known and we have to sum over values of c.
Given a Bayesian network, and an elimination order for the non-query variables , compute
For i = m downto 1
That was a simple special case. Now we can look at the algorithm in the general case. Let’s assume
that we’re given a Bayesian network and an ordering on the variables that aren’t fixed in the query.
We’ll come back later to the question of the influence of the order, and how we might find a good
one.
We can express the probability of the query variables as a sum over each value of each of the non-
query variables of a product over each node in the network, of the probability that that variable has
the given value given the values of its parents.
So, we’ll eliminate the variables from the inside out. Starting with variable Xm and finishing with
variable X1.
To eliminate variable Xi, we start by gathering up all of the factors that mention Xi, and removing
them from our set of factors. Let’s say there are k such factors.
Now, we make a k+1 dimensional table, indexed by Xi as well as each of the other variables that is
mentioned in our set of factors.
We then sum the table over the Xi dimension, resulting in a k-dimensional table.
This table is our new factor, and we put a term for it back into our set of factors.
Once we’ve eliminated all the summations, we have the desired value.
Here’s a more complicated example, to illustrate the variable elimination algorithm in a more
general case. We have this big network that encodes a domain for diagnosing lung disease.
(Dyspnea, as I understand it, is shortness of breath).
So, we start by eliminating V. We gather the two terms that mention V and see that they also
involve variable T. So, we compute the product for each value of T, and summarize those in the
factor f1 of T.
Now we can substitute that factor in for the summation, and remove the node from the network.
The next variable to be eliminated is X. There is actually only one term involving X, and it also
involves variable A. So, for each value of A, we compute the sum over X of P(x|a). But wait! We
know what this value is! If we fix a and sum over x, these probabilities have to add up to 1.
So, rather than adding another factor to our expression, we can just remove the whole sum. In
general, the only nodes that will have an influence on the probability of D are its ancestors.
Now, it’s time to eliminate S. We find that there are three terms involving S, and we gather them
into the sum. These three terms involve two other variables, B and L. So we have to make a factor
that specifies, for each value of B and L, the value of the sum of products.
Now we can substitute that factor back into our expression. We can also eliminate node S. But in
eliminating S, we’ve added a direct dependency between L and B (they used to be dependent via S,
but now the dependency is encoded explicitly in f2(b). We’ll show that in the graph by drawing a line
between the two nodes. It’s not exactly a standard directed conditional dependence, but it’s still
useful to show that they’re coupled.
Now we eliminate T. It involves two terms, which themselves involve variables A and L. So we make
a new factor f3 of A and L.
Next we eliminate L. It involves these two factors, which depend on variables A and B. So we make a
new factor, f4 of A and B, and substitute it in. We remove node L, but couple A and B.
At this point, we could just do the summations over A and B and be done. But to finish out the
algorithm the way a computer would, it’s time to eliminate variable B.
It involves both of our remaining terms, and it seems to depend on variables A and D. However, in
this case, we’re interested in the probability of a particular value, little d of D, and so the variable d
is instantiated. Thus, we can treat it as a constant in this expression, and we only need to generate a
factor over a, which we’ll call f5 of a. And we can now, in some sense, remove D from our network
as well (because we’ve already factored it into our answer).
Finally, to get the probability that variable D has value little d, we simply sum factor f5 over all
values of a. Yay! We did it.
Let’s see how the variable elimination algorithm performs, both in theory and in practice.
First of all, it’s pretty easy to see that it runs in time exponential in the number of variables involved in
the largest factor. Creating a factor with k variables involves making a k+1 dimensional table. If you have
b values per variable, that’s a table of size b^(k+1). To make each entry, you have to multiply at most n
numbers, where n is the number of nodes. We have to do this for each variable to be eliminated (which
is usually close to n). So we have something like time = O(n^2 b^k).
How big the factors are depends on the elimination order. You’ll see in one of the recitation exercises
just how dramatic the difference in factor sizes can be. A bad elimination order can generate huge
factors.
So, we’d like to use the elimination order that generates the smallest factors. Unfortunately, it turns out
to be NP hard to find the best elimination order.
At least, there are some fairly reasonable heuristics for choosing an elimination order. It’s usually done
dynamically. So, rather than fixing the elimination order in advance, as we suggested in the algorithm
description, you can pick the next variable to be eliminated depending on the situation. In particular,
one reasonable heuristic is to pick the variable to eliminate next that will result in the smallest factor.
This greedy approach won’t always be optimal, but it’s not usually too bad.
There is one case where Bayes net inference in general, and the variable elimination algorithm in
particular is fairly efficient, and that’s when the network is a polytree. A polytree is a network with no
cycles. That is, a network in which, for any two nodes, there is only one path between them. In a
polytree, inference is linear in the size of the network, where the size of the network is defined to be the
size of the largest conditional probability table (or exponential in the maximum number of parents of
any node). In a polytree, the optimal elimination order is to start at the root nodes, and work
downwards, always eliminating a variable that no longer has any parents. In doing so, we never
introduce additional connections into the network.
So, inference in polytrees is efficient, and even in many large non-polytree networks, it’s possible to
keep the factors small, and therefore to do inference relatively efficiently.
When the network is such that the factors are, of necessity, large, we’ll have to turn to a different class
of methods.
2. Approximate inference:
Sampling
Another strategy, which is a theme that comes up also more and more in AI actually, is to say, well, we
didn't really want the right answer anyway. Let's try to do an approximation. And you can also show that
it's computationally hard to get an approximation that's within epsilon of the answer that you want, but
again that doesn't keep us from trying.
So, the other thing that we can do is the stochastic simulation or sampling. In sampling, what we do is
we look at the root nodes of our graph, and attached to this root node is some probability that A is going
to be true, right? Maybe it's .4. So we flip a coin that comes up heads with probability .4 and see if we
get true or false.
We flip our coin, let's say, and we get true for A -- this time. And now, given the assignment of true to A,
we look in the conditional probability table for B given A = true, and that gives us a probability for B.
Now, we flip a coin with that probability. Say we get False. We enter that into the table.
Now, we look in the CPT for D given B and C, for the case where B is false and C is true, and we flip a coin
with that probability, in order to get a value for D.
So, there's one sample from the joint distribution of these four variables. And you can just keep doing
this, all day and all night, and generate a big pile of samples, using that algorithm. And now you can ask
various questions.
Estimate:
P*(D|A) = #D,A / #A
Let's say you want to know the probability of D given A. How would you answer - - given all the
examples -- what would you do to compute the probability of D given A? You would just count. You’d
count the number of cases in which A and D were true, and you’d divide that by the number of cases in
which A was true, and that would give you an unbiased estimate of the probability of D given A. The
more samples, the more confidence you’d have that the estimated probability is close to the true one.
Estimation
Exactly because of the process we’re using to generate the samples, the majority of them will be the
typical cases. Oh, it's someone with a cold, someone with a cold, someone with a cold, someone with a
cold, someone with a cold, someone with malaria, someone with a cold, someone with a cold. So the
rare results are not going to come up very often. And so doing this sampling naively can make it really
hard to estimate the probability of a rare event. If it's something that happens one in ten thousand
times, well, you know for sure you're going to need, some number of tens of thousands of samples to
get even a reasonable estimate of that probability.
Imagine that you want to estimate the probability of some disease given -- oh, I don't know -- spots on
your tongue and a sore toe. Somebody walks in and they have a really peculiar set of symptoms, and
you want to know what's the probability that they have some disease.
Well, if the symptoms are root nodes, it's easy. If the symptoms were root nodes, you could just assign
the root nodes to have their observed values and then simulate the rest of the network as before.
But if the symptoms aren't root nodes then if you do naïve sampling, you would generate a giant table
of samples, and you'd have to go and look and say, gosh, how many cases do I have where somebody
has spots on their tongue and a sore toe; and the answer would be, well, maybe zero or not very many.
There’s a technique called importance sampling, which allows you to draw examples from a distribution
that’s going to be more helpful and then reweight them so that you can still get an unbiased estimate of
the desired conditional probability. It’s a bit beyond the scope of this class to get into the details, but it’s
an important and effective idea.
Recitation Problem
• Do the variable elimination algorithm on the net below using the elimination order A,B,C (that is,
eliminate node C first). In computing P(D=d), what factors do you get?
Here’s the network we started with. We used elimination order C, B, A (we eliminated A first). Now
we’re going to explore what happens when we eliminate the variables in the opposite order. First, work
on the case we did, where we’re trying to calculate the probability that node D takes on a particular
value, little d. Remember that little d is a constant in this case. Now, do the case where we’re trying to
find the whole distribution over D, so we don’t know a particular value for little d.
Find an elimination order that keeps the factors small for the net below, or show that there is no such
order.
Here’s a pretty complicated graph. But notice that no node has more than 2 parents, so none of the
CPTs are huge. The question is, is this graph hard for variable elimination? More concretely, can you find
an elimination order that results only in fairly small factors? Is there an elimination order that generates
a huge factor?
Bayesian networks (or related models) are often used in computer vision, but they almost always
require sampling. What happens when you try to do variable elimination on a model like the grid below?
6. Casual Networks:
A causal network is an acyclic digraph arising from an evolution of a substitution system, and
representing its history. The illustration above shows a causal network corresponding to the rules
(applied in a left-to-right scan) and initial condition .
The figure above shows the procedure for diagrammatically creating a causal network from
a mobile automaton.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Exam Date: 23.09.2024 Reg. No.:
12. a Explain the Breadth First search, Uniform cost Search and 16 CO1 L2
Depth First search algorithms with examples.
(OR)
12.b Explain the Depth limited search, Iterative deepening and 16 CO1 L2
Bidirectional search algorithms with examples.
************
Initial state, Goal state, Possible actions, State transition model, Solution path.
5 What are the things that the agent knows in online search problems?
The agent knows the action model, current percepts, and its goal but lacks a
complete model of the environment.
A heuristic function estimates the cost from the current state to the goal state,
helping to guide search algorithms.
8 Define annealing.
Part – B (5 x 16 = 80 marks)
11.a. What are informed search techniques? Explain any with examples.
Informed search techniques, also known as heuristic search techniques, use additional
knowledge beyond the problem description to guide the search towards the goal more
efficiently than uninformed search techniques. These algorithms make use of heuristic
functions, which provide estimates of how close a state is to the goal. The heuristic function
is used to prioritize which nodes or states are explored next.
1. Best-First Search: This algorithm selects the next node to explore based on a
heuristic function. A simple example is the Greedy Best-First Search, which
always selects the node that appears to be the closest to the goal based on a
heuristic function h(n)h(n)h(n).
Advantages: Heuristic searches, especially A*, are often more efficient and faster in finding
solutions compared to uninformed searches like Breadth-First or Depth-First Search.
Disadvantages: The efficiency of informed searches depends on the quality of the heuristic.
Poor heuristics can lead to suboptimal or slow searches.
11b. List the basic kinds of Intelligent Agents and explain with a neat schematic diagram.
Intelligent agents are systems that perceive their environment through sensors and act
upon it using actuators. They operate autonomously to achieve specific goals based on their
internal knowledge or beliefs. Below are the basic types of intelligent agents:
1. Simple Reflex Agents: These agents select actions based only on the current
percept, ignoring the rest of the percept history. They follow the condition-action
rules.
2. Model-Based Reflex Agents: These agents maintain an internal state to keep track
of aspects of the world that cannot be observed directly. The agent uses a model of
the world to make decisions.
o Example: A self-driving car that keeps track of nearby vehicles and road
conditions.
3. Goal-Based Agents: In addition to the state of the environment, these agents have
goals that they try to achieve. They decide their actions based on the distance from
the goal.
5. Learning Agents: These agents can improve their performance based on past
experiences. They have a learning component that allows them to update their
knowledge or behavior.
o Example: A robot that learns how to navigate a maze based on trial and error.
Schematic Diagram: [Insert diagram showing the flow between the environment, sensors,
actuators, and decision-making components like the model, goals, or utility.]
12a. Explain the Breadth-First Search (BFS), Uniform Cost Search (UCS), and Depth-First
Search (DFS) algorithms with examples.
o Description: BFS explores all nodes at the current depth before moving to the
next level. It is guaranteed to find the shortest path in an unweighted graph.
o Example: In a maze, BFS would explore all possible adjacent paths step by
step, ensuring that it finds the shortest path to the exit.
o Description: UCS is similar to BFS but takes into account varying costs of
edges. It expands the node with the lowest path cost first, ensuring the
optimal solution in weighted graphs.
o Example: Finding the shortest path between cities on a map where each edge
has a different travel cost (e.g., distances or time).
o Example: In solving puzzles like Sudoku, DFS tries a path until it fails and
then backtracks to try another.
o Time Complexity: O(bm)O(b^m)O(bm), where mmm is the maximum depth of
the search tree.
12b. Explain the Depth-Limited Search, Iterative Deepening Search, and Bidirectional
Search algorithms with examples.
1. Depth-Limited Search:
3. Bidirectional Search:
13a. What is heuristic search technique in AI? How does heuristic search work? Explain its
advantages and disadvantages.
A heuristic search is an informed search strategy that uses heuristic functions to guide
the search process, speeding up the search for solutions in large problem spaces by
estimating which path is most promising. The key idea behind heuristic search is to
prioritize states that are likely to lead to a solution faster.
A heuristic function h(n)h(n)h(n) provides an estimate of the cost to reach the goal
from a given node nnn. This is used to evaluate which node should be expanded
next.
Best-First Search and A* are common heuristic search algorithms. In A*, the total
cost function f(n)=g(n)+h(n)f(n) = g(n) + h(n)f(n)=g(n)+h(n) is used, where:
o g(n)g(n)g(n) is the cost to reach node nnn from the start node.
o h(n)h(n)h(n) is the estimated cost to reach the goal from node nnn.
Example of A* Search:
Consider a robot trying to find the shortest path through a maze. Each cell in the maze has
a cost, and the goal is to minimize the path cost. The heuristic function could be the
Manhattan distance from the current cell to the goal cell. A* will expand the node that
minimizes the total estimated cost (sum of the cost to the node and the estimated
remaining cost).
3. Faster than BFS and DFS: Since it is guided by the heuristic, the search process
can avoid exploring paths that are unlikely to lead to a solution.
1. Heuristic Design: The performance of the search depends on the quality of the
heuristic. Designing a good heuristic can be difficult and problem-specific.
2. Memory Intensive: Algorithms like A* need to keep track of all explored nodes,
which can lead to high memory consumption.
Local search algorithms are a class of search methods that operate by iteratively improving
a single solution. Unlike global search algorithms that explore a wide range of possibilities,
local search focuses on improving the current state by making small changes, often known
as hill climbing.
1. Hill Climbing:
o Drawback: Hill climbing can get stuck in local optima, where no neighboring
state appears better, but the solution is not globally optimal.
2. Simulated Annealing:
3. Genetic Algorithms:
o Advantage: Can explore a broad search space and avoid local optima by
maintaining diversity in the population.
Sketch: [Insert sketches showing hill climbing, simulated annealing, and genetic algorithm
processes with search states.]
14a. Explain about searching with non-deterministic actions with an example to formulate
a problem solution.
In some real-world situations, the outcomes of an agent’s actions are not entirely
predictable, leading to non-deterministic search. This occurs when the result of applying
an action can vary, and the agent must account for these possibilities when planning.
Consider a robot navigating a warehouse where the ground may be slippery in some areas.
When the robot issues a command to move forward, the robot may slip and end up in a
different location than expected. The environment is non-deterministic because the same
action (move forward) can lead to different outcomes.
2. Actions: The robot can attempt to move forward, backward, left, or right, but the
outcome of these actions is uncertain.
4. Goal State: The robot must reach a specific location in the warehouse.
5. Solution: A contingency plan that accounts for all possible outcomes of actions.
For example, if the robot slips, it may try the same action again or choose a different
action depending on where it ends up.
Partial Knowledge: The agent does not know the exact result of its actions but has
some probability distribution of the outcomes.
Strategies: The agent may need to adopt backup plans or conditional strategies
depending on the possible outcomes.
The AO* algorithm is used to solve problems represented as AND-OR graphs, where
some tasks (AND nodes) require multiple sub-tasks to be completed, while others (OR
nodes) can be solved by choosing one of several sub-tasks.
In an AND-OR graph, nodes can be solved either by solving all of their child nodes
(AND) or by solving any one of their child nodes (OR).
AO* updates the cost of each node based on the costs of its children, allowing it to
backtrack and revise decisions as better paths are discovered.
Example of AO*:
Consider a project where multiple tasks need to be completed. Some tasks depend on all of
their subtasks being completed (AND), while others can be solved by completing any one of
their subtasks (OR).
Solution: The AO* algorithm will guide the project manager to focus on the most
critical path that minimizes the overall project time or cost.
Alpha-Beta pruning is an optimization technique for the minimax algorithm used in two-
player games like chess. It reduces the number of nodes evaluated by the minimax
algorithm, cutting off branches that cannot influence the final decision.
Alpha-Beta pruning works by keeping track of two values during the minimax
search:
o Alpha: The best score that the maximizing player can guarantee.
o Beta: The best score that the minimizing player can guarantee.
The algorithm prunes (cuts off) branches of the game tree that cannot possibly affect
the outcome of the game, thus saving computation time.
Example:
In a chess game, if a branch shows that the maximizing player can guarantee a win with a
certain move, Alpha-Beta pruning will stop evaluating any other moves that are guaranteed
to be worse.
2. Optimality: Does not affect the outcome of the minimax algorithm—still finds the
optimal move.
Steps:
1. The game tree is generated, with each node representing a possible game state.
2. Minimax evaluates the game tree from the terminal states upwards, assigning
values based on the game's outcome.
3. The maximizing player selects the move that leads to the highest score, while
the minimizing player selects
Monte Carlo Search (MCS) is a heuristic search algorithm that uses randomness and
statistical sampling to estimate the value of different actions or decisions in games or other
problem-solving environments. It is particularly effective when the search space is large or
complex, and an exhaustive search is computationally impractical.
2. Simulation: In games, Monte Carlo Search often involves simulating many random
games from the current state to the end. Based on the outcomes of these
simulations, the algorithm estimates the value of different actions.
A common and more advanced form of Monte Carlo search is Monte Carlo Tree Search
(MCTS), which is widely used in game-playing AI, such as for Go and Chess. MCTS builds a
search tree by progressively refining the value estimates for different moves through four
key steps:
1. Selection: Starting from the root node (current game state), a path is selected
according to a certain strategy, often guided by the Upper Confidence Bound (UCB)
formula to balance exploration and exploitation.
2. Expansion: New nodes (representing game states) are added to the tree based on
possible moves from the selected node.
3. Simulation: Random simulations are run from the new node until a terminal state
(game end) is reached.
4. Backpropagation: The result of the simulation is used to update the values of the
nodes along the path that was selected, improving their estimates for future
decisions.
In a game like Go, where the number of possible moves is immense, it is impossible to
evaluate every possible sequence of moves like in chess. MCTS simulates thousands of
games, randomly playing out different scenarios from a given position. Based on which
simulations result in a win, the algorithm learns which moves are more likely to succeed
and prioritizes them.
Advantages:
1. Scalability: Monte Carlo search can handle large search spaces and complex
problems efficiently.
2. Anytime Algorithm: MCTS can be stopped at any time, providing a reasonably good
solution based on the simulations run so far.
Disadvantages:
1. Randomness: Since it relies on random sampling, the results may vary between
runs unless a very large number of simulations are performed.
************
o Where:
P(A∣B)P(A | B)P(A∣B) is the posterior probability of AAA given BBB.
P(B∣A)P(B | A)P(B∣A) is the likelihood of BBB given AAA.
P(A)P(A)P(A) is the prior probability of AAA.
P(B)P(B)P(B) is the evidence or marginal probability of BBB.
Uses:
o Medical Diagnosis: Given symptoms, update the probability of a disease.
o Spam Filtering: Given certain keywords, determine the probability of an
email being spam.
o Machine Learning: Used in algorithms like Naive Bayes for classification
tasks.
Belief State: A representation of the agent's knowledge about the current state of
the world. It often includes the probability distribution over all possible states based
on available information.
State Estimation: The process of estimating the current state of the world based on
partial observations and prior knowledge. It may involve techniques like filtering,
smoothing, or using algorithms like the Kalman Filter to estimate the state from
noisy data.
10. Justify why we cannot use traditional minimax for games with an element of chance,
such as backgammon.
Part – B (5 x 16 = 80 marks)
11.a. What is Conjunctive Normal Form? Illustrate and explain the procedure to
convert sentences into Conjunctive normal form with neat example.
Key Terms
Literal: A literal is either an atomic proposition (like PPP, QQQ) or its negation
(¬P\neg P¬P, ¬Q\neg Q¬Q).
Clause: A clause is a disjunction (OR) of literals, e.g., P∨¬QP \vee \neg QP∨¬Q.
Conjunction of Clauses: CNF requires that a formula is a conjunction (AND) of
clauses, such as (P∨Q)∧(¬P∨R)(P \vee Q) \wedge (\neg P \vee R)(P∨Q)∧(¬P∨R).
Characteristics of CNF
To convert a logical sentence into CNF, you typically follow these steps:
Example Conversion
Rewrite implications using the rule A→B=¬A∨BA \rightarrow B = \neg A \vee BA→B=¬A∨B:
This process ensures the logical expression is in a standardized CNF form suitable for
logical reasoning algorithms like resolution in propositional logic.
In First-Order Logic (FOL), quantifiers are symbols used to indicate the scope of a
statement over a domain of objects. They specify whether a statement applies universally to
all elements in the domain or only to some elements. The two standard quantifiers in FOL
are:
1. Universal Quantifier ( ∀ )
2. Existential Quantifier ( ∃ )
1. Universal Quantifier ( ∀ )
The universal quantifier (denoted by ∀\forall∀) specifies that a statement is true for all
elements in the domain. The phrase “for all” or “for every” is often used to express
universal quantification.
Symbol: ∀\forall∀
Meaning: “For all” or “For every”
Example in Mathematical Language: ∀x P(x)\forall x \, P(x)∀xP(x)
o Here, P(x)P(x)P(x) is a predicate, and the statement ∀x P(x)\forall x \,
P(x)∀xP(x) means that P(x)P(x)P(x) is true for every possible value of xxx in the
domain.
Example:
2. Existential Quantifier ( ∃ )
The existential quantifier (denoted by ∃\exists∃) specifies that there exists at least one
element in the domain for which a statement is true. The phrase “there exists” or “for
some” is used to express existential quantification.
Symbol: ∃\exists∃
Meaning: “There exists” or “For some”
Example in Mathematical Language: ∃x P(x)\exists x \, P(x)∃xP(x)
o Here, P(x)P(x)P(x) is a predicate, and the statement ∃x P(x)\exists x \,
P(x)∃xP(x) means that there is at least one value of xxx in the domain for
which P(x)P(x)P(x) is true.
Example:
Combining Quantifiers
Example of Combination:
Statement: “For every human, there exists a day they were born.”
FOL Expression: ∀x (Human(x)→∃y BornOn(x,y))\forall x \, (\text{Human}(x)
\rightarrow \exists y \, \text{BornOn}(x, y))∀x(Human(x)→∃yBornOn(x,y))
o This reads as “For all xxx, if xxx is a human, then there exists some yyy such
that xxx was born on yyy.”
o This statement means that for every human, we can find a specific day of
birth, illustrating a relationship that involves both universal and existential
quantifiers.
Quantifiers enable us to express general statements about objects and their properties in a
precise, logical way, making FOL a powerful tool in fields like mathematics, computer
science, and artificial intelligence.
12a. What is Bayesian network? Explain the method for constructing Bayesian
networks.
DDD: Disease
SSS: Symptoms
TTT: Test results
The graph could show DDD influencing SSS and TTT, indicating that symptoms and test
results depend on the presence of a disease.
2. Determine Dependencies
Make sure the graph is acyclic, i.e., there are no loops. Modify the structure if
needed to eliminate cycles.
4. Quantify Relationships
Verify that the structure and probabilities are consistent with the problem domain.
Ensure the network can generate the desired joint probability distribution.
6. Perform Inference
Use the Bayesian Network for reasoning tasks, such as calculating marginal
probabilities, updating beliefs, or making predictions.
Example: Given observed symptoms (SSS), infer the likelihood of the disease (DDD).
You want to build a Bayesian Network for determining if a student will pass an exam based
on whether they study and their intelligence level.
Steps:
1. Variables:
o SSS: Study (Yes/No)
o III: Intelligence (High/Low)
o PPP: Pass Exam (Yes/No)
2. Dependencies:
o Intelligence (III) affects whether the student passes (PPP).
o Studying (SSS) also affects whether the student passes (PPP).
o III and SSS are independent of each other.
3. Graph:
o I→PI \rightarrow PI→P
o S→PS \rightarrow PS→P
4. CPDs:
o P(I)=[P(High Intelligence)=0.7,P(Low Intelligence)=0.3]P(I) = [P(\text{High
Intelligence}) = 0.7, P(\text{Low Intelligence}) =
0.3]P(I)=[P(High Intelligence)=0.7,P(Low Intelligence)=0.3]
o P(S)=[P(Study)=0.6,P(Not Study)=0.4]P(S) = [P(\text{Study}) = 0.6, P(\text{Not
Study}) = 0.4]P(S)=[P(Study)=0.6,P(Not Study)=0.4]
o P(P∣I,S)P(P | I, S)P(P∣I,S): Conditional probabilities for passing given
intelligence and study status.
5. Joint Probability:
o Using the chain rule: P(I,S,P)=P(I)⋅P(S)⋅P(P∣I,S)P(I, S, P) = P(I) \cdot P(S) \cdot
P(P | I, S)P(I,S,P)=P(I)⋅P(S)⋅P(P∣I,S)
6. Inference:
o Use the network to calculate the probability of passing given specific
conditions, like P(P∣S=Yes)P(P | S = \text{Yes})P(P∣S=Yes).
A causal network, also known as a causal graph or causal Bayesian network, is a type of
graphical model used to represent and reason about causal relationships between variables
in a system. It combines the principles of causality with the structure of a Bayesian
network, making it a powerful tool for understanding, explaining, and predicting the effects
of interventions or changes in a system.
For example, if AAA causes BBB, there will be a directed edge from AAA to BBB.
1. Causality Representation:
o The direction of an edge indicates the cause-and-effect relationship.
o For example, A→BA \rightarrow BA→B implies that changes in AAA cause
changes in BBB.
2. Markov Property:
o A variable in the network is conditionally independent of its non-descendants
given its parents. This simplifies reasoning about relationships.
3. Joint Probability Factorization:
o Similar to Bayesian networks, the joint probability of all variables can be
expressed as: P(X1,X2,…,Xn)=∏i=1nP(Xi ∣ Parents(Xi))P(X_1, X_2, \ldots, X_n)
= \prod_{i=1}^n P(X_i \,|\, \text{Parents}(X_i))P(X1,X2,…,Xn)=i=1∏nP(Xi
∣Parents(Xi))
4. Intervention Analysis:
o Causal networks allow for interventions to test "what if" scenarios. For
example, setting a variable AAA to a specific value and observing its effect on
other variables.
1. Nodes:
o Represent variables (discrete or continuous) in the system.
o Example: Smoking\text{Smoking}Smoking, Lung Cancer\text{Lung
Cancer}Lung Cancer, Coughing\text{Coughing}Coughing.
2. Directed Edges:
o Indicate causal influence. For instance, Smoking→Lung Cancer\text{Smoking}
\rightarrow \text{Lung Cancer}Smoking→Lung Cancer means smoking
directly affects lung cancer.
3. Conditional Probability Distributions (CPDs):
o Quantify the strength of causal relationships. Each node is associated with a
CPD that specifies the probability of that node's values given its parents.
Example of a Causal Network
Scenario:
Causal Relationships:
Graph Representation:
css
Copy code
A S
\ /
\ /
L
|
C
Joint Probability:
P(A,S,L,C)=P(A)⋅P(S)⋅P(L ∣ A,S)⋅P(C ∣ L)P(A, S, L, C) = P(A) \cdot P(S) \cdot P(L \,|\, A, S) \cdot
P(C \,|\, L)P(A,S,L,C)=P(A)⋅P(S)⋅P(L∣A,S)⋅P(C∣L)
1. Identify Variables:
o Define all relevant variables in the system.
o Example: In a medical scenario, these could include symptoms, diseases, and
environmental factors.
2. Determine Causal Relationships:
o Use domain knowledge, data, or experiments to establish direct causal links
between variables.
o Example: Smoking causes lung cancer.
3. Create a Directed Acyclic Graph (DAG):
o Represent the variables as nodes and the causal relationships as directed
edges.
o Ensure there are no cycles.
4. Quantify Relationships:
o Assign a conditional probability distribution (CPD) to each node based on its
parents.
o Use historical data, expert knowledge, or statistical methods to estimate
probabilities.
5. Validate the Model:
o Ensure the network aligns with known causal relationships.
o Perform sensitivity analysis or compare predictions against observed data.
Applications of Causal Networks
1. Medical Diagnosis:
o Understanding causal relationships between symptoms, diseases, and
treatments.
o Example: Predicting how a treatment affects recovery.
2. Policy Decision-Making:
o Evaluating the effects of interventions, such as a new law or policy.
o Example: Studying the impact of reducing air pollution on public health.
3. Machine Learning and AI:
o Improving interpretability and robustness in decision-making systems.
o Example: Building explainable AI models.
4. Economics and Social Sciences:
o Analyzing the impact of variables like education, income, and policy changes
on economic outcomes.
A researcher wants to model how education affects earnings, mediated by skills, and how
work experience also contributes.
Variables:
EEE: Education
SSS: Skills
WWW: Work experience
YYY: Earnings
Causal Relationships:
Graph Representation:
markdown
Copy code
E W
\ /
\ /
S
|
Y
Joint Probability:
P(E,W,S,Y)=P(E)⋅P(W)⋅P(S ∣ E)⋅P(Y ∣ S,W)P(E, W, S, Y) = P(E) \cdot P(W) \cdot P(S \,|\, E)
\cdot P(Y \,|\, S, W)P(E,W,S,Y)=P(E)⋅P(W)⋅P(S∣E)⋅P(Y∣S,W)
This causal network helps in understanding how changes in education or work experience
influence earnings.
1. Intuitive Representation:
o Provides a clear graphical view of causal relationships.
2. Intervention Analysis:
o Enables testing "what if" scenarios by simulating interventions.
3. Efficient Inference:
o Allows for reasoning under uncertainty and predicting outcomes.
4. Insight into Dependencies:
o Helps identify independent and dependent variables for better decision-
making.
Causal networks are essential for reasoning in complex systems, providing a structured
way to understand cause-and-effect relationships and enabling data-driven decision-
making.
Types of Inference
1. Marginal Inference:
o Calculates the marginal probability of a variable.
o Example: P(A)P(A)P(A), the probability of AAA, without any evidence.
2. Conditional Inference:
o Computes the probability of a variable given evidence about other
variables.
o Example: P(A ∣ B=b)P(A \,|\, B = b)P(A∣B=b), the probability of AAA
given B=bB = bB=b.
3. Most Probable Explanation (MPE):
o Finds the most likely values of all variables given evidence.
o Example: Given observed symptoms, find the most likely disease.
4. Maximum a Posteriori (MAP):
o Identifies the most likely values for a subset of variables given evidence.
1. Exact Inference
b. Variable Elimination
2. Approximate Inference
For large or complex networks, exact methods become intractable, and approximate
methods are used.
Approximates the true distribution with a simpler one and minimizes the
difference (e.g., using Kullback-Leibler divergence).
Example of Inference
Bayesian Network:
Variables:
o CCC: Cloudy
o RRR: Rain
o SSS: Sprinkler
o WWW: Wet Grass
Structure:
o C→RC \rightarrow RC→R, C→SC \rightarrow SC→S, R→WR
\rightarrow WR→W, S→WS \rightarrow WS→W
Query:
1. Medical Diagnosis:
o Compute the probability of a disease given symptoms.
2. Fault Detection:
o Identify the most likely cause of a system failure based on observed
anomalies.
3. Decision Support Systems:
o Make predictions or recommendations under uncertainty.
4. Natural Language Processing:
o Resolve ambiguities in text or speech.
1. Complexity:
o Exact inference is NP-hard for general Bayesian networks.
2. Scalability:
o Large networks with many variables require approximate methods.
3. Data Sparsity:
o Incomplete data can make accurate inference challenging.
1. Probability Theory
Probability theory is one of the most widely used frameworks for representing uncertainty.
It assigns probabilities to events or propositions, reflecting the likelihood of their
occurrence.
Bayesian Networks: These are directed acyclic graphs where nodes represent
random variables and edges represent probabilistic dependencies. Bayesian
networks allow for inference, diagnostics, and prediction in uncertain environments.
Markov Networks: These are undirected graphical models that represent joint
probabilities without directed edges, useful for domains with symmetrical
relationships.
Hidden Markov Models (HMMs): Used for time-series data, HMMs represent
systems that evolve over time with states that are not directly observable but can be
inferred through observable data.
2. Fuzzy Logic
Fuzzy logic extends traditional Boolean logic to handle partial truth values between 0 and
1. It is particularly useful for reasoning in domains where information is vague or
imprecise.
Example: In control systems, such as air conditioning, fuzzy logic can interpret
ambiguous terms like "slightly warm" or "very hot" and make decisions accordingly.
Belief Functions: Belief functions allow for expressing degrees of belief for
propositions based on evidence.
Dempster’s Rule of Combination: Used to combine evidence from multiple sources
and update beliefs.
4. Certainty Factors
Certainty factors (CFs) are used to represent the strength of belief or disbelief in a
hypothesis given evidence. They are common in expert systems and provide a measure
between -1 (complete disbelief) and +1 (complete belief).
CF Rule-Based Systems: Certainty factors are often used with rule-based systems,
where each rule is associated with a certainty factor.
Example: A troubleshooting system for machinery may use rules like “If the engine
won’t start, check the fuel level” without a strict probabilistic model, relying instead
on empirical rules and heuristics.
6. Qualitative Reasoning
Example: A weather forecasting system may reason that an increase in cloud cover
often leads to cooler temperatures without specifying exact measurements.
7. Non-Monotonic Logic
In domains where knowledge is incomplete or can change over time, non-monotonic logic
allows systems to retract conclusions when new information contradicts them.
Example: A legal expert system may assume a suspect is innocent (by default) until
evidence suggests otherwise, updating its conclusions with new data.
Machine learning models, especially neural networks, can model uncertainty by learning
patterns from large datasets. They are capable of capturing complex relationships that may
not be easily expressed through probabilistic or rule-based models.
Bayesian Networks, Markov Models, and Probability Theory are best for domains
with known probabilistic relationships.
Fuzzy Logic works well with vague, imprecise information.
Dempster-Shafer Theory and Certainty Factors allow belief representation in
cases of partial or conflicting evidence.
Rule-Based Systems with Heuristics are useful for empirical and expert-driven
reasoning.
Qualitative Reasoning provides relational knowledge when exact data is sparse.
Non-Monotonic Logic is ideal for evolving knowledge bases.
Machine Learning Models capture complex patterns and can quantify uncertainty
in predictions.
Choosing the right method depends on the nature of the domain, the type of uncertainty
involved, and the reasoning tasks required. Often, combining several approaches is
necessary to address different aspects of uncertainty effectively.
14a. Discuss the Knowledge Engineering Process with proper illustration. Depict the
concept of forward chaining.
1. Problem Definition: Define the system's goal as diagnosing plant diseases based on
symptoms.
2. Knowledge Acquisition: Interview agricultural experts to gather knowledge on
various plant diseases and their symptoms.
3. Knowledge Representation: Use a rule-based representation, where each disease is
associated with a set of symptoms.
4. Knowledge Encoding: Encode these rules in a system such as Prolog or a decision
tree model.
5. Knowledge Validation: Test the system with known cases to ensure accurate
diagnosis.
6. Knowledge Maintenance: Update the system with new diseases or symptoms based
on expert input or user feedback.
Forward Chaining
Consider an expert system for plant disease diagnosis. Assume the following rules:
Rule 1: If leaves are yellow and wilting, then consider nutrient deficiency.
Rule 2: If nutrient deficiency and soil is dry, then consider irrigation need.
Rule 3: If leaves have spots and temperature is high, then consider fungal infection.
1. Match Rule 1: Since "leaves are yellow and wilting," apply Rule 1 to conclude
"nutrient deficiency."
2. New Fact Added: The fact "nutrient deficiency" is added to the knowledge base.
3. Match Rule 2: If the system knows "soil is dry," Rule 2 would conclude "irrigation
need."
4. Goal Reached: Forward chaining continues until no more rules apply or the system
reaches a target conclusion.
Illustration of Forward Chaining in Flowchart
sql
Copy code
Start
|
Known Facts
|
-------------------------
| |
Rule Matching No rules apply
or apply next --> --> End
applicable rule |
|
Update Knowledge
Base
|
V
New Facts / Conclusions
Applications of Forward Chaining
1. Medical Diagnosis: Forward chaining can help determine diseases by starting with
symptoms and applying diagnostic rules to narrow down the possible conditions.
2. Troubleshooting Systems: For example, in an automotive diagnosis system,
starting from observed problems, forward chaining rules can help find probable
faults.
3. Agricultural Expert Systems: Diagnose plant issues by observing symptoms and
applying rules to reach a diagnosis.
The completeness of resolution refers to the ability of the resolution method to prove any
logical entailment in propositional or first-order logic. Specifically, if a set of clauses is
unsatisfiable (i.e., there is no interpretation that makes all the clauses true), the resolution
method will eventually derive a contradiction (the empty clause, denoted as ⊥\bot⊥).
Resolution Method
1. Soundness:
o The resolution method is sound, meaning that any clause derived by
resolution is logically implied by the original set of clauses.
o This ensures that the resolution method does not produce incorrect results.
2. Refutational Completeness:
o The resolution method is refutationally complete. This means:
If the set of clauses is unsatisfiable, the resolution process will
eventually derive the empty clause (⊥\bot⊥).
If the set of clauses is satisfiable, no contradiction will be derived.
Proof of Completeness
In propositional logic, every formula can be reduced to a finite set of clauses in CNF.
The resolution rule ensures that if two clauses contain complementary literals, a new
clause can be derived, progressively simplifying the problem.
Given the compactness theorem of propositional logic, if a set of clauses is
unsatisfiable, the resolution method will eventually derive the empty clause (⊥\bot⊥).
Resolution Steps:
In first-order logic, the completeness proof is more complex because of the need to account
for infinite domains. The resolution method extends to first-order logic by:
Key Results:
1. Herbrand’s Theorem: Any unsatisfiable set of first-order clauses has a finite subset
of ground instances that is also unsatisfiable.
2. Resolution Completeness: If a set of first-order clauses is unsatisfiable, the
resolution method will derive the empty clause.
1. Clause Normalization:
o Convert all formulas into CNF.
o Ensure no syntactic errors in transformation (e.g., avoid losing information
during Skolemization).
2. Fairness of Selection:
o During resolution, ensure all possible pairs of clauses are considered. This
guarantees that no valid derivation is missed.
3. Termination:
o For finite domains, resolution terminates when either the empty clause is
derived (proving unsatisfiability) or no more resolutions are possible
(indicating satisfiability).
o For infinite domains, heuristics like breadth-first search ensure progress
toward refutation.
Limitations of Completeness
Computational Complexity:
o While complete, the resolution method may be computationally expensive,
especially for large or infinite domains.
o Finding a resolution refutation can involve exploring an exponential number
of possible clause pairs.
Decidability:
o While propositional logic is decidable, first-order logic is only semi-decidable.
If a set of first-order clauses is satisfiable, the resolution method may run
indefinitely without finding a refutation.
Components of a CSP
1. Variables:
o The set of variables to be assigned values.
o Denoted as X={X1,X2,…,Xn}X = \{X_1, X_2, \ldots, X_n\}X={X1,X2,…,Xn}.
2. Domains:
o Each variable has a domain of possible values.
o Denoted as D={D1,D2,…,Dn}D = \{D_1, D_2, \ldots, D_n\}D={D1,D2,…,Dn},
where DiD_iDi is the domain of XiX_iXi.
3. Constraints:
o Rules that restrict the values variables can simultaneously take.
o Constraints can involve one or more variables and are expressed as relations,
such as:
Unary constraint: Constraints on a single variable (X1≠2X_1 \neq 2X1
=2).
Binary constraint: Constraints between two variables (X1≠X2X_1 \neq
X_2X1 =X2).
Global constraint: Constraints involving multiple variables (e.g., all
variables must have distinct values).
Representation:
1. Variables:
o X={A,B,C,D,E}X = \{A, B, C, D, E\}X={A,B,C,D,E}, where A,B,C,D,EA, B, C, D,
EA,B,C,D,E are regions of the map.
2. Domains:
o DA=DB=DC=DD=DE={R,G,B}D_A = D_B = D_C = D_D = D_E = \{R, G, B\}DA
=DB=DC=DD=DE={R,G,B}.
3. Constraints:
o A≠BA \neq BA =B, A≠CA \neq CA =C, B≠DB \neq DB =D, C≠DC \neq
DC =D, C≠EC \neq EC =E, D≠ED \neq ED =E.
mathematica
Copy code
A -- B
| |
C -- D -- E
Solution Techniques
1. Backtracking Search:
o Assign values to variables one at a time and backtrack when a constraint is
violated.
o Example:
Assign A=RA = RA=R.
Assign B=GB = GB=G (satisfies A≠BA \neq BA =B).
Assign C=BC = BC=B (satisfies A≠CA \neq CA =C).
Continue until all constraints are satisfied.
2. Constraint Propagation:
o Use techniques like arc-consistency (AC-3) to reduce the domain of variables
by eliminating values that violate constraints.
3. Heuristics:
o Use heuristics to optimize the search process:
Most Constrained Variable (MCV): Assign values to the variable with
the fewest legal values first.
Least Constraining Value (LCV): Choose a value that least restricts
the domains of other variables.
Example Solution
Solution:
A=R,B=G,C=B,D=R,E=GA = R, B = G, C = B, D = R, E = GA=R,B=G,C=B,D=R,E=G
Applications of CSPs
1. Scheduling:
o Allocating time slots for tasks or resources while meeting constraints.
o Example: Exam timetabling, where no two exams in the same room occur
simultaneously.
2. Planning:
o Determining actions to achieve a goal under constraints.
o Example: Planning a delivery route with time and resource limitations.
3. Puzzles and Games:
o Solving problems like Sudoku, crossword puzzles, or n-queens.
4. Resource Allocation:
o Assigning limited resources to tasks under constraints.
o Example: Allocating workers to shifts while meeting availability and skill
constraints.
Local search is particularly effective for large CSPs where systematic search methods are
infeasible due to time or memory constraints.
1. Initialization:
o Start with a random or heuristic-based complete assignment of values to
variables.
2. Evaluation:
o Calculate the objective function (number of violated constraints).
3. Neighbor Selection:
o Select a neighboring solution by modifying the value of one or more variables.
o Choose the modification that reduces the number of violated constraints.
4. Iteration:
Replace the current solution with the selected neighbor.
o
Repeat until the solution satisfies all constraints or a stopping condition is
o
met.
5. Escape Strategies (if stuck):
o Random Restart: Restart the search from a new random solution.
o Simulated Annealing: Accept worse solutions with some probability to escape
local optima.
o Tabu Search: Keep track of recently visited solutions to avoid cycles.
We need to color a map such that no two adjacent regions share the same color. Variables
represent regions, and domains are the colors available (Red, Green, Blue).
Steps:
A=R,B=R,C=G,D=G,E=BA = R, B = R, C = G, D = G, E = BA=R,B=R,C=G,D=G,E=B
A=R,B=G,C=G,D=G,E=BA = R, B = G, C = G, D = G, E = BA=R,B=G,C=G,D=G,E=B
A=R,B=G,C=B,D=R,E=GA = R, B = G, C = B, D = R, E = GA=R,B=G,C=B,D=R,E=G
1. Efficiency:
o Scales well to large CSPs since it avoids exhaustive exploration of the search
space.
2. Simplicity:
o Easy to implement and can be applied to a wide range of problems.
3. Memory Usage:
o Requires less memory compared to systematic methods like backtracking.
4. Heuristics:
o Can incorporate domain-specific heuristics for faster convergence.
1. Local Optima:
o May get stuck in a local optimum without reaching a global solution.
2. Incomplete:
o Does not guarantee finding a solution even if one exists.
3. Parameter Sensitivity:
o Performance depends on parameters like the number of restarts or cooling
schedules in simulated annealing.
1. Timetabling:
o Assigning timeslots to exams or classes while satisfying constraints.
2. Sudoku and Puzzles:
o Solving combinatorial puzzles with constraints.
3. Job Scheduling:
o Allocating tasks to workers or machines while meeting deadlines and
dependencies.
4. Resource Allocation:
o Assigning resources to tasks in constrained environments.
Local search for CSPs is an optimization technique that focuses on iteratively improving a
complete assignment of values to variables by reducing the number of constraint violations.
While efficient and memory-friendly, it may struggle with local optima and incompleteness,
making escape strategies essential for robust performance. It is particularly effective for
large-scale or complex CSPs where systematic methods become computationally
prohibitive.
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
ASSIGNMENT-1
ASSIGNMENT-1
ANSWER KEY
4. AO Algorithm*
The AO* algorithm is used for solving problems that can be represented as an AND-
OR graph. It works by searching the graph in a best-first manner and is suited for
problems that involve decision-making where the outcome is dependent on multiple sub-
goals (AND) or where one of several sub-goals needs to be achieved (OR).
Steps in AO Algorithm:*
1. Start at the root node.
2. Select the best node based on the heuristic function.
3. If the node is an OR node, expand it by selecting one of its children.
4. If the node is an AND node, expand all its children.
5. Continue expanding until a solution is found or no further nodes can be
expanded.
Example: AO* is used in game-playing and decision-tree problems, where
different strategies are explored, and combinations of outcomes (ANDs) or
alternatives (ORs) are considered.
5. Monte Carlo Search
Monte Carlo search is a technique used to make decisions in uncertain
environments, particularly in game-playing AI. It involves running many random
simulations from the current position to estimate the potential outcomes of different
actions.
Monte Carlo Tree Search (MCTS): This is a popular variant, where the search is
structured as a tree, and each node represents a game state. Simulations are run
by playing random moves, and the tree is updated based on the results. Over
time, better actions are explored more frequently.
Example: MCTS is widely used in AI systems for board games like Go and Chess,
where it helps the AI select moves that have the highest likelihood of success
based on multiple simulated outcomes.
VIVEKANANDHA COLLEGE OF TECHNOLOGY FOR WOMEN
ASSIGNMENT-2
ASSIGNMENT-2
ANSWER KEY